Re:invent Coming in 2021, Babelfish is an AWS project which provides an endpoint for PostgreSQL that is compatible with Microsoft SQL Server, the idea being to enable porting an application without having to change the code.
The thinking behind Babelfish is that organisations have applications using SQL Server databases that they would love to migrate to PostgreSQL, but that the effort and risk involved in porting is such that they are locked in.
Why is AWS providing this? The idea is to attract new customers either to PostgreSQL-compatible Aurora, or to its PostgreSQL RDS (Relational Database Service).
Babelfish is currently in private preview for Aurora. It is is written in C and PL/pgSQL and will be released as open source in 2021, under the Apache 2.0 licence; it should work with any flavour of PostgreSQL. The name is a nod to Hitchhiker’s Guide to the Galaxy by Douglas Adams, in which the Babel Fish, “perhaps the oddest thing in the universe,” enables someone to understand any language by nesting in their ear.
Babelfish was also a now-defunct precursor to Bing Translator owned by AltaVista/Yahoo!.
SQL Server is the third most popular database engine according to the DB-engines rankings, behind Oracle and MySQL but with almost double the score of PostgreSQL (Aurora is currently ranked 44th).
Some of that popularity though is because SQL Server exists as a capable but size-limited free DBMS in the form of SQL Server Express, as well as enterprise versions which are expensive to license.
Microsoft gears most of its developer documentation and examples to SQL Server which means that many projects start out using it, some of which will grow and end up requiring licences.
Suits you serverless! Google offers SQL database porting service tailor-made for its cloud
Migrating data is relatively easy, but porting applications more troublesome, depending on how many features specific to SQL Server are used, and which components are installed. It is a huge product and has its own language, called Transact-SQL or T-SQL, and additional features including Analysis Services, Reporting Services and Machine Learning services, and additional features including Analysis Services, Reporting Services and Machine Learning services.
Babelfish assists porting by providing two key features. First, it understands the Tabular Data Stream (TDS) protocol which SQL Server uses for requests and response between clients and servers. Second, it understands T-SQL and SQL Server column types, translating them to equivalent PostgreSQL code.
Using Babelfish, it is possible to connect to PostgreSQL using SQL Server drivers and it behaves as if it were SQL Server – within its limitations. Not everything works; but according to Tobias Ternström (who worked at Microsoft on open source databases for 10 years, then at Google, and is now leading product management for Aurora and RDS at AWS), “either you get exactly the same behaviour out of your query as you would have gotten on Microsoft SQL Server, or you get an error. You don’t get something in between.” Ternström was speaking at AWS’ virtual re:Invent conference, which started this week.
If something does not work, or developers wish to add new functionality, they can choose between modifying or writing new T-SQL code, or using PL/pgSQL (Procedural Language/PostgreSQL) via the SQL Server driver. Both are supported. “Performance between the two should be negligible,” Ternström claimed.
How extensive is the compatibility? That is a key question, and currently it is not altogether clear. “The simple answer is maybe,” said Ternström, asking himself whether any particular feature will work. “We were focusing on the most commonly used parts of T-SQL, as best we understand it.”
He did mention some things which do work, including stored procedures, cursors, client-side cursors, scalar and table-valued functions, and nested transactions (even though nested transactions are not part of PL/pgSQL). The project should also improve over time, particularly once it is open source.
The low hanging fruit for Babelfish will be for projects which use SQL Server but make little use of its advanced features, with significant savings in license fees. Projects that make deep use of SQL Server’s capabilities (and which therefore get better value from its cost) may be less suitable.
Álvaro Hernández at PostgreSQL support company Ongres is keen, as you might expect. He said that “compatibility will not be 100 per cent at launch.”
“But as long as it is high enough,” he continued, “it will enable seamless transition from a commercial and proprietary database to the open source Postgres for thousands, millions of users. And the approach taken for correctness (if a feature is not supported, an error is raised … is sound and builds trust on the success of the migrations.” He would like to see the project “integrated back into main PostgreSQL codebase.”
Is AWS getting better at open source? Perhaps; but it is worth noting that while PostgreSQL is open source, Aurora is not. Aurora is a proprietary database with two variants, one compatible with MySQL, the other compatible with PostgreSQL. AWS offers PostgreSQL itself as a separate service.
“Aurora is probably the best choice when High Availability, scalability and elasticity is the most important. PostgreSQL may provide lower latency on block storage, and is probably cheaper,” argues former CERN DBA Frank Pachot, now a consultant at DBI Services.
That said, “there is still some confusion and some Aurora users are asking the PostgreSQL community for help,” he added. “The PostgreSQL community helps a lot their users because they know the engine (which is very well documented, and source code and source comments are accessible). But they cannot do anything for Aurora as we don’t even know what has been modified … I also think that the PostgreSQL community would appreciate that the improvements made by Amazon on the PostgreSQL code are shared with the community.”
Open source for something like Babelfish is an easy win for AWS; but there is plenty more it could do, to engage with the community.®