The shortest path
The unique thing that Akiban does is to store the rows of grouped tables together, eliminating the costs associated with the most common joins. Early on, the question was how to get a database system that implemented this idea into the hands of as many users as possible, as quickly as possible.
We decided that the answer was to build an open-source MySQL storage engine. MySQL has a huge user base, well-known performance problems with queries containing multiple joins that our technology could fix, and an API for plugging in storage engines. Perfect.
The shortest path goes to a bad place
It wasn't too long before reality hit. If we are going to optimize queries containing lots of joins, then we had to see all the joins. But the division of labor between the core of MySQL and a MySQL storage engine makes that impossible. The storage engine API is an abstraction at the level of a B-tree: scan a table, search an index given a key, insert, update, delete. We couldn't see the joins, so we were going to have a hard time optimizing them.
So we decided to intercept queries on their way into MySQL, and rewrite them. The joined tables were replaced by a single "group table", and the resulting query had fewer explicit joins. A storage engine operation on a group table is then implemented, by our server, as an equivalent operation on the grouped tables, allowing us to implement joins very efficiently. (A second benefit is that MySQL has fewer joins to think about.) We wrote a hook for intercepting and rewriting queries, and tried hard to get this accepted into MySQL. This was around the time that MySQL ownership transferred to Oracle, so our patch was, understandably, not their highest priority. This technical approach was far from ideal in many ways, but it did give us the view of joins that we needed.
We also implemented the MySQL storage engine interface, and it worked well. But the overall plan was in trouble. One problem was how to connect MySQL, written in C and C++, with Java, the language used for our server and B-tree. JNI has many problems, (performance and impact on garbage collection, among others), JNA didn't seem right, and a network layer is expensive. We went with the network approach. But this made it difficult to avoid very large numbers of expensive network round trips, and this was especially painful for cross-group joins, which, based on our contacts with potential users, appeared to be common. Our query rewrite approach worked within a group only, so we still relied on MySQL to join between group tables. Yes, we could have rewritten even more of the query, and basically used MySQL as a mechanism for delivering entire queries to our back end. You can see where this is going.
One last consideration is that working in the MySQL code base is very difficult. There are few clean abstractions, and we spent many days trying to find various pieces of information about a query or the schema by poking around undocumented data structures.
How hard could it be?
About a year ago, we came to the realization that our plan to deliver our technology via MySQL just wasn't working. Meanwhile, our B-tree and server were written in Java, and we knew that we'd be able to move much faster in that language, and especially if we were unencumbered by MySQL. There was just the little matter of writing a query parser, optimizer, and query execution engine. We were discussing this with Goetz Graefe, one of our technical advisers (a leading database researcher and one of the architects of Microsoft SQL Server). The phrase "how hard could it be?" was uttered.
So here is what we've done in the past year:
- Extracted the Apache Derby parser: Apache Derby is a Java database system which started life as Cloudscape in the late 90s. Cloudscape was bought by Informix, which was bought by IBM who turned over the code to the Apache foundation. Derby has a very nice standard SQL parser which we have extracted.
- Written the beginnings of a query optimizer: Our optimizer is advancing quickly. Most queries now run. Queries that join tables within a group are turned into particularly good execution plans that exploit table grouping. We routinely see speedups of 10x-30x compared to the same query running in MySQL.
- Built a query execution engine: We have defined a set of query processing operators designed to exploit table grouping. This set of operators represents the instruction set targeted by the optimizer. Grouping allows us to use operators that exploit the interleaving of related rows, and to exploit indexes that cross tables within a group. For example, you can define an index on (customer.name, order.date), and use it to optimize a query that joins customer with order, restricts name, and orders by date. (These "group indexes" were discussed in an earlier blog post.) The query execution engine is implemented on top of our B-tree, Persistit.
Lemonade
While we did have to stop what we were doing, back up, and go off in a new direction, our time spent with MySQL was worthwhile. We completed our storage engine, and this means that an Akiban database can be loaded via MySQL replication.
Our early users have MySQL applications, are already doing master/slave replication, and have a few queries that are too expensive to run as frequently as they would like. These are typically queries with several joins that take many seconds or minutes to run. We engage with these customers by identifying the problem queries, setting up a replication slave running Akiban, and then do a proof-of-concept showing that we can run these queries 1-2 orders of magnitude faster, and with a high degree of concurrency. We then continue to work with the customer as they move to deployment.
We're a very young database company. No one is going to run their business on our product until we're proven. But a read-only replica represents a very low level of risk, and as a result we are finding that customers are willing to give us a chance to prove our technology by fixing their hardest performance problems.
