Building a fast in-memory database using efficient query compilation and a tuned data-centric IR
University of New Brunswick
Traditional database systems convert an SQL (Structured Query Language) query into a plan-tree consisting of relational algebra operators and interpret it to return the result. This design was adequate in the past, as the major bottleneck was reading and writing data to disk, which offset the overheads of interpretation. Modern databases with access to faster storage systems and huge amounts of main memory expose these overheads of the interpretation model. The tree of operators can be compiled into a lower level and executed directly. In this thesis, a new compilation-based, in-memory database system is introduced, which uses a push-down approach to generate query-specific code. This generated code is compiled and executed to return the result. The system described in this thesis has been implemented with a template-based code generation framework written in Java and supports multiple target back-end systems like C++, UPC++, and also a tuned data-centric IR. The new system was compared to a typical postgreSQL installation, and the performance of the generated IR code was found to be significantly better.