Towards Compilation of SQL Queries into Efficient Execution Plans for Distributed In-Memory Query Processing
University of New Brunswick
A query processing engine is the core component of any modern database system. There are several types of query processing engines that employ different query processing techniques. The speed of data-driven decision-making and analytics is crucial to firms and organizations that build software and system applications. An intuitive way to speed up database querying is to improve the performance of these engines. Conventionally, databases use a disk-oriented, pull-based or tuple-at-a-time interpreted query evaluation model. In this thesis, a compilation-based, in-memory query compiler is introduced that ingests an SQL query and generates a distributed C++ (UPC++)-based physical query plans. As part of this work, different models and components of query processing are explored, efficient “Partitioned Global Address Space”-based parallel programs corresponding to SQL queries are designed and developed, which are emitted by a code generator that uses a data-centric compilation strategy. The approach proposed in this thesis combines high-performance parallel programs with database query processing to take advantage of the advances in hardware available and offers a 2× speedup in query performance over the best existing approach.