Accelerating database query processing on FPGAs
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
University of New Brunswick
Abstract
Database query processing is a key capability of modern database management systems (DBMSs), where moving large data volumes directly affects performance, energy use, and reliability. The growing processor–memory gap creates major bottlenecks, making CPU-based systems increasingly inefficient. Data-centric computing addresses this by moving computation closer to data, and FPGAs, with their inherent parallelism and adaptability, are well suited for this paradigm.
This research focuses on accelerating database query processing operations, such as lookup, join, indexing, and updates, by exploring the potential of FPGAs. It also presents Reconfigurable Acceleration for Database Systems: Taxonomy, Techniques, and Research Challenges [58], which formalizes a unified taxonomy of FPGA-based database acceleration that shapes the design and evaluation of the accelerators developed here.
We developed a series of FPGA-accelerated learned index and join systems that significantly improve performance and efficiency for data-intensive workloads. To ensure fair comparison, each design was evaluated against optimized CPU baselines implemented in C/C++ on multi-core x86 systems, leveraging parallel libraries, multithreading, and compiler auto-vectorization (including SIMD where applicable).
In SMART [55], we introduced a fully pipelined FPGA architecture for RadixSpline (RS) [77]-based learned indexes, demonstrating multi-fold improvements in lookup throughput gains, up to an order-of-magnitude reduction in tail latency, and an overall speedup of 5.5× over a CPU-based RS on SOSD [51] datasets.
In LIJA [56], we extended learned models to relational joins, developing an FPGA-accelerated operator that reduces memory traffic and achieves up to 4.4× speedup in the build phase and 6.63× overall compared to the CPU-based implementation.
Our most recent work, FALCON [57], presented a lightweight, updatable FPGA-based learned index tailored for dynamic workloads, offering fast insertions, low update overhead, and high query throughput, outperforming both traditional tree-based hardware indexes and prior learned index accelerators. FPGA-based FALCON achieved a build stage speedup of 17×, and a total speedup of 5.5×. Against the FPGA-based RadixSpline, we achieved a total of 7.6× better performance on FPGA.
Together, these works establish a coherent progression of contributions that advance the state of the art in hardware-accelerated learned data structures on reconfigurable platforms.
