Parallel and in-memory big spatial data processing systems and benchmarking

Alam, Md. Mahbub

Parallel and in-memory big spatial data processing systems and benchmarking

dc.contributor.advisor	Ray, Suprio
dc.contributor.advisor	Bhavsar, Virendrakumar
dc.contributor.author	Alam, Md. Mahbub
dc.date.accessioned	2023-03-01T16:19:16Z
dc.date.available	2023-03-01T16:19:16Z
dc.date.issued	2018
dc.date.updated	2023-03-01T15:01:32Z
dc.description.abstract	With the accelerated growth in spatial data volume, being generated from a wide variety of sources, the need for efficient storage, retrieval, processing and analyzing of spatial data is ever more important. Hence,the spatial data processing system has become an important field of research. Though the traditional relational database systems provide spatial functionality (such as, PostgreSQL with PostGIS) , due to the lack of parallelism and I/O bottleneck, these systems are not efficient to run compute-intensive spatial queries on large datasets. In recent times a number of big spatial data systems have been proposed by researchers around the world. These systems can be roughly categorized into disk-based systems over ApacheHadoop and in memory systems based on ApacheSpark. The available features supported by these systems vary widely. However, there has not been any comprehensive evaluation study of these systems in terms of performance, scalability, and functionality. In order to address this need, this thesis proposes a benchmark to evaluate big spatial data systems. It intends to investigate the present status of the big spatial data systems by conducting a comprehensive feature analysis and performance evaluation of a few representative systems. The Hadoop and Spark based big spatial data systems are distributed, scalable, and able to exploit the parallelism of today’s multi-core/many-core architecture. However, most of them are immature, unstable, difficult to extend and missing efficient query language like SQL. In this work, a disk based system Parallax is introduced as a parallel big spatial database system. It integrates the powerful spatial features of PostgreSQL/PostGIS and distributed persistence storage of Alluxio. The host-specific data partitioning and parallel query on local data in each node ensure the maximum utilization of main memory, disk storage, and CPU. This thesis also introduces an in-memory system Spatial Ignite, as extended spatial support for Apache Ignite. Spatial Ignite incorporates a spatial library which contains all the OGC compliant join predicates and spatial analysis functions. Along with query parallelism and collocated query processing of Ignite, the integrated spatial data partitioning techniques improve the performance of Spatial Ignite.The evaluation shows that Spatial Ignite performs better than Hadoop and Spark based systems.
dc.description.copyright	© Md. Mahbub Alam, 2019
dc.format	text/xml
dc.format.extent	xiii, 85 pages
dc.format.medium	electronic
dc.identifier.uri	https://unbscholar.lib.unb.ca/handle/1882/13490
dc.language.iso	en_CA
dc.publisher	University of New Brunswick
dc.rights	http://purl.org/coar/access_right/c_abf2
dc.subject.discipline	Computer Science
dc.title	Parallel and in-memory big spatial data processing systems and benchmarking
dc.type	master thesis
thesis.degree.discipline	Computer Science
thesis.degree.fullname	Master of Computer Science
thesis.degree.grantor	University of New Brunswick
thesis.degree.level	masters
thesis.degree.name	M.C.S.

Files

Original bundle

Now showing 1 - 1 of 1

Name:: item.pdf
Size:: 2.06 MB
Format:: Adobe Portable Document Format

Download

Collections

Open Theses & Dissertations