Efficient Exploratory Data Analysis with Spatio-temporal Aggregation over Polygonal Regions
University of New Brunswick
Statistical analysis is at the heart of data science work-flows. With the rapid rise in spatio-temporal data volume, and popularity of Web and mobile mapping applications, exploratory data analysis with spatio-temporal data is becoming important. Such exploratory analysis often involves the user selecting an arbitrary polygon region to perform a statistical computation on the selected region. Existing approaches for spatio-temporal data aggregation support rectangular query regions only, and not arbitrary polygons. A recently proposed system called GeoBlocks supports polygonal queries, but GeoBlocks was designed for spatial data, not spatio-temporal data. Another aspect of exploratory data analysis is that the users often repeatedly perform similar statistical analyses over the same selected query region. Although the reuse of already computed answers can improve the response time, existing approaches do not support this reuse for statistical analysis. A recently proposed system called Data Canopy supports statistics synthesis by reusing basic aggregates, but Data Canopy does not support spatial or spatio-temporal analysis. To address the mentioned challenges, we introduce ScanCube, an exploratory statistical analysis system over any arbitrary polygonal query region for any time interval. ScanCube also supports statistics synthesis by reusing a small set of basic aggregates that are computed and stored a priori. We introduce two new techniques, ScanX1 and ScanX2, for providing a grid-based polygonal approximation, which offers distance-based bounded error. Experimental evaluation suggests that ScanCube significantly outperforms GeoBlocks and other existing approaches.