OceanMappingDataframe - Scalable multi-indexed dataframe for hydrography
University of New Brunswick
Ocean data constitutes one of the largest geospatial datasets. Due to developments in the field of multibeam sonar, the amount of data gathered from hydrographic surveys is growing, causing the data to fall into the category of massive spatial data. Dataframe is a popular data model used to represent the data and is widely used in data science applications. Due to the lack of a suitable dataframe that can load large volumes of multibeam sonar data and support advanced analytics libraries, in this thesis, a new multi-indexed dataframe, OceanMappingDataframe, is introduced that can be used to load, store, and analyze multibeam sonar data. The multi-indexed dataframe was implemented using the MODIN dataframe library. The multi-indexed dataframe can load the hydrography files in Generic Sensor Format (GSF) or CSV file format, and save the results in partitioned Parquet files. The multi-indexed dataframe can also support advanced AI libraries such as OpenAI. This has been demonstrated by applying the Reinforcement Learning (RL) algorithm to an outlier detection problem in hydrography.