Diskless data analytics on distributed coordination systems
Loading...
Files
Date
2013
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
University of New Brunswick
Abstract
A distributed system contains software programs, applications and data resources dispersed across independent computers connected through a communication network. Distributed coordination systems are file-system like distributed meta-data stores that ensure consistency between processes of the distributed system. The challenge in this area is to perform processing fast enough on data that is continuously changing. The focus of this research is to reduce the disk bound time of a chosen distributed coordination system called Apache Zookeeper. By reducing the disk dependency, the performance will be improved. The shortcoming of this approach is that data is volatile on failures. The durability of the data is provided by replicating the data and restoring it from other nodes in the distributed ensemble. On average, a 30 times write performance improvement has been achieved with this approach.