Diskless data analytics on distributed coordination systems

Loading...
Thumbnail Image

Date

2013

Journal Title

Journal ISSN

Volume Title

Publisher

University of New Brunswick

Abstract

A distributed system contains software programs, applications and data resources dispersed across independent computers connected through a communication network. Distributed coordination systems are file-system like distributed meta-data stores that ensure consistency between processes of the distributed system. The challenge in this area is to perform processing fast enough on data that is continuously changing. The focus of this research is to reduce the disk bound time of a chosen distributed coordination system called Apache Zookeeper. By reducing the disk dependency, the performance will be improved. The shortcoming of this approach is that data is volatile on failures. The durability of the data is provided by replicating the data and restoring it from other nodes in the distributed ensemble. On average, a 30 times write performance improvement has been achieved with this approach.

Description

Keywords

Citation