Efficient and privacy-preserving AdaBoost classification framework for mining healthcare data over outsourced cloud

Thumbnail Image



Journal Title

Journal ISSN

Volume Title


University of New Brunswick


In recent years, the analysis and mining of electronic health records (EHRs) with the aid of machine learning (ML) algorithms have become a popular approach to improve the quality of patient care and increase the productivity and efficiency of healthcare delivery. A sufficient amount of data is needed to have robust and more accurate decision-making systems with machine learning algorithms. Due to the high volume of EHRs, many frameworks require outsourcing their data to cloud servers. However, cloud servers are not fully trusted. Moreover, releasing sensitive raw data might put individuals at risk. For example, in Canada, the University of Ontario Institute of Technology (UOIT) in collaboration with IBM, has implemented an online real-time analytic platform, Artemis¹. The Artemis framework is a storage of patients' raw physiological and clinical information and is also used for online real-time analysis and data mining. While utilizing patients' sensitive healthcare data contributes to more accurate diagnoses, it raises security and privacy breaches. In 2019, 25 million patients were the victims of the American Medical Collection Agency (AMCA) data breach². As a result, preserving the privacy of sensitive health records is a pressing issue. A practical solution to ensure the security and privacy of the extreme volume of healthcare data is outsourcing encrypted data to the cloud servers. However, encryption increases the computational cost significantly. As noted earlier, the rapid growth of Machine Learning (ML) and big data have become ubiquitous. However, adversaries may abuse the healthcare data outsourced to the cloud servers without encryption. Thus a Privacy-Preserving (PP) model is required. Researchers have proposed various PP ML models with the aim of different privacy techniques. Nonetheless, time efficiency in PP ML frameworks matters. In comparison to existing ML models, AdaBoost is a fast, simple, and versatile yet highly accurate classifier. Privacy-Preserving techniques can restore the balance between data usage and data privacy. An inefficient privacy technique, by contrast, requires intensive computational power. To address these challenges, we conduct studies and experiments to propose an efficient and privacy-preserving classification framework for mining outsourced encrypted healthcare data. This thesis covers the AdaBoost learning process, classification, Homomorphic Encryption (HE), and Paillier cryptosystem algorithm. The experimental results prove the accuracy and demonstrates the efficiency of our framework. ¹ http://hir.uoit.ca/cms/?q=node/24 ² https://healthitsecurity.com/news/the-10-biggest-healthcare-data-breaches-of-2019-so-far