Design of efficient and privacy-preserving similarity query over encrypted data in Cloud

Thumbnail Image



Journal Title

Journal ISSN

Volume Title


University of New Brunswick


Similarity query, retrieving objects similar to a sample of interest, has enabled plentiful customized services in a variety of applications, such as disease diagnosis, locationbased services, recommendation system, signal processing, etc.; and attracted considerable attention from industry and academia. Meanwhile, big data era has stimulated the continuing explosive growth of data volumes in these applications and has been leading data owners to outsource data and the corresponding similarity query services to a powerful cloud for releasing the burden of local data storage and computation. Due to privacy concerns, data are usually outsourced in an encrypted form, and as a result, similarity queries have to be performed over encrypted data, which is more challenging to implement than similarity queries over plaintext data. Targeting at similarity queries over encrypted data, many solutions have been proposed by designing some customized protocols based on selected encryption techniques, e.g., homomorphic encryption. Based on the type of data, three categories of extensively studied privacy-preserving similarity queries include similarity query for eHealthcare data, similarity query for set data, and similarity query for time series, which are also the focus of our dissertation. Although these three categories have been widely studied, existing solutions still have some limitations in query efficiency, security, and practicality. To address these issues, in the dissertation, we design several efficient and privacy-preserving similarity query schemes for encrypted eHealthcare data with the distance measure of Euclidean distance; for encrypted set data with the distance measure of Jaccard similarity; and for encrypted time series with the distance measure of time warp edit distance (TWED), in the cloud outsourced environment. Specifically, our main contributions of the dissertation can be summarized as i) Design an efficient and privacy-preserving kNN query scheme for outsourced eHealthcare data by employing the k-d tree data structure as index and designing two privacy-preserving protocols for the encrypted data comparison and Euclidean distance computation in the cloud based on a homomorphic encryption technique. The proposed scheme can not only support efficient and privacy-preserving kNN queries over encrypted eHealthcare data but also dynamically update encrypted eHealthcare data in the cloud. ii) Propose a new efficient, privacy-preserving, and practical Euclidean distance based similarity range query scheme over encrypted eHealthcare data in the cloud by integrating the modified asymmetric scalar-product-preserving encryption (ASPE) scheme and Quadsector tree structure, where the modified ASPE scheme enables the cloud server to determine whether a data point satisfies the current similarity range query request through the encrypted data under a single-server setting; and the Quadsector tree used to index the dataset reduces the average computational complexity of query processing sublinear to the size of the dataset. iii) Formalize and design a similarity query based healthcare monitoring scheme over the digital twin cloud platform. In this scheme, we deploy a partition-based tree (PB-tree) to represent the healthcare center’s data and introducing the modified ASPE scheme to design a privacy-preserving PB-tree based similarity range query (PSRQ) algorithm. iv) Propose an efficient and privacy-preserving set similarity query scheme under a single-server setting, which achieves high efficiency in the set similarity query while preserving the data privacy. In the proposed scheme, we design a symmetric-key predicate encryption scheme to achieve privacy-preserving similarity queries over binary vectors and employ B+ tree as the index structure to improve the query efficiency. v) Propose an efficient and privacy-preserving similarity range query scheme for time series data by organizing time series data into a k-d tree and applying the modified ASPE scheme and a symmetric homomorphic encryption to preserve the privacy of k-d tree based similarity queries. vi) Analyze the security of all proposed similarity query schemes in the dissertation and conduct extensive experiments to validate their efficiency. The results demonstrate that all proposed schemes are privacy-preserving, protecting the privacy of datasets and query requests against the honest-but-curious cloud server; and are computationally efficient.