Efficient and Privacy-Preserving Similarity Range Query over Genomic Sequences

Thumbnail Image



Journal Title

Journal ISSN

Volume Title


University of New Brunswick


Personalized medicine is becoming more common and accepted with the development of the economy and the improvement of living standards. Meanwhile, similarity queries, one of the trending topics for researchers, have attracted much attention. A subsection of the trending topic, similarity query over genomic sequences, has played a significant role in personalized medicine and has applications in various fields, including DNA alignment and genomic sequencing. Since handling genomic sequences requires massive storage and considerable computational capacity, service providers prefer to process similarity queries over genomic sequences with outsourced datasets on cloud servers. Furthermore, since genomic sequences are highly sensitive data, preserving the privacy of queries has attracted considerable attention. Although many schemes have been proposed for similarity queries over encrypted genomic data, they are either inefficient or have limitations in supporting the dynamic update of the dataset. To address the challenges, we propose an efficient and privacy-preserving similarity range query scheme. Specifically, we introduce an algorithm to build a hash table to index the dataset and present a similarity range query algorithm based on the hash table. Then, we design two cloud-based privacy-preserving protocols based on homomorphic encryption to support the similarity range query algorithm over the encrypted dataset. After that, we propose the privacy-preserving similarity range query scheme by leveraging the two privacy-preserving protocols. We then analyze the security of our proposed scheme and prove that our scheme is privacy-preserving. Finally, we perform experiments to evaluate the scheme’s performance, and the results indicate that it is computationally efficient.