A query-efficient black-box adversarial attack on text classification Deep Neural Networks

Thumbnail Image



Journal Title

Journal ISSN

Volume Title


University of New Brunswick


Recent work has demonstrated that modern text classifiers trained on Deep Neural Networks are vulnerable to adversarial attacks. There are insufficient studies on text data compared to the image domain, and the lack of investigation originates from the special challenges of the NLP domain. Despite being extremely effective, most adversarial attacks in the text domain ignore the overhead they induced on the victim model. In this research, we propose a Query-Efficient black-box adversarial attack named EQFooler on text data that tries to attack a textual deep neural network while considering the amount of overhead that it may produce. The evaluation of our method shows that the results are promising. We demonstrate the impact of keyword extraction methods in generating query-efficient adversarial attacks. Four variants of the EQFooler mode are developed based on different keyword extractors and importance score strategies. We compare the performance of these variants in terms of four evaluation metrics, namely original accuracy, adversarial accuracy, change rate, and number of queries. All the variants of the proposed attack significantly reduce the accuracy of the targeted models. Among those variants, EQFooler-Rake-MS has the best functionality in terms of adversarial accuracy, change rate and the number of queries needed. Also, multiple experiments are designed to compare the outcomes of the proposed method with the state-of-the-art adversarial attacks as a baseline. The results show that the EQFooler is as powerful as the state-of-the-art adversarial attacks while requiring fewer queries to the victim model. In addition, we study the transferability of the generated adversarial examples. Compared to the baseline in any transfer setting, at least one of the variants has better outcomes than the baseline.