On the evaluation of adversarial vulnerabilities of deep neural networks
University of New Brunswick
Adversarial examples are inputs designed by an adversary to fool the machine learning models. Despite having overly simplifying assumptions and extensive studies in recent years, there is no defense against adversarial examples for complex tasks (e.g., ImageNet). However, for more straightforward tasks like hand-written digit classification, a robust model seems to be within reach. Having the right estimation of the adversarial robustness of the models has been one of the main challenges researchers face. First, we present AETorch, a Pytorch library containing the strongest attacks and the common threat models that make comparing the defenses easier. Most of the research about adversarial examples have focused on adding small l[subscript p] bounded perturbations to the natural inputs with the assumption that the actual label remains unchanged. However, being robust in l[subscript p] bounded settings does not guarantee general robustness. Additionally, we present an efficient technique to create unrestricted adversarial examples using generative adversarial networks on MNIST, SVHN, and fashion-mnist datasets. We demonstrate that even the state-of-the-art adversarially robust MNIST classifiers are vulnerable to the adversarial examples generated with this technique. We demonstrate that our method is better than the previous unrestricted techniques since it has access to a bigger adversarial subspace. Additionally, we show that examples generated with our method are transferable. Overall, our findings emphasize the need for further studying the vulnerability of neural networks to unrestricted adversarial examples.