Paper in IEEE SMC 2020: disturbing classifiers against adversarial attacks
The research work of João Zago, a Master student in our AI group has been accepted for publication in the IEEE SMC 2020!
Abstract:
Convolutional neural networks (CNNs) for image classification can be fragile to small perturbations in the images they ought to classify. This fragility exposes CNNs to malicious attacks, resulting in safety concerns in many application domains. In this paper, we propose a simple yet efficient strategy for decreasing the effectiveness of black-box attacks that need to sequentially query the classifier network in order to build an attack. The general idea consists of applying controlled random disturbances (noise) at the softmax output layer of neural network classifiers, changing the confidence scores according to a set of design requirements. To evaluate this defense strategy, we employ a CNN, trained on the MNIST data set, and attack it with a black-box attack method from the literature called ZOO. The results show that our defense strategy: a) decreases the attack success rate of the adversarial examples; and b) forces the attack algorithm to insert larger perturbations in the input images.