Abstract
Deep neural networks are vulnerable to small input perturbations known as adversarial attacks. Inspired by the fact that these adversaries are constructed by iteratively minimizing the confidence of a network for the true class label, we propose the anti-adversary layer, aimed at countering this effect. In particular, our layer generates an input perturbation in the opposite direction of the adversarial one and feeds the classifier a perturbed version of the input. Our approach is training-free and theoretically supported. We verify the effectiveness of our approach by combining our layer with both nominally and robustly trained models and conduct large-scale experiments from black-box to adaptive attacks on CIFAR10, CIFAR100, and ImageNet. Our layer significantly enhances model robustness while coming at no cost on clean accuracy.
Original language | English (US) |
---|---|
Title of host publication | AAAI-22 Technical Tracks 6 |
Publisher | Association for the Advancement of Artificial Intelligence |
Pages | 5992-6000 |
Number of pages | 9 |
ISBN (Electronic) | 1577358767, 9781577358763 |
State | Published - Jun 30 2022 |
Event | 36th AAAI Conference on Artificial Intelligence, AAAI 2022 - Virtual, Online Duration: Feb 22 2022 → Mar 1 2022 |
Publication series
Name | Proceedings of the 36th AAAI Conference on Artificial Intelligence, AAAI 2022 |
---|---|
Volume | 36 |
Conference
Conference | 36th AAAI Conference on Artificial Intelligence, AAAI 2022 |
---|---|
City | Virtual, Online |
Period | 02/22/22 → 03/1/22 |
Bibliographical note
Funding Information:This publication is based upon work supported by the King Abdullah University of Science and Technology (KAUST) Office of Sponsored Research (OSR) under Award No. OSR-CRG2019-4033. We would also like to thank Humam Alwassel for the help and discussion.
Publisher Copyright:
Copyright © 2022, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
ASJC Scopus subject areas
- Artificial Intelligence