Scholars@Duke publication: Detecting Adversarial Samples Using Influence Functions and Nearest Neighbors

Detecting Adversarial Samples Using Influence Functions and Nearest Neighbors

Publication , Conference

Cohen, G; Sapiro, G; Giryes, R

Published in: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

January 1, 2020

Deep neural networks (DNNs) are notorious for their vulnerability to adversarial attacks, which are small perturbations added to their input images to mislead their prediction. Detection of adversarial examples is, therefore, a fundamental requirement for robust classification frameworks. In this work, we present a method for detecting such adversarial attacks, which is suitable for any pre-trained neural network classifier. We use influence functions to measure the impact of every training sample on the validation set data. From the influence scores, we find the most supportive training samples for any given validation example. A k-nearest neighbor (k-NN) model fitted on the DNN's activation layers is employed to search for the ranking of these supporting training samples. We observe that these samples are highly correlated with the nearest neighbors of the normal inputs, while this correlation is much weaker for adversarial inputs. We train an adversarial detector using the k-NN ranks and distances and show that it successfully distinguishes adversarial examples, getting state-of-the-art results on six attack methods with three datasets. Code is available at https://github.com/giladcohen/NNIF_adv_defense.

Duke Scholars

Author Guillermo Sapiro Pierre R. Lamond Department of Electrical and Computer Engin ...

Published In

Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

DOI

10.1109/CVPR42600.2020.01446

ISSN

1063-6919

Publication Date

January 1, 2020

Start / End Page

14441 / 14450

Citation

APA

Chicago

ICMJE

MLA

NLM

Cohen, G., Sapiro, G., & Giryes, R. (2020). Detecting Adversarial Samples Using Influence Functions and Nearest Neighbors. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (pp. 14441–14450). https://doi.org/10.1109/CVPR42600.2020.01446

Cohen, G., G. Sapiro, and R. Giryes. “Detecting Adversarial Samples Using Influence Functions and Nearest Neighbors.” In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 14441–50, 2020. https://doi.org/10.1109/CVPR42600.2020.01446.

Cohen G, Sapiro G, Giryes R. Detecting Adversarial Samples Using Influence Functions and Nearest Neighbors. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 2020. p. 14441–50.

Cohen, G., et al. “Detecting Adversarial Samples Using Influence Functions and Nearest Neighbors.” Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2020, pp. 14441–50. Scopus, doi:10.1109/CVPR42600.2020.01446.

Cohen G, Sapiro G, Giryes R. Detecting Adversarial Samples Using Influence Functions and Nearest Neighbors. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 2020. p. 14441–14450.

Published In

Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

DOI

10.1109/CVPR42600.2020.01446

ISSN

1063-6919

Publication Date

January 1, 2020

Start / End Page

14441 / 14450