Foiling Explanations in Deep Neural Networks

Snir Vitrack Tamam, Raz Lapid, Moshe Sipper

Research output: Contribution to journalArticlepeer-review

Abstract

Deep neural networks (DNNs) have greatly impacted numerous fields over the past decade. Yet despite exhibiting superb performance over many problems, their black-box nature still poses a significant challenge with respect to explainability. Indeed, explainable artificial intelligence (XAI) is crucial in several fields, wherein the answer alone—sans a reasoning of how said answer was derived—is of little value. This paper uncovers a troubling property of explanation methods for image-based DNNs: by making small visual changes to the input image—hardly influencing the network’s output—we demonstrate how explanations may be arbitrarily manipulated through the use of evolution strategies. Our novel algorithm, AttaXAI, a model-and-data XAI-agnostic, adversarial attack on XAI algorithms, only requires access to the output logits of a classifier and to the explanation map; these weak assumptions render our approach highly useful where real-world models and data are concerned. We compare our method’s performance on two benchmark datasets—CIFAR100 and ImageNet—using four different pretrained deep-learning models: VGG16-CIFAR100, VGG16-ImageNet, MobileNet-CIFAR100, and Inception-v3-ImageNet. We find that the XAI methods can be manipulated without the use of gradients or other model internals. AttaXAI successfully manipulates an image such that several XAI methods output a specific explanation map. To our knowledge, this is the first such method in a black-box setting, and we believe it has significant value where explainability is desired, required, or legally mandatory. The code is available at https://github.com/razla/Foiling-Explanations-in-Deep-Neural-Networks.

Original languageAmerican English
JournalTransactions on Machine Learning Research
Volume2023
StatePublished - 1 Aug 2023

Keywords

  • adversarial attack
  • computer vision
  • deep learning
  • evolutionary algo-rithm
  • explainable artificial intelligence

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Computer Vision and Pattern Recognition

Cite this