Abstract
Adversarial attacks on deep learning models have received increased attention in recent years. Work in this area has mostly focused on gradient-based techniques, so-called “white-box” attacks, where the attacker has access to the targeted model’s internal parameters; such an assumption is usually untenable in the real world. Additionally, some attacks use the entire pixel space to fool a given model, which is neither practical nor physical. To accommodate these problems we propose the BBNP algorithm (Black-Box Naturalistic Patch): a direct, black-box, naturalistic, gradient-free method that uses the learned image manifold of a pretrained, generative adversarial network (GAN) to generate naturalistic adversarial patches for object detectors. This method performs model-agnostic black-box naturalistic attacks on object detection models by relying solely on the outputs of the model. Comparing our approach against five models, five black-box and two white-box attacks, we show that our proposed method achieves state-of-the-art results, outperforming all other tested black-box approaches. The code is available on GitHub at https://github.com/razla/Patch-of-Invisibility.
Original language | American English |
---|---|
Title of host publication | 6th Workshop on Machine Learning for CyberSecurity (ECMLPKDD 2024) |
Number of pages | 16 |
DOIs | |
State | Published - 2024 |