TY - GEN
T1 - DataDetective
T2 - 27th European Conference on Artificial Intelligence, ECAI 2024
AU - Wegerhoff, Noa
AU - Shapira, Avishag
AU - Elovici, Yuval
AU - Shabtai, Asaf
N1 - Publisher Copyright: © 2024 The Authors.
PY - 2024/10/16
Y1 - 2024/10/16
N2 - Data owners (distributors) often share machine learning (ML) datasets with third-party collaborators (agents) for various purposes. While such collaborations can be mutually beneficial, they also introduce the risk of data leakage, i.e., the deliberate or accidental disclosure of sensitive ML datasets to unauthorized parties. Consequently, distributors may lose their intellectual property, experience reduced revenue, or violate data privacy regulations. In this paper, we propose a novel black-box dataset watermarking approach called DataDetective, which not only detects the unauthorized use of protected datasets but also identifies the agent responsible for the leakage. DataDetective, which leverages a backdoor technique, is composed of two processes: In the dataset watermarking process a unique watermark signature is embedded into each agent's version of the dataset, which embeds detectable, agent-specific behaviors in any model trained on the data. In the leaker identification process the watermark signature embedded in a suspected model is identified and compared to the signatures of all agents, to identify the leaking agent. Extensive evaluations on benchmark datasets in the computer vision domain demonstrate our method's effectiveness; DataDetective achieved a perfect leaker identification rate with just 1% of the data watermarked. Moreover, DataDetective maintains the model's performance with a negligible impact on model accuracy. By providing a verifiable and robust solution for leaker attribution, DataDetective enhances accountability in collaborative ML environments. For more details, the code is available at https://github.com/NoaWegerhoff/data-detective.
AB - Data owners (distributors) often share machine learning (ML) datasets with third-party collaborators (agents) for various purposes. While such collaborations can be mutually beneficial, they also introduce the risk of data leakage, i.e., the deliberate or accidental disclosure of sensitive ML datasets to unauthorized parties. Consequently, distributors may lose their intellectual property, experience reduced revenue, or violate data privacy regulations. In this paper, we propose a novel black-box dataset watermarking approach called DataDetective, which not only detects the unauthorized use of protected datasets but also identifies the agent responsible for the leakage. DataDetective, which leverages a backdoor technique, is composed of two processes: In the dataset watermarking process a unique watermark signature is embedded into each agent's version of the dataset, which embeds detectable, agent-specific behaviors in any model trained on the data. In the leaker identification process the watermark signature embedded in a suspected model is identified and compared to the signatures of all agents, to identify the leaking agent. Extensive evaluations on benchmark datasets in the computer vision domain demonstrate our method's effectiveness; DataDetective achieved a perfect leaker identification rate with just 1% of the data watermarked. Moreover, DataDetective maintains the model's performance with a negligible impact on model accuracy. By providing a verifiable and robust solution for leaker attribution, DataDetective enhances accountability in collaborative ML environments. For more details, the code is available at https://github.com/NoaWegerhoff/data-detective.
UR - http://www.scopus.com/inward/record.url?scp=85213332847&partnerID=8YFLogxK
U2 - https://doi.org/10.3233/FAIA240771
DO - https://doi.org/10.3233/FAIA240771
M3 - Conference contribution
T3 - Frontiers in Artificial Intelligence and Applications
SP - 2442
EP - 2451
BT - ECAI 2024 - 27th European Conference on Artificial Intelligence, Including 13th Conference on Prestigious Applications of Intelligent Systems, PAIS 2024, Proceedings
A2 - Endriss, Ulle
A2 - Melo, Francisco S.
A2 - Bach, Kerstin
A2 - Bugarin-Diz, Alberto
A2 - Alonso-Moral, Jose M.
A2 - Barro, Senen
A2 - Heintz, Fredrik
PB - IOS Press BV
Y2 - 19 October 2024 through 24 October 2024
ER -