TY - GEN
T1 - Tricking the Hashing Trick
T2 - 37th AAAI Conference on Artificial Intelligence, AAAI 2023
AU - Cohen, Edith
AU - Nelson, Jelani
AU - Sarlós, Tamás
AU - Stemmer, Uri
N1 - Publisher Copyright: Copyright © 2023, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
PY - 2023/6/27
Y1 - 2023/6/27
N2 - CountSketch and Feature Hashing (the “hashing trick”) are popular randomized dimensionality reduction methods that support recovery of ℓ2-heavy hitters (keys i where vi2 > ϵ∥v∥22) and approximate inner products. When the inputs are not adaptive (do not depend on prior outputs), classic estimators applied to a sketch of size O(ℓ/ϵ) are accurate for a number of queries that is exponential in ℓ. When inputs are adaptive, however, an adversarial input can be constructed after O(ℓ) queries with the classic estimator and the best known robust estimator only supports Õ(ℓ2) queries. In this work we show that this quadratic dependence is in a sense inherent: We design an attack that after O(ℓ2) queries produces an adversarial input vector whose sketch is highly biased. Our attack uses “natural” non-adaptive inputs (only the final adversarial input is chosen adaptively) and universally applies with any correct estimator, including one that is unknown to the attacker. In that, we expose inherent vulnerability of this fundamental method.
AB - CountSketch and Feature Hashing (the “hashing trick”) are popular randomized dimensionality reduction methods that support recovery of ℓ2-heavy hitters (keys i where vi2 > ϵ∥v∥22) and approximate inner products. When the inputs are not adaptive (do not depend on prior outputs), classic estimators applied to a sketch of size O(ℓ/ϵ) are accurate for a number of queries that is exponential in ℓ. When inputs are adaptive, however, an adversarial input can be constructed after O(ℓ) queries with the classic estimator and the best known robust estimator only supports Õ(ℓ2) queries. In this work we show that this quadratic dependence is in a sense inherent: We design an attack that after O(ℓ2) queries produces an adversarial input vector whose sketch is highly biased. Our attack uses “natural” non-adaptive inputs (only the final adversarial input is chosen adaptively) and universally applies with any correct estimator, including one that is unknown to the attacker. In that, we expose inherent vulnerability of this fundamental method.
UR - http://www.scopus.com/inward/record.url?scp=85167962774&partnerID=8YFLogxK
U2 - https://doi.org/10.1609/aaai.v37i6.25882
DO - https://doi.org/10.1609/aaai.v37i6.25882
M3 - Conference contribution
T3 - Proceedings of the 37th AAAI Conference on Artificial Intelligence, AAAI 2023
SP - 7235
EP - 7243
BT - AAAI-23 Technical Tracks 6
A2 - Williams, Brian
A2 - Chen, Yiling
A2 - Neville, Jennifer
Y2 - 7 February 2023 through 14 February 2023
ER -