Linear Adversarial Concept Erasure

Shauli Ravfogel, Michael Twiton, Yoav Goldberg, Ryan Cotterell

Research output: Contribution to journalConference articlepeer-review


Modern neural models trained on textual data rely on pre-trained representations that emerge without direct supervision. As these representations are increasingly being used in real-world applications, the inability to control their content becomes an increasingly important problem. This paper formulates the problem of identifying and erasing a linear subspace that corresponds to a given concept in order to prevent linear predictors from recovering the concept. Our formulation consists of a constrained, linear minimax game. We consider different concept-identification objectives, modeled after several tasks such as classification and regression. We derive a closed-form solution for certain objectives, and propose a convex relaxation, R-LACE, that works well for others. When evaluated in the context of binary gender removal, our method recovers a low-dimensional subspace whose removal mitigates bias by intrinsic and extrinsic evaluation. We show that the method-despite being linear-is highly expressive, effectively mitigating bias in deep nonlinear classifiers while maintaining tractability and interpretability.

Original languageEnglish
Pages (from-to)18400-18421
Number of pages22
JournalProceedings of Machine Learning Research
StatePublished - 2022
Event39th International Conference on Machine Learning, ICML 2022 - Baltimore, United States
Duration: 17 Jul 202223 Jul 2022

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Software
  • Control and Systems Engineering
  • Statistics and Probability


Dive into the research topics of 'Linear Adversarial Concept Erasure'. Together they form a unique fingerprint.

Cite this