Log-linear Guardedness and its Implications

Shauli Ravfogel, Yoav Goldberg, Ryan Cotterell

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Methods for erasing human-interpretable concepts from neural representations that assume linearity have been found to be tractable and useful. However, the impact of this removal on the behavior of downstream classifiers trained on the modified representations is not fully understood. In this work, we formally define the notion of log-linear guardedness as the inability of an adversary to predict the concept directly from the representation, and study its implications. We show that, in the binary case, under certain assumptions, a downstream log-linear model cannot recover the erased concept. However, we demonstrate that a multiclass log-linear model can be constructed that indirectly recovers the concept in some cases, pointing to the inherent limitations of log-linear guardedness as a downstream bias mitigation technique. These findings shed light on the theoretical limitations of linear erasure methods and highlight the need for further research on the connections between intrinsic and extrinsic bias in neural models.

Original languageEnglish
Title of host publicationLong Papers
PublisherAssociation for Computational Linguistics (ACL)
Pages9413-9431
Number of pages19
ISBN (Electronic)9781959429722
StatePublished - 1 Jan 2023
Event61st Annual Meeting of the Association for Computational Linguistics, ACL 2023 - Toronto, Canada
Duration: 9 Jul 202314 Jul 2023

Publication series

NameProceedings of the Annual Meeting of the Association for Computational Linguistics
Volume1

Conference

Conference61st Annual Meeting of the Association for Computational Linguistics, ACL 2023
Country/TerritoryCanada
CityToronto
Period9/07/2314/07/23

All Science Journal Classification (ASJC) codes

  • Computer Science Applications
  • Linguistics and Language
  • Language and Linguistics

Fingerprint

Dive into the research topics of 'Log-linear Guardedness and its Implications'. Together they form a unique fingerprint.

Cite this