Skip to main navigation Skip to search Skip to main content

Shielded Representations: Protecting Sensitive Attributes Through Iterative Gradient-Based Projection

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Natural language processing models tend to learn and encode social biases present in the data. One popular approach for addressing such biases is to eliminate encoded information from the model's representations. However, current methods are restricted to removing only linearly encoded information. In this work, we propose Iterative Gradient-Based Projection (IGBP), a novel method for removing non-linear encoded concepts from neural representations. Our method consists of iteratively training neural classifiers to predict a particular attribute we seek to eliminate, followed by a projection of the representation on a hypersurface, such that the classifiers become oblivious to the target attribute. We evaluate the effectiveness of our method on the task of removing gender and race information as sensitive attributes. Our results demonstrate that IGBP is effective in mitigating bias through intrinsic and extrinsic evaluations, with minimal impact on downstream task accuracy.

Original languageEnglish
Title of host publicationFindings of the Association for Computational Linguistics, ACL 2023
Pages5961-5977
Number of pages17
ISBN (Electronic)9781959429623
DOIs
StatePublished - 2023
EventFindings of the Association for Computational Linguistics, ACL 2023 - Toronto, Canada
Duration: 9 Jul 202314 Jul 2023

Publication series

NameProceedings of the Annual Meeting of the Association for Computational Linguistics

Conference

ConferenceFindings of the Association for Computational Linguistics, ACL 2023
Country/TerritoryCanada
CityToronto
Period9/07/2314/07/23

ASJC Scopus subject areas

  • Computer Science Applications
  • Linguistics and Language
  • Language and Linguistics

Fingerprint

Dive into the research topics of 'Shielded Representations: Protecting Sensitive Attributes Through Iterative Gradient-Based Projection'. Together they form a unique fingerprint.

Cite this