TY - GEN
T1 - Gender Coreference and Bias Evaluation at WMT 2020
AU - Kocmi, Tom
AU - Limisiewicz, Tomasz
AU - Stanovsky, Gabriel
N1 - Funding Information: This study was supported in parts by the grants 18-24210S of the Czech Science Foundation and 825303 (Bergamot) of the European Union. This work has been using language resources and tools stored and distributed by the LINDAT/CLARIN project of the Ministry of Education, Youth and Sports of the Czech Republic (LM2015071). Publisher Copyright: © 2020 Association for Computational Linguistics
PY - 2020
Y1 - 2020
N2 - Gender bias in machine translation can manifest when choosing gender inflections based on spurious gender correlations. For example, always translating doctors as men and nurses as women. This can be particularly harmful as models become more popular and deployed within commercial systems. Our work presents the largest evidence for the phenomenon in more than 19 systems submitted to the WMT over four diverse target languages: Czech, German, Polish, and Russian. To achieve this, we use WinoMT, a recent automatic test suite which examines gender coreference and bias when translating from English to languages with grammatical gender. We extend WinoMT to handle two new languages tested in WMT: Polish and Czech. We find that all systems consistently use spurious correlations in the data rather than meaningful contextual information.
AB - Gender bias in machine translation can manifest when choosing gender inflections based on spurious gender correlations. For example, always translating doctors as men and nurses as women. This can be particularly harmful as models become more popular and deployed within commercial systems. Our work presents the largest evidence for the phenomenon in more than 19 systems submitted to the WMT over four diverse target languages: Czech, German, Polish, and Russian. To achieve this, we use WinoMT, a recent automatic test suite which examines gender coreference and bias when translating from English to languages with grammatical gender. We extend WinoMT to handle two new languages tested in WMT: Polish and Czech. We find that all systems consistently use spurious correlations in the data rather than meaningful contextual information.
UR - http://www.scopus.com/inward/record.url?scp=85108133503&partnerID=8YFLogxK
M3 - Conference contribution
T3 - 5th Conference on Machine Translation, WMT 2020 - Proceedings
SP - 357
EP - 364
BT - 5th Conference on Machine Translation, WMT 2020 - Proceedings
A2 - Barrault, Loic
A2 - Bojar, Ondrej
A2 - Bougares, Fethi
A2 - Chatterjee, Rajen
A2 - Costa-Jussa, Marta R.
A2 - Federmann, Christian
A2 - Fishel, Mark
A2 - Fraser, Alexander
A2 - Graham, Yvette
A2 - Guzman, Paco
A2 - Haddow, Barry
A2 - Huck, Matthias
A2 - Yepes, Antonio Jimeno
A2 - Koehn, Philipp
A2 - Martins, Andre
A2 - Morishita, Makoto
A2 - Monz, Christof
A2 - Nagata, Masaaki
A2 - Nakazawa, Toshiaki
A2 - Negri, Matteo
PB - Association for Computational Linguistics (ACL)
T2 - 5th Conference on Machine Translation, WMT 2020
Y2 - 19 November 2020 through 20 November 2020
ER -