TY - GEN
T1 - Machine Translation into Low-resource Language Varieties
AU - Kumar, Sachin
AU - Anastasopoulos, Antonios
AU - Wintner, Shuly
AU - Tsvetkov, Yulia
N1 - Publisher Copyright: © 2021 Association for Computational Linguistics.
PY - 2021
Y1 - 2021
N2 - State-of-the-art machine translation (MT) systems are typically trained to generate "standard"target language; however, many languages have multiple varieties (regional varieties, dialects, sociolects, non-native varieties) that are different from the standard language. Such varieties are often low-resource, and hence do not benefit from contemporary NLP solutions, MT included. We propose a general framework to rapidly adapt MT systems to generate language varieties that are close to, but different from, the standard target language, using no parallel (source- variety) data. This also includes adaptation of MT systems to low-resource typologicallyrelated target languages.1 We experiment with adapting an English-Russian MT system to generate Ukrainian and Belarusian, an English-Norwegian Bokmål system to generate Nynorsk, and an English-Arabic system to generate four Arabic dialects, obtaining significant improvements over competitive baselines.
AB - State-of-the-art machine translation (MT) systems are typically trained to generate "standard"target language; however, many languages have multiple varieties (regional varieties, dialects, sociolects, non-native varieties) that are different from the standard language. Such varieties are often low-resource, and hence do not benefit from contemporary NLP solutions, MT included. We propose a general framework to rapidly adapt MT systems to generate language varieties that are close to, but different from, the standard target language, using no parallel (source- variety) data. This also includes adaptation of MT systems to low-resource typologicallyrelated target languages.1 We experiment with adapting an English-Russian MT system to generate Ukrainian and Belarusian, an English-Norwegian Bokmål system to generate Nynorsk, and an English-Arabic system to generate four Arabic dialects, obtaining significant improvements over competitive baselines.
UR - http://www.scopus.com/inward/record.url?scp=85115711022&partnerID=8YFLogxK
M3 - Conference contribution
T3 - ACL-IJCNLP 2021 - 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Proceedings of the Conference
SP - 110
EP - 121
BT - ACL-IJCNLP 2021 - 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Proceedings of the Conference
PB - Association for Computational Linguistics (ACL)
T2 - Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL-IJCNLP 2021
Y2 - 1 August 2021 through 6 August 2021
ER -