Data Augmentation for Sign Language Gloss Translation

Amit Moryossef, Kayo Yin, Graham Neubig, Yoav Goldberg

Research output: Contribution to conferencePaperpeer-review

Abstract

Sign language translation (SLT) is often decomposed into video-to-gloss recognition and gloss-to-text translation, where a gloss is a sequence of transcribed spoken-language words in the order in which they are signed. We focus here on gloss-to-text translation, which we treat as a low-resource neural machine translation (NMT) problem. However, unlike traditional low-resource NMT, gloss-to-text translation differs because gloss-text pairs often have a higher lexical overlap and lower syntactic overlap than pairs of spoken languages. We exploit this lexical overlap and handle syntactic divergence by proposing two rule-based heuristics that generate pseudo-parallel gloss-text pairs from monolingual spoken language text. By pre-training on this synthetic data, we improve translation from American Sign Language (ASL) to English and German Sign Language (DGS) to German by up to 3.14 and 2.20 BLEU, respectively.

Original languageEnglish
Pages1-11
Number of pages11
StatePublished - 2021
Event1st International Workshop on Automatic Translation for Signed and Spoken Languages, AT4SSL 2021 - Virtual, Online, United States
Duration: 16 Aug 202120 Aug 2021

Conference

Conference1st International Workshop on Automatic Translation for Signed and Spoken Languages, AT4SSL 2021
Country/TerritoryUnited States
CityVirtual, Online
Period16/08/2120/08/21

All Science Journal Classification (ASJC) codes

  • Language and Linguistics
  • Artificial Intelligence
  • Software
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Data Augmentation for Sign Language Gloss Translation'. Together they form a unique fingerprint.

Cite this