Improving Statistical Machine Translation by Adapting Translation Models to Translationese

Gennadi Lembersky, Noam Ordan, Shuly Wintner

Research output: Contribution to journalArticlepeer-review


Translation models used for statistical machine translation are compiled from parallel corpora that are manually translated. The common assumption is that parallel texts are symmetrical: The direction of translation is deemed irrelevant and is consequently ignored. Much research in Translation Studies indicates that the direction of translation matters, however, as translated language (translationese) has many unique properties. It has already been shown that phrase tables constructed from parallel corpora translated in the same direction as the translation task outperform those constructed from corpora translated in the opposite direction. We reconfirm that this is indeed the case, but emphasize the importance of also using texts translated in the "wrong" direction.We take advantage of information pertaining to the direction of translation in constructing phrase tables by adapting the translation model to the special properties of translationese. We explore two adaptation techniques: First, we create a mixture model by interpolating phrase tables trained on texts translated in the "right" and the "wrong" directions. The weights for the interpolation are determined by minimizing perplexity. Second, we define entropy-based measures that estimate the correspondence of target-language phrases to translationese, thereby eliminating the need to annotate the parallel corpus with information pertaining to the direction of translation.We show that incorporating these measures as features in the phrase tables of statistical machine translation systems results in consistent, statistically significant improvement in the quality of the translation.

Original languageAmerican English
Pages (from-to)999-1023
Number of pages25
JournalComputational Linguistics
Issue number4
StatePublished - Dec 2013

All Science Journal Classification (ASJC) codes

  • Language and Linguistics
  • Linguistics and Language
  • Computer Science Applications
  • Artificial Intelligence


Dive into the research topics of 'Improving Statistical Machine Translation by Adapting Translation Models to Translationese'. Together they form a unique fingerprint.

Cite this