Publikation
Exploring cross-language statistical machine translation for closely related South Slavic languages
Maja Popovic; Nikola Ljube¨ić
In: EMNLP Workshop on Language Technologies for closely related languages and language variants. Conference on Empirical Methods in Natural Language Processing (EMNLP-14), October 25-29, Doha, Qatar, EMNLP, 10/2014.
Zusammenfassung
This work investigates the use of cross-language resources for statistical
machine translation (SMT) between English and two closely
related South Slavic languages, namely Croatian and Serbian. The goal is to
explore the effects of translating from and into one language using an SMT system trained on another.
For translation into English, a loss due to
cross-translation is about 13% of BLEU and for
the other translation direction about 15%. The performance decrease for
both languages in both translation directions is mainly due to lexical divergences.
Several language adaptation methods are explored, and it is
shown that very simple lexical transformations already can yield a small
improvement, and that the most promising adaptation method is using a
Croatian-Serbian SMT system trained on a very small corpus.