Publication
Evaluation without references: IBM1 scores as evaluation metrics
Maja Popovic; David Vilar Torres; Eleftherios Avramidis; Aljoscha Burchardt
In: Proceedings of the Sixth Workshop on Statistical Machine Translation. Workshop on Statistical Machine Translation (WMT-11), 6th, located at EMNLP, July 30-31, Edinburgh, United Kingdom, Pages 99-103, Association for Computational Linguistics, 7/2011.
Abstract
Current metrics for evaluating machine translation quality have the
huge drawback that they require human-quality reference
translations. We propose a truly automatic evaluation metric based
on IBM1 lexicon probabilities which does not need any
reference translations. Several variants of IBM1
scores are systematically explored in order to find the most promising
directions. Correlations between the new metrics and human
judgments are calculated on the data of the third, fourth and fifth
shared tasks of the Statistical Machine Translation Workshop. Five
different European languages are taken into account: English,
Spanish, French, German and Czech. The results show that the IBM1
scores are competitive with the classic evaluation metrics, the
most promising being IBM1 scores calculated on morphemes and
POS-4grams.