Publication
Hjerson: An Open Source Tool for Automatic Error Classification of Machine Translation Output
Maja Popovic
In: The Prague Bulletin of Mathematical Linguistics (PBML), Vol. 96, Pages 59-68, Charles University, Prague, 10/2011.
Abstract
We describe Hjerson, a tool for automatic classification of errors in machine
translation output. The tool features the detection of five word level error classes:
morphological errors, reordering errors, missing words, extra words and
lexical errors. As input, the tool requires original full form
reference translation(s) and hypothesis along with their corresponding
base forms. It is also possible to use additional information on the
word level (e.g. POS tags) in order to obtain more details. The tool
provides the raw count and the normalised score (error rate) for each
error class at the document level and at the sentence level, as
well as original reference and hypothesis words labelled with the
corresponding error class in text and
HTML formats.