Skip to main content Skip to main navigation

Publication

Hjerson: An Open Source Tool for Automatic Error Classification of Machine Translation Output

Maja Popovic
In: The Prague Bulletin of Mathematical Linguistics (PBML), Vol. 96, Pages 59-68, Charles University, Prague, 10/2011.

Abstract

We describe Hjerson, a tool for automatic classification of errors in machine translation output. The tool features the detection of five word level error classes: morphological errors, reordering errors, missing words, extra words and lexical errors. As input, the tool requires original full form reference translation(s) and hypothesis along with their corresponding base forms. It is also possible to use additional information on the word level (e.g. POS tags) in order to obtain more details. The tool provides the raw count and the normalised score (error rate) for each error class at the document level and at the sentence level, as well as original reference and hypothesis words labelled with the corresponding error class in text and HTML formats.

Projects