Publication

Hjerson: An Open Source Tool for Automatic Error Classification of Machine Translation Output

Maja Popovic

In: The Prague Bulletin of Mathematical Linguistics (PBML), Vol. 96, Pages 59-68, Charles University, Prague, 10/2011.

Abstract

We describe Hjerson, a tool for automatic classification of errors in machine translation output. The tool features the detection of five word level error classes: morphological errors, reordering errors, missing words, extra words and lexical errors. As input, the tool requires original full form reference translation(s) and hypothesis along with their corresponding base forms. It is also possible to use additional information on the word level (e.g. POS tags) in order to obtain more details. The tool provides the raw count and the normalised score (error rate) for each error class at the document level and at the sentence level, as well as original reference and hypothesis words labelled with the corresponding error class in text and HTML formats.

Projects

taraXÜ - Self-Adapting Machine Translation with Multi-Approach Language Technology

MAIN.pdf (pdf, 109 KB )