Publication
TQ-AutoTest – An Automated Test Suite for (Machine) Translation Quality
Vivien Macketanz; Renlong Ai; Aljoscha Burchardt; Hans Uszkoreit
In: Nicoletta Calzolari; Khalid Choukri; Christopher Cieri; Thierry Declerck; Sara Goggi; Koiti Hasida; Hitoshi Isahara; Bente Maegaard; Joseph Mariani; Hélène Mazo; Asuncion Moreno; Jan Odijk; Stelios Piperidis; Takenobu Tokunaga (Hrsg.). Proceedings of the Eleventh International Conference on Language Resources and Evaluation. International Conference on Language Resources and Evaluation (LREC-2018), 11th, May 7-12, Miyazaki, Japan, European Language Resources Association (ELRA), 2018.
Abstract
In several areas of NLP evaluation, test suites have been used to analyze the strengths and weaknesses of systems. Today, Machine
Translation (MT) quality is usually assessed by shallow automatic comparisons of MT outputs with reference corpora resulting in a
number. Especially the trend towards neural MT has renewed peoples’ interest in better and more analytical diagnostic methods for MT
quality. In this paper we present TQ-AutoTest, a novel framework that supports a linguistic evaluation of (machine) translations using
test suites. Our current test suites comprise about 5000 handcrafted test items for the language pair German–English. The framework
supports the creation of tests and the semi-automatic evaluation of the MT results using regular expressions. The expressions help to
classify the results as correct, incorrect or as requiring a manual check. The approach can easily be extended to other NLP tasks where
test suites can be used such as evaluating (one-shot) dialogue systems.