Publication
Quality Estimation for Machine Translation output using linguistic analysis and decoding features
Eleftherios Avramidis
In: Proceedings of the Seventh Workshop on Statistical Machine Translation. Workshop on Statistical Machine Translation (WMT-12), located at The 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, June 7-8, Montreal, Canada, Association for Computational Linguistics, 6/2012.
Abstract
We describe a submission to the WMT12 Quality Estimation task, including an extensive Machine Learning experimentation. Data were augmented with features from linguistic analysis and statistical features from the SMT search graph. Several Feature Selection algorithms were employed. The Quality Estimation problem was addressed both as a regression task and as a discretised classification task, but the latter did not generalise well on the unseen testset. The most successful regression methods had an RMSE of 0.86 and were trained with a feature set given by Correlation-based Feature Selection. Indications that RMSE is not always sufficient for measuring performance were observed.