Publication

DFKI System Combination with Sentence Ranking at ML4HMT-2011

Eleftherios Avramidis

In: Proceedings of the International Workshop on Using Linguistic Information for Hybrid Machine Translation (LIHMT 2011) and of the Shared Task on Applying Machine Learning Techniques to Optimising the Division of Labour in Hybrid Machine Translation (M. Shared Task on Applying Machine Learning Techniques to Optimising the Division of Labour in Hybrid Machine Translation (ML4HMT-11), located at International Workshop on Using Linguistic Information for Hybrid Machine Translation, November 18-19, Barcelona, Spain, Center for Language and Speech Technologies and Applications (TALP), Technical University of Catalonia, 2011.

Abstract

We present a pilot study on a Hybrid Machine Translation system that takes advantage of multilateral system-specific meta-data provided as part of the shared task. The proposed solution offers a machine learning approach, resulting into a selection mechanism able to learn and rank system outputs on the sentence level, based on their quality. For training, due to the lack of human annotations, word-level Levenshtein distance has been used as a quality indicator, whereas a rich set of sentence features was extracted and selected from the dataset. Three classification algorithms (Naive Bayes, SVM and Linear Regression) were trained and tested on pairwise featured sentence comparisons. The approaches yielded high correlation with original rankings (tau=0.52) and selected the best translation in 54% of the cases.

Projects

taraXÜ - Self-Adapting Machine Translation with Multi-Approach Language Technology

document.pdf (pdf, 103 KB )