Publication
Hybrid Parallel Sentence Mining from Comparable Corpora
Sabine Hunsicker; Radu Ion; Dan Stefanescu
In: Proceedings of the 16th Annual Conference of the European Association for Machine Translation. Annual Conference of the European Association for Machine Translation (EAMT-12), May 28-30, Trento, Italy, 2012.
Abstract
This paper presents a fast and accurate parallel sentence mining algorithm for comparable corpora called LEXACC based on the Cross-Language Information Retrieval framework combined with a trainable translation similarity measure that detects pairs of parallel and quasi-parallel sentences. LEXACC obtains state-of-the-art results in comparison with established approaches .