Skip to main content Skip to main navigation

Publication

Hybrid Parallel Sentence Mining from Comparable Corpora

Sabine Hunsicker; Radu Ion; Dan Stefanescu
In: Proceedings of the 16th Annual Conference of the European Association for Machine Translation. Annual Conference of the European Association for Machine Translation (EAMT-12), May 28-30, Trento, Italy, 2012.

Abstract

This paper presents a fast and accurate parallel sentence mining algorithm for comparable corpora called LEXACC based on the Cross-Language Information Retrieval framework combined with a trainable translation similarity measure that detects pairs of parallel and quasi-parallel sentences. LEXACC obtains state-of-the-art results in comparison with established approaches .

Projects

More links