Publication
Quality estimation-guided data selection for domain adaptation of smt
Pratyush Banerjee; Raphael Rubino; Johann Roturier; Josef van Genabith
In: Machine Translation (MT), Pages 101-108, Springer, 2014.
Abstract
Supplementary data selection is a strongly motivated approach in domain adaptation of
statistical machine translation systems. In this paper we report a novel approach of data
selection guided by automatic quality estimation. In contrast to the conventional approach of
using the entire target-domain data as reference for data selection, we restrict the reference
set only to sentences poorly translated by the baseline model. Automatic quality estimation
is used to identify such poorly translated sentences in the target domain. Our experiments
reveal that this approach provides statistically significant improvements over the unadapted
baseline and achieves comparable scores to that of conventional data selection approaches
with significantly smaller amounts of selected data.