Publication
Using MT-Based Metrics for RTE
Alexander Volokh; Günter Neumann
In: Fourth Text Analysis Conference. Text Analysis Conference (TAC), November 14-15, Gaithersburg, MD, USA, NIST, 2011.
Abstract
We analyse the complexity of the RTE task data and divide the T/H pairs into three different classes, depending on the type of knowledge
required to solve the problem. We then propose an approach which is suitable for the easier two classes, which account for two thirds
of all pairs. Our assumption is that T and H are translations of the same source sentence. We then use a metric for MT evaluation (Meteor) in
order to judge the similarity of both translations. It is clear that in most cases when T entails H, T and H do not have exactly the same
meaning. However, we can observe that the similarity is still much higher for positive T/H pairs than for negative pairs. We achieve a result
of 46.34 macro-average F1-score for the task. On one hand-side, it shows that our approach has its weaknesses especially because
our assumption that T and H contain the same meaning does not always hold, especially if T and H have very different lengths. On the other
hand considering the fact that RTE-7 is a difficult class-imbalanced problem (<5% YES, >95% NO) this robust approach achieves a decent
result for a large amount of data. It is above the median of this year's results and is comparable with the top results from the previous year.