Skip to main content Skip to main navigation

Publikation

"A Sacred Bird Called the Phoenix". Auditing the most-used Parallel Corpus for German Sign Language Recognition and Translation

Vera Czehmann; Shakib Yazdani; Yasser Hamidullah; Fabrizio Nunnari; Eleftherios Avramidis
In: Eleni Efthimiou; Stavroula-Evita Fotinea; Thomas Hanke; Julie A. Hochgesang; Johanna Mesch; Marc Schulder (Hrsg.). Proceedings of the LREC2026 12th Workshop on the Representation and Processing of Sign Languages: Language in Motion. Workshop on the Representation and Processing of Sign Languages (signlang@LREC-2026), located at LREC-2026, May 16, Palma, Mallorca, Spain, Pages 80-92, ISBN 978-2-493814-82-1, European Language Resources Association (ELRA), 2026.

Zusammenfassung

This paper presents an empirical audit of the widely used RWTH‑PHOENIX‑2014T corpus, examining its suitability as a benchmark for sign language recognition and translation. Through human annotation of the training set and extensive sign-to-text back translation of the test set, we provide detailed statistics that indicate substantial quality issues, including information loss and lexical errors. Automatic scores comparing human sign-to-text back translations to the original speech-transcribed references are remarkably low, suggesting strong translationese effects and substantial paraphrasing, revealing limitations of lexical metrics in adequately scoring translation quality. Replacing the original speech-transcribed references with human sign-to-text back translations while scoring existing sign language translation systems reveals the lack of robustness of system evaluation with lexical metrics against this test set. Our findings highlight risks associated with relying on this corpus for model evaluation and call for more rigorous, linguistically grounded evaluation practices in sign language technology research. The back-translated test set and error annotations are made publicly available.

Projekte

Weitere Links