Skip to main content Skip to main navigation

Publication

Language Data Sharing in European Public Services – Overcoming Obstacles and Creating Sustainable Data Sharing Infrastructures

Lilli Smal; Andrea Lösch; Josef van Genabith; Maria Giagkou; Thierry Declerck; Stephan Busemann
In: Nicoletta Calzolari; Frédéric Béchet; Philippe Blache; Christopher Cieri; Khalid Choukri; Thierry Declerck; Sara Goggi; Hitoshi Isahara; Bente Maegaard; Joseph Mariani; Hélène Mazo; Asuncion Moreno; Jan Odijk; Stelios Piperidis (Hrsg.). Proceedings of the Twelfth International Conference on Language Resources and Evaluation (LREC 2020). International Conference on Language Resources and Evaluation (LREC-2020), May 11-16, Marseille, France, Pages 3443-3448, ISBN 979-10-95546-34-4, ELRA, Paris, 5/2020.

Abstract

Data is key in training modern language technologies. In this paper, we summarise the findings of the first pan-European study on obstacles to sharing language data across 29 EU Member States and CEF-affiliated countries carried out under the ELRC White Paper action on Sustainable Language Data Sharing to Support Language Equality in Multilingual Europe. Why Language Data Matters. We present the methodology of the study, the obstacles identified and report on recommendations on how to overcome those. The obstacles are classified into (1) lack of appreciation of the value of language data, (2) structural challenges, (3) disposition towards CAT tools and lack of digital skills, (4) inadequate language data management practices, (5) limited access to outsourced translations, and (6) legal concerns. Recommendations are grouped into addressing the European/national policy level, and the organisational/institutional level.

Projects