Publikation
The Language Resource Life Cycle: Towards a Generic Model for Creating, Maintaining, Using and Distributing Language Resources
Georg Rehm
In: Nicoletta Calzolari; Khalid Choukri; Thierry Declerck; Marko Grobelnik; Bente Maegaard; Joseph Mariani; Asuncion Moreno; Jan Odijk; Stelios Piperidis (Hrsg.). Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016). International Conference on Language Resources and Evaluation (LREC-2016), May 23-28, Portoro¸, Slovenia, ISBN 978-2-9517408-9-1, European Language Resources Association (ELRA), Paris, France, 5/2016.
Zusammenfassung
Language Resources (LRs) are an essential ingredient of current approaches in Linguistics, Computational Linguistics, Language Technology and related fields. LRs are collections of spoken or written language data, typically annotated with linguistic analysis information. Different types of LRs exist, for example, corpora, ontologies, lexicons, collections of spoken language data (audio), or collections that also include video (multimedia, multimodal). Often, LRs are distributed with specific tools, documentation, manuals or research publications. The different phases that involve creating and distributing an LR can be conceptualised as a life cycle. While the idea of handling the LR production and maintenance process in terms of a life cycle has been brought up quite some time ago, a best practice model or common approach can still be considered a research gap. This article wants to help fill this gap by proposing an initial version of a generic Language Resource Life Cycle that can be used to inform, direct, control and evaluate LR research and development activities (including description, management, production, validation and evaluation workflows).
Projekte
- CRACKER - Cracking the Language Barrier: Coordination, Evaluation and Resources for European MT Research
- META Net - A Network of Excellence forging the Multilingual Europe Technology Alliance