Skip to main content Skip to main navigation

Project | TRAILS

Duration:
Trustworthy and Inclusive Machines

Trustworthy and Inclusive Machines

Application fields

  • Other

Natural language processing (NLP) has demonstrated impressive performance in some human tasks. To achieve such performance, current neural models need to be pre-trained on huge amounts of raw text data. This dependence on uncurated data has at least four indirect and unintended consequences that are relevant to our proposal:

1) Uncurated data tends to be linguistically and culturally non-diverse due to the statistical dominance of major languages and dialects in online texts (English vs. North Frisian, US English vs. UK English, etc.).

2) Pre-trained neural models such as the ubiquitous pre-trained language models (PLM) reproduce the features present in the data, including human biases.

3) Rare phenomena (or languages) in the "long tail" are often not sufficiently taken into account in model evaluation, leading to an underestimation of model performance, especially in real-world application scenarios.

4) The focus on achieving state-of-the-art results through the use of transfer learning with giant PLMs such as GPT4 or mT5 often underestimates alternative methods that are more accessible, efficient and sustainable.

As inclusion and trust are undermined by these problems, in TRAILS we focus on three main research directions to address such problems: (i) inclusion of underrepresented languages and cultures through multilingual and culturally sensitive NLP, (ii) robustness and fairness with respect to long-tail phenomena and classes and "trustworthy content", and (iii) robust and efficient NLP models that enable training and deployment of models for (i) and (ii). We also partially address economic inequality by aiming for more efficient models (objective (iii)), which directly translates into a lower resource/cost footprint.

Publications

  1. Cross-Refine: Improving Natural Language Explanation Generation by Learning in Tandem

    Qianli Wang; Tatiana Anikina; Nils Feldhus; Simon Ostermann; Sebastian Möller; Vera Schmitt

    In: Marianna Apidianaki; Hend Al-Khalifa; Barbara Di Eugenio; Steven Schockaert (Hrsg.). 31th International Conference on Computational Linguistics 2025. International Conference on Computational Linguistics (COLING-2025), January 19-24, Abu Dhabi, United Arab Emirates, International Conference on Computational Linguistics, 2025.

Sponsors

BMBF - Federal Ministry of Education and Research

01IW24005

BMBF - Federal Ministry of Education and Research