Project | TRAILS

Duration: 08/01/2024 - 07/31/2027

Trustworthy and Inclusive Machines

Research Topics

Language & Text Understanding

Application fields

Other

Natural language processing (NLP) has demonstrated impressive performance in some human tasks. To achieve such performance, current neural models need to be pre-trained on huge amounts of raw text data. This dependence on uncurated data has at least four indirect and unintended consequences that are relevant to our proposal:

1) Uncurated data tends to be linguistically and culturally non-diverse due to the statistical dominance of major languages and dialects in online texts (English vs. North Frisian, US English vs. UK English, etc.).

2) Pre-trained neural models such as the ubiquitous pre-trained language models (PLM) reproduce the features present in the data, including human biases.

3) Rare phenomena (or languages) in the "long tail" are often not sufficiently taken into account in model evaluation, leading to an underestimation of model performance, especially in real-world application scenarios.

4) The focus on achieving state-of-the-art results through the use of transfer learning with giant PLMs such as GPT4 or mT5 often underestimates alternative methods that are more accessible, efficient and sustainable.

As inclusion and trust are undermined by these problems, in TRAILS we focus on three main research directions to address such problems: (i) inclusion of underrepresented languages and cultures through multilingual and culturally sensitive NLP, (ii) robustness and fairness with respect to long-tail phenomena and classes and "trustworthy content", and (iii) robust and efficient NLP models that enable training and deployment of models for (i) and (ii). We also partially address economic inequality by aiming for more efficient models (objective (iii)), which directly translates into a lower resource/cost footprint.

Contact Person

Dr.-Ing. Leonhard Hennig

Leonhard.Hennig@dfki.de
Phone: +49 30 23895 1821

Dr. Simon Ostermann

Simon.Ostermann@dfki.de
Phone: +49 681 85775 5310

Keyfacts

Involved research areas

Head

Prof. Dr. Josef van Genabith

Publications

All publications

Building Common Ground in Dialogue: A Survey
Tatiana Anikina; Alina Leippert; Simon Ostermann
In: Proceedings of the Second LUHME Workshop. Workshop on Language Understanding in the Human-Machine Era (LUHME-2025), located at ECAI-2025, October 26, Bologna, Italy, Association for Computational Linguistics, 10/2025.
dfkinit2b at CheckThat! 2025: Leveraging LLMs and Ensemble of Methods for Multilingual Claim Normalization
Tatiana Anikina; Ivan Vykopal; Sebastian Kula; Ravi Kiran Chikkala; Natalia Skachkova; Jing Yang; Veronika Solopova; Vera Schmitt; Simon Ostermann
In: CLEF 2025 Working Notes. Conference and Labs of the Evaluation Forum (CLEF-2025), Information Access Evaluation meets Multilinguality, Multimodality, and Visualization, September 9-12, Madrid, Spain, CEUR Workshop Proceedings, 9/2025.
Cross-Lingual Fact Verification: Analyzing LLMs Performance Patterns Across Languages
Hanna Shcharbakova; Tatiana Anikina; Natalia Skachkova; Josef van Genabith
In: Recent Advanced in Natural Language Processing. International Conference on Recent Advances in Natural Language Processing (RANLP-2025), September 8-10, Varna, Bulgaria, Association for Computational Linguistics, 9/2025.

Project | TRAILS

Trustworthy and Inclusive Machines

Research Topics

Application fields

Contact Person

Keyfacts

Involved research areas

Head

Publications

Building Common Ground in Dialogue: A Survey

dfkinit2b at CheckThat! 2025: Leveraging LLMs and Ensemble of Methods for Multilingual Claim Normalization

Cross-Lingual Fact Verification: Analyzing LLMs Performance Patterns Across Languages

Funding Authorities

BMBF - Federal Ministry of Education and Research

01IW24005

Research Topics

Application fields

Share project:

Contact Person

Keyfacts

Involved research areas

Head

Related projects

Building Common Ground in Dialogue: A Survey

dfkinit2b at CheckThat! 2025: Leveraging LLMs and Ensemble of Methods for Multilingual Claim Normalization

Cross-Lingual Fact Verification: Analyzing LLMs Performance Patterns Across Languages

Funding Authorities

BMBF - Federal Ministry of Education and Research

01IW24005