CORA4NLP

Project

Co(n)textual Reasoning and Adaptation for Natural Language Processing

Duration:
10/01/2020 - 09/30/2023

Research Topics
Machine Learning & Deep Learning Language & Text Understanding

Application fields
Other

Language is implicit - it omits information. Filling this information gap requires contextual inference, background- and commonsense knowledge, and reasoning over situational context. Language also evolves, i.e., it specializes and changes over time. For example, many different languages and domains exist, new domains arise, and both evolve constantly. Thus, language understanding also requires continuous and efficient adaptation to new languages and domains, and transfer to, and between, both. Current language understanding technology, however, focuses on high resource languages and domains, uses little to no context, and assumes static data, task, and target distributions.

The research in Cora4NLP aims to address these challenges. It builds on the expertise and results of the predecessor project DEEPLEE and is carried out jointly between the language technology research departments in Berlin and Saarbrücken. Specifically, our goal is to develop natural language understanding methods that enable:

reasoning over broader co- and contexts;
efficient adaptation to novel and/or low resource contexts;
continuous adaptation to, and generalization over, evolving contexts.

To achieve this, we pursue the following research directions:

memory- and language model-augmented few- and zero-shot learning;
self- and weakly-supervised pre-training for low-resource domains and long-tail classes;
multi-lingual, intra- and inter-document, and dialogue context representations;
integration of structured domain knowledge, background- and commonsense knowledge;
continual learning for open-domain and supervised tasks multi-hop contextual reasoning.

The resulting methods will be applied in the context of various natural language understanding tasks, such as information extraction, question answering, machine translation, and dialogue.

Keyfacts

Involved research areas
Speech and Language Technology,
Multilinguality and Language Technology
Head
Dr.-Ing. Leonhard Hennig

Publications about the project

Tatiana Anikina; Natalia Skachkova; Anna Mokhova

In: Zdeněk ´abokrtský; Maciej Ogrodniczuk (Hrsg.). Proceedings of the CRAC 2023 Shared Task on Multilingual Coreference Resolution. Workshop on Computational Models of Reference, Anaphora and Coreference (CRAC-2023), located at EMNLP 2023, December 6-7, Singapore, Singapore, Pages 19-33, Association for Computational Linguistics, 12/2023.

Stalin Varanasi; Muhammad Umer Butt; Günter Neumann

In: Large Language Models for Natural Language Processing. International Conference on Recent Advances in Natural Language Processing (RANLP-2023), located at RANLP, September 4-6, Varna, Bulgaria, Pages 1171-1179, ISBN ISBN 978-954-452-092-2, INCOMA Ltd. Shoumen, BULGARIA, 9/2023.

Leonhard Hennig; Philippe Thomas; Sebastian Möller

In: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Annual Meeting of the Association for Computational Linguistics (ACL-2023), July 9-14, Online and Toronto, Canada, Pages 3785-3801, Association for Computational Linguistics, 7/2023.

All publications

Co(n)textual Reasoning and Adaptation for Natural Language Processing

Sponsors

BMBF - Federal Ministry of Education and Research

01IW20010

Dr. Simon Ostermann

Keyfacts

Multilingual coreference resolution: Adapt and Generate

AutoQIR: Auto-Encoding Questions with Retrieval Augmented Decoding for Unsupervised Passage Retrieval and Zero-shot Question Generation

MultiTACRED: A Multilingual Version of the TAC Relation Extraction Dataset