Skip to main content Skip to main navigation

Publication

Inducing a Computational Lexicon from a Corpus with Syntactic and Semantic Information

Dennis Spohr; Aljoscha Burchardt; Sebastian Pado; Anette Frank; Ulrich Heid
In: Proceedings of the 7th IWCS 2007. International Workshop on Computational Semantics (IWCS-7), January 10-12, Tilburg, Netherlands, 2007.

Abstract

To date, linguistically annotated corpora are mainly exploited for feature-based training of automatic labelling systems. In this paper, we present a general approach for the Description Logics-based modelling of multi-layered annotated corpora that offers (i) flexible and enhanced querying functionality that goes beyond current XML-based query languages, (ii) a basis for consistency checking, and (iii) a general method for defining abstractions over corpus annotations. We apply this method to the syntactically and semantically annotated SALSA/TIGER corpus. By defining abstractions over the corpus data, we generalise from a large set of individual corpus annotations to a corresponding lexicon model. We discuss issues arising from modelling multi-layered corpus annotations in Description Logics and formalisation of multi-layered corpus annotations illustrate the benefits of our approach at concrete examples.