Publikation
OCAS: Ontology-Based Corpus and Annotation Scheme. Towards an OBIE Gold Standard that Contains even Implicit Facts
Alexander Grothkast; Benjamin Adrian; Kinga Schumacher; Andreas Dengel
In: Sebastian Blohm; Ulf Brefeld; Felix Jungermann; Roman Yangarber (Hrsg.). Proceedings of the High-level Information Extraction Workshop 2008. High-level Information Extraction Workshop (HLIE-2008), located at ECML PKDD 2008, September 15-19, Antwerpen, Belgium, Pages 25-35, ECML PKDD 2008, 2008.
Zusammenfassung
This paper presents strategies and lessons learned from the creation of
a corpus. It suggests a gold standard for evaluating ontology-based information
extraction (OBIE) systems. This OBIE gold standard is called OCAS2008 and
consists of: (i) an OBIE layer cake for comparing OBIE systems by subtasks, (ii)
a document corpus of 121 documents with 31,000 words about a closed domain,
(iii) a compact domain ontology including more than 40,000 instances, (iv) two
annotation scenarios that extend traditional template-based evaluations, (v) an
annotation set that contains typed annotations according to the ontology and the
OBIE layer cake, (vi) annotations that concern text phrases, symbols, instances,
explicitly written facts, implicit facts, and (vii) finally, human created annotations
according to predefined specifications. We claim that the use of OCAS2008
provides a basis for comparable and significant evaluations of OBIE systems.