The project TAKE aims to adapt, develop and utilize a range of language and knowledge technologies for the gradual automatic extraction of knowledge from the World Wide Web. Rule-based and statistical methods for language processing will be combined for systematically extending a body of formalized knowledge.
The central technology for this endeavor is semantically driven advanced information extraction, especially relation extraction, i.e., the detection of instances of semantic relations in large volumes of texts. Such relevant relations may belong to several classes such as facts, definitions, events, citations and opinions.
In TAKE, information extraction is not viewed as a pragmatic shortcut to getting at least something out of natural language texts but rather as a method for gradually approaching the unsolved problem of text understanding in a systematic and controlled way.
Existing bodies of formalized linguistic knowledge such as lexicons, morphologies and grammars will be utilized as well as tools for statistical processing.
The developed methods, architectures and systems will be tested and demonstrated in two knowledge domains:
- scientific/technological literature in a selected field of research, i.e., language technology, and
- general biographical texts.
TAKE is funded under contract 01IW08003.