Publikation
Interactive Topic Graph Extraction and Exploration of Web Content
Günter Neumann; Sven Schmeier
In: T. Poibeau; H. Saggion; J. Piskorski; R. Yangarber. Multi-source, Multilingual Information Extraction and Summarization. Pages 1-24, Theory and Applications of Natural Language Processing, ISBN ISBN 978-3-642-28568-4, Springer, 6/2012.
Zusammenfassung
In the following, we present an approach using interactive topic graph extraction
for the exploration of web content. The initial information request, in the
form of a query topic description, is issued online by a user to the system. The
topic graph is then constructed from N web snippets that are produced by a standard
search engine. We consider the extraction of a topic graph to be a specific empirical
collocation extraction task, where collocations are extracted between chunks.
Our measure of association strength is based on the pointwise mutual information
between chunk pairs which explicitly takes their distance into account. This topic
graph can then be further analyzed by users so that they can request additional background
information with the help of interesting nodes and pairs of nodes in the topic
graph, e.g., explicit relationships extracted from Wikipedia or those automatically
extracted from additional Web content as well as conceptual information of the topic
in form of semantically oriented clusters of descriptive phrases. This information is
presented to the users, who can investigate the identified information nuggets to refine
their information search. An initial user evaluation shows that our approach is
especially helpful for finding new interesting information on topics about which the
user has only a vague idea or no idea, at all.