Publication
Crowdsourcing versus the laboratory: towards crowd-based linguistic text quality assessment of query-based extractive summarization
Neslihan Iskender; Tim Polzehl; Sebastian Möller
In: Adrian Paschke; Clemens Neudecker; Georg Rehm; Jamal Al Qundus; Lydia Pintscher (Hrsg.). Proceedings of the Conference on Digital Curation Technologies (Qurator 2020). Conference on Digital Curation Technologies (QURATOR-2020), January 20-21, Berlin, Germany, Pages 1-16, CEUR, 2020.
Abstract
Curating text manually in order to improve the quality of automatic natural language processing tools can become very time consuming and expensive. Especially, in the case of query-based extractive online forum summarization, curating complex information spread along multiple posts from multiple forum members to create a short meta-summary that answers a given query is a very challenging task. To overcome this challenge, we explore the applicability of microtask crowdsourcing as a fast and cheap alternative for query-based extractive text summarization of online forum discussions. We measure the linguistic quality of crowd-based forum summarizations, which is usually conducted in a traditional laboratory environment with the help of experts, via comparative crowdsourcing and laboratory experiments. To our knowledge, no other study considered query-based extractive text summarization and summary quality evaluation as an application area of the microtask crowdsourcing. By conducting experiments both in crowdsourcing and laboratory environments, and comparing the results of linguistic quality judgments, we found out that microtask crowdsourcing shows high applicability for determining the factors overall quality, grammaticality, non-redundancy, referential clarity, focus, and structure & coherence. Further, our comparison of these findings with a preliminary and initial set of expert annotations suggest that the crowd assessments can reach comparable results to experts specifically when determining factors such as overall quality and structure & coherence mean values. Eventually, preliminary analyses reveal a high correlation between the crowd and expert ratings when assessing low-quality summaries.