Publikation
Suspicious Sentence Detection and Claim Verification in the COVID-19 Domain
Elitsa Pankovska; Konstantin Schulz; Georg Rehm
In: Marinella Petrocchi; Marco Viviani (Hrsg.). Advances in Information Retrieval: 44th European Conference on IR Research, ECIR 2022, Proceedings, Part II. Workshop on Reducing Online Misinformation Through Credible Information Retrieval (ROMCIR-2022), located at European Conference on Information Retrieval (ECIR) 2022, April 1, Stavanger, Norway, Lecture Notes in Comouter Science (LNCS), Vol. 13186, CEUR-WS, 2022.
Zusammenfassung
The processing, identification and fact checking of online information has received a lot of attention recently. One of the challenges is that scandalous or “blown up” news tend to become viral, even when coming from unreliable sources. Particularly during a global pandemic, it is crucial to find efficient ways of determining the credibility of information. Fact-checking initiatives such as Snopes, FactCheck.org etc., perform manual claim validation but they are unable to cover all suspicious claims that can be found online – they focus mainly on the ones that have gone viral. Similarly, for the general user it is also impossible to fact-check every single statement on a specific topic. While a lot of research has been carried out in both claim verification and fact-check-worthiness, little work has been done so far on the detection and extraction of dubious claims, combined with fact-checking them using external knowledge bases, especially in the COVID-19 domain. Our approach involves a two-step claim verification procedure consisting of a fake news detection task in the form of binary sequence classification and fact-checking using the Google Fact Check Tools. We primarily work on medium-sized documents in the English language. Our prototype is able to recognize, on a higher level, the nature of fake news, even hidden in a text that seems credible at first glance. This way we can alert the reader that a document contains suspicious statements, even if no already validated similar claims exist. For more popular claims, however, multiple results are found and displayed. We manage to achieve an F1 score of 98.03% and an accuracy of 98.1% in the binary fake news detection task using a fine-tuned DistilBERT model.