Suspicious Sentence Detection and Claim Verification in the COVID-19 Domain

Elitsa Pankovska, Konstantin Schulz, Georg Rehm

In: Marinella Petrocchi, Marco Viviani (Hrsg.). Advances in Information Retrieval: 44th European Conference on IR Research, ECIR 2022, Proceedings, Part II. Workshop on Reducing Online Misinformation Through Credible Information Retrieval (ROMCIR-2022) befindet sich European Conference on Information Retrieval (ECIR) 2022 April 1-1 Stavanger Norway Lecture Notes in Comouter Science (LNCS) 13186 CEUR-WS 2022.


The processing, identification and fact checking of online information has received a lot of attention recently. One of the challenges is that scandalous or “blown up” news tend to become viral, even when coming from unreliable sources. Particularly during a global pandemic, it is crucial to find efficient ways of determining the credibility of information. Fact-checking initiatives such as Snopes, etc., perform manual claim validation but they are unable to cover all suspicious claims that can be found online – they focus mainly on the ones that have gone viral. Similarly, for the general user it is also impossible to fact-check every single statement on a specific topic. While a lot of research has been carried out in both claim verification and fact-check-worthiness, little work has been done so far on the detection and extraction of dubious claims, combined with fact-checking them using external knowledge bases, especially in the COVID-19 domain. Our approach involves a two-step claim verification procedure consisting of a fake news detection task in the form of binary sequence classification and fact-checking using the Google Fact Check Tools. We primarily work on medium-sized documents in the English language. Our prototype is able to recognize, on a higher level, the nature of fake news, even hidden in a text that seems credible at first glance. This way we can alert the reader that a document contains suspicious statements, even if no already validated similar claims exist. For more popular claims, however, multiple results are found and displayed. We manage to achieve an 𝐹 1 score of 98.03% and an accuracy of 98.1% in the binary fake news detection task using a fine-tuned DistilBERT model.


paper.pdf (pdf, 508 KB )

Deutsches Forschungszentrum für Künstliche Intelligenz
German Research Center for Artificial Intelligence