Publikation
Numbers Don't Lie: Hybrid Extraction and Validation of Quantitative Statements in Arguments with Semi-structured Information
Mirko Lenz; Lorik Dumani; Ralf Schenkel; Ralph Bergmann
In: Tanya Braun; Benjamin Paaßen; Frieder Stolzenburg (Hrsg.). KI 2025: Advances in Artificial Intelligence. German Conference on Artificial Intelligence (KI-2025), Potsdam, Germany, Pages 77-90, Lecture Notes in Computer Science (LNCS), Vol. 15956, ISBN 978-3-032-02813-6, Springer Nature Switzerland, Cham, 2026.
Zusammenfassung
Evidence in arguments may be stated in various forms, including quantitative statements (i.e., numerical relations between entities). This measurable information can be validated against reliable sources like Wikipedia to combat the spread of misinformation. In this paper, we propose a four-step pipeline that combines rule-based techniques with prompting strategies for generative language models in a hybrid fashion. We use regular expressions to identify candidates in claim-premise structures, extract statements using GPT-4o, augment the data with tables from Wikipedia, and validate statements through retrieval-augmented generation (RAG). The pipeline is evaluated on two existing argumentation corpora and the generated dataset is manually annotated to assess the quality of our predictions, showing promising results for extraction and mixed results for validation. Our code and data are available to foster further research in this area.