Skip to main content Skip to main navigation

Publication

Towards a Novel Classification of Table Types in Scholarly Publications

Jilin He; Ekaterina Borisova; Georg Rehm
In: Georg Rehm; Sonja Schimmler; Stefan Dietze; Frank Krüger (Hrsg.). Proceedings of the Workshop on Natural Scientific Language Processing and Research Knowledge Graphs (NSLP 2024). Extended Semantic Web Conference (ESWC-2024), May 26-30, Hersonissos, Greece, Greece, Springer series Lecture Notes in Artificial Intelligence (LNAI), 5/2024.

Abstract

Tables are one of the prevalent means of organising and representing structured data. They contain a wealth of valuable information that is challenging to extract automatically, yet can be leveraged for downstream tasks such as question answering and knowledge base construction. Table Type Classification (TTC) is one of the tasks which contributes to better semantic understanding and extraction of knowledge in tabular data. While multiple classification schemas exist, almost all of them are focused on web tables. Therefore, these classifications might overlook certain types which are common in other areas such as scientific research. This paper addresses this gap by introducing ten novel TTC taxonomies tailored towards tables used in scholarly publications. We also evaluate the applicability of taxonomies derived from web tables to scientific tables. Additionally, we propose a new dataset containing 13,000 annotated table images, called TD4CLTabs. Our results indicate that both existing and newly proposed taxonomies are suitable and effective for classifying scientific tables.

Projekte