Publikation
DeepDeSRT: Deep Learning for Detection and Structure Recognition of Tables in Document Images
Sebastian Schreiber; Sheraz Ahmed; Stefan Agne; Ivo Wolf; Andreas Dengel
In: ICDAR. International Conference on Document Analysis and Recognition (ICDAR-2047), IEEE, 2017.
Zusammenfassung
This paper presents a novel end-to-end system for
table understanding in document images called DeepDeSRT. In
particular, the contribution of DeepDeSRT is two-fold. First, it
presents a deep learning-based solution for table detection in
document images. Secondly, it proposes a novel deep learningbased
approach for table structure recognition, i.e. identifying
rows, columns, and cell positions in the detected tables. In
contrast to existing rule-based methods, which rely on heuristics
or additional PDF metadata (like, for example, print instructions,
character bounding boxes, or line segments), the presented
system is data-driven and does not need any heuristics or
metadata to detect as well as to recognize tabular structures
in document images. Furthermore, in contrast to most existing
table detection and structure recognition methods, which are
applicable only to PDFs, DeepDeSRT processes document images,
which makes it equally suitable for born-digital PDFs (as they can
automatically be converted into images) as well as even harder
problems, e.g. scanned documents. To gauge the performance of
DeepDeSRT, the system is evaluated on the publicly available
ICDAR 2013 table competition dataset containing 67 documents
with 238 pages overall. Evaluation results reveal that DeepDeSRT
outperforms state-of-the-art methods for table detection and
structure recognition and achieves F1-measures of 96.77% and
91.44% for table detection and structure recognition, respectively.
Additionally, DeepDeSRT is evaluated on a closed dataset from a
real use case of a major European aviation company comprising
documents which are highly unlike those in ICDAR 2013. Tested
on a randomly selected sample from this dataset, DeepDeSRT
achieves high detection accuracy for tables which demonstrates
the sound generalization capabilities of our system.