Investigating Attention Mechanism for Page Object Detection in Document Images

Shivam Naik; Khurram Azeem Hashmi; Alain Pagani; Marcus Liwicki; Didier Stricker; Muhammad Zeshan Afzal

In: Applied Sciences (MDPI), Vol. 12, No. 15, Pages 3390-3408, MDPI, Switzerland, 7/2022.


Page object detection in scanned document images is a complex task due to varying document layouts and diverse page objects. In the past, traditional methods such as Optical Character Recognition (OCR)-based techniques have been employed to extract textual information. However, these methods fail to comprehend complex page objects such as tables and figures. This paper addresses the localization problem and classification of graphical objects that visually summarize vital information in documents. Furthermore, this work examines the benefit of incorporating attention mechanisms in different object detection networks to perform page object detection on scanned document images. The model is designed with a Pytorch-based framework called Detectron2. The proposed pipelines can be optimized end-to-end and exhaustively evaluated on publicly available datasets such as DocBank, PublayNet, and IIIT-AR-13K. The achieved results reflect the effectiveness of incorporating the attention mechanism for page object detection in documents.


Weitere Links

applsci-12-07486.pdf (pdf, 10 MB )

German Research Center for Artificial Intelligence
Deutsches Forschungszentrum für Künstliche Intelligenz