Publication
A Hybrid Approach and Unified Framework for Bibliographic Reference Extraction
Syed Tahseen Raza Rizvi; Andreas Dengel; Sheraz Ahmed
In: IEEE Access, Vol. 8, Pages 217231-217245, IEEE, 12/2020.
Abstract
Publications are an integral part of a scientific community. Bibliographic reference extractionfrom scientific publication is a challenging task due to diversity in referencing styles and document layout.Existing methods perform sufficiently on one dataset however, applying these solutions to a different datasetproves to be challenging. Therefore, a generic solution was anticipated which could overcome the limitationsof the previous approaches. The contribution of this paper is three-fold. First, it presents a novel approachcalledDeepBiRDwhich is inspired by human visual perception and exploits layout features to identifyindividual references in a scientific publication. Second, we release a large dataset for image-based referencedetection with 2401 scans containing 38863 references, all manually annotated for individual reference.Third, we present a unified and highly configurable end-to-end automatic bibliographic reference extractionframework calledBRExSyswhich employsDeepBiRDalong with state-of-the-art text-based models to detectand visualize references from a bibliographic document. Our proposed approach pre-processes the imagesin which a hybrid representation is obtained by processing the given image using different computer visiontechniques. Then, it performs layout driven reference detection using Mask R-CNN on a given scientificpublication.DeepBiRDwas evaluated on two different datasets to demonstrate the generalization of thisapproach. The proposed system achieved an AP50 of 98.56% on our dataset.DeepBiRDsignificantlyoutperformed the current state-of-the-art approach on their dataset. Therefore, suggesting thatDeepBiRDis significantly superior in performance, generalized, and independent of any domain or referencing style.