Publication
anyOCR: An Open-Source OCR System for Historical Archives
Syed Saqin Bukhari; Ahmad Kadi; Fahim Mir; Andreas Dengel
In: IEEE (Hrsg.). ICDAR. International Conference on Document Analysis and Recognition (ICDAR-17), Kyoto, Japan, IEEE, 2017.
Abstract
Currently an intensive amount of research is going on in the field of digitizing historical Archives for converting scanned page images into searchable full text. anyOCR is a new OCR system which mainly emphasize the techniques requires for digitizing a historical archive with high accuracy. It is an open-source system for the research community who can be easily applied the anyOCR system for digitization of a historical archive. The anyOCR system can also be used for contemporary document images containing diverse, simple to complex, layouts. This paper describes the current state of the anyOCR system, its architecture, as well as its major features. The anyOCR system supports a complete document processing pipeline, which includes layout analysis, training OCR models and text line prediction, with an addition of fast and interactive layout and OCR error corrections web-based services.