Publikation

anyOCR: An Open-Source OCR System for Historical Archives

Syed Saqin Bukhari; Ahmad Kadi; Fahim Mir; Andreas Dengel

In: IEEE (Hrsg.). ICDAR. International Conference on Document Analysis and Recognition (ICDAR-17), Kyoto, Japan, IEEE, 2017.

Zusammenfassung

Currently an intensive amount of research is going on in the field of digitizing historical Archives for converting scanned page images into searchable full text. anyOCR is a new OCR system which mainly emphasize the techniques requires for digitizing a historical archive with high accuracy. It is an open-source system for the research community who can be easily applied the anyOCR system for digitization of a historical archive. The anyOCR system can also be used for contemporary document images containing diverse, simple to complex, layouts. This paper describes the current state of the anyOCR system, its architecture, as well as its major features. The anyOCR system supports a complete document processing pipeline, which includes layout analysis, training OCR models and text line prediction, with an addition of fast and interactive layout and OCR error corrections web-based services.

Projekte

KALLIMACHOS - KALLIMACHOS

ICDAR2017_anyOCR.pdf (pdf, 1 MB )