Publication

Techniques for Improving OCR Results

Andreas Dengel; Rainer Hoch; Frank Hönes; Michael Malburg; Achim Weigel

In: P. S. P. Wang; H. Bunke (Hrsg.). Handbook on Character Recognition and Document Image Analysis. Pages 227-258, World Scientific Publ. Comp. 1997.

Abstract

In this chapter, we give an overview of the state-of-the-art techniques for improving recognition results of OCR systems. OCR results may contain segmentation as well as classification errors due to low image quality. Such errors can often be corrected by contextual post-processing. We will present the most important techniques for the post-processing of OCR results: voting techniques, lexical post-processing as well as techniques that consider the word or document context. Voting techniques combine the recognition results from multiple OCR devices, typically without utilizing any contextual knowledge. Other post-processing techniques are able to correct remaining OCR errors by employing various sources of contextual knowledge. Lexical post-processing, for example, makes use of knowledge about valid words of natural language. More sophisticated techniques integrating knowledge about the word context or even the entire document context can also be applied to further improve the quality of OCR results. The most useful is the incorporation of knowledge about valid word sequences. In general, post-processing of recognition results considerably improves the OCR accuracy if various kinds of contextual knowledge beyond the level of individual characters are utilized.