Skip to main content Skip to main navigation

Publication

Layout Analysis of Arabic Script Documents

Syed Saqib Bukhari; Faisal Shafait; Thomas Breuel
In: Volker Märgner; Haikal El Abed (Hrsg.). Guide to OCR for Arabic Scripts. Pages 35-53, Springer, 2012.

Abstract

Layout analysis—extraction of text lines from a document image and identification of their reading order—is an important step in converting the document into a searchable electronic representation. Projection methods are typically employed for extraction of text lines in Arabic script documents. Although projection methods achieve good accuracy on clean, skew-free documents, their performance drops under challenging situations (border noise, skew, complex layouts, etc.). This chapter presents a layout analysis system for extracting text lines in reading order from scanned Arabic script document images written in different languages (Arabic, Urdu, Persian, etc.) and different styles (Naskh, Nastaliq, etc.). The presented system is based on a suitable combination of different well-established techniques for analyzing Latin script documents that have proven to be robust against different types of document image degradations.