Publikation
Structural Mixtures for Statistical Layout Analysis
Faisal Shafait; Joost van Beusekom; Daniel Keysers; Thomas Breuel
In: Proceedings of the 8th IAPR International Workshop on Document Analysis Systems. IAPR International Workshop on Document Analysis Systems (DAS-2008), September 16-19, Nara, Japan, IEEE, 2008.
Zusammenfassung
A key limitation of current layout analysis methods is
that they rely on many hard-coded assumptions about doc-
ument layouts and can not adapt to new layouts for which
the underlying assumptions are not satisfied. Another ma-
jor drawback of these approaches is that they do not return
confidence scores for their outputs. These problems pose
major challenges in large scale digitization efforts where a
large number of different layouts need to be handled and
manual inspection of the results on each individual page
is not feasible. This paper presents a novel statistical ap-
proach to layout analysis that aims at solving the above-
mentioned problems for Manhattan layouts. The presented
approach models known page layouts as a structural mix-
ture model. A probabilistic matching algorithm is presented
that gives multiple interpretations of input layout with asso-
ciated probabilities. First experiments on documents from
the publicly available MARG dataset achieved below 5%
error rate for geometric layout analysis.