Publikation
Optical Document Security in High Volume Offce Environments
Joost van Beusekom
PhD-Thesis, Fachbereich Informatik der Universität Kaiserslautern, 2010.
Zusammenfassung
The widespread use of scanning in the processing of forms in insurance companies,
banks and other high volume businesses has created new opportunities for fraud by
document tampering and modification. Such alterations can be performed either
on the paper itself, or using digital image processing and editing.
This thesis describes several methods that can be integrated into automated
mail processing systems to detect potentially fraudulent alterations of documents.
Three novel approaches are presented: first, two model-based approaches are presented.
Second, an alteration detection approach based on text-line examination
is introduced. Important contributions are also made by providing data sets for
evaluating the methods.
Using prior information on genuine documents, the model-based approaches
use intrinsic document features for authentication of the document source and for
detecting alterations. Authentication by identifying the source uses counterfeit
protection system (CPS) codes. These permit printer class identification with accuracies
of up to 92:5% (n = 67, 16 classes). The image-based comparison of CPS
patterns provides authentication on a printer level and attains an accuracy of 88:3%
(n = 94). The second model-based approach models the positional variations of
fixed foreground components. These can effectively detect forgeries on the basis of
scanning distortions. This method shows an accuracy of 97:0% (n = 168) for
classification between forged and genuine documents. Finally, an alteration detection
approach is presented that works even in absence of prior knowledge. Text-line
skew angles and text-line alignment are used to check the document's contents
for optical consistency. Evaluation on two pass printed documents and manually
pasted text achieves an area under the ROC curve (AUC) score of AUC = 0:89
(n = 191).
The operation of the alteration detection is supported by additional pre-processing:
an integrated approach for orientation and skew detection is presented, showing
competitive results of up to 98:8% (n = 979) for orientation detection accuracy
and up to 98:0% (n = 979) for the skew detection accuracy on the UW-I dataset.
Border noise removal by explicit detection of the page's content area is presented
together with a comprehensive and extensive set of evaluation procedures. Results
on public datasets show an accuracy of up to 97:2% on zone-level classification
(n = 1440). A
exible and accurate case-based reasoning approach is presented
for logical labeling, showing accuracy rates of up to 99:6% (n=6770) on the MARG
dataset.
The methods described in this thesis suggest that authentication and alteration
detection methods can be used to build an effective filter in high volume document
digitization setups. This constitutes a first step into the direction of a fully
automated forgery detection system.