Publikation

Optical Document Security in High Volume Offce Environments

Joost van Beusekom

PhD-Thesis, Fachbereich Informatik der Universität Kaiserslautern, 2010.

Zusammenfassung

The widespread use of scanning in the processing of forms in insurance companies, banks and other high volume businesses has created new opportunities for fraud by document tampering and modification. Such alterations can be performed either on the paper itself, or using digital image processing and editing. This thesis describes several methods that can be integrated into automated mail processing systems to detect potentially fraudulent alterations of documents. Three novel approaches are presented: first, two model-based approaches are presented. Second, an alteration detection approach based on text-line examination is introduced. Important contributions are also made by providing data sets for evaluating the methods. Using prior information on genuine documents, the model-based approaches use intrinsic document features for authentication of the document source and for detecting alterations. Authentication by identifying the source uses counterfeit protection system (CPS) codes. These permit printer class identification with accuracies of up to 92:5% (n = 67, 16 classes). The image-based comparison of CPS patterns provides authentication on a printer level and attains an accuracy of 88:3% (n = 94). The second model-based approach models the positional variations of fixed foreground components. These can effectively detect forgeries on the basis of scanning distortions. This method shows an accuracy of 97:0% (n = 168) for classification between forged and genuine documents. Finally, an alteration detection approach is presented that works even in absence of prior knowledge. Text-line skew angles and text-line alignment are used to check the document's contents for optical consistency. Evaluation on two pass printed documents and manually pasted text achieves an area under the ROC curve (AUC) score of AUC = 0:89 (n = 191). The operation of the alteration detection is supported by additional pre-processing: an integrated approach for orientation and skew detection is presented, showing competitive results of up to 98:8% (n = 979) for orientation detection accuracy and up to 98:0% (n = 979) for the skew detection accuracy on the UW-I dataset. Border noise removal by explicit detection of the page's content area is presented together with a comprehensive and extensive set of evaluation procedures. Results on public datasets show an accuracy of up to 97:2% on zone-level classification (n = 1440). A exible and accurate case-based reasoning approach is presented for logical labeling, showing accuracy rates of up to 99:6% (n=6770) on the MARG dataset. The methods described in this thesis suggest that authentication and alteration detection methods can be used to build an effective filter in high volume document digitization setups. This constitutes a first step into the direction of a fully automated forgery detection system.