Due to missing first line inspection in many automated digitization setups, it has become more difficult to identify forged documents. Widespread availability of high-quality printing and scanning devices have further elevated the problem by enabling even non-experts to generate high-quality forgeries. When training a machine learning system for forgery detection, one is faced with several challenges like unbalanced classes, or even absence of one class (no real forgeries might be available to train the system).
The AnDruDok project aims at bringing together research in document forensics and anomaly detection for identifying suspicious documents in a document collection. The main objective in this project is to investigate unsupervised machine learning techniques for forgery detection in document images. Particularly, the approaches based on modeling class distributions will be investigated to develop algorithms that can detect forged documents as outliers in the document collection.