Publication
Cutting the Error by Half: Investigation of Very Deep CNN and Advanced Training Strategies for Document Image Classification
Muhammad Zeshan Afzal; Andreas Kölsch; Sheraz Ahmed; Marcus Liwicki
In: ICADR. International Conference on Document Analysis and Recognition (ICDAR), IEEE, 2017.
Abstract
We present an exhaustive investigation of recent
Deep Learning architectures, algorithms, and strategies for the
task of document image classification to finally reduce the error
by more than half. Existing approaches, such as the DeepDoc-
Classifier, apply standard Convolutional Network architectures
with transfer learning from the object recognition domain. The
contribution of the paper is threefold: First, it investigates
recently introduced very deep neural network architectures
(GoogLeNet, VGG, ResNet) using transfer learning (from real
images). Second, it proposes transfer learning from a huge set
of document images, i.e. 400; 000 documents. Third, it analyzes
the impact of the amount of training data (document images)
and other parameters to the classification abilities. We use
two datasets, the Tobacco-3482 and the large-scale RVL-CDIP
dataset. We achieve an accuracy of 91:13% for the Tobacco-
3482 dataset while earlier approaches reach only 77:6%. Thus,
a relative error reduction of more than 60% is achieved. For
the large dataset RVL-CDIP, an accuracy of 90:97% is achieved,
corresponding to a relative error reduction of 11:5%.