Publication
Comparative Study between Traditional Machine Learning and Deep Learning Approaches for Text Classification
Cannannore Nidhi Narayana Kamath; Syed Saqib Bukhari; Andreas Dengel
In: DocEng. ACM Symposium on Document Engineering (DocEng-2018), August 28-31, Halifax, Nova Scotia, Canada, ACM, 2018.
Abstract
In this contemporaneous world, it is an obligation for any orga-
nization working with documents to end up with the insipid task
of classifying truckload of documents, which is the nascent stage
of venturing into the realm of information retrieval and data min-
ing. But classification of such humongous documents into multiple
classes, calls for a lot of time and labor. Hence a system which
could classify these documents with acceptable accuracy would
be of an unfathomable help in document engineering. We have
created multiple classifiers for document classification and com-
pared their accuracy on raw and processed data. We have garnered
data used in a corporate organization as well as publicly available
data for comparison. Data is processed by removing the stop-words
and stemming is implemented to produce root words. Multiple
traditional machine learning techniques like Naive Bayes, Logistic
Regression, Support Vector Machine, Random forest Classifier and
Multi-Layer Perceptron are used for classification of documents.
Classifiers are applied on raw and processed data separately and
their accuracy is noted. Along with this, Deep learning technique
such as Convolution Neural Network is also used to classify the
data and its accuracy is compared with that of traditional machine
learning techniques. We are also exploring hierarchical classifiers
for classification of classes and subclasses. The system classifies
the data faster and with better accuracy than if done manually. The
results are discussed in the results and evaluation section.