Augmenting Data with Generative Adversarial Networks to Improve Machine Learning-Based Fraud Detection

Philipp Fukas, Lukas Menzel, Oliver Thomas

In: Wirtschaftsinformatik Proceedings (2022). Internationale Tagung Wirtschaftsinformatik (WI-2022) February 21-23 Erlangen-Nürnberg Germany Springer 2022.


While current machine learning methods can detect financial fraud more effectively, they suffer from a common problem: dataset imbalance, i.e. there are substantially more non-fraud than fraud cases. In this paper, we propose the application of generative adversarial networks (GANs) to generate synthetic fraud cases on a dataset of public firms convicted by the United States Securities and Exchange Commission for accounting malpractice. This approach aims to increase the prediction accuracy of a downstream logit, support vector machine (SVM), and eXtreme Gradient Boosting (XGBoost) classifier by training on a more well-balanced dataset. While the results indicate that a state-of-the-art machine learning model like XGBoost can outperform previous fraud detection models on the same data, generating synthetic fraud cases before applying a machine learning model does not improve performance.

German Research Center for Artificial Intelligence
Deutsches Forschungszentrum für Künstliche Intelligenz