Skip to main content Skip to main navigation

Publication

Towards privacy preserved document image classification: a comprehensive benchmark

Saifullah Saifullah; Dominique Mercier; Stefan Agne; Andreas Dengel; Sheraz Ahmed
In: International Journal on Document Analysis and Recognition (IJDAR), Vol. Special Issue Paper, Springer Nature, 2024.

Abstract

As data-driven AI systems become increasingly integrated into industry, concerns have recently arisen regarding potential privacy breaches and the inadvertent leakage of sensitive user data through the exploitation of these systems. In this paper, we explore the intersection of data privacy and AI-powered document analysis systems, presenting a comprehensive benchmark of well-known privacy-preserving methods for the task of document image classification. In particular, we investigate four different privacy methods—Differential Privacy (DP), Federated Learning (FL), Differentially Private Federated Learning (DP-FL), and Secure Multi-Party Computation (SMPC)—on two well-known document benchmark datasets, namely RVL-CDIP and Tobacco3482. Furthermore, we investigate the performance of each method under a variety of configurations for thorough benchmarking. Finally, the privacy strength of each approach is assessed by subjecting the private models to well-known membership inference attacks. Our results demonstrate that, with sufficient tuning of hyperparameters, Differential Privacy (DP) can achieve reasonable performance on the task of document image classification while also ensuring rigorous privacy constraints, both in standalone and federated learning setups. On the other hand, while FL-based approaches present less implementation complexity and incur little to no loss in performance on the task, they do not offer sufficient protection against privacy attacks. By rigorously benchmarking various privacy approaches, our study paves the way for integrating deep document classification models into industrial pipelines while meeting regulatory and ethical standards, including GDPR and the AI Act 2022.

More links