Publikation
Leveraging Diffusion-Based Image Variations for Robust Training on Poisoned Data
Lukas Struppek; Martin B. Hentschel; Clifton Poth; Dominik Hintersdorf; Kristian Kersting
In: Computing Research Repository eprint Journal (CoRR), Vol. abs/2310.06372, Pages 1-12, arXiv, 2023.
Zusammenfassung
Backdoor attacks pose a serious security threat for training neural networks as
they surreptitiously introduce hidden functionalities into a model. Such backdoors
remain silent during inference on clean inputs, evading detection due to inconspicu-
ous behavior. However, once a specific trigger pattern appears in the input data, the
backdoor activates, causing the model to execute its concealed function. Detecting
such poisoned samples within vast datasets is virtually impossible through manual
inspection. To address this challenge, we propose a novel approach that enables
model training on potentially poisoned datasets by utilizing the power of recent
diffusion models. Specifically, we create synthetic variations of all training samples,
leveraging the inherent resilience of diffusion models to potential trigger patterns
in the data. By combining this generative approach with knowledge distillation, we
produce student models that maintain their general performance on the task while
exhibiting robust resistance to backdoor triggers.
