Publication
Defending Our Privacy With Backdoors
Dominik Hintersdorf; Lukas Struppek; Daniel Schneider-Wetzel; Kristian Kersting
In: Computing Research Repository eprint Journal (CoRR), Vol. abs/2310.08320, Pages 1-19, arXiv, 2023.
Abstract
The proliferation of large AI models trained on uncu-
rated, often sensitive web-scraped data has raised significant privacy
concerns. One of the concerns is that adversaries can extract infor-
mation about the training data using privacy attacks. Unfortunately,
the task of removing specific information from the models without
sacrificing performance is not straightforward and has proven to be
challenging. We propose a rather easy yet effective defense based
on backdoor attacks to remove private information, such as names
and faces of individuals, from vision-language models by fine-tuning
them for only a few minutes instead of re-training them from scratch.
Specifically, by strategically inserting backdoors into text encoders,
we align the embeddings of sensitive phrases with those of neutral
terms–“a person” instead of the person’s actual name. For image
encoders, we map individuals’ embeddings to be removed from the
model to a universal, anonymous embedding. The results of our ex-
tensive experimental evaluation demonstrate the effectiveness of our
backdoor-based defense on CLIP by assessing its performance using a
specialized privacy attack for zero-shot classifiers. Our approach pro-
vides a new “dual-use” perspective on backdoor attacks and presents a
promising avenue to enhance the privacy of individuals within models
trained on uncurated web-scraped data.
