Skip to main content Skip to main navigation

Publication

Tabular Data Adapters: Pseudo-Labeling Unlabeled Private Tabular Data for Outlier Detection

Dayananda Herurkar; Jörn Hees; Vesselin Tzvetkov; Andreas Dengel
In: IEEE Access (Hrsg.). IEEE Access (IEEE), Vol. 14, Pages 25691-25705, IEEE, 2/2026.

Abstract

The remarkable success of Deep Learning approaches is often demonstrated on large public datasets. However, when applying such approaches to internal, private datasets, one frequently faces challenges arising from structural differences in the datasets, domain shift, and the lack of labels. This leads practitioners to face a cold-start problem: they cannot determine which model or configuration is reliable without labels. Alternatives such as manual annotation, heuristic thresholding, or blindly applying models are either costly or unreliable. In this work, we introduce Tabular Data Adapters (TDA), a method for generating pseudo-labels for unlabeled tabular data in outlier detection (OD) tasks. By identifying statistically similar public datasets and transforming private data (based on a shared autoencoder) into a format compatible with state-of-the-art public models, our approach enables the generation of weak labels. These labels provide a starting point for training, tuning, and calibrating OD models in label-scarce scenarios. It can thereby help to mitigate the cold start problem of labeling by basing on existing outlier detection models for public datasets. In experiments on 50 tabular datasets across different domains, we demonstrate that our method is able to provide more accurate annotations than baseline approaches while reducing computational time. Our approach offers a scalable, efficient, and cost-effective solution to bridge the gap between public research models and real-world industrial applications.