Publication

A benchmark and survey of fully unsupervised concept drift detectors on real-world data streams

Daniel Lukats; Oliver Zielinski; Axel Hahn; Frederic Theodor Stahl

In: International Journal of Data Science and Analytics, Vol. 18, No. 3, Pages 1-31, Springer Nature, Switzerland, 8/2024.

Abstract

Concept drift detection techniques can be used to discover substantial changes of the patterns encoded in data streams in real-time. If left unaddressed, these changes can render deployed machine learning models unreliable because their training data no longer matches the patterns present in the data stream. Most algorithms proposed in the literature depend on the immediate availability of ground truth class labels. This is unrealistic for many applications due to the associated cost of labeling. Therefore, this study reviews the availability of fully unsupervised concept drift detectors, which can operate entirely without labeled data. Ten algorithms are analyzed in terms of architectural choices, core ideas and assumptions about data because they fulfilled several inclusion criteria designed to ensure faithful and reliable implementations. Seven of these algorithms are evaluated with common concept drift detection metrics on eleven real-world data streams; the remaining three performed too slow or depended on chance. Based on the results of these experiments, three concept drift detectors—Discriminative Drift Detector, Image-Based Drift Detector and Semi-Parametric Log-Likelihood—can be recommended depending on the desired target metric. This study further reveals issues with the evaluation metrics Mean Time Ratio and lift-per-drift. Finally, it highlights open research challenges.

A benchmark and survey of fully unsupervised concept drift detectors on real-world data streams

Abstract

More links