Publikation
Tab-Distillation: Impacts of Dataset Distillation on Tabular Data For Outlier Detection
Dayananda Herurkar; Federico Raue; Andreas Dengel
In: Association for Computing Machinery (ACM) (Hrsg.). Proceedings of the 5th ACM International Conference on AI in Finance. International Conference on AI in Finance (ICAIF-2024), ICAIF '24, located at ICAIF, November 14-17, Brooklyn, New York, New York, USA, Pages 804-812, Proceedings of the 5th ACM International Conference on AI in Finance (ICAIF '24), No. 9, ISBN 9798400710810, Association for Computing Machinery, New York, NY, USA, 11/2024.
Zusammenfassung
Dataset distillation aims to replace large training sets with significantly smaller synthetic sets while preserving essential information. This method reduces the training costs of advanced deep learning models and is widely used in the image domain. Among various distillation methods, "Dataset Condensation with Distribution Matching (DM)" stands out for its low synthesis cost and minimal hyperparameter tuning. Due to its computationally economical nature, DM is applicable to realistic scenarios, such as industries with large tabular datasets. However, its use in tabular data has not been extensively explored. In this study, we apply DM to tabular datasets for outlier detection. Our findings show that distillation effectively addresses class imbalance, a common issue in these datasets. The synthetic datasets offer better sample representation and class separation between inliers and outliers. They also maintain high feature correlation making them resilient against feature pruning. Classification models trained on these distilled datasets perform faster and better that will enhance outlier detection in industries that rely on tabular data.