Skip to main content Skip to main navigation

Publication

Using Data Synthesis to Improve Length of Stay Predictions for Patients with Rare Diagnoses

Robert Simon Schiff; Sebastian Wolfrum; Ralf Möller; Mattis Hartwig
In: The International FLAIRS Conference Proceedings, Vol. 37 (2024) - Special Track: AI in Healthcare Informatics, No. 1, Pages 1-8, The Florida Artificial Intelligence Society, Florida, 5/2024.

Abstract

In healthcare, managing small patient cohorts, particularly those with rare diseases, presents a unique challenge due to the scarcity of data required for effective machine learning applications. Addressing this issue, our paper investigates if a specific conditional data synthesis prior to learning the machine learning model using the CTGAN architecture improves the result. Data synthesis refers to the artificial generation of synthetic data with certain properties from the original data. We choose the specific learning task of predicting hospital length of stay (LoS) of patients leaving the emergency department. It can, e.g., be used to predict the bed occupancy in a hospital and thus enables better planning. The accuracy of the LoS-prediction is strongly dependent on rarity of the patients disease, ranging from an acceptable accuracy, e.g., for often occurring homogeneous cases to worse accuracy for, e.g., inhomogeneous and rare ones. To increase the accuracy for such cohorts, we enrich the dataset with new, synthesized patient admissions. Then, for each cohort, a model is trained to predict the LoS of a patient of this cohort. Our experiments show that adding synthetic data is able to increase the accuracy for the majority of cohorts. Indicators for a benefit of synthetic data seem to be cohorts that have a high LoS with high variance within in cohort.

Projects

More links