Publication
A Comparison of Imputation Techniques to Improve Lung Cancer Survival Estimation
Nina Wiegers; Sebastian Germer; Christiane Rudolph; Katharina Rausch; Natalie Rath; Heinz Handels
In: Thorsten M. Buzug; Heinz Handels; Christian Hübner; Alfred Mertins; Stefan Müller; Philipp Rostalski; Nico Bunzeck (Hrsg.). Student Conference Proceedings 2024. Student Conference on Medical Engineering Science, Lübeck, Germany, Pages 231-234, ISBN 978-3-945954-73-7, Infinite Science Publishing, 2024.
Abstract
Survival analysis in oncology is important, e.g. for tracking and comparing the success of therapeutic regimens. Cancer
registry data include sociodemographic information, tumor histology and progression information, and overall patient
survival time, but tend to have a high percentage of missing values in some variables, which complicates data analysis and
model fitting. This study investigates the impact of data imputation on survival prediction for lung cancer patients using
data from the Schleswig-Holstein Cancer Registry. We use three methods to impute the data: Simple Imputer, Multivariate
Imputation by Chained Equations (MICE) and Generation of Realistic Tabular Data (GReaT). We then estimate patients
survival in a classification task with above and below one year. GReaT outperforms MICE, showing an increase in f1-
score of 3%. Despite not imputing every missing value, GReaT’s higher performance suggests, that imputing all missing
values, as MICE does, may lead to misleading results.
