Skip to main content Skip to main navigation

Publication

A Comparison of Imputation Techniques to Improve Lung Cancer Survival Estimation

Nina Wiegers; Sebastian Germer; Christiane Rudolph; Katharina Rausch; Natalie Rath; Heinz Handels
In: Thorsten M. Buzug; Heinz Handels; Christian Hübner; Alfred Mertins; Stefan Müller; Philipp Rostalski; Nico Bunzeck (Hrsg.). Student Conference Proceedings 2024. Student Conference on Medical Engineering Science, Lübeck, Germany, Pages 231-234, ISBN 978-3-945954-73-7, Infinite Science Publishing, 2024.

Abstract

Survival analysis in oncology is important, e.g. for tracking and comparing the success of therapeutic regimens. Cancer registry data include sociodemographic information, tumor histology and progression information, and overall patient survival time, but tend to have a high percentage of missing values in some variables, which complicates data analysis and model fitting. This study investigates the impact of data imputation on survival prediction for lung cancer patients using data from the Schleswig-Holstein Cancer Registry. We use three methods to impute the data: Simple Imputer, Multivariate Imputation by Chained Equations (MICE) and Generation of Realistic Tabular Data (GReaT). We then estimate patients survival in a classification task with above and below one year. GReaT outperforms MICE, showing an increase in f1- score of 3%. Despite not imputing every missing value, GReaT’s higher performance suggests, that imputing all missing values, as MICE does, may lead to misleading results.

Projects