Publication
Critical Evaluation of Biologically Informed Neural Networks: Validating Biological Pathway Representation
Tanya Amit Tyagi
Mastersthesis, University of Saarland, 2026.
Abstract
High-dimensional omics data are widely used for biomarker discovery, but many
predictive models struggle with interpretability and reproducibility. Standard
machine learning and deep learning methods can achieve good performance, yet
their predictions are often difficult to relate to known biological mechanisms.
Biologically Informed Neural Networks (BINNs) aim to address this limitation
by embedding curated biological pathway information directly into the model
architecture, allowing predictions to be traced to genes, proteins, and pathways.
This thesis aims to critically evaluate whether BINNs provide practical ad
vantages over conventional machine learning models in single-omics biomarker
discovery. Specifically, the study assesses whether biologically constrained
architectures can preserve predictive performance while offering structured and
interpretable representations of disease-related signals. BINNs are evaluated
across three distinct omics modalities: plasma proteomics, mRNA expression,
and microRNA expression, covering septic acute kidney injury and two cancer
cohorts. Their performance is compared against baseline models, including
random forests, fully connected neural networks, and Bayesian hierarchical
logistic regression. This thesis examines predictive accuracy, generalisation
behaviour, and pathway-level interpretability across datasets by using nested
cross-validation and consistent evaluation metrics. The results show that BINNs
can achieve competitive performance in settings with strong biological signals,
particularly in proteomics, while offering transparent pathway-level insights.
However, their advantages are less consistent in noisier or more heterogeneous
transcriptomic and microrna datasets. Overall, the findings highlight both
the strengths and limitations of biologically informed neural architectures
and highlight the conditions under which they are most useful for biomarker
discovery.
