Publication
Towards Personalized Cancer Immunotherapy: A Deep Learning Approach for Tumor-Specific T Cell Receptor Discovery
Sara Farmahini Farahani
Mastersthesis, University of Saarland, 2/2026.
Abstract
Personalized cancer immunotherapy aims to harness a patient’s own immune system to selectively eliminate tumor cells. Central to this strategy is the ability of T cells to recognize tumor-specific mutations, known as neoantigens, through their T cell receptors (TCRs). Accurately identifying which TCRs can recognize which neoantigens remains a fundamental challenge in computational immunology and a critical bottleneck in individualized therapy design.
Recent deep learning approaches have demonstrated promising performance in predicting TCR–peptide interactions. However, extensive benchmarking has revealed a persistent generalization gap: models often perform well on peptides encountered during training but fail to generalize to previously unseen peptides. This limitation is particularly problematic in the neoantigen setting, where each patient presents a largely unique mutational landscape. In addition, conventional training strategies frequently rely on random negative sampling, which can introduce bias and obscure true immunological specificity.
In this thesis, we address these limitations through a systematic investigation of dataset construction, model architecture, and evaluation strategy. We introduce a novel controlled negative sampling framework based on peptide similarity clustering, designed to reduce label noise and improve generalization. Our backbone model, a Transformer architecture utilizing AAIndex-based encoding, demonstrates that dataset design exerts a substantial influence on model robustness. Under strict 'hard-split' evaluation protocols (completely unseen peptides), our framework outperforms established state-of-the-art models, improving AUROC by 11% compared to NetTCR-2.0 and 10% compared to ERGO-II.
Building on this backbone, we develop a neoantigen-specific adaptation strategy based on transfer learning. By explicitly contrasting neoantigens with their corresponding wild-type peptides and incorporating mutation-aware physicochemical features, the model is specialized for fine-grained discrimination between tumor-specific mutations and self-antigens.
Finally, the framework is applied to a clinically motivated case study in non-small cell lung cancer (NSCLC), integrating somatic mutation data with MHC binding prediction to prioritize candidate TCR sequences for tumor-associated neoantigens. While the resulting predictions remain model-dependent and do not substitute for structural or functional validation, this work establishes a rigorous and transparent computational framework for improving generalization and robustness in TCR–neoantigen prediction tasks.
