Research Papers

ARXIV Cancer: non-small cell lung cancer Method: GRU autoencoder

A Patient-Specific Digital Twin for Adaptive Radiotherapy of Non-Small Cell Lung Cancer

Anvi Sud, Jialu Huang, Gregory R. Hart, Keshav Saxena, John Kim, Lauren Tressel, Jun Deng
Published 2026-02-15 01:47

The study presents COMPASS, a patient-specific digital twin architecture designed for adaptive radiotherapy in non-small cell lung cancer (NSCLC). By utilizing sequential imaging and dosimetry data, the system models normal tissue biology as a dynamic process, employing a GRU autoencoder to predict toxicity based on organ-specific trajectories. The findings indicate that this AI-driven approach can provide early warnings for toxicity, enhancing the precision of radiotherapy treatments.

Read abstract

Radiotherapy continues to become more precise and data dense, with current treatment regimens generating high frequency imaging and dosimetry streams ideally suited for AI driven temporal modeling to characterize how normal tissues evolve with time. Each fraction in biologically guided radiotherapy(BGRT) treated non small cell lung cancer (NSCLC) patients records new metabolic, anatomical, and dose information. However, clinical decision making is largely informed by static, population based NTCP models which overlook the dynamic, unique biological trajectories encoded in sequential data. We developed COMPASS (Comprehensive Personalized Assessment System) for safe radiotherapy, functioning as a temporal digital twin architecture utilizing per fraction PET, CT, dosiomics, radiomics, and cumulative biologically equivalent dose (BED) kinetics to model normal tissue biology as a dynamic time series process. A GRU autoencoder was employed to learn organ specific latent trajectories, which were classified via logistic regression to predict eventual CTCAE grade 1 or higher toxicity. Eight NSCLC patients undergoing BGRT contributed to the 99 organ fraction observations covering 24 organ trajectories (spinal cord, heart, and esophagus). Despite the small cohort, intensive temporal phenotyping allowed for comprehensive analysis of individual dose response dynamics. Our findings revealed a viable AI driven early warning window, as increasing risk ratings occurred from several fractions before clinical toxicity. The dense BED driven representation revealed biologically relevant spatial dose texture characteristics that occur before toxicity and are averaged out with traditional volume based dosimetry. COMPASS establishes a proof of concept for AI enabled adaptive radiotherapy, where treatment is guided by a continually updated digital twin that tracks each patients evolving biological response.

ARXIV Cancer: breast cancer Method: hybrid vision model

MamaDino: A Hybrid Vision Model for Breast Cancer 3-Year Risk Prediction

Ruggiero Santeramo, Igor Zubarev, Florian Jug
Published 2026-02-14 23:56

The study presents MamaDino, a hybrid vision model designed for predicting 3-year breast cancer risk using lower-resolution mammograms. By combining convolutional and transformer-based approaches with explicit contralateral asymmetry modeling, the model achieves state-of-the-art prediction performance while utilizing significantly fewer input pixels. Evaluation on diverse cohorts shows improved discrimination in risk prediction compared to existing models.

Read abstract

Breast cancer screening programmes increasingly seek to move from one-size-fits-all interval to risk-adapted and personalized strategies. Deep learning (DL) has enabled image-based risk models with stronger 1- to 5-year prediction than traditional clinical models, but leading systems (e.g., Mirai) typically use convolutional backbones, very high-resolution inputs (>1M pixels) and simple multi-view fusion, with limited explicit modelling of contralateral asymmetry. We hypothesised that combining complementary inductive biases (convolutional and transformer-based) with explicit contralateral asymmetry modelling would allow us to match state-of-the-art 3-year risk prediction performance even when operating on substantially lower-resolution mammograms, indicating that using less detailed images in a more structured way can recover state-of-the-art accuracy. We present MamaDino, a mammography-aware multi-view attentional DINO model. MamaDino fuses frozen self-supervised DINOv3 ViT-S features with a trainable CNN encoder at 512x512 resolution, and aggregates bilateral breast information via a BilateralMixer to output a 3-year breast cancer risk score. We train on 53,883 women from OPTIMAM (UK) and evaluate on matched 3-year case-control cohorts: an in-distribution test set from four screening sites and an external out-of-distribution cohort from an unseen site. At breast-level, MamaDino matches Mirai on both internal and external tests while using ~13x fewer input pixels. Adding the BilateralMixer improves discrimination to AUC 0.736 (vs 0.713) in-distribution and 0.677 (vs 0.666) out-of-distribution, with consistent performance across age, ethnicity, scanner, tumour type and grade. These findings demonstrate that explicit contralateral modelling and complementary inductive biases enable predictions that match Mirai, despite operating on substantially lower-resolution mammograms.

ARXIV Cancer: cervical cancer Method: optimization framework

ALMo: Interactive Aim-Limit-Defined, Multi-Objective System for Personalized High-Dose-Rate Brachytherapy Treatment Planning and Visualization for Cervical Cancer

Edward Chen, Natalie Dullerud, Pang Wei Koh, Thomas Niedermayr, Elizabeth Kidd, Sanmi Koyejo, Carlos Guestrin
Published 2026-02-14 08:24

This paper presents ALMo, an interactive decision support system designed for High-Dose-Rate brachytherapy treatment planning in cervical cancer. The system utilizes a novel optimization framework to minimize manual input and allows clinicians to navigate dosimetric tradeoffs effectively. A retrospective evaluation of 25 clinical cases showed that ALMo consistently met or exceeded the quality of manual planning, with significant improvements in dosimetry and reduced planning time.

Read abstract

In complex clinical decision-making, clinicians must often track a variety of competing metrics defined by aim (ideal) and limit (strict) thresholds. Sifting through these high-dimensional tradeoffs to infer the optimal patient-specific strategy is cognitively demanding and historically prone to variability. In this paper, we address this challenge within the context of High-Dose-Rate (HDR) brachytherapy for cervical cancer, where planning requires strictly managing radiation hot spots while balancing tumor coverage against organ sparing. We present ALMo (Aim-Limit-defined Multi-Objective system), an interactive decision support system designed to infer and operationalize clinician intent. ALMo employs a novel optimization framework that minimizes manual input through automated parameter setup and enables flexible control over toxicity risks. Crucially, the system allows clinicians to navigate the Pareto surface of dosimetric tradeoffs by directly manipulating intuitive aim and limit values. In a retrospective evaluation of 25 clinical cases, ALMo generated treatment plans that consistently met or exceeded manual planning quality, with 65% of cases demonstrating dosimetric improvements. Furthermore, the system significantly enhanced efficiency, reducing average planning time to approximately 17 minutes, compared to the conventional 30-60 minutes. While validated in brachytherapy, ALMo demonstrates a generalized framework for streamlining interaction in multi-criteria clinical decision-making.

ARXIV Cancer: glioma Method: Attention-Gated Recurrent Residual U-Net

Attention-gated U-Net model for semantic segmentation of brain tumors and feature extraction for survival prognosis

Rut Pate, Snehal Rajput, Mehul S. Raval, Rupal A. Kapdi, Mohendra Roy
Published 2026-02-14 07:48

This study introduces an Attention-Gated Recurrent Residual U-Net (R2U-Net) model designed for the semantic segmentation of gliomas, a type of brain tumor. The model integrates residual, recurrent, and triplanar architectures to enhance feature representation and segmentation accuracy, achieving a Dice Similarity Score of 0.900 on the BraTS2021 validation set. Additionally, it extracts features for survival prognosis, demonstrating potential for improved treatment planning.

Read abstract

Gliomas, among the most common primary brain tumors, vary widely in aggressiveness, prognosis, and histology, making treatment challenging due to complex and time-intensive surgical interventions. This study presents an Attention-Gated Recurrent Residual U-Net (R2U-Net) based Triplanar (2.5D) model for improved brain tumor segmentation. The proposed model enhances feature representation and segmentation accuracy by integrating residual, recurrent, and triplanar architectures while maintaining computational efficiency, potentially aiding in better treatment planning. The proposed method achieves a Dice Similarity Score (DSC) of 0.900 for Whole Tumor (WT) segmentation on the BraTS2021 validation set, demonstrating performance comparable to leading models. Additionally, the triplanar network extracts 64 features per planar model for survival days prediction, which are reduced to 28 using an Artificial Neural Network (ANN). This approach achieves an accuracy of 45.71%, a Mean Squared Error (MSE) of 108,318.128, and a Spearman Rank Correlation Coefficient (SRC) of 0.338 on the test dataset.

ARXIV Cancer: unknown Method: multimodal learning

Thinking Like a Radiologist: A Dataset for Anatomy-Guided Interleaved Vision Language Reasoning in Chest X-ray Interpretation

Yichen Zhao, Zelin Peng, Piao Yang, Xiaokang Yang, Wei Shen
Published 2026-02-13 11:49

This paper introduces MMRad-IVL-22K, a large-scale dataset designed for interleaved visual language reasoning in chest X-ray interpretation. The dataset reflects the workflow of radiologists, combining visual rationales with textual descriptions to enhance diagnostic accuracy. Experimental results indicate that models utilizing this dataset significantly improve clinical accuracy and report quality compared to traditional text-only reasoning methods.

Read abstract

Radiological diagnosis is a perceptual process in which careful visual inspection and language reasoning are repeatedly interleaved. Most medical large vision language models (LVLMs) perform visual inspection only once and then rely on text-only chain-of-thought (CoT) reasoning, which operates purely in the linguistic space and is prone to hallucination. Recent methods attempt to mitigate this issue by introducing visually related coordinates, such as bounding boxes. However, these remain a pseudo-visual solution: coordinates are still text and fail to preserve rich visual details like texture and density. Motivated by the interleaved nature of radiological diagnosis, we introduce MMRad-IVL-22K, the first large-scale dataset designed for natively interleaved visual language reasoning in chest X-ray interpretation. MMRad-IVL-22K reflects a repeated cycle of reasoning and visual inspection workflow of radiologists, in which visual rationales complement textual descriptions and ground each step of the reasoning process. MMRad-IVL-22K comprises 21,994 diagnostic traces, enabling systematic scanning across 35 anatomical regions. Experimental results on advanced closed-source LVLMs demonstrate that report generation guided by multimodal CoT significantly outperforms that guided by text-only CoT in clinical accuracy and report quality (e.g., 6\% increase in the RadGraph metric), confirming that high-fidelity interleaved vision language evidence is a non-substitutable component of reliable medical AI. Furthermore, benchmarking across seven state-of-the-art open-source LVLMs demonstrates that models fine-tuned on MMRad-IVL-22K achieve superior reasoning consistency and report quality compared with both general-purpose and medical-specific LVLMs. The project page is available at https://github.com/qiuzyc/thinking_like_a_radiologist.

ARXIV Cancer: lung cancer Method: 3D convolutional neural networks

Lung nodule classification on CT scan patches using 3D convolutional neural networks

Volodymyr Sydorskyi
Published 2026-02-13 09:26

This study addresses the challenge of early detection of lung cancer by developing an automated system for lung nodule classification using 3D convolutional neural networks. The authors introduce methodological improvements including an advanced CT scan cropping strategy, target filtering techniques, and novel augmentation methods to enhance model robustness. The proposed system demonstrates state-of-the-art performance, achieving high Macro and Binary ROC AUC scores on the LIDC-IDRI dataset.

Read abstract

Lung cancer remains one of the most common and deadliest forms of cancer worldwide. The likelihood of successful treatment depends strongly on the stage at which the disease is diagnosed. Therefore, early detection of lung cancer represents a critical medical challenge. However, this task poses significant difficulties for thoracic radiologists due to the large number of studies to review, the presence of multiple nodules within the lungs, and the small size of many nodules, which complicates visual assessment. Consequently, the development of automated systems that incorporate highly accurate and computationally efficient lung nodule detection and classification modules is essential. This study introduces three methodological improvements for lung nodule classification: (1) an advanced CT scan cropping strategy that focuses the model on the target nodule while reducing computational cost; (2) target filtering techniques for removing noisy labels; (3) novel augmentation methods to improve model robustness. The integration of these techniques enables the development of a robust classification subsystem within a comprehensive Clinical Decision Support System for lung cancer detection, capable of operating across diverse acquisition protocols, scanner types, and upstream models (segmentation or detection). The multiclass model achieved a Macro ROC AUC of 0.9176 and a Macro F1-score of 0.7658, while the binary model reached a Binary ROC AUC of 0.9383 and a Binary F1-score of 0.8668 on the LIDC-IDRI dataset. These results outperform several previously reported approaches and demonstrate state-of-the-art performance for this task.

ARXIV Cancer: unknown Method: Negation-Aware Selective Training

Layer-Specific Fine-Tuning for Improved Negation Handling in Medical Vision-Language Models

Ali Abbasi, Mehdi Taghipour, Rahmatollah Beheshti
Published 2026-02-13 00:44

This paper addresses the challenge of negation handling in vision-language models (VLMs) used in clinical reporting. It introduces a diagnostic benchmark and a contextual clinical negation dataset to evaluate and improve the models' ability to distinguish between affirmative and negated medical statements. The authors propose a novel method called Negation-Aware Selective Training (NAST), which optimizes layer-wise updates based on causal contributions to negation processing, resulting in enhanced model performance without compromising overall alignment.

Read abstract

Negation is a fundamental linguistic operation in clinical reporting, yet vision-language models (VLMs) frequently fail to distinguish affirmative from negated medical statements. To systematically characterize this limitation, we introduce a radiology-specific diagnostic benchmark that evaluates polarity sensitivity under controlled clinical conditions, revealing that common medical VLMs consistently confuse negated and non-negated findings. To enable learning beyond simple condition absence, we further construct a contextual clinical negation dataset that encodes structured claims and supports attribute-level negations involving location and severity. Building on these resources, we propose Negation-Aware Selective Training (NAST), an interpretability-guided adaptation method that uses causal tracing effects (CTEs) to modulate layer-wise gradient updates during fine-tuning. Rather than applying uniform learning rates, NAST scales each layer's update according to its causal contribution to negation processing, transforming mechanistic interpretability signals into a principled optimization rule. Experiments demonstrate improved discrimination of affirmative and negated clinical statements without degrading general vision-language alignment, highlighting the value of causal interpretability for targeted model adaptation in safety-critical medical settings. Code and resources are available at https://github.com/healthylaife/NAST.

ARXIV Cancer: triple-negative breast cancer Method: multimodal learning

Prototype-driven fusion of pathology and spatial transcriptomics for interpretable survival prediction

Lihe Liu, Xiaoxi Pan, Yinyin Yuan, Lulu Shang
Published 2026-02-12 21:59

The study introduces PathoSpatial, an interpretable framework that integrates whole slide images and spatial transcriptomics for survival prediction in cancer patients. It employs a multi-level experts architecture with task-guided prototype learning to enhance interpretability and performance. The method was evaluated on a cohort of triple-negative breast cancer, demonstrating strong performance across multiple survival endpoints and providing biologically grounded explanations for prognostic factors.

Read abstract

Whole slide images (WSIs) enable weakly supervised prognostic modeling via multiple instance learning (MIL). Spatial transcriptomics (ST) preserves in situ gene expression, providing a spatial molecular context that complements morphology. As paired WSI-ST cohorts scale to population level, leveraging their complementary spatial signals for prognosis becomes crucial; however, principled cross-modal fusion strategies remain limited for this paradigm. To this end, we introduce PathoSpatial, an interpretable end-to-end framework integrating co-registered WSIs and ST to learn spatially informed prognostic representations. PathoSpatial uses task-guided prototype learning within a multi-level experts architecture, adaptively orchestrating unsupervised within-modality discovery with supervised cross-modal aggregation. By design, PathoSpatial substantially strengthens interpretability while maintaining discriminative ability. We evaluate PathoSpatial on a triple-negative breast cancer cohort with paired ST and WSIs. PathoSpatial delivers strong and consistent performance across five survival endpoints, achieving superior or comparable performance to leading unimodal and multimodal methods. PathoSpatial inherently enables post-hoc prototype interpretation and molecular risk decomposition, providing quantitative, biologically grounded explanations, highlighting candidate prognostic factors. We present PathoSpatial as a proof-of-concept for scalable and interpretable multimodal learning for spatial omics-pathology fusion.

ARXIV Cancer: brain tumor Method: ResNet

Brain Tumor Classifiers Under Attack: Robustness of ResNet Variants Against Transferable FGSM and PGD Attacks

Ryan Deem, Garrett Goodman, Waqas Majeed, Md Abdullah Al Hafiz Khan, Michail S. Alexiou
Published 2026-02-12 06:58

This study investigates the adversarial robustness of various ResNet-based architectures for brain tumor classification using MRI data. The models, including BrainNet, BrainNeXt, and DilationNet, are evaluated against gradient-based adversarial attacks such as FGSM and PGD. Results indicate that BrainNeXt demonstrates the highest robustness to black-box attacks, while the other models show increased vulnerability, particularly under certain conditions. The findings emphasize the need to balance classification performance with adversarial resilience for effective clinical application.

Read abstract

Adversarial robustness in deep learning models for brain tumor classification remains an underexplored yet critical challenge, particularly for clinical deployment scenarios involving MRI data. In this work, we investigate the susceptibility and resilience of several ResNet-based architectures, referred to as BrainNet, BrainNeXt and DilationNet, against gradient-based adversarial attacks, namely FGSM and PGD. These models, based on ResNet, ResNeXt, and dilated ResNet variants respectively, are evaluated across three preprocessing configurations (i) full-sized augmented, (ii) shrunk augmented and (iii) shrunk non-augmented MRI datasets. Our experiments reveal that BrainNeXt models exhibit the highest robustness to black-box attacks, likely due to their increased cardinality, though they produce weaker transferable adversarial samples. In contrast, BrainNet and Dilation models are more vulnerable to attacks from each other, especially under PGD with higher iteration steps and $α$ values. Notably, shrunk and non-augmented data significantly reduce model resilience, even when the untampered test accuracy remains high, highlighting a key trade-off between input resolution and adversarial vulnerability. These results underscore the importance of jointly evaluating classification performance and adversarial robustness for reliable real-world deployment in brain MRI analysis.

ARXIV Cancer: breast cancer Method: variational autoencoders

Locally Interpretable Individualized Treatment Rules for Black-Box Decision Models

Yasin Khadem Charvadeh, Katherine S. Panageas, Yuan Chen
Published 2026-02-12 03:27

The paper presents the Locally Interpretable Individualized Treatment Rule (LI-ITR) method, which aims to optimize treatment decisions by tailoring them to individual patient characteristics. This method combines flexible machine learning models with locally interpretable approximations to create subject-specific treatment rules. Simulation studies demonstrate that LI-ITR effectively recovers local coefficients and optimal treatment strategies, with a specific application to precision side-effect management in breast cancer.

Read abstract

Individualized treatment rules (ITRs) aim to optimize healthcare by tailoring treatment decisions to patient-specific characteristics. Existing methods typically rely on either interpretable but inflexible models or highly flexible black-box approaches that sacrifice interpretability; moreover, most impose a single global decision rule across patients. We introduce the Locally Interpretable Individualized Treatment Rule (LI-ITR) method, which combines flexible machine learning models to accurately learn complex treatment outcomes with locally interpretable approximations to construct subject-specific treatment rules. LI-ITR employs variational autoencoders to generate realistic local synthetic samples and learns individualized decision rules through a mixture of interpretable experts. Simulation studies show that LI-ITR accurately recovers true subject-specific local coefficients and optimal treatment strategies. An application to precision side-effect management in breast cancer illustrates the necessity of flexible predictive modeling and highlights the practical utility of LI-ITR in estimating optimal treatment rules while providing transparent, clinically interpretable explanations.

Find the papers that actually matter