Research Papers

ARXIV Cancer: general cancer Method: unknown

Evolutionarily Stable Stackelberg Equilibrium

Sam Ganzfried
Published 2026-03-19 01:06

This paper introduces the concept of evolutionarily stable Stackelberg equilibrium (SESS) within the context of evolutionary game theory. It explores the interactions between a leading player and a population of followers, focusing on optimal strategy selection and stability against mutations. The authors provide algorithms for computing SESS and validate their approach empirically, highlighting its applicability in biological contexts such as cancer treatment.

Read abstract

We present a new solution concept called evolutionarily stable Stackelberg equilibrium (SESS). We study the Stackelberg evolutionary game setting in which there is a single leading player and a symmetric population of followers. The leader selects an optimal mixed strategy, anticipating that the follower population plays an evolutionarily stable strategy (ESS) in the induced subgame and may satisfy additional ecological conditions. We consider both leader-optimal and follower-optimal selection among ESSs, which arise as special cases of our framework. Prior approaches to Stackelberg evolutionary games either define the follower response via evolutionary dynamics or assume rational best-response behavior, without explicitly enforcing stability against invasion by mutations. We present algorithms for computing SESS in discrete and continuous games, and validate the latter empirically. Our model applies naturally to biological settings; for example, in cancer treatment the leader represents the physician and the followers correspond to competing cancer cell phenotypes.

ARXIV Cancer: unknown Method: multimodal retrieval-augmented generation

Grounded Multimodal Retrieval-Augmented Drafting of Radiology Impressions Using Case-Based Similarity Search

Himadri Samanta
Published 2026-03-18 14:25

This study presents a multimodal retrieval-augmented generation (RAG) system designed for the grounded drafting of chest radiograph impressions. By integrating contrastive image-text embeddings and case-based similarity retrieval, the system aims to enhance the reliability of automated radiology report generation. Experimental results indicate that this approach significantly improves retrieval performance and produces interpretable outputs with citation traceability, thereby increasing trustworthiness in clinical settings.

Read abstract

Automated radiology report generation has gained increasing attention with the rise of deep learning and large language models. However, fully generative approaches often suffer from hallucinations and lack clinical grounding, limiting their reliability in real-world workflows. In this study, we propose a multimodal retrieval-augmented generation (RAG) system for grounded drafting of chest radiograph impressions. The system combines contrastive image-text embeddings, case-based similarity retrieval, and citation-constrained draft generation to ensure factual alignment with historical radiology reports. A curated subset of the MIMIC-CXR dataset was used to construct a multimodal retrieval database. Image embeddings were generated using CLIP encoders, while textual embeddings were derived from structured impression sections. A fusion similarity framework was implemented using FAISS indexing for scalable nearest-neighbor retrieval. Retrieved cases were used to construct grounded prompts for draft impression generation, with safety mechanisms enforcing citation coverage and confidence-based refusal. Experimental results demonstrate that multimodal fusion significantly improves retrieval performance compared to image-only retrieval, achieving Recall@5 above 0.95 on clinically relevant findings. The grounded drafting pipeline produces interpretable outputs with explicit citation traceability, enabling improved trustworthiness compared to conventional generative approaches. This work highlights the potential of retrieval-augmented multimodal systems for reliable clinical decision support and radiology workflow augmentation

ARXIV Cancer: general cancer Method: differential visual prompting

DiffVP: Differential Visual Semantic Prompting for LLM-Based CT Report Generation

Yuhe Tian, Kun Zhang, Haoran Ma, Rui Yan, Yingtai Li, Rongsheng Wang, Shaohua Kevin Zhou
Published 2026-03-18 13:38

This paper presents Differential Visual Prompting (DiffVP), a novel approach for CT report generation that enhances the performance of large language models by focusing on high-level semantic differences in imaging data. The method utilizes a hierarchical difference extractor to identify and emphasize diagnostically relevant visual evidence while minimizing irrelevant anatomical information. Experimental results demonstrate that DiffVP significantly improves report generation accuracy on large-scale benchmarks, outperforming existing methods.

Read abstract

While large language models (LLMs) have advanced CT report generation, existing methods typically encode 3D volumes holistically, failing to distinguish informative cues from redundant anatomical background. Inspired by radiological cognitive subtraction, we propose Differential Visual Prompting (DiffVP), which conditions report generation on explicit, high-level semantic scan-to-reference differences rather than solely on absolute visual features. DiffVP employs a hierarchical difference extractor to capture complementary global and local semantic discrepancies into a shared latent space, along with a difference-to-prompt generator that transforms these signals into learnable visual prefix tokens for LLM conditioning. These difference prompts serve as structured conditioning signals that implicitly suppress invariant anatomy while amplifying diagnostically relevant visual evidence, thereby facilitating accurate report generation without explicit lesion localization. On two large-scale benchmarks, DiffVP consistently outperforms prior methods, improving the average BLEU-1-4 by +10.98 and +4.36, respectively, and further boosts clinical efficacy on RadGenome-ChestCT (F1 score 0.421). All codes will be released at https://github.com/ArielTYH/DiffVP/.

ARXIV Cancer: brain tumor Method: vision-language detection model

LoGSAM: Parameter-Efficient Cross-Modal Grounding for MRI Segmentation

Mohammad Robaitul Islam Bhuiyan, Sheethal Bhat, Melika Qahqaie, Tri-Thien Nguyen, Paula Andrea Perez-Toro, Tomas Arias-Vergara, Andreas Maier
Published 2026-03-18 10:33

This study presents LoGSAM, a parameter-efficient framework designed for the localization and segmentation of brain tumors in MRI scans. By transforming radiologist dictation into text prompts, the method utilizes a combination of automatic speech recognition and clinical natural language processing to guide tumor localization. The approach demonstrates high accuracy, achieving a state-of-the-art dice score of 80.32% and 91.7% case-level accuracy on unseen MRI scans, indicating its effectiveness in clinical applications.

Read abstract

Precise localization and delineation of brain tumors using Magnetic Resonance Imaging (MRI) are essential for planning therapy and guiding surgical decisions. However, most existing approaches rely on task-specific supervised models and are constrained by the limited availability of annotated data. To address this, we propose LoGSAM, a parameter-efficient, detection-driven framework that transforms radiologist dictation into text prompts for foundation-model-based localization and segmentation. Radiologist speech is first transcribed and translated using a pretrained Whisper ASR model, followed by negation-aware clinical NLP to extract tumor-specific textual prompts. These prompts guide text-conditioned tumor localization via a LoRA-adapted vision-language detection model, Grounding DINO (GDINO). The LoRA adaptation updates using 5% of the model parameters, thereby enabling computationally efficient domain adaptation while preserving pretrained cross-modal knowledge. The predicted bounding boxes are used as prompts for MedSAM to generate pixel-level tumor masks without any additional fine-tuning. Conditioning the frozen MedSAM on LoGSAM-derived priors yields a state-of-the-art dice score of 80.32% on BRISC 2025. In addition, we evaluate the full pipeline using German dictations from a board-certified radiologist on 12 unseen MRI scans, achieving 91.7% case-level accuracy. These results highlight the feasibility of constructing a modular, speech-to-segmentation pipeline by intelligently leveraging pretrained foundation models with minimal parameter updates.

ARXIV Cancer: general cancer Method: latent diffusion model

CytoSyn: a Foundation Diffusion Model for Histopathology -- Tech Report

Thomas Duboudin, Xavier Fontaine, Etienne Andrier, Lionel Guillou, Alexandre Filiot, Thalyssa Baiocco-Rodrigues, Antoine Olivier, Alberto Romagnoni, John Klein, Jean-Baptiste Schiratti
Published 2026-03-18 08:58

This paper presents CytoSyn, a foundation latent diffusion model specifically designed for histopathology. The model enables the generation of realistic H&E-stained images and addresses tasks beyond traditional feature extractors. The authors conducted extensive benchmarking and methodological improvements, resulting in CytoSyn-v2, which demonstrates state-of-the-art performance on a dataset of over 10,000 TCGA diagnostic whole-slide images across 32 cancer types.

Read abstract

Computational pathology has made significant progress in recent years, fueling advances in both fundamental disease understanding and clinically ready tools. This evolution is driven by the availability of large amounts of digitized slides and specialized deep learning methods and models. Multiple self-supervised foundation feature extractors have been developed, enabling downstream predictive applications from cell segmentation to tumor sub-typing and survival analysis. In contrast, generative foundation models designed specifically for histopathology remain scarce. Such models could address tasks that are beyond the capabilities of feature extractors, such as virtual staining. In this paper, we introduce CytoSyn, a state-of-the-art foundation latent diffusion model that enables the guided generation of highly realistic and diverse histopathology H&E-stained images, as shown in an extensive benchmark. We explored methodological improvements, training set scaling, sampling strategies and slide-level overfitting, culminating in the improved CytoSyn-v2, and compared our work to PixCell, a state-of-the-art model, in an in-depth manner. This comparison highlighted the strong sensitivity of both diffusion models and performance metrics to preprocessing-specific details such as JPEG compression. Our model has been trained on a dataset obtained from more than 10,000 TCGA diagnostic whole-slide images of 32 different cancer types. Despite being trained only on oncology slides, it maintains state-of-the-art performance generating inflammatory bowel disease images. To support the research community, we publicly release CytoSyn's weights, its training and validation datasets, and a sample of synthetic images in this repository: https://huggingface.co/Owkin-Bioptimus/CytoSyn.

ARXIV Cancer: general cancer Method: physics-informed neural network

Mathematical Modeling of Cancer-Bacterial Therapy: Analysis and Numerical Simulation via Physics-Informed Neural Networks

Ayoub Farkane, David Lassounon
Published 2026-03-18 02:04

This study presents a mathematical model to analyze the interactions between tumor growth and bacterial therapy in cancer treatment. The model consists of five coupled nonlinear reaction-diffusion equations and is solved using a physics-informed neural network (PINN), which allows for accurate predictions without extensive data. The findings indicate that maintaining hypoxic regions in tumors may be crucial for effective long-term therapy.

Read abstract

Bacterial cancer therapy exploits anaerobic bacteria's ability to target hypoxia tumor regions, yet the interactions among tumor growth, bacterial colonization, oxygen levels, immunosuppressive cytokines, and bacterial communication remain poorly quantified. We present a mathematical model of five coupled nonlinear reaction-diffusion equations in a two-dimensional tissue domain. We proved the global well-posedness of the model and identified its steady states to analyze stability. Furthermore, a physics-informed neural network (PINN) solves the system without a mesh and without requiring extensive data. It provides convergence guarantees by combining residual stability and Sobolev approximation error bounds. This results in an overall error rate of O(n^-2 ln^4(n) + N^-1/2), which depends on the network width n and the number of collocation points N. We conducted several numerical experiments, including predicting the tumor's response to therapy. We also performed a sensitivity analysis of certain parameters. The results suggest that long-term therapeutic efficacy may require the maintenance of hypoxia regions in the tumor, or using bacteria that tolerate oxygen better, may be necessary for long-lasting tumor control.

ARXIV Cancer: glioma Method: CycleGAN

SA-CycleGAN-2.5D: Self-Attention CycleGAN with Tri-Planar Context for Multi-Site MRI Harmonization

Ishrith Gowda, Chunwei Liu
Published 2026-03-17 23:49

The study presents SA-CycleGAN-2.5D, a domain adaptation framework designed to address scanner-induced covariate shifts in multi-site neuroimaging analysis. The method integrates architectural innovations such as a 2.5D tri-planar manifold injection and a U-ResNet generator with self-attention to enhance radiomic reproducibility. Evaluations on glioma patients demonstrate a significant reduction in Maximum Mean Discrepancy and improved harmonization of voxel-level images, facilitating reproducible multi-center analysis.

Read abstract

Multi-site neuroimaging analysis is fundamentally confounded by scanner-induced covariate shifts, where the marginal distribution of voxel intensities $P(\mathbf{x})$ varies non-linearly across acquisition protocols while the conditional anatomy $P(\mathbf{y}|\mathbf{x})$ remains constant. This is particularly detrimental to radiomic reproducibility, where acquisition variance often exceeds biological pathology variance. Existing statistical harmonization methods (e.g., ComBat) operate in feature space, precluding spatial downstream tasks, while standard deep learning approaches are theoretically bounded by local effective receptive fields (ERF), failing to model the global intensity correlations characteristic of field-strength bias. We propose SA-CycleGAN-2.5D, a domain adaptation framework motivated by the $HΔH$-divergence bound of Ben-David et al., integrating three architectural innovations: (1) A 2.5D tri-planar manifold injection preserving through-plane gradients $\nabla_z$ at $O(HW)$ complexity; (2) A U-ResNet generator with dense voxel-to-voxel self-attention, surpassing the $O(\sqrt{L})$ receptive field limit of CNNs to model global scanner field biases; and (3) A spectrally-normalized discriminator constraining the Lipschitz constant ($K_D \le 1$) for stable adversarial optimization. Evaluated on 654 glioma patients across two institutional domains (BraTS and UPenn-GBM), our method reduces Maximum Mean Discrepancy (MMD) by 99.1% ($1.729 \to 0.015$) and degrades domain classifier accuracy to near-chance (59.7%). Ablation confirms that global attention is statistically essential (Cohen's $d = 1.32$, $p < 0.001$) for the harder heterogeneous-to-homogeneous translation direction. By bridging 2D efficiency and 3D consistency, our framework yields voxel-level harmonized images that preserve tumor pathophysiology, enabling reproducible multi-center radiomic analysis.

ARXIV Cancer: general cancer Method: unknown

HistoAtlas: A Pan-Cancer Morphology Atlas Linking Histomics to Molecular Programs and Clinical Outcomes

Pierre-Antoine Bannier
Published 2026-03-17 14:36

HistoAtlas is a comprehensive computational atlas designed to extract and analyze 38 interpretable histomic features from a large dataset of 6,745 diagnostic H&E slides across 21 cancer types. The study systematically links these features to various clinical outcomes, including survival and gene expression, while adjusting for covariates and correcting for multiple testing. The atlas reveals significant biological insights, including immune infiltration and morphological subtypes, facilitating large-scale biomarker discovery from standard histological samples.

Read abstract

We present HistoAtlas, a pan-cancer computational atlas that extracts 38 interpretable histomic features from 6,745 diagnostic H&E slides across 21 TCGA cancer types and systematically links every feature to survival, gene expression, somatic mutations, and immune subtypes. All associations are covariate-adjusted, multiple-testing corrected, and classified into evidence-strength tiers. The atlas recovers known biology, from immune infiltration and prognosis to proliferation and kinase signaling, while uncovering compartment-specific immune signals and morphological subtypes with divergent outcomes. Every result is spatially traceable to tissue compartments and individual cells, statistically calibrated, and openly queryable. HistoAtlas enables systematic, large-scale biomarker discovery from routine H&E without specialized staining or sequencing. Data and an interactive web atlas are freely available at https://histoatlas.com .

ARXIV Cancer: general cancer Method: Transformer model

Understanding Cell Fate Decisions with Temporal Attention

Florian Bürger, Martim Dias Gomes, Adrián E. Granada, Noémie Moreau, Katarzyna Bozek
Published 2026-03-17 14:22

This study investigates the non-genetic determinants of cell fate in cancer therapies using a deep learning approach. A Transformer model is developed to predict cell fate from raw long-term live-cell recordings during chemotherapeutic treatment, achieving a balanced accuracy of 0.94 and an F1-score of 0.93. The model's predictions are based on video data, with insights into the temporal distribution of predictive information and the influence of cell morphology and p53 signaling.

Read abstract

Understanding non-genetic determinants of cell fate is critical for developing and improving cancer therapies, as genetically identical cells can exhibit divergent outcomes under the same treatment conditions. In this work, we present a deep learning approach for cell fate prediction from raw long-term live-cell recordings of cancer cell populations under chemotherapeutic treatment. Our Transformer model is trained to predict cell fate directly from raw image sequences, without relying on predefined morphological or molecular features. Beyond classification, we introduce a comprehensive explainability framework for interpreting the temporal and morphological cues guiding the model's predictions. We demonstrate that prediction of cell outcomes is possible based on the video only, our model achieves balanced accuracy of 0.94 and an F1-score of 0.93. Attention and masking experiments further indicate that the signal predictive of the cell fate is not uniquely located in the final frames of a cell trajectory, as reliable predictions are possible up to 10 h before the event. Our analysis reveals distinct temporal distribution of predictive information in the mitotic and apoptotic sequences, as well as the role of cell morphology and p53 signaling in determining cell outcomes. Together, these findings demonstrate that attention-based temporal models enable accurate cell fate prediction while providing biologically interpretable insights into non-genetic determinants of cellular decision-making. The code is available at https://github.com/bozeklab/Cell-Fate-Prediction.

ARXIV Cancer: general cancer Method: multimodal learning

HGP-Mamba: Integrating Histology and Generated Protein Features for Mamba-based Multimodal Survival Risk Prediction

Jing Dai, Chen Wu, Ming Wu, Qibin Zhang, Zexi Wu, Jingdong Zhang, Hongming Xu
Published 2026-03-17 11:57

The paper presents HGP-Mamba, a multimodal framework designed to enhance cancer survival risk prediction by integrating histological and generated protein features. It introduces a protein feature extractor that utilizes pretrained models to derive protein embeddings from Whole Slide Images, alongside histology embeddings. The framework employs Local Interaction-aware Mamba and Global Interaction-enhanced Mamba to effectively capture complex interactions between modalities. Experimental results on four public cancer datasets indicate that HGP-Mamba achieves state-of-the-art performance with improved computational efficiency.

Read abstract

Recent advances in multimodal learning have significantly improved cancer survival risk prediction. However, the joint prognostic potential of protein markers and histopathology images remains underexplored, largely due to the high cost and limited availability of protein expression profiling. To address this challenge, we propose HGP-Mamba, a Mamba-based multimodal framework that efficiently integrates histological with generated protein features for survival risk prediction. Specifically, we introduce a protein feature extractor (PFE) that leverages pretrained foundation models to derive high-throughput protein embeddings directly from Whole Slide Images (WSIs), enabling data-efficient incorporation of molecular information. Together with histology embeddings that capture morphological patterns, we further introduce the Local Interaction-aware Mamba (LiAM) for fine-grained feature interaction and the Global Interaction-enhanced Mamba (GiEM) to promote holistic modality fusion at the slide level, thus capture complex cross-modal dependencies. Experiments on four public cancer datasets demonstrate that HGP-Mamba achieves state-of-the-art performance while maintaining superior computational efficiency compared with existing methods. Our source code is publicly available at https://github.com/Daijing-ai/HGP-Mamba.git.

Find the papers that actually matter