Research Papers

ARXIV Cancer: unknown Method: multimodal learning

Beyond Medical Diagnostics: How Medical Multimodal Large Language Models Think in Space

Quoc-Huy Trinh, Xi Ding, Yang Liu, Zhenyue Qin, Xingjian Li, Gorkem Durak, Halil Ertugrul Aktas, Elif Keles, Ulas Bagci, Min Xu
Published 2026-03-14 07:17

This study addresses the gap in visual spatial intelligence for medical image interpretation in Multimodal Large Language Models (MLLMs) applied to 3D imaging. The authors introduce SpatialMed, a benchmark that synthesizes spatial visual question-answering data through a novel pipeline involving computational tools and expert validation. Evaluations on various MLLMs indicate a deficiency in spatial reasoning capabilities for medical imaging tasks.

Read abstract

Visual spatial intelligence is critical for medical image interpretation, yet remains largely unexplored in Multimodal Large Language Models (MLLMs) for 3D imaging. This gap persists due to a systemic lack of datasets featuring structured 3D spatial annotations beyond basic labels. In this study, we introduce an agentic pipeline that autonomously synthesizes spatial visual question-answering (VQA) data by orchestrating computational tools such as volume and distance calculators with multi-agent collaboration and expert radiologist validation. We present SpatialMed, the first comprehensive benchmark for evaluating 3D spatial intelligence in medical MLLMs, comprising nearly 10K question-answer pairs across multiple organs and tumor types. Our evaluations on 14 state-of-the-art MLLMs and extensive analyses reveal that current models lack robust spatial reasoning capabilities for medical imaging.

ARXIV Cancer: general cancer Method: hierarchical fusion framework

Advancing Cancer Prognosis with Hierarchical Fusion of Genomic, Proteomic and Pathology Imaging Data from a Systems Biology Perspective

Junjie Zhou, Bao Xue, Meiling Wang, Wei Shao, Daoqiang Zhang
Published 2026-03-14 06:30

This paper presents HFGPI, a hierarchical fusion framework designed to improve cancer prognosis by integrating genomic, proteomic, and histology imaging data. The framework addresses limitations in existing methods by modeling the biological hierarchy between these data types. Key innovations include the Molecular Tokenizer for encoding molecular representations and Gene-Regulated Protein Fusion for capturing gene-protein relationships. Experimental results indicate that HFGPI outperforms current state-of-the-art approaches in survival outcome prediction.

Read abstract

To enhance the precision of cancer prognosis, recent research has increasingly focused on multimodal survival methods by integrating genomic data and histology images. However, current approaches overlook the fact that the proteome serves as an intermediate layer bridging genomic alterations and histopathological features while providing complementary biological information essential for survival prediction. This biological reality exposes another architectural limitation: existing integrative analysis studies fuse these heterogeneous data sources in a flat manner that fails to capture their inherent biological hierarchy. To address these limitations, we propose HFGPI, a hierarchical fusion framework that models the biological progression from genes to proteins to histology images from a systems biology perspective. Specifically, we introduce Molecular Tokenizer, a molecular encoding strategy that integrates identity embeddings with expression profiles to construct biologically informed representations for genes and proteins. We then develop Gene-Regulated Protein Fusion (GRPF), which employs graph-aware cross-attention with structure-preserving alignment to explicitly model gene-protein regulatory relationships and generate gene-regulated protein representations. Additionally, we propose Protein-Guided Hypergraph Learning (PGHL), which establishes associations between proteins and image patches, leveraging hypergraph convolution to capture higher-order protein-morphology relationships. The final features are progressively fused across hierarchical layers to achieve precise survival outcome prediction. Extensive experiments on five benchmark datasets demonstrate the superiority of HFGPI over state-of-the-art methods.

ARXIV Cancer: glioma Method: persistent homology

Brain Tumor Classification from 3D MRI Using Persistent Homology and Betti Features: A Topological Data Analysis Approach on BraTS2020

Faisal Ahmed
Published 2026-03-14 05:44

This study presents a topology-driven framework for classifying brain tumors using Topological Data Analysis (TDA) on 3D MRI volumes. By applying persistent homology, the authors extract topological descriptors that capture the geometric characteristics of brain tumors. The framework utilizes these features to train classical machine learning classifiers, achieving an accuracy of 89.19% in distinguishing between high-grade and low-grade gliomas.

Read abstract

Accurate and interpretable brain tumor classification from medical imaging remains a challenging problem due to the high dimensionality and complex structural patterns present in magnetic resonance imaging (MRI). In this study, we propose a topology-driven framework for brain tumor classification based on Topological Data Analysis (TDA) applied directly to three-dimensional (3D) MRI volumes. Specifically, we analyze 3D Fluid Attenuated Inversion Recovery (FLAIR) images from the BraTS 2020 dataset and extract interpretable topological descriptors using persistent homology. Persistent homology captures intrinsic geometric and structural characteristics of the data through Betti numbers, which describe connected components (Betti-0), loops (Betti-1), and voids (Betti-2). From the 3D MRI volumes, we derive a compact set of 100 topological features that summarize the underlying topology of brain tumor structures. These descriptors represent complex 3D tumor morphology while significantly reducing data dimensionality. Unlike many deep learning approaches that require large-scale training data or complex architectures, the proposed framework relies on computationally efficient topological features extracted directly from the images. These features are used to train classical machine learning classifiers, including Random Forest and XGBoost, for binary classification of high-grade glioma (HGG) and low-grade glioma (LGG). Experimental results on the BraTS 2020 dataset show that the Random Forest classifier combined with selected Betti features achieves an accuracy of 89.19%. These findings highlight the potential of persistent homology as an effective and interpretable approach for analyzing complex 3D medical images and performing brain tumor classification.

ARXIV Cancer: metastatic castration-resistant prostate cancer Method: multi-agent framework

TheraAgent: Multi-Agent Framework with Self-Evolving Memory and Evidence-Calibrated Reasoning for PET Theranostics

Zhihao Chen, Jiahui Wang, Yizhou Chen, Xiaozhong Ji, Xiaobin Hu, Jimin Hong, Wolfram Andreas Bosbach, Axel Rominger, Ali Afshar-Oromieh, Hongming Shan, Kuangyu Shi
Published 2026-03-14 00:50

This paper introduces TheraAgent, a novel multi-agent framework designed for predicting treatment outcomes in PET theranostics, specifically for metastatic castration-resistant prostate cancer (mCRPC). The framework addresses challenges such as data scarcity, heterogeneous information integration, and the need for evidence-based reasoning. TheraAgent employs multi-expert feature extraction, self-evolving memory, and evidence-calibrated reasoning, achieving an overall accuracy of 75.7% on real patient data. The results suggest significant improvements over existing models, indicating its potential for enhancing decision support in precision oncology.

Read abstract

PET theranostics is transforming precision oncology, yet treatment response varies substantially; many patients receiving 177Lu-PSMA radioligand therapy (RLT) for metastatic castration-resistant prostate cancer (mCRPC) fail to respond, demanding reliable pre-therapy prediction. While LLM-based agents have shown remarkable potential in complex medical diagnosis, their application to PET theranostic outcome prediction remains unexplored, which faces three key challenges: (1) data and knowledge scarcity: RLT was only FDA-approved in 2022, yielding few training cases and insufficient domain knowledge in general LLMs; (2) heterogeneous information integration: robust prediction hinges on structured knowledge extraction from PET/CT, laboratory tests, and free-text clinical documentation; (3) evidence-grounded reasoning: clinical decisions must be anchored in trial evidence rather than LLM hallucinations. In this paper, we present TheraAgent, to our knowledge, the first agentic framework for PET theranostics, with three core innovations: (1) Multi-Expert Feature Extraction with Confidence-Weighted Consensus, where three specialized experts process heterogeneous inputs with uncertainty quantification; (2) Self-Evolving Agentic Memory (SEA-Mem), which learns prognostic patterns from accumulated cases, enabling case-based reasoning from limited data; (3) Evidence-Calibrated Reasoning, integrating a curated theranostics knowledge base to ground predictions in VISION/TheraP trial evidence. Evaluated on 35 real patients and 400 synthetic cases, TheraAgent achieves 75.7% overall accuracy on real patients and 87.0% on synthetic cases, outperforming MDAgents and MedAgent-Pro by over 20%. These results highlight a promising blueprint for trustworthy AI agents in PET theranostics, enabling trial-calibrated, multi-source decision support. Code will be released upon acceptance.

ARXIV Cancer: melanoma Method: Generative Adversarial Networks

Synthetic Melanoma Image Generation and Evaluation Using Generative Adversarial Networks

Pei-Yu Lin, Yidan Shen, Neville Mathew, Renjie Hu, Siyu Huang, Courtney M. Queen, Cameron E. West, Ana Ciurea, George Zouridakis
Published 2026-03-13 18:21

This study investigates the use of Generative Adversarial Networks (GANs) for generating high-resolution images of melanoma to address challenges in dataset limitations and class imbalance. Four GAN architectures, including DCGAN and StyleGAN variants, were systematically benchmarked for their ability to synthesize melanoma-specific images. The results indicate that StyleGAN2 produced the highest quality images, which were effective in improving melanoma detection performance when used to augment training datasets.

Read abstract

Melanoma is the most lethal form of skin cancer, and early detection is critical for improving patient outcomes. Although dermoscopy combined with deep learning has advanced automated skin-lesion analysis, progress is hindered by limited access to large, well-annotated datasets and by severe class imbalance, where melanoma images are substantially underrepresented. To address these challenges, we present the first systematic benchmarking study comparing four GAN architectures-DCGAN, StyleGAN2, and two StyleGAN3 variants (T/R)-for high-resolution melanoma-specific synthesis. We train and optimize all models on two expert-annotated benchmarks (ISIC 2018 and ISIC 2020) under unified preprocessing and hyperparameter exploration, with particular attention to R1 regularization tuning. Image quality is assessed through a multi-faceted protocol combining distribution-level metrics (FID), sample-level representativeness (FMD), qualitative dermoscopic inspection, downstream classification with a frozen EfficientNet-based melanoma detector, and independent evaluation by two board-certified dermatologists. StyleGAN2 achieves the best balance of quantitative performance and perceptual quality, attaining FID scores of 24.8 (ISIC 2018) and 7.96 (ISIC 2020) at gamma=0.8. The frozen classifier recognizes 83% of StyleGAN2-generated images as melanoma, while dermatologists distinguish synthetic from real images at only 66.5% accuracy (chance = 50%), with low inter-rater agreement (kappa = 0.17). In a controlled augmentation experiment, adding synthetic melanoma images to address class imbalance improved melanoma detection AUC from 0.925 to 0.945 on a held-out real-image test set. These findings demonstrate that StyleGAN2-generated melanoma images preserve diagnostically relevant features and can provide a measurable benefit for mitigating class imbalance in melanoma-focused machine learning pipelines.

ARXIV Cancer: brain tumor Method: convolutional neural network

Diffusion-Based Feature Denoising and Using NNMF for Robust Brain Tumor Classification

Hiba Adil Al-kharsan, Róbert Rajkó
Published 2026-03-13 17:15

This study presents a robust framework for brain tumor classification using magnetic resonance imaging (MRI). It combines Non-Negative Matrix Factorization (NNMF) with lightweight convolutional neural networks (CNNs) and a diffusion-based feature purification method to enhance adversarial robustness. The framework demonstrates competitive classification performance while addressing reliability concerns associated with adversarial perturbations.

Read abstract

Brain tumor classification from magnetic resonance imaging, which is also known as MRI, plays a sensitive role in computer-assisted diagnosis systems. In recent years, deep learning models have achieved high classification accuracy. However, their sensitivity to adversarial perturbations has become an important reliability concern in medical applications. This study suggests a robust brain tumor classification framework that combines Non-Negative Matrix Factorization (NNMF or NMF), lightweight convolutional neural networks (CNNs), and diffusion-based feature purification. Initially, MRI images are preprocessed and converted into a non-negative data matrix, from which compact and interpretable NNMF feature representations are extracted. Statistical metrics, including AUC, Cohen's d, and p-values, are used to rank and choose the most discriminative components. Then, a lightweight CNN classifier is trained directly on the selected feature groups. To improve adversarial robustness, a diffusion-based feature-space purification module is introduced. A forward noise method followed by a learned denoiser network is used before classification. System performance is estimated using both clean accuracy and robust accuracy under powerful adversarial attacks created by AutoAttack. The experimental results show that the proposed framework achieves competitive classification performance while significantly enhancing robustness against adversarial perturbations.The findings presuppose that combining interpretable NNMF-based representations with a lightweight deep approach and diffusion-based defense technique supplies an effective and reliable solution for medical image classification under adversarial conditions.

ARXIV Cancer: colorectal cancer Method: attention-based multiple instance learning

A protocol for evaluating robustness to H&E staining variation in computational pathology models

Lydia A. Schönpflug, Nikki van den Berg, Sonali Andani, Nanda Horeweg, Jurriaan Barkey Wolf, Tjalling Bosse, Viktor H. Koelzer, Maxime W. Lafarge
Published 2026-03-13 10:34

This study presents a protocol for evaluating the robustness of computational pathology models to variations in hematoxylin and eosin (H&E) staining. The protocol involves selecting reference staining conditions, characterizing test set staining properties, and applying models under simulated conditions. The authors applied this protocol to assess 306 microsatellite instability classification models on a colorectal cancer dataset, measuring classification performance and robustness across different staining conditions. Results indicate variability in model performance and robustness, highlighting the importance of this evaluation for reliable model deployment.

Read abstract

Sensitivity to staining variation remains a major barrier to deploying computational pathology (CPath) models as hematoxylin and eosin (H&E) staining varies across laboratories, requiring systematic assessment of how this variability affects model prediction. In this work, we developed a three-step protocol for evaluating robustness to H&E staining variation in CPath models. Step 1: Select reference staining conditions, Step 2: Characterize test set staining properties, Step 3: Apply CPath model(s) under simulated reference staining conditions. Here, we first created a new reference staining library based on the PLISM dataset. As an exemplary use case, we applied the protocol to assess the robustness properties of 306 microsatellite instability (MSI) classification models on the unseen SurGen colorectal cancer dataset (n=738), including 300 attention-based multiple instance learning models trained on the TCGA-COAD/READ datasets across three feature extractors (UNI2-h, H-Optimus-1, Virchow2), alongside six public MSI classification models. Classification performance was measured as AUC, and robustness as the min-max AUC range across four simulated staining conditions (low/high H&E intensity, low/high H&E color similarity). Across models and staining conditions, classification performance ranged from AUC 0.769-0.911 ($Δ$ = 0.142). Robustness ranged from 0.007-0.079 ($Δ$ = 0.072), and showed a weak inverse correlation with classification performance (Pearson r=-0.22, 95% CI [-0.34, -0.11]). Thus, we show that the proposed evaluation protocol enables robustness-informed CPath model selection and provides insight into performance shifts across H&E staining conditions, supporting the identification of operational ranges for reliable model deployment. Code is available at https://github.com/CTPLab/staining-robustness-evaluation .

ARXIV Cancer: skin cancer Method: prompt-based representation adaptation

Residual SODAP: Residual Self-Organizing Domain-Adaptive Prompting with Structural Knowledge Preservation for Continual Learning

Gyutae Oh, Jungwoo Bae, Jitae Shin
Published 2026-03-13 09:18

The paper presents Residual SODAP, a novel framework designed to address catastrophic forgetting in continual learning, particularly in domain-incremental learning scenarios. The method integrates prompt-based representation adaptation with classifier-level knowledge preservation, utilizing techniques such as sparse prompt selection and uncertainty-aware multi-loss balancing. Experimental results demonstrate that Residual SODAP achieves state-of-the-art performance across multiple benchmarks, including skin cancer detection.

Read abstract

Continual learning (CL) suffers from catastrophic forgetting, which is exacerbated in domain-incremental learning (DIL) where task identifiers are unavailable and storing past data is infeasible. While prompt-based CL (PCL) adapts representations with a frozen backbone, we observe that prompt-only improvements are often insufficient due to suboptimal prompt selection and classifier-level instability under domain shifts. We propose Residual SODAP, which jointly performs prompt-based representation adaptation and classifier-level knowledge preservation. Our framework combines $α$-entmax sparse prompt selection with residual aggregation, data-free distillation with pseudo-feature replay, prompt-usage--based drift detection, and uncertainty-aware multi-loss balancing. Across three DIL benchmarks without task IDs or extra data storage, Residual SODAP achieves state-of-the-art AvgACC/AvgF of 0.850/0.047 (DR), 0.760/0.031 (Skin Cancer), and 0.995/0.003 (CORe50).

ARXIV Cancer: unknown Method: SPADE-UNet

UNIStainNet: Foundation-Model-Guided Virtual Staining of H&E to IHC

Jillur Rahman Saurav, Thuong Le Hoai Pham, Pritam Mukherjee, Paul Yi, Brent A. Orr, Jacob M. Luber
Published 2026-03-13 07:02

The paper presents UNIStainNet, a novel approach for virtual immunohistochemistry (IHC) staining from hematoxylin and eosin (H&E) images. This method utilizes a SPADE-UNet model conditioned on dense spatial tokens from a pathology foundation model to enhance stain translation accuracy. UNIStainNet achieves state-of-the-art performance across multiple IHC markers using a single model, addressing limitations of previous methods that required separate models for each stain. The results indicate systematic errors primarily in non-tumor tissue.

Read abstract

Virtual immunohistochemistry (IHC) staining from hematoxylin and eosin (H&E) images can accelerate diagnostics by providing preliminary molecular insight directly from routine sections, reducing the need for repeat sectioning when tissue is limited. Existing methods improve realism through contrastive objectives, prototype matching, or domain alignment, yet the generator itself receives no direct guidance from pathology foundation models. We present UNIStainNet, a SPADE-UNet conditioned on dense spatial tokens from a frozen pathology foundation model (UNI), providing tissue-level semantic guidance for stain translation. A misalignment-aware loss suite preserves stain quantification accuracy, and learned stain embeddings enable a single model to serve multiple IHC markers simultaneously. On MIST, UNIStainNet achieves state-of-the-art distributional metrics on all four stains (HER2, Ki67, ER, PR) from a single unified model, where prior methods typically train separate per-stain models. On BCI, it also achieves the best distributional metrics. A tissue-type stratified failure analysis reveals that remaining errors are systematic, concentrating in non-tumor tissue. Code is available at https://github.com/facevoid/UNIStainNet.

ARXIV Cancer: unknown Method: deep learning

Decoding Matters: Efficient Mamba-Based Decoder with Distribution-Aware Deep Supervision for Medical Image Segmentation

Fares Bougourzi, Fadi Dornaika, Abdenour Hadid
Published 2026-03-13 01:13

This paper presents a decoder-centric approach for generalized 2D medical image segmentation, addressing the limitations of existing task-specific methods. The proposed Deco-Mamba architecture utilizes a U-Net-like structure with a combination of CNN and Transformer backbones for efficient feature extraction. The method incorporates novel components such as a Co-Attention Gate and a Vision State Space Module to enhance contextual representation, achieving state-of-the-art performance across diverse medical imaging benchmarks.

Read abstract

Deep learning has achieved remarkable success in medical image segmentation, often reaching expert-level accuracy in delineating tumors and tissues. However, most existing approaches remain task-specific, showing strong performance on individual datasets but limited generalization across diverse imaging modalities. Moreover, many methods focus primarily on the encoder, relying on large pretrained backbones that increase computational complexity. In this paper, we propose a decoder-centric approach for generalized 2D medical image segmentation. The proposed Deco-Mamba follows a U-Net-like structure with a Transformer-CNN-Mamba design. The encoder combines a CNN block and Transformer backbone for efficient feature extraction, while the decoder integrates our novel Co-Attention Gate (CAG), Vision State Space Module (VSSM), and deformable convolutional refinement block to enhance multi-scale contextual representation. Additionally, a windowed distribution-aware KL-divergence loss is introduced for deep supervision across multiple decoding stages. Extensive experiments on diverse medical image segmentation benchmarks yield state-of-the-art performance and strong generalization capability while maintaining moderate model complexity. The source code will be released upon acceptance.

Find the papers that actually matter