Research Papers

ARXIV Cancer: breast cancer Method: diffusion model

PathoGen: Diffusion-Based Synthesis of Realistic Lesions in Histopathology Images

Mohamad Koohi-Moghadam, Mohammad-Ali Nikouei Mahani, Kyongtae Tyler Bae
Published 2026-01-13 01:45

The paper introduces PathoGen, a diffusion-based generative model designed to synthesize realistic lesions in histopathology images to address the scarcity of expert-annotated data. PathoGen enhances training datasets by generating high-fidelity lesions that maintain natural tissue boundaries and cellular structures. Validation across multiple datasets demonstrates its superiority over existing generative methods, leading to improved segmentation performance in data-scarce scenarios.

Read abstract

The development of robust artificial intelligence models for histopathology diagnosis is severely constrained by the scarcity of expert-annotated lesion data, particularly for rare pathologies and underrepresented disease subtypes. While data augmentation offers a potential solution, existing methods fail to generate sufficiently realistic lesion morphologies that preserve the complex spatial relationships and cellular architectures characteristic of histopathological tissues. Here we present PathoGen, a diffusion-based generative model that enables controllable, high-fidelity inpainting of lesions into benign histopathology images. Unlike conventional augmentation techniques, PathoGen leverages the iterative refinement process of diffusion models to synthesize lesions with natural tissue boundaries, preserved cellular structures, and authentic staining characteristics. We validate PathoGen across four diverse datasets representing distinct diagnostic challenges: kidney, skin, breast, and prostate pathology. Quantitative assessment confirms that PathoGen outperforms state-of-the-art generative baselines, including conditional GAN and Stable Diffusion, in image fidelity and distributional similarity. Crucially, we show that augmenting training sets with PathoGen-synthesized lesions enhances downstream segmentation performance compared to traditional geometric augmentations, particularly in data-scarce regimes. Besides, by simultaneously generating realistic morphology and pixel-level ground truth, PathoGen effectively overcomes the manual annotation bottleneck. This approach offers a scalable pathway for developing generalizable medical AI systems despite limited expert-labeled data.

ARXIV Cancer: colorectal cancer Method: foundation model

Robust Multicentre Detection and Classification of Colorectal Liver Metastases on CT: Application of Foundation Models

Shruti Atul Mali, Zohaib Salahuddin, Yumeng Zhang, Andre Aichert, Xian Zhong, Henry C. Woodruff, Maciej Bobowicz, Katrine Riklund, Juozas Kupčinskas, Lorenzo Faggioni, Roberto Francischello, Razvan L Miclea, Philippe Lambin
Published 2026-01-12 14:35

This study presents a foundation model-based AI pipeline designed for the detection and classification of colorectal liver metastases (CRLM) on contrast-enhanced CT scans. Utilizing data from the EuCanImage consortium and an external TCIA cohort, the model achieved a classification AUC of 0.90 and demonstrated improved performance with uncertainty quantification. The results indicate that this approach can enhance the reliability and interpretability of CRLM detection across diverse clinical settings.

Read abstract

Colorectal liver metastases (CRLM) are a major cause of cancer-related mortality, and reliable detection on CT remains challenging in multi-centre settings. We developed a foundation model-based AI pipeline for patient-level classification and lesion-level detection of CRLM on contrast-enhanced CT, integrating uncertainty quantification and explainability. CT data from the EuCanImage consortium (n=2437) and an external TCIA cohort (n=197) were used. Among several pretrained models, UMedPT achieved the best performance and was fine-tuned with an MLP head for classification and an FCOS-based head for lesion detection. The classification model achieved an AUC of 0.90 and a sensitivity of 0.82 on the combined test set, with a sensitivity of 0.85 on the external cohort. Excluding the most uncertain 20 percent of cases improved AUC to 0.91 and balanced accuracy to 0.86. Decision curve analysis showed clinical benefit for threshold probabilities between 0.30 and 0.40. The detection model identified 69.1 percent of lesions overall, increasing from 30 percent to 98 percent across lesion size quartiles. Grad-CAM highlighted lesion-corresponding regions in high-confidence cases. These results demonstrate that foundation model-based pipelines can support robust and interpretable CRLM detection and classification across heterogeneous CT data.

ARXIV Cancer: general cancer Method: Comparison-based Reinforcement Policy Optimization

PulseMind: A Multi-Modal Medical Model for Real-World Clinical Diagnosis

Jiao Xu, Junwei Liu, Jiangwei Lao, Qi Zhu, Yunpeng Zhao, Congyun Jin, Shinan Liu, Zhihong Lu, Lihe Zhang, Xin Chen, Jian Wang, Ping Wang
Published 2026-01-12 09:17

The paper presents PulseMind, a multi-modal diagnostic model designed to enhance real-world clinical diagnosis by integrating diverse inputs and ongoing contextual understanding. It introduces a comprehensive dataset, MediScope, consisting of 98,000 multi-turn consultations and 601,500 medical images across various clinical departments. The model employs a unique training framework called Comparison-based Reinforcement Policy Optimization (CRPO) to improve training stability and alignment with human preferences. Experimental results indicate that PulseMind performs competitively on diagnostic benchmarks.

Read abstract

Recent advances in medical multi-modal models focus on specialized image analysis like dermatology, pathology, or radiology. However, they do not fully capture the complexity of real-world clinical diagnostics, which involve heterogeneous inputs and require ongoing contextual understanding during patient-physician interactions. To bridge this gap, we introduce PulseMind, a new family of multi-modal diagnostic models that integrates a systematically curated dataset, a comprehensive evaluation benchmark, and a tailored training framework. Specifically, we first construct a diagnostic dataset, MediScope, which comprises 98,000 real-world multi-turn consultations and 601,500 medical images, spanning over 10 major clinical departments and more than 200 sub-specialties. Then, to better reflect the requirements of real-world clinical diagnosis, we develop the PulseMind Benchmark, a multi-turn diagnostic consultation benchmark with a four-dimensional evaluation protocol comprising proactiveness, accuracy, usefulness, and language quality. Finally, we design a training framework tailored for multi-modal clinical diagnostics, centered around a core component named Comparison-based Reinforcement Policy Optimization (CRPO). Compared to absolute score rewards, CRPO uses relative preference signals from multi-dimensional com-parisons to provide stable and human-aligned training guidance. Extensive experiments demonstrate that PulseMind achieves competitive performance on both the diagnostic consultation benchmark and public medical benchmarks.

ARXIV Cancer: general cancer Method: scene-appearance disentanglement

Learning Domain-Invariant Representations for Cross-Domain Image Registration via Scene-Appearance Disentanglement

Jiahao Qin, Yiwen Wang
Published 2026-01-12 07:14

This paper presents SAR-Net, a framework designed to tackle the challenge of image registration under domain shift in medical imaging. By employing scene-appearance disentanglement, the method decomposes images into domain-invariant representations and domain-specific appearance codes, facilitating registration through re-rendering. The empirical results demonstrate that SAR-Net outperforms existing methods on the ANHIR benchmark, achieving a median relative Target Registration Error of 0.25%.

Read abstract

Image registration under domain shift remains a fundamental challenge in computer vision and medical imaging: when source and target images exhibit systematic intensity differences, the brightness constancy assumption underlying conventional registration methods is violated, rendering correspondence estimation ill-posed. We propose SAR-Net, a unified framework that addresses this challenge through principled scene-appearance disentanglement. Our key insight is that observed images can be decomposed into domain-invariant scene representations and domain-specific appearance codes, enabling registration via re-rendering rather than direct intensity matching. We establish theoretical conditions under which this decomposition enables consistent cross-domain alignment (Proposition 1) and prove that our scene consistency loss provides a sufficient condition for geometric correspondence in the shared latent space (Proposition 2). Empirically, we validate SAR-Net on the ANHIR (Automatic Non-rigid Histological Image Registration) challenge benchmark, where multi-stain histopathology images exhibit coupled domain shift from different staining protocols and geometric distortion from tissue preparation. Our method achieves a median relative Target Registration Error (rTRE) of 0.25%, outperforming the state-of-the-art MEVIS method (0.27% rTRE) by 7.4%, with robustness of 99.1%. Code is available at https://github.com/D-ST-Sword/SAR-NET .

ARXIV Cancer: unknown Method: deep learning

Adversarial Attacks on Medical Hyperspectral Imaging Exploiting Spectral-Spatial Dependencies and Multiscale Features

Yunrui Gu, Zhenzhe Gao, Cong Kong, Zhaoxia Yin
Published 2026-01-11 20:28

This study investigates the vulnerabilities of medical hyperspectral imaging (HSI) in cancer diagnostics, particularly focusing on adversarial attacks that exploit spectral-spatial dependencies and multiscale features. The authors propose a targeted adversarial attack framework that includes a Local Pixel Dependency Attack and a Multiscale Information Attack. Experimental results indicate that these attacks significantly impair classification performance in tumor regions while being visually imperceptible. The findings highlight the necessity for robust defenses in clinical applications of medical HSI.

Read abstract

Medical hyperspectral imaging (HSI) enables accurate disease diagnosis by capturing rich spectral-spatial tissue information, but recent advances in deep learning have exposed its vulnerability to adversarial attacks. In this work, we identify two fundamental causes of this fragility: the reliance on local pixel dependencies for preserving tissue structure and the dependence on multiscale spectral-spatial representations for hierarchical feature encoding. Building on these insights, we propose a targeted adversarial attack framework for medical HSI, consisting of a Local Pixel Dependency Attack that exploits spatial correlations among neighboring pixels, and a Multiscale Information Attack that perturbs features across hierarchical spectral-spatial scales. Experiments on the Brain and MDC datasets demonstrate that our attacks significantly degrade classification performance, especially in tumor regions, while remaining visually imperceptible. Compared with existing methods, our approach reveals the unique vulnerabilities of medical HSI models and underscores the need for robust, structure-aware defenses in clinical applications.

ARXIV Cancer: glioblastoma Method: 3D convolutional neural network

Explainable Deep Radiogenomic Molecular Imaging for MGMT Methylation Prediction in Glioblastoma

Hasan M Jamil
Published 2026-01-11 19:16

This study presents a novel framework for the non-invasive prediction of MGMT promoter methylation in glioblastoma using radiogenomic molecular imaging. The approach integrates radiomics, deep learning, and explainable artificial intelligence to analyze MRI-derived features and correlate them with molecular labels. The framework utilizes a 3D convolutional neural network and applies XAI methods to enhance clinical interpretability, demonstrating its potential for precision oncology.

Read abstract

Glioblastoma (GBM) is a highly aggressive primary brain tumor with limited therapeutic options and poor prognosis. The methylation status of the O6-methylguanine-DNA methyltransferase (MGMT) gene promoter is a critical molecular biomarker that influences patient response to temozolomide chemotherapy. Traditional methods for determining MGMT status rely on invasive biopsies and are limited by intratumoral heterogeneity and procedural risks. This study presents a radiogenomic molecular imaging analysis framework for the non-invasive prediction of MGMT promoter methylation using multi-parametric magnetic resonance imaging (mpMRI). Our approach integrates radiomics, deep learning, and explainable artificial intelligence (XAI) to analyze MRI-derived imaging phenotypes and correlate them with molecular labels. Radiomic features are extracted from FLAIR, T1-weighted, T1-contrast-enhanced, and T2-weighted MRI sequences, while a 3D convolutional neural network learns deep representations from the same modalities. These complementary features are fused using both early fusion and attention-based strategies and classified to predict MGMT methylation status. To enhance clinical interpretability, we apply XAI methods such as Grad-CAM and SHAP to visualize and explain model decisions. The proposed framework is trained on the RSNA-MICCAI Radiogenomic Classification dataset and externally validated on the BraTS 2021 dataset. This work advances the field of molecular imaging by demonstrating the potential of AI-driven radiogenomics for precision oncology, supporting non-invasive, accurate, and interpretable prediction of clinically actionable molecular biomarkers in GBM.

ARXIV Cancer: breast cancer Method: spatial multi-task learning

Spatial Multi-Task Learning for Breast Cancer Molecular Subtype Prediction from Single-Phase DCE-MRI

Sen Zeng, Hong Zhou, Zheng Zhu, Yang Liu
Published 2026-01-11 17:33

This study presents a spatial multi-task learning framework aimed at predicting molecular subtypes of breast cancer from single-phase dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI). The method integrates deep feature extraction with multi-scale spatial attention and a region-of-interest weighting module to enhance tumor characterization. Results indicate that the framework significantly outperforms traditional radiomics and single-task deep learning approaches, achieving high accuracy in classifying key biomarkers associated with breast cancer subtypes.

Read abstract

Accurate molecular subtype classification is essential for personalized breast cancer treatment, yet conventional immunohistochemical analysis relies on invasive biopsies and is prone to sampling bias. Although dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) enables non-invasive tumor characterization, clinical workflows typically acquire only single-phase post-contrast images to reduce scan time and contrast agent dose. In this study, we propose a spatial multi-task learning framework for breast cancer molecular subtype prediction from clinically practical single-phase DCE-MRI. The framework simultaneously predicts estrogen receptor (ER), progesterone receptor (PR), human epidermal growth factor receptor 2 (HER2) status, and the Ki-67 proliferation index -- biomarkers that collectively define molecular subtypes. The architecture integrates a deep feature extraction network with multi-scale spatial attention to capture intratumoral and peritumoral characteristics, together with a region-of-interest weighting module that emphasizes the tumor core, rim, and surrounding tissue. Multi-task learning exploits biological correlations among biomarkers through shared representations with task-specific prediction branches. Experiments on a dataset of 960 cases (886 internal cases split 7:1:2 for training/validation/testing, and 74 external cases evaluated via five-fold cross-validation) demonstrate that the proposed method achieves an AUC of 0.893, 0.824, and 0.857 for ER, PR, and HER2 classification, respectively, and a mean absolute error of 8.2\% for Ki-67 regression, significantly outperforming radiomics and single-task deep learning baselines. These results indicate the feasibility of accurate, non-invasive molecular subtype prediction using standard imaging protocols.

ARXIV Cancer: brain tumor Method: unsupervised domain adaptation

Unsupervised Domain Adaptation with SAM-RefiSeR for Enhanced Brain Tumor Segmentation

Dillan Imans, Phuoc-Nguyen Bui, Duc-Tai Le, Hyunseung Choo
Published 2026-01-11 12:10

This paper presents a method for enhancing brain tumor segmentation through unsupervised domain adaptation using SAM-RefiSeR. The approach aims to improve the accuracy of segmentation models by adapting them to different domains without requiring labeled data. The results indicate that this method effectively enhances segmentation performance in brain tumor imaging.

Read abstract

Unsupervised Domain Adaptation with SAM-RefiSeR for Enhanced Brain Tumor Segmentation

ARXIV Cancer: prostate cancer Method: deep learning

Computational Mapping of Reactive Stroma in Prostate Cancer Yields Interpretable, Prognostic Biomarkers

Mara Pleasure, Ekaterina Redekop, Dhakshina Ilango, Zichen Wang, Vedrana Ivezic, Kimberly Flores, Israa Laklouk, Jitin Makker, Gregory Fishbein, Anthony Sisk, William Speier, Corey W. Arnold
Published 2026-01-10 00:03

This study introduces PROTAS, a deep learning framework designed to quantify reactive stroma in prostate cancer using routine histopathological slides. The framework links stromal morphology to biological processes and demonstrates superior performance in detecting reactive stroma compared to pathologists. Additionally, the identified stromal features are shown to predict biochemical recurrence, offering a new approach for risk stratification in prostate cancer.

Read abstract

Current histopathological grading of prostate cancer relies primarily on glandular architecture, largely overlooking the tumor microenvironment. Here, we present PROTAS, a deep learning framework that quantifies reactive stroma (RS) in routine hematoxylin and eosin (H&E) slides and links stromal morphology to underlying biology. PROTAS-defined RS is characterized by nuclear enlargement, collagen disorganization, and transcriptomic enrichment of contractile pathways. PROTAS detects RS robustly in the external Prostate, Lung, Colorectal, and Ovarian (PLCO) dataset and, using domain-adversarial training, generalizes to diagnostic biopsies. In head-to-head comparisons, PROTAS outperforms pathologists for RS detection, and spatial RS features predict biochemical recurrence independently of established prognostic variables (c-index 0.80). By capturing subtle stromal phenotypes associated with tumor progression, PROTAS provides an interpretable, scalable biomarker to refine risk stratification.

ARXIV Cancer: pancreatic cancer Method: vision transformer

Performance of a Deep Learning-Based Segmentation Model for Pancreatic Tumors on Public Endoscopic Ultrasound Datasets

Pankaj Gupta, Priya Mudgil, Niharika Dutta, Kartik Bose, Nitish Kumar, Anupam Kumar, Jimil Shah, Vaneet Jearth, Jayanta Samanta, Vishal Sharma, Harshal Mandavdhare, Surinder Rana, Saroj K Sinha, Usha Dutta
Published 2026-01-09 16:48

This study evaluates a Vision Transformer-based deep learning segmentation model specifically designed for pancreatic tumors using endoscopic ultrasound (EUS) images. The model was trained on 17,367 images and validated through 5-fold cross-validation, achieving notable metrics such as a mean Dice similarity coefficient of 0.651 and an accuracy of 97.5%. The results indicate strong performance in segmenting pancreatic tumors, although challenges related to dataset heterogeneity and external validation were noted.

Read abstract

Background: Pancreatic cancer is one of the most aggressive cancers, with poor survival rates. Endoscopic ultrasound (EUS) is a key diagnostic modality, but its effectiveness is constrained by operator subjectivity. This study evaluates a Vision Transformer-based deep learning segmentation model for pancreatic tumors. Methods: A segmentation model using the USFM framework with a Vision Transformer backbone was trained and validated with 17,367 EUS images (from two public datasets) in 5-fold cross-validation. The model was tested on an independent dataset of 350 EUS images from another public dataset, manually segmented by radiologists. Preprocessing included grayscale conversion, cropping, and resizing to 512x512 pixels. Metrics included Dice similarity coefficient (DSC), intersection over union (IoU), sensitivity, specificity, and accuracy. Results: In 5-fold cross-validation, the model achieved a mean DSC of 0.651 +/- 0.738, IoU of 0.579 +/- 0.658, sensitivity of 69.8%, specificity of 98.8%, and accuracy of 97.5%. For the external validation set, the model achieved a DSC of 0.657 (95% CI: 0.634-0.769), IoU of 0.614 (95% CI: 0.590-0.689), sensitivity of 71.8%, and specificity of 97.7%. Results were consistent, but 9.7% of cases exhibited erroneous multiple predictions. Conclusions: The Vision Transformer-based model demonstrated strong performance for pancreatic tumor segmentation in EUS images. However, dataset heterogeneity and limited external validation highlight the need for further refinement, standardization, and prospective studies.

Find the papers that actually matter