Research Papers

ARXIV Cancer: unknown Method: structure and progress-aware diffusion

Structure and Progress Aware Diffusion for Medical Image Segmentation

Siyuan Song, Guyue Hu, Chenglong Li, Dengdi Sun, Zhe Jin, Jin Tang
Published 2026-03-09 02:05

This paper presents a novel approach called structure and progress-aware diffusion (SPAD) for medical image segmentation, which is essential for computer-aided diagnosis. The method integrates a semantic-concentrated diffusion and a boundary-centralized diffusion, guided by a progress-aware scheduler to enhance the segmentation of medical targets. The proposed approach aims to improve the understanding of both coarse morphological structures and fine boundaries in medical images, addressing challenges such as annotation uncertainty and noisy boundaries.

Read abstract

Medical image segmentation is crucial for computer-aided diagnosis, which necessitates understanding both coarse morphological and semantic structures, as well as carving fine boundaries. The morphological and semantic structures in medical images are beneficial and stable clues for target understanding. While the fine boundaries of medical targets (like tumors and lesions) are usually ambiguous and noisy since lesion overlap, annotation uncertainty, and so on, making it not reliable to serve as early supervision. However, existing methods simultaneously learn coarse structures and fine boundaries throughout the training process. In this paper, we propose a structure and progress-aware diffusion (SPAD) for medical image segmentation, which consists of a semantic-concentrated diffusion (ScD) and a boundary-centralized diffusion (BcD) modulated by a progress-aware scheduler (PaS). Specifically, the semantic-concentrated diffusion introduces anchor-preserved target perturbation, which perturbs pixels within a medical target but preserves unaltered areas as semantic anchors, encouraging the model to infer noisy target areas from the surrounding semantic context. The boundary-centralized diffusion introduces progress-aware boundary noise, which blurs unreliable and ambiguous boundaries, thus compelling the model to focus on coarse but stable anatomical morphology and global semantics. Furthermore, the progress-aware scheduler gradually modulates noise intensity of the ScD and BcD forming a coarse-to-fine diffusion paradigm, which encourage focusing on coarse morphological and semantic structures during early target understanding stages and gradually shifting to fine target boundaries during later contour adjusting stages.

ARXIV Cancer: glioblastoma Method: Mixture-of-Transformers

Brain-WM: Brain Glioblastoma World Model

Chenhui Wang, Boyun Zheng, Liuxin Bao, Zhihao Peng, Peter Y. M. Woo, Hongming Shan, Yixuan Yuan
Published 2026-03-08 09:54

The paper presents Brain-WM, a novel model for prognostic modeling of glioblastoma (GBM) that integrates treatment prediction and future MRI generation. By employing a Y-shaped Mixture-of-Transformers architecture, it captures the dynamic interplay between tumor evolution and treatment response. Extensive validation shows that Brain-WM achieves high accuracy in treatment planning and effective MRI sequence generation, offering a new tool for optimizing patient healthcare.

Read abstract

Precise prognostic modeling of glioblastoma (GBM) under varying treatment interventions is essential for optimizing clinical outcomes. While generative AI has shown promise in simulating GBM evolution, existing methods typically treat interventions as static conditional inputs rather than dynamic decision variables. Consequently, they fail to capture the complex, reciprocal interplay between tumor evolution and treatment response. To bridge this gap, we present Brain-WM, a pioneering brain GBM world model that unifies next-step treatment prediction and future MRI generation, thereby capturing the co-evolutionary dynamics between tumor and treatment. Specifically, Brain-WM encodes spatiotemporal dynamics into a shared latent space for joint autoregressive treatment prediction and flow-based future MRI generation. Then, instead of a conventional monolithic framework, Brain-WM adopts a novel Y-shaped Mixture-of-Transformers (MoT) architecture. This design structurally disentangles heterogeneous objectives, successfully leveraging cross-task synergies while preventing feature collapse. Finally, a synergistic multi-timepoint mask alignment objective explicitly anchors latent representations to anatomically grounded tumor structures and progression-aware semantics. Extensive validation on internal and external multi-institutional cohorts demonstrates the superiority of Brain-WM, achieving 91.5% accuracy in treatment planning and SSIMs of 0.8524, 0.8581, and 0.8404 for FLAIR, T1CE, and T2W sequences, respectively. Ultimately, Brain-WM offers a robust clinical sandbox for optimizing patient healthcare. The source code is made available at https://github.com/thibault-wch/Brain-GBM-world-model.

ARXIV Cancer: general cancer Method: transformer-based model

Class Visualizations and Activation Atlases for Enhancing Interpretability in Deep Learning-Based Computational Pathology

Marco Gustav, Fabian Wolf, Christina Glasner, Nic G. Reitsam, Stefan Schulz, Kira Aschenbroich, Bruno Märkl, Sebastian Foersch, Jakob Nikolas Kather
Published 2026-03-07 12:23

This study develops a visualization framework to enhance interpretability in transformer-based models used for computational pathology. It evaluates class visualizations and activation atlases across various tissue and multi-organ cancer classification tasks. The findings indicate that while class visualizations maintain recognizability for distinct tissues, they struggle with overlapping cancer subclasses, highlighting the complexity of pathological representations.

Read abstract

The rapid adoption of transformer-based models in computational pathology has enabled prediction of molecular and clinical biomarkers from H&E whole-slide images, yet interpretability has not kept pace with model complexity. While attribution- and generative-based methods are common, feature visualization approaches such as class visualizations (CVs) and activation atlases (AAs) have not been systematically evaluated for these models. We developed a visualization framework and assessed CVs and AAs for a transformer-based foundation model across tissue and multi-organ cancer classification tasks with increasing label granularity. Four pathologists annotated real and generated images to quantify inter-observer agreement, complemented by attribution and similarity metrics. CVs preserved recognizability for morphologically distinct tissues but showed reduced separability for overlapping cancer subclasses. In tissue classification, agreement decreased from Fleiss k = 0.75 (scans) to k = 0.31 (CVs), with similar trends in cancer subclass tasks. AAs revealed layer-dependent organization: coarse tissue-level concepts formed coherent regions, whereas finer subclasses exhibited dispersion and overlap. Agreement was moderate for tissue classification (k = 0.58), high for coarse cancer groupings (k = 0.82), and low at subclass level (k = 0.11). Atlas separability closely tracked expert agreement on real images, indicating that representational ambiguity reflects intrinsic pathological complexity. Attribution-based metrics approximated expert variability in low-complexity settings, whereas perceptual and distributional metrics showed limited alignment. Overall, concept-level feature visualization reveals structured morphological manifolds in transformer-based pathology models and provides a framework for expert-centered interrogation of learned representations across label granularities.

ARXIV Cancer: general cancer Method: multimodal large language model

NuNext: Reframing Nucleus Detection as Next-Point Detection

Zhongyi Shui, Honglin Li, Xiaozhong Ji, Ye Zhang, Zijiang Yang, Chenglu Zhu, Yuxuan Sun, Kai Yao, Conghui He, Cheng Tan
Published 2026-03-07 08:17

This study presents a novel approach to nucleus detection in histopathology by reformulating the task as next-point prediction. A multimodal large language model is developed to directly output foreground nucleus centroids from input images, utilizing a two-stage training process that includes supervised learning with spatial-aware soft supervision and reinforcement fine-tuning. The proposed method shows improved detection quality across nine widely used benchmarks.

Read abstract

Nucleus detection in histopathology is pivotal for a wide range of clinical applications. Existing approaches either regress nuclear proxy maps that require complex post-processing, or employ dense anchors or queries that introduce severe foreground-background imbalance. In this work, we reformulate nucleus detection as next-point prediction, wherein a multimodal large language model is developed to directly output foreground nucleus centroids from the input image. The model is trained in two stages. In the supervised learning stage, we propose spatial-aware soft supervision to relax strict centroid matching and a chain-of-visual-thought strategy to incorporate visual priors that facilitate coordinate prediction. In the reinforcement fine-tuning stage, we design distribution matching reward, low-variance group filtering, and fine-grained advantage shaping to further improve the model's detection quality. Extensive experiments on nine widely used benchmarks demonstrate the superiority of our method. Code will be released soon.

ARXIV Cancer: unknown Method: unknown

Exploring Strategies for Personalized Radiation Therapy Part IV: An Interaction-Picture Approach to Quantify the Abscopal Effect

Hao Peng, Laurentiu Pop, Kai Jiang, Faya Zhang, Debabrata Saha, Raquibul Hannan, Robert Timmerman
Published 2026-03-07 00:17

This study revisits the abscopal effect in the context of Personalized Ultra-Fractionated Stereotactic Adaptive Radiotherapy (PULSAR). It introduces an interaction-picture transformation to quantify treatment-induced effects in preclinical tumor models, allowing for a detailed analysis of the interaction between primary and secondary tumors. The framework aims to enhance understanding of the abscopal effect as a continuous phenomenon and is adaptable for future studies involving radiation and immunotherapy.

Read abstract

We revisit the controversial "abscopal" effect in the context of Personalized Ultra-Fractionated Stereotactic Adaptive Radiotherapy (PULSAR). By allowing long interval between fractions, PULSAR may enhance systemic immune activation and increase the likelihood of abscopal responses compared with conventional daily fractionation. To quantify treatment-induced effects, we introduce an interaction-picture transformation adapted from quantum mechanics, which separates intrinsic tumor growth from radiation and immune-mediated perturbations. In this preliminary study, we tested this method to two preclinical bilateral tumor models (4T1 and MC38). Our model provides a quantitative measure of the interaction strength between primary and secondary tumors at the individual level, capturing dynamics over time rather than relying solely on cohort averages. This approach frames the abscopal effect as a continuous, stochastic phenomenon rather than a binary response. The framework is flexible for future studies, particularly in concurrent radiation and immunotherapy with PULSAR, where different radiation doses and fractionation schedules can be compared, and immune checkpoint inhibitors (ICIs) can be incorporated to further enhance systemic anti-tumor immunity. The framework can also help us make cross-study comparison of abscopal effects and standardizes the reporting of abscopal magnitude beyond simple statistical significance.

ARXIV Cancer: prostate cancer Method: nnU-Net

Extracting and analyzing 3D histomorphometric features related to perineural and lymphovascular invasion in prostate cancer

Sarah S. L. Chow, Rui Wang, Robert B. Serafin, Yujie Zhao, Elena Baraznenok, Xavier Farré, Jennifer Salguero-Lopez, Gan Gao, Huai-Ching Hsieh, Lawrence D. True, Priti Lal, Anant Madabhushi, Jonathan T. C. Liu
Published 2026-03-06 23:10

This study develops an analytical pipeline for extracting 3D histomorphometric features related to perineural and lymphovascular invasion in prostate cancer. By utilizing a 3D segmentation model (nnU-Net), the researchers segment nerves and vessels in 3D datasets of prostatectomy specimens. The extracted features are used to train a supervised machine learning classifier to predict 5-year biochemical recurrence outcomes, demonstrating that 3D features provide better prognostic value than traditional 2D features.

Read abstract

Diagnostic grading of prostate cancer (PCa) relies on the examination of 2D histology sections. However, the limited sampling of specimens afforded by 2D histopathology, and ambiguities when viewing 2D cross-sections, can lead to suboptimal treatment decisions. Recent studies have shown that 3D histomorphometric analysis of glands and nuclei can improve PCa risk assessment compared to analogous 2D features. Here, we expand on these efforts by developing an analytical pipeline to extract 3D features related to perineural invasion (PNI) and lymphovascular invasion (LVI), which correlate with poor prognosis for a variety of cancers. A 3D segmentation model (nnU-Net) was trained to segment nerves and vessels in 3D datasets of archived prostatectomy specimens that were optically cleared, labeled with a fluorescent analog of H&E, and imaged with open-top light-sheet (OTLS) microscopy. PNI- and LVI-related features, including metrics describing cancer-nerve and cancer-vessel proximity, were then extracted based on the 3D nerve/vessel segmentation masks in conjunction with 3D masks of cancer-enriched regions. As a preliminary exploration of the prognostic value of these features, we trained a supervised machine learning classifier to predict 5-year biochemical recurrence (BCR) outcomes, finding that 3D PNI-related features are moderately prognostic and outperform 2D PNI-related features (AUC = 0.71 vs. 0.52). Source code is available at https://github.com/sarahrahsl/SegCIA.git.

ARXIV Cancer: general cancer Method: radiology foundation model

GreenRFM: Toward a resource-efficient radiology foundation model

Yingtai Li, Shuai Ming, Mingyue Zhao, Haoran Lai, Rongsheng Wang, Rui Zhou, Rundong Wang, Yujia Li, Wei Wei, Shaohua Kevin Zhou
Published 2026-03-06 16:51

This paper presents GreenRFM, a resource-efficient pre-training framework for radiology foundation models that addresses the limitations of existing methods reliant on brute-force scaling. The framework achieves state-of-the-art performance while significantly reducing computational requirements. Extensive experiments demonstrate its robust generalization across diverse patient populations and imaging protocols, outperforming complex models on chest and abdominal CT datasets.

Read abstract

The development of radiology foundation models (RFMs) is hindered by a reliance on brute-force scaling. Existing approaches often directly translate methods for natural images, which prioritize scale over precision and hence lead to brittle and expensive models in clinical practice. To address this, we present a resource-efficient pre-training framework, GreenRFM, that achieves state-of-the-art performance. Our framework ensures robust generalization across diverse patient populations and imaging protocols, reducing computational requirements by orders of magnitude while surpassing complex, parameter-heavy models. These capabilities stem from principled supervision design that aims to maximally utilize supervisory signals via More distilled, Ubiquitous, Semantic-enforcing, and Task-aligning (MUST) supervision, rather than simply piling up the quantity of training data. We offer two GreenRFM configurations: (i) a performant model that establishes a new state-of-the-art using a single 24GB GPU within 24 hours, and (ii) a lightweight model that matches existing benchmarks with 6GB VRAM in 4 hours. We conduct extensive experiments using over 200,000 images from four institutions and of two modalities. GreenRFMs achieve superior performances on chest and abdominal CT datasets, regardless of public or private benchmark, surpassing a range of baseline models. In addition, the results on internal musculoskeletal MRI images show that the same supervision principles transfer between different modalities. Our performance and efficiency challenge the ``scale is all you need'' dogma and democratize the equitable development of state-of-the-art RFMs for clinicians even on a laptop.

ARXIV Cancer: general cancer Method: agentic retrieval-augmented reasoning

Agentic retrieval-augmented reasoning reshapes collective reliability under model variability in radiology question answering

Mina Farajiamiri, Jeta Sopa, Saba Afza, Lisa Adams, Felix Barajas Ordonez, Tri-Thien Nguyen, Mahshad Lotfinia, Sebastian Wind, Keno Bressem, Sven Nebelung, Daniel Truhn, Soroosh Tayebi Arasteh
Published 2026-03-06 13:31

This study evaluates the effectiveness of agentic retrieval-augmented reasoning pipelines in enhancing the reliability of large language models (LLMs) for radiology question answering. By comparing zero-shot inference with a structured evidence report approach across 34 LLMs, the research found that agentic inference significantly reduced decision dispersion and improved correctness robustness. The results indicate that while consensus strength increased, high agreement did not always correlate with accuracy, highlighting the need for comprehensive evaluations of model reliability.

Read abstract

Agentic retrieval-augmented reasoning pipelines are increasingly used to structure how large language models (LLMs) incorporate external evidence in clinical decision support. These systems iteratively retrieve curated domain knowledge and synthesize it into structured reports before answer selection. Although such pipelines can improve performance, their impact on reliability under model variability remains unclear. In real-world deployment, heterogeneous models may align, diverge, or synchronize errors in ways not captured by accuracy. We evaluated 34 LLMs on 169 expert-curated publicly available radiology questions, comparing zero-shot inference with a radiology-specific multi-step agentic retrieval condition in which all models received identical structured evidence reports derived from curated radiology knowledge. Agentic inference reduced inter-model decision dispersion (median entropy 0.48 vs. 0.13) and increased robustness of correctness across models (mean 0.74 vs. 0.81). Majority consensus also increased overall (P<0.001). Consensus strength and robust correctness remained correlated under both strategies (\r{ho}=0.88 for zero-shot; \r{ho}=0.87 for agentic), although high agreement did not guarantee correctness. Response verbosity showed no meaningful association with correctness. Among 572 incorrect outputs, 72% were associated with moderate or high clinically assessed severity, although inter-rater agreement was low (\k{appa}=0.02). Agentic retrieval therefore was associated with more concentrated decision distributions, stronger consensus, and higher cross-model robustness of correctness. These findings suggest that evaluating agentic systems through accuracy or agreement alone may not always be sufficient, and that complementary analyses of stability, cross-model robustness, and potential clinical impact are needed to characterize reliability under model variability.

ARXIV Cancer: general cancer Method: transfer learning

SpaCRD: Multimodal Deep Fusion of Histology and Spatial Transcriptomics for Cancer Region Detection

Shuailin Xue, Jun Wan, Lihua Zhang, Wenwen Min
Published 2026-03-06 11:46

The paper presents SpaCRD, a transfer learning-based method designed for accurate detection of cancer tissue regions (CTR) by integrating histology images and spatial transcriptomics (ST) data. Traditional methods struggle with false positives due to morphological similarities, but SpaCRD utilizes a novel fusion network to effectively capture latent co-expression patterns. Extensive benchmarking shows that SpaCRD outperforms eight existing state-of-the-art methods across diverse datasets.

Read abstract

Accurate detection of cancer tissue regions (CTR) enables deeper analysis of the tumor microenvironment and offers crucial insights into treatment response. Traditional CTR detection methods, which typically rely on the rich cellular morphology in histology images, are susceptible to a high rate of false positives due to morphological similarities across different tissue regions. The groundbreaking advances in spatial transcriptomics (ST) provide detailed cellular phenotypes and spatial localization information, offering new opportunities for more accurate cancer region detection. However, current methods are unable to effectively integrate histology images with ST data, especially in the context of cross-sample and cross-platform/batch settings for accomplishing the CTR detection. To address this challenge, we propose SpaCRD, a transfer learning-based method that deeply integrates histology images and ST data to enable reliable CTR detection across diverse samples, platforms, and batches. Once trained on source data, SpaCRD can be readily generalized to accurately detect cancerous regions across samples from different platforms and batches. The core of SpaCRD is a category-regularized variational reconstruction-guided bidirectional cross-attention fusion network, which enables the model to adaptively capture latent co-expression patterns between histological features and gene expression from multiple perspectives. Extensive benchmark analysis on 23 matched histology-ST datasets spanning various disease types, platforms, and batches demonstrates that SpaCRD consistently outperforms existing eight state-of-the-art methods in CTR detection.

ARXIV Cancer: unknown Method: unknown

CRIMSON: A Clinically-Grounded LLM-Based Metric for Generative Radiology Report Evaluation

Mohammed Baharoon, Thibault Heintz, Siavash Raissi, Mahmoud Alabbad, Mona Alhammad, Hassan AlOmaish, Sung Eun Kim, Oishi Banerjee, Pranav Rajpurkar
Published 2026-03-06 11:43

The paper presents CRIMSON, a novel evaluation framework designed for assessing chest X-ray report generation. This framework emphasizes diagnostic correctness and contextual relevance while incorporating clinical factors such as patient age and guideline-based decision rules. CRIMSON categorizes errors comprehensively and prioritizes clinically significant mistakes, demonstrating strong validation against expert radiologist assessments.

Read abstract

We introduce CRIMSON, a clinically grounded evaluation framework for chest X-ray report generation that assesses reports based on diagnostic correctness, contextual relevance, and patient safety. Unlike prior metrics, CRIMSON incorporates full clinical context, including patient age, indication, and guideline-based decision rules, and prevents normal or clinically insignificant findings from exerting disproportionate influence on the overall score. The framework categorizes errors into a comprehensive taxonomy covering false findings, missing findings, and eight attribute-level errors (e.g., location, severity, measurement, and diagnostic overinterpretation). Each finding is assigned a clinical significance level (urgent, actionable non-urgent, non-actionable, or expected/benign), based on a guideline developed in collaboration with attending cardiothoracic radiologists, enabling severity-aware weighting that prioritizes clinically consequential mistakes over benign discrepancies. CRIMSON is validated through strong alignment with clinically significant error counts annotated by six board-certified radiologists in ReXVal (Kendalls tau = 0.61-0.71; Pearsons r = 0.71-0.84), and through two additional benchmarks that we introduce. In RadJudge, a targeted suite of clinically challenging pass-fail scenarios, CRIMSON shows consistent agreement with expert judgment. In RadPref, a larger radiologist preference benchmark of over 100 pairwise cases with structured error categorization, severity modeling, and 1-5 overall quality ratings from three cardiothoracic radiologists, CRIMSON achieves the strongest alignment with radiologist preferences. We release the metric, the evaluation benchmarks, RadJudge and RadPref, and a fine-tuned MedGemma model to enable reproducible evaluation of report generation, all available at https://github.com/rajpurkarlab/CRIMSON.

Find the papers that actually matter