Research Papers

ARXIV Cancer: glioma Method: multimodal learning

CoRe-BT: A Multimodal Radiology-Pathology-Text Benchmark for Robust Brain Tumor Typing

Juampablo E. Heras Rivera, Daniel K. Low, Xavier Xiong, Jacob J. Ruzevick, Daniel D. Child, Wen-wai Yim, Mehmet Kurt, Asma Ben Abacha
Published 2026-03-04 01:06

The paper presents CoRe-BT, a benchmark designed for robust brain tumor typing by integrating various clinical evidence such as MRI and pathology reports. It evaluates multimodal learning under conditions of missing data, utilizing a dataset of 310 patients with annotated tumor types and grades. The study demonstrates the effectiveness of multimodal approaches compared to MRI-only models, highlighting the contributions of different modalities in tumor classification tasks.

Read abstract

Accurate brain tumor typing requires integrating heterogeneous clinical evidence, including magnetic resonance imaging (MRI), histopathology, and pathology reports, which are often incomplete at the time of diagnosis. We introduce CoRe-BT, a cross-modal radiology-pathology-text benchmark for brain tumor typing, designed to study robust multimodal learning under missing modality conditions. The dataset comprises 310 patients with multi-sequence brain MRI (T1, T1c, T2, FLAIR), including 95 cases with paired H&E-stained whole-slide pathology images and pathology reports. All cases are annotated with tumor type and grade, and MRI volumes include expert-annotated tumor masks, enabling both region-aware modeling and auxiliary learning tasks. Tumors are categorized into six clinically relevant classes capturing the heterogeneity of common and rare glioma subtypes. We evaluate tumor typing under variable modality availability by comparing MRI-only models with multimodal approaches that incorporate pathology information when present. Baseline experiments demonstrate the feasibility of multimodal fusion and highlight complementary modality contributions across clinically relevant typing tasks. CoRe-BT provides a grounded testbed for advancing multimodal glioma typing and representation learning in realistic scenarios with incomplete clinical data.

ARXIV Cancer: unknown Method: multimodal learning

RADAR: A Multimodal Benchmark for 3D Image-Based Radiology Report Review

Zhaoyi Sun, Minal Jagtiani, Wen-wai Yim, Fei Xia, Martin Gunn, Meliha Yetisgen, Asma Ben Abacha
Published 2026-03-04 00:13

The paper introduces RADAR, a multimodal benchmark designed for analyzing discrepancies in radiology reports associated with 3D medical images. It aims to enhance quality assurance and clinical decision support by providing a structured assessment task that evaluates proposed edits in reports. The benchmark includes expert-annotated abdominal CT examinations and standardized evaluation protocols to facilitate the comparison of multimodal models.

Read abstract

Radiology reports for the same patient examination may contain clinically meaningful discrepancies arising from interpretation differences, reporting variability, or evolving assessments. Systematic analysis of such discrepancies is important for quality assurance, clinical decision support, and multimodal model development, yet remains limited by the lack of standardized benchmarks. We present RADAR, a multimodal benchmark for radiology report discrepancy analysis that pairs 3D medical images with a preliminary report and corresponding candidate edits for the same study. The dataset reflects a standard clinical workflow in which trainee radiologists author preliminary reports that are subsequently reviewed and revised by attending radiologists. RADAR defines a structured discrepancy assessment task requiring models to evaluate proposed edits by determining image-level agreement, assessing clinical severity, and classifying edit type (correction, addition, or clarification). In contrast to prior work emphasizing binary error detection or comparison against fully independent reference reports, RADAR targets fine-grained clinical reasoning and image-text alignment at the report review stage. The benchmark consists of expert-annotated abdominal CT examinations and is accompanied by standardized evaluation protocols to support systematic comparison of multimodal models. RADAR provides a clinically grounded testbed for evaluating multimodal systems as reviewers of radiology report edits.

ARXIV Cancer: colorectal cancer Method: persistent homology

Topology of Multi-species Localization

Abhinav Natarajan, Thomas Chaplin, Joshua A. Bull, Eoghan J. Mulholland-Illingworth, Simon J. Leedham, Helen M. Byrne, Maria-Jose Jimenez, Heather A. Harrington
Published 2026-03-03 18:29

This paper presents a novel approach to quantify higher-order interactions in multi-species data using persistent homology (PH) within the framework of topological data analysis (TDA). The method aims to enhance the understanding of spatial relationships that can influence outcomes in various fields, including cancer. The authors demonstrate the application of their approach in two contexts, notably identifying behavioral regimes in a synthetic tumor micro-environment and analyzing interspecies interactions in colorectal cancer tissue samples during disease progression.

Read abstract

Spatial relationships in multi-species data can indicate and affect system outcomes and behaviors, ranging from disease progression in cancer to coral reef resilience in ecology; therefore, quantifying these relationships is an important problem across scientific disciplines. Persistent homology (PH), a key mathematical and computational tool in topological data analysis (TDA), provides a multiscale description of the shape of data. While it effectively describes spatial organization of species, such as cellular patterns in pathology, it cannot detect the shape relations between different types of species. Traditionally, PH analyzes single-species data, which limits the spatial analysis of interactions between different species. Leveraging recent developments in TDA and computational geometry, we introduce a scalable approach to quantify higher-order interactions in multi-species data. The framework can distinguish the presence of shape features or patterns in the data that are (i) common to multiple species of points, (ii) present in some species but disappear in the presence of other species, (iii) only visible when multiple species are considered together, and (iv) formed by some species and remain visible in the presence of others. We demonstrate our approach on two example applications. We identify (1) different behavioral regimes in a synthetic tumor micro-environment model, and (2) interspecies spatial interactions that are most significantly altered in colorectal cancer tissue samples during disease progression.

ARXIV Cancer: breast cancer Method: foundation model

BRIGHT: A Collaborative Generalist-Specialist Foundation Model for Breast Pathology

Xiaojing Guo, Jiatai Lin, Yumian Jia, Jingqi Huang, Zeyan Xu, Weidong Li, Longfei Wang, Jingjing Chen, Qin Li, Weiwei Wang, Lifang Cui, Wen Yue, Zhiqiang Cheng, Xiaolong Wei, Jianzhong Yu, Xia Jin, Baizhou Li, Honghong Shen, Jing Li, Chunlan Li, Yanfen Cui, Yi Dai, Yiling Yang, Xiaolong Qian, Liu Yang, Yang Yang, Guangshen Gao, Yaqing Li, Lili Zhai, Chenying Liu, Tianhua Zhang, Zhenwei Shi, Cheng Lu, Xingchen Zhou, Jing Xu, Miaoqing Zhao, Fang Mei, Jiaojiao Zhou, Ning Mao, Fangfang Liu, Chu Han, Zaiyi Liu
Published 2026-03-03 14:24

This study introduces BRIGHT, a collaborative generalist-specialist foundation model specifically designed for breast pathology. Trained on a vast dataset of histopathology images, BRIGHT aims to enhance the predictive capabilities of pathology models for various clinical tasks in breast oncology. The model demonstrates superior performance compared to existing generalist models, achieving state-of-the-art results in multiple validation tasks.

Read abstract

Generalist pathology foundation models (PFMs), pretrained on large-scale multi-organ datasets, have demonstrated remarkable predictive capabilities across diverse clinical applications. However, their proficiency on the full spectrum of clinically essential tasks within a specific organ system remains an open question due to the lack of large-scale validation cohorts for a single organ as well as the absence of a tailored training paradigm that can effectively translate broad histomorphological knowledge into the organ-specific expertise required for specialist-level interpretation. In this study, we propose BRIGHT, the first PFM specifically designed for breast pathology, trained on approximately 210 million histopathology tiles from over 51,000 breast whole-slide images derived from a cohort of over 40,000 patients across 19 hospitals. BRIGHT employs a collaborative generalist-specialist framework to capture both universal and organ-specific features. To comprehensively evaluate the performance of PFMs on breast oncology, we curate the largest multi-institutional cohorts to date for downstream task development and evaluation, comprising over 25,000 WSIs across 10 hospitals. The validation cohorts cover the full spectrum of breast pathology across 24 distinct clinical tasks spanning diagnosis, biomarker prediction, treatment response and survival prediction. Extensive experiments demonstrate that BRIGHT outperforms three leading generalist PFMs, achieving state-of-the-art (SOTA) performance in 21 of 24 internal validation tasks and in 5 of 10 external validation tasks with excellent heatmap interpretability. By evaluating on large-scale validation cohorts, this study not only demonstrates BRIGHT's clinical utility in breast oncology but also validates a collaborative generalist-specialist paradigm, providing a scalable template for developing PFMs on a specific organ system.

ARXIV Cancer: general cancer Method: medical foundation models

Designing UNICORN: a Unified Benchmark for Imaging in Computational Pathology, Radiology, and Natural Language

Michelle Stegeman, Lena Philipp, Fennie van der Graaf, Marina D'Amato, Clément Grisi, Luc Builtjes, Joeran S. Bosma, Judith Lefkes, Rianne A. Weber, James A. Meakin, Thomas Koopman, Anne Mickan, Mathias Prokop, Ewoud J. Smit, Geert Litjens, Jeroen van der Laak, Bram van Ginneken, Maarten de Rooij, Henkjan Huisman, Colin Jacobs, Francesco Ciompi, Alessa Hering
Published 2026-03-03 09:27

The paper introduces UNICORN, a public benchmark aimed at evaluating medical foundation models across various tasks and modalities in computational pathology and radiology. It emphasizes the need for standardized evaluation frameworks to assess cross-task generalization and presents a novel two-step framework for model inference and task-specific evaluation. The benchmark includes a diverse dataset from over 2,400 patients and aims to facilitate reproducible benchmarking in medical AI applications.

Read abstract

Medical foundation models show promise to learn broadly generalizable features from large, diverse datasets. This could be the base for reliable cross-modality generalization and rapid adaptation to new, task-specific goals, with only a few task-specific examples. Yet, evidence for this is limited by the lack of public, standardized, and reproducible evaluation frameworks, as existing public benchmarks are often fragmented across task-, organ-, or modality-specific settings, limiting assessment of cross-task generalization. We introduce UNICORN, a public benchmark designed to systematically evaluate medical foundation models under a unified protocol. To isolate representation quality, we built the benchmark on a novel two-step framework that decouples model inference from task-specific evaluation based on standardized few-shot adaptation. As a central design choice, we constructed indirectly accessible sequestered test sets derived from clinically relevant cohorts, along with standardized evaluation code and a submission interface on an open benchmarking platform. Performance is aggregated into a single UNICORN Score, a new metric that we introduce to support direct comparison of foundation models across diverse medical domains, modalities, and task types. The UNICORN test dataset includes data from more than 2,400 patients, including over 3,700 vision cases and over 2,400 clinical reports collected from 17 institutions across eight countries. The benchmark spans eight anatomical regions and four imaging modalities. Both task-specific and aggregated leaderboards enable accessible, standardized, and reproducible evaluation. By standardizing multi-task, multi-modality assessment, UNICORN establishes a foundation for reproducible benchmarking of medical foundation models. Data, baseline methods, and the evaluation platform are publicly available via unicorn.grand-challenge.org.

ARXIV Cancer: brain tumor Method: CNN autoencoder

In-batch Relational Features Enhance Precision in An Unsupervised Medical Anomaly Detection Task

P. Bilha Githinji, Xi Yuan, Ijaz Gul, Lian Zhang, Jinhao Xu, Zhenglin Chen, Peiwu Qin, Dongmei Yu
Published 2026-03-03 07:00

This study addresses the challenge of distinguishing confounding pathology from normal anatomical variation in unsupervised medical-image anomaly detection. The authors propose an enhancement to a CNN autoencoder by incorporating batch-wise hypergraph estimation and a shared-weights graph convolution layer, resulting in a population-aware embedding. The method demonstrates improved separability between healthy and pathological samples on a heterogeneous brain-tumor dataset, achieving significant gains in AUC-ROC and average precision.

Read abstract

Confounding pathology with normal anatomical variation remains a significant challenge in unsupervised medical-image anomaly detection, resulting in numerous false positives. To enhance integration of healthy variation, we augment the latent representation of a CNN autoencoder with contextual similarities within a normal cohort through batch-wise hypergraph estimation and a shared-weights graph convolution layer, producing a population-aware embedding. On a heterogeneous brain-tumor dataset of 2D MRI scans, the method improves separability between healthy and pathological samples, achieving an AUC-ROC of 0.90 (95% CI 0.84-0.95, 5.7% absolute gain), and a 16% absolute improvement in average precision (0.78 AP, 95% CI 0.66-0.89), thereby lowering false-positive rates. Moreover, both anomaly detection and downstream tumor versus no-tumor classification performance improve with the size of the mini-batch context captured in the augmented representation, suggesting a tunable lever for integrating healthy variation.

ARXIV Cancer: breast cancer Method: unknown

Less Is More in Chemotherapy of Breast Cancer

Fatemeh Ansarizadeh, Tonghua Zhang
Published 2026-03-03 03:40

This study presents a mathematical model that analyzes the interactions among tumor cells, healthy cells, and immune cells in breast cancer. The model, consisting of four differential equations, demonstrates the superior efficacy of metronomic chemotherapy compared to the maximum tolerated dose method. The findings highlight the challenges of complete tumor elimination through chemotherapy alone and emphasize the importance of combination therapies. Sensitivity analysis confirms the model's robustness, particularly under metronomic treatment protocols.

Read abstract

This study presents a mathematical model that captures the interactions among tumor cells, healthy cells, and immune cells in a tumor-bearing host, with a specific focus on breast cancer. Incorporating the concept of delay, the model consists of four differential equations to analyze these cellular dynamics. The findings demonstrate the superior efficacy of metronomic chemotherapy compared to the maximum tolerated dose (MTD) method and underscore the necessity of adjunct therapies. Oscillatory tumor cell dynamics revealed by the model highlight the challenges of achieving complete tumor elimination through chemotherapy alone. Sensitivity analysis confirms the robustness of the model, particularly under metronomic treatment protocols, aligning with experimental observations regarding metronomic-to-MTD dosage ratios. Furthermore, the results emphasize the importance of synergistic effects from combination therapies. This biologically consistent framework provides valuable insights into tumor-immune interactions and offers a foundation for optimizing therapeutic strategies in cancer treatment.

ARXIV Cancer: breast cancer Method: multimodal learning

Bridging the gap between Performance and Interpretability: An Explainable Disentangled Multimodal Framework for Cancer Survival Prediction

Aniek Eijpe, Soufyan Lakbir, Melis Erdal Cesur, Sara P. Oliveira, Angelos Chatzimparmpas, Sanne Abeln, Wilson Silva
Published 2026-03-02 18:26

This paper presents DIMAFx, an explainable multimodal framework designed for cancer survival prediction, which enhances interpretability while maintaining high performance. The framework utilizes histopathology whole-slide images and transcriptomics data to produce disentangled representations. Results demonstrate that DIMAFx achieves state-of-the-art performance across multiple cancer cohorts and reveals significant multimodal interactions relevant to breast cancer biology.

Read abstract

While multimodal survival prediction models are increasingly more accurate, their complexity often reduces interpretability, limiting insight into how different data sources influence predictions. To address this, we introduce DIMAFx, an explainable multimodal framework for cancer survival prediction that produces disentangled, interpretable modality-specific and modality-shared representations from histopathology whole-slide images and transcriptomics data. Across multiple cancer cohorts, DIMAFx achieves state-of-the-art performance and improved representation disentanglement. Leveraging its interpretable design and SHapley Additive exPlanations, DIMAFx systematically reveals key multimodal interactions and the biological information encoded in the disentangled representations. In breast cancer survival prediction, the most predictive features contain modality-shared information, including one capturing solid tumor morphology contextualized primarily by late estrogen response, where higher-grade morphology aligned with pathway upregulation and increased risk, consistent with known breast cancer biology. Key modality-specific features capture microenvironmental signals from interacting adipose and stromal morphologies. These results show that multimodal models can overcome the traditional trade-off between performance and explainability, supporting their application in precision medicine.

ARXIV Cancer: unknown Method: convolutional neural network

OpenRad: a Curated Repository of Open-access AI models for Radiology

Konstantinos Vrettos, Galini Papadaki, Emmanouil Brilakis, Matthaios Triantafyllou, Dimitrios Leventis, Despina Staraki, Maria Mavroforou, Eleftherios Tzanis, Konstantina Giouroukou, Michail E. Klontzas
Published 2026-03-02 16:51

OpenRad is a curated, standardized, open-access repository designed to aggregate AI models in radiology, addressing issues of discoverability and reproducibility. The repository includes approximately 1700 models across various imaging modalities and subspecialties, with a focus on providing detailed metadata and facilitating community contributions. A retrospective analysis of literature revealed that convolutional neural networks and transformer architectures are the most commonly used methods in these models.

Read abstract

The rapid developments in artificial intelligence (AI) research in radiology have produced numerous models that are scattered across various platforms and sources, limiting discoverability, reproducibility and clinical translation. Herein, OpenRad was created, a curated, standardized, open-access repository that aggregates radiology AI models and providing details such as the availability of pretrained weights and interactive applications. Retrospective analysis of peer reviewed literature and preprints indexed in PubMed, arXiv and Scopus was performed until Dec 2025 (n = 5239 records). Model records were generated using a locally hosted LLM (gpt-oss:120b), based on the RSNA AI Roadmap JSON schema, and manually verified by ten expert reviewers. Stability of LLM outputs was assessed on 225 randomly selected papers using text similarity metrics. A total of 1694 articles were included after review. Included models span all imaging modalities (CT, MRI, X-ray, US) and radiology subspecialties. Automated extraction demonstrated high stability for structured fields (Levenshtein ratio > 90%), with 78.5% of record edits being characterized as minor during expert review. Statistical analysis of the repository revealed CNN and transformer architectures as dominant, while MRI was the most commonly used modality (in 621 neuroradiology AI models). Research output was mostly concentrated in China and the United States. The OpenRad web interface enables model discovery via keyword search and filters for modality, subspecialty, intended use, verification status and demo availability, alongside live statistics. The community can contribute new models through a dedicated portal. OpenRad contains approx. 1700 open access, curated radiology AI models with standardized metadata, supplemented with analysis of code repositories, thereby creating a comprehensive, searchable resource for the radiology community.

ARXIV Cancer: unknown Method: NeuroSymbolic abductive reasoning

NeuroSymb-MRG: Differentiable Abductive Reasoning with Active Uncertainty Minimization for Radiology Report Generation

Rong Fu, Yiqing Lyu, Chunlei Meng, Muge Qi, Yabin Jin, Qi Zhao, Li Bao, Juntao Gao, Fuqian Shi, Nilanjan Dey, Wei Luo, Simon Fong
Published 2026-03-02 11:31

The paper presents NeuroSymb-MRG, a framework designed for the automatic generation of radiology reports, aiming to enhance documentation consistency and reduce clinician workload. It integrates NeuroSymbolic abductive reasoning with active uncertainty minimization to produce structured reports that are clinically grounded. The system effectively maps image features to clinical concepts and employs a unique sampling loop to refine outputs, showing improvements in factual consistency and language metrics in experiments.

Read abstract

Automatic generation of radiology reports seeks to reduce clinician workload while improving documentation consistency. Existing methods that adopt encoder-decoder or retrieval-augmented pipelines achieve progress in fluency but remain vulnerable to visual-linguistic biases, factual inconsistency, and lack of explicit multi-hop clinical reasoning. We present NeuroSymb-MRG, a unified framework that integrates NeuroSymbolic abductive reasoning with active uncertainty minimization to produce structured, clinically grounded reports. The system maps image features to probabilistic clinical concepts, composes differentiable logic-based reasoning chains, decodes those chains into templated clauses, and refines the textual output via retrieval and constrained language-model editing. An active sampling loop driven by rule-level uncertainty and diversity guides clinician-in-the-loop adjudication and promptbook refinement. Experiments on standard benchmarks demonstrate consistent improvements in factual consistency and standard language metrics compared to representative baselines.

Find the papers that actually matter