Research Papers

ARXIV Cancer: general cancer Method: AI-assisted drafting

Evidence-Linked Radiology Reporting: A Human-Supervised Reference Architecture for Structured Imaging Intelligence

Houman Kazemzadeh, Kamyar Naderi
Published 2026-05-24 15:07

This paper presents a human-supervised reference architecture aimed at enhancing structured radiology reporting. It integrates various components such as exam-specific templates and AI-assisted drafting to improve the communication of imaging findings. The proposed framework emphasizes interoperability and supports clinical data reuse while addressing safety and regulatory considerations.

Read abstract

Radiology reports remain the primary mechanism by which imaging findings are communicated to clinical teams. However, much of the structured information behind these reports, including measurements, image evidence, prior comparisons, lesion identity, uncertainty, and terminology, often remains trapped in free text or fragmented across picture archiving and communication systems, radiology information systems, reporting workstations, worksheets, advanced visualization tools, and electronic health records. This paper proposes a human-supervised, evidence-linked reference architecture for structured radiology reporting. The framework combines exam-specific templates, speech-to-structure processing, measurement and segmentation capture, controlled AI-assisted drafting, and standards-based interoperability using DICOM, DICOM Structured Reporting, DICOM Segmentation, HL7 FHIR, RadLex, SNOMED CT, LOINC, and UCUM. The system is positioned not as an autonomous report generator, but as a structured intelligence layer for enterprise imaging that supports reviewed reporting, longitudinal comparison, clinical data reuse, governance, and integration with PACS, RIS, EHR, analytics, and registry workflows. The paper also discusses modality-specific deployment considerations, clinical safety risks, validation requirements, cybersecurity, privacy, quality management, and regulatory boundaries for AI-assisted radiology reporting systems.

ARXIV Cancer: non-small cell lung cancer Method: multimodal learning

Multimodality Stacking with Blockwise missing values and application to the PIONeeR biomarkers study for prediction of resistance to immunotherapy

Mohamed Boussena, Florence Monville, Jacques Fieschi-Meric, Frederic Vely, Pierre Milpied, Julien Mazieres, Maurice Perol, Eric Vivier, Laurent Greillier, Fabrice Barlesi, Sebastien Benzekry
Published 2026-05-24 12:48

This paper presents Multimodality Stacking with Blockwise missing values (MSB), a framework designed to address the challenges of high dimensionality and missing data in clinical oncology. The method was validated on the PIONeeR study, focusing on predicting progression-free survival in patients with advanced non-small cell lung cancer undergoing immunotherapy. MSB demonstrated superior predictive performance compared to baseline algorithms, highlighting its potential for effective survival analysis in the presence of incomplete datasets.

Read abstract

Integrating multimodal datasets in clinical oncology is frequently hindered by high dimensionality and blockwise missingness, where entire data sources are unavailable for specific patient subsets. Standard survival models often struggle with these gaps, leading to biased results or patient exclusion. We introduce Multimodality Stacking with Blockwise missing values (MSB), a late-fusion framework for survival analysis that independently models modality-specific features before aggregating predictions via a cross-validated stacking meta-learner. MSB was validated on the PIONeeR study (n=443 patients, 378 biomarkers across eight heterogeneous sources) to predict progression-free survival in advanced non-small cell lung cancer patients receiving immunotherapy. MSB yielded higher predictive performance (C-index) than baseline algorithms. Improvements varied by baseline strength: linear models showed a 15.9% increase (p<0.001 for the Wilcoxon signed-rank test), random survival forests gained 5.4% (p=0.002), and gradient boosting methods improved by 2.1% (p=0.030). Beyond discrimination, MSB reduced the generalization gap (train-test difference in 5 folds cross-validation repeated 3 times: 0.055 vs 0.380 for linear models). Permutation importance analysis identified routine laboratory markers, clinical features, and PD-L1 expression as primary predictive drivers. Missing block indicators showed negligible importance, suggesting the model learned from biomarker values rather than data availability patterns. MSB provides a statistically validated framework for multimodal survival prediction with blockwise missingness. By enabling systematic biomarker evaluation without requiring complete data, MSB offers a practical tool for predictive modeling in biomedical research, pending external validation. Implementation is available at https://github.com/MohamedBoussena/MSB under Inria license.

ARXIV Cancer: general cancer Method: sparse autoencoder

Universal Boosts, Specific Suppressors: Sparse Autoencoder Steering of Medical Vision-Language Models

Farhad Nooralahzadeh, Benjamin Gundersen, Nicolas Deperrois, Hidetoshi Matsuom, Mizuho Nishio, Thomas Frauenfelder, Ahmed Allam, Christian Blüthgen, Michael Moor, Michael Krauthammer
Published 2026-05-24 10:17

This study addresses the issue of hallucinations in medical vision-language models (VLMs) when generating chest X-ray reports. The authors propose a decoding-time residual steering method using a sparse autoencoder (SAE) to improve report quality without weight updates. Their approach demonstrates significant improvements in clinical metrics across multiple VLMs, confirming the effectiveness of the identified features in enhancing report accuracy.

Read abstract

Medical vision-language models (VLMs) often hallucinate findings when generating chest X-ray reports: they fabricate findings that are not present in the image, miss important ones, or locate them incorrectly. We mitigate this without weight updates by decoding-time residual steering on a per-token sparse autoencoder (SAE) basis: Top-$K$ SAEs on late layers, causal steering against clinical errors, then combined suppress/boost intervention at inference time. On the MIMIC-CXR test split, our inference-only method improves the quality of generated reports for three radiology VLMs (RadVLM, LLaVA-Rad, and CheXOne), with relative improvements of +5.4%, +7.2%, and +17.0% in the clinical composite metric, and statistically significant GREEN gains on all backbones. A cross-model feature alignment shows that the quality-promoting (boost) directions overlap strongly across architectures, whereas hallucination-linked (suppress) directions are model-specific. Therefore, transferable steering must treat suppression per-backbone, rather than sharing a universal suppress list. The same recipe transfers zero-shot to IU-Xray (Green $+7.7\%$ rel.) without retraining, confirming that the identified features are properties of the model, not of the training corpus. We release causal feature sets and an interactive feature dashboard: https://cxr-sparse-feature-dashboard.netlify.app/.

ARXIV Cancer: general cancer Method: deep learning

Catching MRI outliers: unsupervised detection and localization of MRI artefacts and clinical anomalies using deep learning

Mustafa Kadhim, Viktor Rogowski, Emilia Persson, Camila Gonzalez, André Haraldsson, Sofie Ceberg, Mikael Nilsson, Malin Kügele, Sven Bäck, Christian Jamtheim Gustafsson
Published 2026-05-23 14:44

This study presents a fully automated, unsupervised anomaly-detection framework for pelvic and brain MRI using deep learning. The framework was trained on reference images from public datasets and evaluated for its ability to detect clinical anomalies. Results indicated robust detection performance, with high sensitivity and specificity, supporting its potential application in radiotherapy workflows.

Read abstract

Artificial intelligence is increasingly integrated into radiotherapy workflows, yet such pipelines remain vulnerable to out-of-distribution image data that may introduce unexpected behavior in clinical tasks. Deep learning-based anomaly detection for pelvic magnetic resonance imaging (MRI) remains largely unexplored, and transparent evaluation of its feasibility for full automation is limited. We developed and evaluated a fully automated, unsupervised anomaly-detection framework for pelvic and brain MRI. A two-stage framework was trained on reference images from public datasets: LUND-PROBE for pelvic MRI, and IXI, fastMRI, and fastMRI+ for brain MRI. In the first stage, MRI slices were compressed into discrete tokens; in the second, the distribution of normal tokens was modeled. Anomaly evidence was estimated by combining perceptual image differences with token-surprisal scores based on negative log-likelihood. Automated detection was evaluated on pelvic MRI with synthetic global and real clinical anomalies, and on brain MRI with clinically annotated fastMRI+ abnormalities. Sensitivity, specificity, area under the receiver operating characteristic curve (AUC), and false-positive behavior in held-out normal cases were assessed. The framework achieved robust detection across hidden evaluation cohorts, with AUCs of 0.97 (95% CI, 0.95-0.98) and 0.81 (95% CI, 0.74-0.87) for pelvic and brain MRI, respectively. Heatmap analysis showed strong spatial agreement between detected anomalies and ground-truth locations, supporting localization accuracy and interpretability. These results support the potential of unsupervised anomaly detection as an automated MRI quality-control layer for radiotherapy workflows, with transparent visualization of image regions likely to compromise downstream AI-based tasks.

ARXIV Cancer: glioma Method: multimodal learning

ConceptM$^3$oE: Concept-Guided Multimodal Mixture of Experts for Interpretable Computational Pathology

Xuan Wang, Zhongling Xu, Gopi Kannedhara, Joakim Nguyen, Jian Yu, Jinrui Fang, Abdurrahmaan Baghdadi, Tianlong Chen, Awais Naeem, Chandra Krishnan, Edward Castillo, Andrew H. Song, Ankita Shukla, Ying Ding, Nicholas Konz, Hairong Wang
Published 2026-05-23 04:55

This paper presents ConceptM$^3$oE, a multimodal mixture of experts model designed for computational pathology, particularly in distinguishing complex tumor subtypes. The model integrates various diagnostic inputs, including pathology reports and molecular measurements, to enhance interpretability and performance. It demonstrates improved performance in data-limited scenarios and provides reasoning traces validated by neuropathologists, indicating its potential for clinical application.

Read abstract

Healthcare models are transitioning from unimodal prediction toward multimodal reasoning over heterogeneous diagnostic inputs. In computational pathology, for complex tumor subtypes where morphology alone can be challenging to distinguish, pathology reports and molecular measurements may provide additional diagnostic evidence alongside whole-slide images, yet existing models often fail to clarify how diverse signals assemble into recognizable diagnostic concepts. We propose ConceptM$^3$oE (Concept Multimodal MoE), which embeds concept formation directly within interaction-aware mixture-of-experts (MoE) pathways. The architecture decomposes evidence into modality-specific, redundant, and synergistic experts, which are then projected into structured concept bottlenecks mapping latent features to a hierarchy of morphology and biomarker concepts. To prevent the information loss typical of interpretable bottlenecks, we utilize residual pathways within each expert to allow task-relevant signals to flow both through the concepts and directly to the final task prediction, so that high performance is maintained alongside interpretability. Across an institutional pediatric brain tumor cohort and a public glioma cohort, the framework delivers competitive performance to unconstrained models while producing reasoning traces validated by an independent neuropathologist. In data-limited regimes, ConceptM$^3$oE improves limited-data performance, increasing macro-F1 from 56.41% to 66.70% at small training sizes compared to non-concept-informed baselines, while also showing faster training convergence consistent with the regularizing effect of concept learning. This work offers a scalable path toward high-performance medical AI that is inherently verifiable and better aligned with the complex decision-making of clinical practice.

ARXIV Cancer: breast cancer Method: clustering-based sampling

CRISP -- Clustering-Based Redundancy-Reduced Instance Sampling for Pathology Case Representation and Retrieval

Zahra Rahimi Afzal, Wataru Uegami, Saghir Alfasly, Saba Yasir, Judy C. Boughey, Matthew P. Goetz, Krishna R. Kalari, H. R. Tizhoosh
Published 2026-05-22 22:06

This paper presents CRISP, an unsupervised framework designed for comprehensive multi-whole-slide image (WSI) case processing in digital pathology. The method integrates information from all available slides within a case by selectively distilling informative patches, thereby capturing case-level heterogeneity. The results demonstrate that CRISP matches or surpasses current practices in patient and case retrieval using breast cancer datasets.

Read abstract

Digital pathology archives increasingly contain multiple whole-slide images (WSIs) per case, capturing spatially distinct tumour regions and reflecting intrinsic morphological heterogeneity. However, most existing approaches rely on a single pathologist-selected slide, thereby discarding potentially informative evidence distributed across the remaining WSIs. To date, no autonomous framework has been proposed for comprehensive multi-WSI case processing. Here, we present an unsupervised framework for case-level analysis that integrates information from all available slides within a case. Rather than relying on a single designated slide, the proposed approach constructs case-level representations by selectively distilling informative patches across WSIs. We introduce Clustering-Based Redundancy-Reduced Instance Sampling for Pathology (CRISP), a two-stage framework that first reduces redundancy within individual WSIs and subsequently applies clustering-based sampling to select a compact yet representative set of patches for the entire case. The resulting patch set captures case-level heterogeneity while avoiding exhaustive processing of gigapixel images, and directly serves as a retrieval index. Using two Mayo Clinic breast cancer datasets for diagnosis and treatment planning, we demonstrate that CRISP consistently matches or surpasses the current standard practice of combined model and pathologist slide selection for patient/case search and retrieval. By automating case-level processing and eliminating subjective WSI selection, CRISP potentially enables the exploitation of clinically relevant information distributed across multiple WSIs that is currently overlooked.

ARXIV Cancer: prostate cancer Method: knowledge graph-modulated deep learning

Knowledge Graph Modulated Deep Learning for Limited-Sample Clinical Data Analysis

Yuwei Xue, Sakib Mostafa, James Zou, Joseph Liao, Maximilian Diehn, Ash A. Alizadeh, Lei Xing, Md. Tauhidul Islam
Published 2026-05-22 19:33

This study presents Graph-in-Graph (GiG), a knowledge graph-modulated deep learning framework designed for data-efficient clinical prediction in limited-sample settings. GiG utilizes modular graphs to represent patients, integrating biological knowledge graphs to enhance the representation of gene interactions and pathway topology. The framework demonstrates significant performance improvements across various clinical tasks, particularly in prostate cancer diagnosis, where it achieves a notable increase in macro-F1 scores compared to traditional methods.

Read abstract

Biological systems are governed by structured molecular interactions, where pathways, regulatory circuits, and functional gene relationships shape cellular behavior and disease progression. Much of this knowledge is naturally represented as graphs. However, most biomedical AI models cannot directly use graph-encoded biological knowledge and instead require compressed low-dimensional representations, which can lose important structure and reduce performance, especially in limited-sample clinical studies. Here, we introduce Graph-in-Graph (GiG), a knowledge graph-modulated deep learning framework for data-efficient clinical prediction. GiG represents each patient as a standalone modular graph, in which curated biological knowledge graphs define edges and patient-specific measurements, such as gene expression, define node features. This design allows multiple biological knowledge graphs to be integrated while preserving gene-gene interactions and pathway topology during patient-level representation learning. Across cohorts comprising nearly 9,700 patients and five clinical tasks, including liquid biopsy cancer detection, prostate cancer diagnosis, and 32-class pan-cancer classification, GiG consistently outperforms traditional and state-of-the-art methods, with the largest gains in limited-sample settings. On the challenging prostate cancer diagnosis task, GiG improves macro-F1 by up to 49 percentage points relative to competing methods. Control experiments replacing real pathway graphs with random topologies confirm that these gains arise from biologically grounded knowledge graph structure rather than graph modeling alone. These findings show that knowledge graph-modulated deep learning can improve robustness, interpretability, and sample efficiency in clinical data analysis, and provide a principled framework for integrating biological knowledge graphs into predictive modeling.

ARXIV Cancer: colorectal cancer Method: deep learning

MuellerPT: Decomposition Driven Pretraining for Dense Learning in Mueller Polarimetry

Adam Tlemsani, Yingdian Li, Maxime Giot, Naim Slim, Christopher J. Peters, Abhijeet Ghosh, Daniel S. Elson
Published 2026-05-22 16:48

The paper presents MuellerPT, a physics-guided pre-training approach designed to enhance dense learning in Mueller polarimetry for biomedical tissue analysis. It addresses challenges in supervised learning due to limited annotations and domain shifts by predicting Lu-Chipman decomposition maps from Mueller matrices. The method demonstrates improved label efficiency and accuracy in both segmentation of grey vs. white matter in lamb brain and classification of colorectal cancer versus non-cancer, achieving significant gains over baseline models. The results indicate the potential of MuellerPT for robust biomedical inference.

Read abstract

Mueller matrix imaging provides rich, physically meaningful contrast for biomedical tissue analysis, but supervised learning is hindered by scarce dense annotations and strong domain shifts across specimens and acquisition settings. We introduce MuellerPT, a physics guided pre-training approach that learns transferable dense representations by predicting Lu-Chipman decomposition maps from per-pixel 4x4 Mueller matrices. To scale pre-training, we collected a new large Multispectral Animal Polarimetric Organ dataset (MAP-Org). The pre-trained encoder is adapted with a segmentation head for grey vs. white matter segmentation in lamb brain. A classification head is used for colorectal cancer vs. non-cancer classification. Both segmentation and classification are evaluated across few-shot learning scenarios. In segmentation, MuellerPT improves label efficiency and cross specimen transfer compared to models without pre-training, achieving an absolute DICE gain of over 20% compared to the baseline trained from scratch when using 5% of the training data. In classification, MuellerPT also enhances label efficiency, improving overall accuracy by 8% compared to the baseline when using 1% of the training data. We demonstrate MuellerPT's robustness to domain shift with a qualitative evaluation of its predicted Lu-Chipman maps on an ex vivo human oesophagus sample. These results suggest that predicting Lu-Chipman decomposition is an effective and practical pretext task for robust biomedical inference from Mueller polarimetry and can pave the way for future work on label efficient Mueller imaging.

ARXIV Cancer: glioma Method: Generative Mixture of Experts Network

GMENet: Generative Mixture of Experts Network for Multi-Center Glioma Diagnosis with Incomplete Imaging Sequences

Pengfei Song, Fangjin Liu, Wenwen Zeng, Yonghuang Wu, Chengqian Zhao, Feiyu Yin, Xuan Xie, Jinhua Yu
Published 2026-05-22 03:05

This paper presents GMENet, a Generative Mixture of Experts Network designed to enhance glioma diagnosis using incomplete MRI sequences. The method incorporates a Cross-attention-based Gated Generation Module to synthesize missing features and a Dynamically Weighted Experts Fusion Module for improved multi-task predictions. Evaluation on a large multi-center cohort demonstrates that GMENet significantly increases the usable training data and outperforms existing methods that rely solely on complete sequences.

Read abstract

Contemporary glioma diagnosis integrates molecular features with histopathology to guide clinical decision-making. However, in clinical settings, divergent imaging protocols result in incomplete MRI sequences, leading to two primary challenges: forcing existing frameworks to discard a large portion of clinical data during training and consequently limiting their clinical applicability. To address these limitations, we propose GMENet, a Generative Mixture of Experts Network for multi-center glioma diagnosis with incomplete imaging sequences. Firstly, we design a Cross-attention-based Gated Generation Module that synthesizes missing sequence features from available sequences via cross-attention and dynamic gating mechanisms, incorporating a cycle-consistency loss to preserve semantic integrity. Secondly, we introduce a Dynamically Weighted Experts Fusion Module that performs mixture-of-experts interaction and confidence-aware fusion over original and synthesized dual-sequence features for multi-task prediction. We evaluate GMENet on a multi-center cohort of 1,241 subjects from four in-house datasets and two public repositories. Experiments show that GMENet expands clinically usable training data by 97\%, relative to complete-sequence-only data. Furthermore, it consistently outperforms state-of-the-art methods trained on complete data, demonstrating improved robustness under cross-center distribution shifts.

ARXIV Cancer: pancreatic cancer Method: deep learning

Exploiting Longitudinal Context in Clinician-Verified Interactive Lesion Tracking

Yannick Kirchhoff, Maximilian Rokuss, Daniel Philipp Mertens, David Füller, Benjamin Hamm, Andreas Schreyer, Oliver Ritter, Klaus Maier-Hein
Published 2026-05-22 00:37

This study introduces a Verified Tracking paradigm for tracking tumor lesions across serial CT scans, addressing the limitations of existing automated methods. The proposed framework combines clinician verification with a model that utilizes both the current and prior lesion appearances to enhance segmentation accuracy. The approach demonstrates significant performance improvements, achieving first place in the MICCAI autoPET IV challenge and introducing a new benchmark for longitudinal pancreatic cancer assessment.

Read abstract

Tracking tumor lesions across serial CT scans is essential for oncological response assessment. Existing automated methods face a fundamental trade-off: end-to-end trackers achieve high automation but offer no opportunity to correct silent tracking failures, while decoupled registration-segmentation pipelines permit user verification yet discard the lesion's prior appearance, limiting accuracy in ambiguous cases. In this work, we propose a Verified Tracking paradigm: a clinician verifies a registration-proposed prompt, which the model leverages alongside the baseline lesion appearance to resolve segmentation ambiguities. We present a unified framework combining early spatial prompt fusion with latent temporal difference weighting for longitudinally-informed segmentation. To address data scarcity, we leverage large-scale synthetic pretraining, proving essential for exploiting longitudinal context, improving performance by up to 4.5 Dice points over training from scratch. Our approach secured first place in the MICCAI autoPET IV challenge. We further curate and release PanTrack, a new longitudinal pancreatic cancer benchmark, to assess out-of-distribution generalization. Experiments show that our model outperforms prior work in both fully automatic and the proposed verified tracking setting offering a clinically safe middle ground between automation and control. Code, model and dataset will be released at https://github.com/MIC-DKFZ/LongiSeg

Find the papers that actually matter