Research Papers

ARXIV Cancer: brain tumor Method: deep learning

XMorph: Explainable Brain Tumor Analysis Via LLM-Assisted Hybrid Deep Intelligence

Sepehr Salem Ghahfarokhi, M. Moein Esfahani, Raj Sunderraman, Vince Calhoun, Mohammed Alser
Published 2026-02-24 18:28

The paper presents XMorph, an explainable and computationally efficient framework designed for the fine-grained classification of brain tumors, specifically glioma, meningioma, and pituitary tumors. It introduces an Information-Weighted Boundary Normalization mechanism to enhance the representation of tumor boundaries and employs a dual-channel explainable AI module for interpretability. The framework achieves a classification accuracy of 96.0%, highlighting the potential for combining explainability with high performance in AI-based medical imaging.

Read abstract

Deep learning has significantly advanced automated brain tumor diagnosis, yet clinical adoption remains limited by interpretability and computational constraints. Conventional models often act as opaque ''black boxes'' and fail to quantify the complex, irregular tumor boundaries that characterize malignant growth. To address these challenges, we present XMorph, an explainable and computationally efficient framework for fine-grained classification of three prominent brain tumor types: glioma, meningioma, and pituitary tumors. We propose an Information-Weighted Boundary Normalization (IWBN) mechanism that emphasizes diagnostically relevant boundary regions alongside nonlinear chaotic and clinically validated features, enabling a richer morphological representation of tumor growth. A dual-channel explainable AI module combines GradCAM++ visual cues with LLM-generated textual rationales, translating model reasoning into clinically interpretable insights. The proposed framework achieves a classification accuracy of 96.0%, demonstrating that explainability and high performance can co-exist in AI-based medical imaging systems. The source code and materials for XMorph are all publicly available at: https://github.com/ALSER-Lab/XMorph.

ARXIV Cancer: general cancer Method: vision-language model

LUMEN: Longitudinal Multi-Modal Radiology Model for Prognosis and Diagnosis

Zhifan Jiang, Dong Yang, Vishwesh Nath, Abhijeet Parida, Nishad P. Kulkarni, Ziyue Xu, Daguang Xu, Syed Muhammad Anwar, Holger R. Roth, Marius George Linguraru
Published 2026-02-24 17:42

The paper presents LUMEN, a novel training framework designed for the longitudinal interpretation of chest X-rays (CXR) using vision-language models. It focuses on enhancing prognostic and diagnostic performance through multi-image and multi-task instruction fine-tuning. Experiments conducted on the MIMIC-CXR dataset demonstrate significant improvements in diagnostic tasks and highlight the potential for prognostic capabilities in radiology.

Read abstract

Large vision-language models (VLMs) have evolved from general-purpose applications to specialized use cases such as in the clinical domain, demonstrating potential for decision support in radiology. One promising application is assisting radiologists in decision-making by the analysis of radiology imaging data such as chest X-rays (CXR) via a visual and natural language question-answering (VQA) interface. When longitudinal imaging is available, radiologists analyze temporal changes, which are essential for accurate diagnosis and prognosis. The manual longitudinal analysis is a time-consuming process, motivating the development of a training framework that can provide prognostic capabilities. We introduce a novel training framework LUMEN, that is optimized for longitudinal CXR interpretation, leveraging multi-image and multi-task instruction fine-tuning to enhance prognostic and diagnostic performance. We conduct experiments on the publicly available MIMIC-CXR and its associated Medical-Diff-VQA datasets. We further formulate and construct a novel instruction-following dataset incorporating longitudinal studies, enabling the development of a prognostic VQA task. Our method demonstrates significant improvements over baseline models in diagnostic VQA tasks, and more importantly, shows promising potential for prognostic capabilities. These results underscore the value of well-designed, instruction-tuned VLMs in enabling more accurate and clinically meaningful radiological interpretation of longitudinal radiological imaging data.

ARXIV Cancer: brain cancer Method: report-supervised learning

Multimodal MRI Report Findings Supervised Brain Lesion Segmentation with Substructures

Yubin Ge, Yongsong Huang, Xiaofeng Liu
Published 2026-02-24 15:14

This study introduces a novel report-supervised learning method, termed MS-RSuper, for brain lesion segmentation in MRI scans. The method addresses the challenges posed by incomplete radiology reports by aligning qualitative and quantitative findings with substructures of brain tumors. The proposed approach demonstrates significant improvements over existing methods on a dataset of 1238 report-labeled scans.

Read abstract

Report-supervised (RSuper) learning seeks to alleviate the need for dense tumor voxel labels with constraints derived from radiology reports (e.g., volumes, counts, sizes, locations). In MRI studies of brain tumors, however, we often involve multi-parametric scans and substructures. Here, fine-grained modality/parameter-wise reports are usually provided along with global findings and are correlated with different substructures. Moreover, the reports often describe only the largest lesion and provide qualitative or uncertain cues (``mild,'' ``possible''). Classical RSuper losses (e.g., sum volume consistency) can over-constrain or hallucinate unreported findings under such incompleteness, and are unable to utilize these hierarchical findings or exploit the priors of varied lesion types in a merged dataset. We explicitly parse the global quantitative and modality-wise qualitative findings and introduce a unified, one-sided, uncertainty-aware formulation (MS-RSuper) that: (i) aligns modality-specific qualitative cues (e.g., T1c enhancement, FLAIR edema) with their corresponding substructures using existence and absence losses; (ii) enforces one-sided lower-bounds for partial quantitative cues (e.g., largest lesion size, minimal multiplicity); and (iii) adds extra- vs. intra-axial anatomical priors to respect cohort differences. Certainty tokens scale penalties; missing cues are down-weighted. On 1238 report-labeled BraTS-MET/MEN scans, our MS-RSuper largely outperforms both a sparsely-supervised baseline and a naive RSuper method.

ARXIV Cancer: breast cancer Method: graph neural networks

GrapHist: Graph Self-Supervised Learning for Histopathology

Sevda Öğüt, Cédric Vincent-Cuaz, Natalia Dubljevic, Carlos Hurtado, Vaishnavi Subramanian, Pascal Frossard, Dorina Thanou
Published 2026-02-24 12:11

This paper presents GrapHist, a novel graph-based self-supervised learning framework designed for histopathology. The method models tissues as cell graphs to enhance representation learning, integrating masked autoencoders and heterophilic graph neural networks. GrapHist is pre-trained on a large dataset of cell graphs from breast tissues and demonstrates competitive performance in various tasks while requiring fewer parameters compared to traditional models.

Read abstract

Self-supervised vision models have achieved notable success in digital pathology. However, their domain-agnostic transformer architectures are not originally designed to account for fundamental biological elements of histopathology images, namely cells and their complex interactions. In this work, we hypothesize that a biologically-informed modeling of tissues as cell graphs offers a more efficient representation learning. Thus, we introduce GrapHist, a novel graph-based self-supervised learning framework for histopathology, which learns generalizable and structurally-informed embeddings that enable diverse downstream tasks. GrapHist integrates masked autoencoders and heterophilic graph neural networks that are explicitly designed to capture the heterogeneity of tumor microenvironments. We pre-train GrapHist on a large collection of 11 million cell graphs derived from breast tissues and evaluate its transferability across in- and out-of-domain benchmarks. Our results show that GrapHist achieves competitive performance compared to its vision-based counterparts in slide-, region-, and cell-level tasks, while requiring four times fewer parameters. It also drastically outperforms fully-supervised graph models on cancer subtyping tasks. Finally, we also release five graph-based digital pathology datasets used in our study at https://huggingface.co/ogutsevda/datasets , establishing the first large-scale graph benchmark in this field. Our code is available at https://github.com/ogutsevda/graphist .

ARXIV Cancer: general cancer Method: deep learning

The Sim-to-Real Gap in MRS Quantification: A Systematic Deep Learning Validation for GABA

Zien Ma, S. M. Shermer, Oktay Karakuş, Frank C. Langbein
Published 2026-02-23 19:16

This study investigates the application of deep learning techniques for quantifying low-concentration metabolites, specifically GABA, using magnetic resonance spectroscopy (MRS). A convolutional neural network (CNN) and a Y-shaped autoencoder (YAE) were developed and validated on simulated and experimental spectra. The models demonstrated superior performance compared to the conventional LCModel quantification tool, achieving lower mean absolute errors in metabolite concentration estimation.

Read abstract

Magnetic resonance spectroscopy (MRS) is used to quantify metabolites in vivo and estimate biomarkers for conditions ranging from neurological disorders to cancers. Quantifying low-concentration metabolites such as GABA ($γ$-aminobutyric acid) is challenging due to low signal-to-noise ratio (SNR) and spectral overlap. We investigate and validate deep learning for quantifying complex, low-SNR, overlapping signals from MEGA-PRESS spectra, devise a convolutional neural network (CNN) and a Y-shaped autoencoder (YAE), and select the best models via Bayesian optimisation on 10,000 simulated spectra from slice-profile-aware MEGA-PRESS simulations. The selected models are trained on 100,000 simulated spectra. We validate their performance on 144 spectra from 112 experimental phantoms containing five metabolites of interest (GABA, Glu, Gln, NAA, Cr) with known ground truth concentrations across solution and gel series acquired at 3 T under varied bandwidths and implementations. These models are further assessed against the widely used LCModel quantification tool. On simulations, both models achieve near-perfect agreement (small MAEs; regression slopes $\approx 1.00$, $R^2 \approx 1.00$). On experimental phantom data, errors initially increased substantially. However, modelling variable linewidths in the training data significantly reduced this gap. The best augmented deep learning models achieved a mean MAE for GABA over all phantom spectra of 0.151 (YAE) and 0.160 (FCNN) in max-normalised relative concentrations, outperforming the conventional baseline LCModel (0.220). A sim-to-real gap remains, but physics-informed data augmentation substantially reduced it. Phantom ground truth is needed to judge whether a method will perform reliably on real data.

ARXIV Cancer: general cancer Method: multimodal learning

Closing the gap in multimodal medical representation alignment

Eleonora Grassucci, Giordano Cicchetti, Danilo Comminiello
Published 2026-02-23 16:57

This study addresses the modality gap in multimodal learning, particularly in the medical domain, where CLIP-based contrastive losses can lead to poor semantic alignment. The authors propose a modality-agnostic framework that improves the alignment of semantically related representations across different modalities. Their method enhances the alignment between radiology images and clinical text, resulting in better cross-modal retrieval and image captioning.

Read abstract

In multimodal learning, CLIP has emerged as the de-facto approach for mapping different modalities into a shared latent space by bringing semantically similar representations closer while pushing apart dissimilar ones. However, CLIP-based contrastive losses exhibit unintended behaviors that negatively impact true semantic alignment, leading to sparse and fragmented latent spaces. This phenomenon, known as the modality gap, has been partially mitigated for standard text and image pairs but remains unknown and unresolved in more complex multimodal settings, such as the medical domain. In this work, we study this phenomenon in the latter case, revealing that the modality gap is present also in medical alignment, and we propose a modality-agnostic framework that closes this gap, ensuring that semantically related representations are more aligned, regardless of their source modality. Our method enhances alignment between radiology images and clinical text, improving cross-modal retrieval and image captioning.

ARXIV Cancer: endometrial carcinoma Method: deep learning

Efficient endometrial carcinoma screening via cross-modal synthesis and gradient distillation

Dongjing Shan, Yamei Luo, Jiqing Xuan, Lu Huang, Jin Li, Mengchu Yang, Zeyu Chen, Fajin Lv, Yong Tang, Chunxiang Zhang
Published 2026-02-23 13:22

This study presents a two-stage deep learning framework aimed at improving the screening of endometrial carcinoma (EC) by addressing data and computational challenges. The method includes a structure-guided cross-modal generation network that synthesizes ultrasound images from MRI data and a lightweight screening network utilizing gradient distillation. The model achieved high sensitivity and specificity in a large cohort, significantly outperforming expert sonographers in diagnostic accuracy.

Read abstract

Early detection of myometrial invasion is critical for the staging and life-saving management of endometrial carcinoma (EC), a prevalent global malignancy. Transvaginal ultrasound serves as the primary, accessible screening modality in resource-constrained primary care settings; however, its diagnostic reliability is severely hindered by low tissue contrast, high operator dependence, and a pronounced scarcity of positive pathological samples. Existing artificial intelligence solutions struggle to overcome this severe class imbalance and the subtle imaging features of invasion, particularly under the strict computational limits of primary care clinics. Here we present an automated, highly efficient two-stage deep learning framework that resolves both data and computational bottlenecks in EC screening. To mitigate pathological data scarcity, we develop a structure-guided cross-modal generation network that synthesizes diverse, high-fidelity ultrasound images from unpaired magnetic resonance imaging (MRI) data, strictly preserving clinically essential anatomical junctions. Furthermore, we introduce a lightweight screening network utilizing gradient distillation, which transfers discriminative knowledge from a high-capacity teacher model to dynamically guide sparse attention towards task-critical regions. Evaluated on a large, multicenter cohort of 7,951 participants, our model achieves a sensitivity of 99.5\%, a specificity of 97.2\%, and an area under the curve of 0.987 at a minimal computational cost (0.289 GFLOPs), substantially outperforming the average diagnostic accuracy of expert sonographers. Our approach demonstrates that combining cross-modal synthetic augmentation with knowledge-driven efficient modeling can democratize expert-level, real-time cancer screening for resource-constrained primary care settings.

ARXIV Cancer: unknown Method: deep learning

Towards Personalized Multi-Modal MRI Synthesis across Heterogeneous Datasets

Yue Zhang, Zhizheng Zhuo, Siyao Xu, Shan Lv, Zhaoxi Liu, Jun Qiu, Qiuli Wang, Yaou Liu, S. Kevin Zhou
Published 2026-02-23 11:20

This paper presents PMM-Synth, a personalized MRI synthesis framework designed to synthesize missing modalities in multi-modal MRI, addressing challenges related to time constraints and patient tolerance. The framework is trained on diverse clinical datasets to enhance generalizability and is evaluated across multiple tasks, demonstrating superior performance compared to existing methods. Key innovations include a Personalized Feature Modulation module and a Modality-Consistent Batch Scheduler, which improve training stability and effectiveness in scenarios with incomplete modality data.

Read abstract

Synthesizing missing modalities in multi-modal magnetic resonance imaging (MRI) is vital for ensuring diagnostic completeness, particularly when full acquisitions are infeasible due to time constraints, motion artifacts, and patient tolerance. Recent unified synthesis models have enabled flexible synthesis tasks by accommodating various input-output configurations. However, their training and evaluation are typically restricted to a single dataset, limiting their generalizability across diverse clinical datasets and impeding practical deployment. To address this limitation, we propose PMM-Synth, a personalized MRI synthesis framework that not only supports various synthesis tasks but also generalizes effectively across heterogeneous datasets. PMM-Synth is jointly trained on multiple multi-modal MRI datasets that differ in modality coverage, disease types, and intensity distributions. It achieves cross-dataset generalization through three core innovations: a Personalized Feature Modulation module that dynamically adapts feature representations based on dataset identifier to mitigate the impact of distributional shifts; a Modality-Consistent Batch Scheduler that facilitates stable and efficient batch training under inconsistent modality conditions; and a selective supervision loss to ensure effective learning when ground truth modalities are partially missing. Evaluated on four clinical multi-modal MRI datasets, PMM-Synth consistently outperforms state-of-the-art methods in both one-to-one and many-to-one synthesis tasks, achieving superior PSNR and SSIM scores. Qualitative results further demonstrate improved preservation of anatomical structures and pathological details. Additionally, downstream tumor segmentation and radiological reporting studies suggest that PMM-Synth holds potential for supporting reliable diagnosis under real-world modality-missing scenarios.

ARXIV Cancer: glioblastoma Method: 3D nnU-Net

Robust Glioblastoma Segmentation and Volumetry Without T2-FLAIR: External Validation of Targeted Dropout Training

Marco Öchsner, Lena Kaiser, Robert Stahl, Nathalie L. Albert, Thomas Liebig, Robert Forbrig, Jonas Reis
Published 2026-02-23 08:51

This study aims to validate targeted T2-FLAIR dropout for automated glioblastoma segmentation and volumetry without the T2-FLAIR sequence. Using 3D nnU-Net models trained on a large dataset, the research demonstrates that performance is maintained even when T2-FLAIR is absent. The results indicate significant improvements in segmentation accuracy and volumetric bias in the absence of T2-FLAIR, supporting the use of targeted dropout as a robust strategy in clinical workflows.

Read abstract

Objectives: To externally validate targeted T2 fluid-attenuated inversion recovery (T2-FLAIR) dropout for robust automated glioblastoma segmentation and whole-tumor volumetry without T2-FLAIR, while preserving performance when the full MRI protocol is available. Methods: In this retrospective multi-dataset study, 3D nnU-Net models were developed on BraTS 2021 (n=848) and externally validated on an independent University of Pennsylvania glioblastoma cohort (n=403). Models were trained with or without targeted T2-FLAIR dropout, zeroing the T2-FLAIR channel during training. Testing used prespecified T2-FLAIR-present and T2-FLAIR-absent scenarios; the absent scenario was simulated by zeroing the T2-FLAIR channel at inference. The primary endpoint was per-patient overall region-wise Dice similarity coefficient (DSC). Secondary endpoints were region-specific DSC, 95th percentile Hausdorff distance, and Bland-Altman whole-tumor volume bias. Results: In external validation, performance was preserved with the full MRI protocol: overall median DSC was 94.8% (interquartile range [IQR] 90.0%-97.1%) with dropout and 95.0% (IQR 90.3%-97.1%) without dropout. In the T2-FLAIR-absent scenario, targeted dropout improved overall median DSC from 81.0% (IQR 75.1%-86.4%) to 93.4% (IQR 89.1%-96.2%). Whole-tumor DSC improved from 60.4% to 92.6%, whole-tumor 95th percentile Hausdorff distance from 17.24 mm to 2.45 mm, and whole-tumor volume bias from -45.6 mL to 0.83 mL. Conclusions: In an independent external test cohort, targeted T2-FLAIR dropout preserved glioblastoma segmentation performance with the full MRI protocol and substantially reduced whole-tumor segmentation error and volumetric bias when T2-FLAIR was absent. These findings support targeted sequence dropout as a practical robustness strategy for automated glioblastoma analysis in retrospective and heterogeneous clinical workflows.

ARXIV Cancer: glioblastoma Method: nnU-Net

Targeted T2-FLAIR Dropout Training Improves Robustness of nnU-Net Glioblastoma Segmentation to Missing T2-FLAIR

Marco Öchsner, Lena Kaiser, Robert Stahl, Nathalie L. Albert, Thomas Liebig, Robert Forbrig, Jonas Reis
Published 2026-02-23 08:51

This study investigates the impact of targeted T2-FLAIR dropout training on the robustness of nnU-Net models for glioblastoma MRI tumor segmentation. The models were trained on a large dataset and tested under scenarios with and without T2-FLAIR. Results indicate that dropout training significantly improved segmentation accuracy when T2-FLAIR was absent, while maintaining performance when it was present.

Read abstract

Purpose: To determine whether targeted T2 fluid-attenuated inversion recovery (T2-FLAIR) dropout training improves glioblastoma MRI tumor segmentation robustness to missing T2-FLAIR without degrading performance when T2-FLAIR is available. Materials and Methods: This retrospective multi-dataset study developed nnU-Net models on BraTS 2021 (n=848) and externally tested them on UPenn-GBM glioblastoma MRI (n=403; 2006-2018; age 18-89 years; 60% male). Models were trained with no dropout or targeted T2-FLAIR dropout (probability rate r=0.35 or 0.50) by replacing only the T2-FLAIR channel with zeros. Inference used T2-FLAIR-present and T2-FLAIR-absent scenarios (T2-FLAIR set to zero). The primary endpoint was Dice similarity coefficient (DSC); secondary endpoints were 95th percentile Hausdorff distance and Bland-Altman whole-tumor volume bias. Equivalence was assessed with two one-sided tests using +/-1.5 DSC percentage points, and noninferiority versus HD-GLIO used a -1.5-point margin. Results: With T2-FLAIR present, median overall DSC was 94.8% (interquartile range, 90.0%-97.1%) with dropout and 95.0% (interquartile range, 90.3%-97.1%) without dropout (equivalence supported, p<0.001). With T2-FLAIR absent, median overall DSC improved from 81.0% (interquartile range, 75.1%-86.4%) without dropout to 93.4% (interquartile range, 89.1%-96.2%) with dropout (r=0.35); edema DSC improved from 14.0% to 87.0%, edema 95th percentile Hausdorff distance improved from 22.44 mm to 2.45 mm, and whole-tumor volume bias improved from -45.6 mL to 0.83 mL. Dropout was noninferior to HD-GLIO under T2-FLAIR-present (all p<0.001). Conclusion: Targeted T2-FLAIR dropout preserved segmentation performance when T2-FLAIR was available and reduced segmentation error and whole-tumor volume bias when T2-FLAIR was absent.

Find the papers that actually matter