Research Papers

ARXIV Cancer: breast cancer Method: graph neural network

Graph neural network explanations reveal a topological signature of disease-associated hubs in biological networks

Kyle Higgins, Ivan Laponogov, Dennis Veselkov, Kirill Veselkov
Published 2026-05-08 15:29

This study evaluates the effectiveness of various explanation methods for graph neural networks (GNNs) in identifying disease-relevant structures in breast cancer RNA-seq data. The authors compare Saliency Attribution, Integrated Gradients, GNNExplainer, and Layer-wise Relevance Propagation, revealing distinct performance characteristics among them. A consistent topological signature of disease-associated hubs is identified, with implications for prioritizing cancer genes and improving biological interpretability.

Read abstract

Graph neural networks (GNNs) are increasingly used to model biological systems, yet the reliability of post-hoc explanation methods for recovering meaningful molecular mechanisms remains unclear. Here, we systematically evaluate four widely used approaches: Saliency Attribution (SA), Integrated Gradients (IG), GNNExplainer, and Layer-wise Relevance Propagation (LRP) for identifying disease-relevant structure in breast cancer RNA-seq data projected onto a protein-protein interaction network. Using synthetic benchmarks with known ground-truth motifs, we show that explanation methods recover distinct signal organizations: SA performs best for sparse single-node drivers, whereas IG and LRP preferentially recover distributed pathway-like and cascade-like signals. In TCGA BRCA data, we identify a consistent topological signature of disease-associated hubs in which attribution peaks in the immediate 1-hop neighborhood and decays across successive network shells, a pattern most pronounced for IG and LRP and associated with strong enrichment of known cancer hubs. We further observe a trade-off between local hub enrichment and global gene ranking performance, with IG optimizing local enrichment and SA achieving superior global discrimination. Motivated by these complementary behaviors, we introduce a framework combining a shell-based hub score with consensus ranking across explainers. Consensus scores improve prioritization of canonical cancer genes (TP53, BRCA1, ESR1, MYC), reduce dependence on node degree, and, especially when tuned, outperform individual methods. Pathway enrichment further reveals improved recovery of biologically coherent cancer programs, including ERBB2, RTK, MAPK, immune, and cytokine signaling. Together, these results demonstrate that topology-aware integration of graph explanations can improve biological interpretability and biologically relevant molecular recovery.

ARXIV Cancer: breast cancer Method: graph neural network

PPI-Net connects molecular protein interactions to functional processes in disease

Kyle Higgins, Guadalupe Gonzalez, Dennis Veselkov, Ivan Laponogov, Kirill Veselkov
Published 2026-05-08 15:04

The study introduces PPI-Net, a hierarchical graph neural network designed to connect molecular protein interactions to functional processes in cancer. By integrating protein-protein interaction networks with pathway-level representations, PPI-Net models disease progression from molecular interactions to biological functions. The model demonstrates robust predictive performance across RNA-seq data from multiple cancer types, achieving over 90% balanced accuracy. Notably, it shows improved accuracy in breast cancer by incorporating hierarchical supervision and multi-omics data.

Read abstract

Understanding how molecular alterations propagate across biological systems to drive disease remains a central challenge. Although high-throughput profiling enables comprehensive characterization of tumor states, most models neglect structured biological relationships or lack interpretability across scales. Here we present PPI-Net, a hierarchical graph neural network that integrates protein-protein interaction (PPI) networks with pathway-level representations to model disease from molecular interactions to functional processes. Patient-specific molecular profiles are embedded within a shared interaction network from STRING and propagated through a multi-layer Reactome hierarchy using graph attention, enabling aggregation of gene-level signals into higher-order biological programs. Across RNA-seq data from ten cancer types from The Cancer Genome Atlas, PPI-Net achieves robust predictive performance, with balanced accuracy exceeding 90% in multiple cohorts. Comparative analysis on RNA-Seq data from breast cancer demonstrated that PPI-Net's integration of the Reactome hierarchy improved balanced accuracy by 6.7% relative to a PPI-only model, while hierarchical multi-level supervision improved balanced accuracy by 12.3% relative to using only a single top-level prediction head. Applying a multi-omics approach using RNA-seq and methylation data improves model interpretation, recovering canonical oncogenic modules, including TP53-AKT signaling and stress response pathways, while revealing convergence onto coherent programs such as ion signaling and cellular responses to stimuli. These results demonstrate that integrating interaction networks with pathway hierarchies enables accurate prediction while providing mechanistic insight into cancer biology.

ARXIV Cancer: clear cell renal cell carcinoma Method: frozen feature-probing protocol

Benchmarking Foundation Models for Renal Lesion Stratification in CT

Hartmut Häntze, Sarah de Boer, Myrthe Buser, Alessa Hering, Bram van Ginneken, Mathias Prokop, Jawed Nawabi, Sebastian Ziegelmayer, Lisa Adams, Keno Bressem
Published 2026-05-08 13:56

This study benchmarks the performance of three medical foundation models (FMs) for classifying renal lesions in CT scans, addressing the challenge of limited training data. The models were evaluated against a handcrafted radiomics classifier and a 3D ResNet-50. Results indicated that while FMs performed comparably to the ResNet model, they did not surpass the conventional radiomics approach, which achieved the highest accuracy. This highlights the limitations of current FMs in capturing the detailed characteristics necessary for effective renal lesion stratification.

Read abstract

The rapid proliferation of open-source medical foundation models (FMs) raises a practical question: how well do their pre-trained representations transfer to clinically relevant but data-scarce classification tasks? Particularly in CT-based renal lesion classification, a push toward greater generalizability would be meaningful, as the field is constrained by inherently limited training data. We addressed this through a benchmark of three medical FMs on this specific task. This six-class problem spans common entities like cysts and clear cell renal cell carcinoma, alongside rare subtypes. Using a frozen feature-probing protocol, we compared FM embeddings against a handcrafted radiomics classifier and a 3D ResNet-50 trained from scratch. Models were trained on a composite dataset of 2,854 lesions and evaluated on an external test set of 234 lesions from The Cancer Imaging Archive. Our results reveal two key findings. First, FM performance (AUC 0.70-0.77) matched the from-scratch ResNet (AUC 0.72) while drastically reducing hardware demand, requiring only seconds on a CPU after feature extraction. However, the conventional radiomics baseline significantly outperformed all deep learning approaches, achieving an AUC of 0.88 (all p $\leq$ 0.002). This suggests that current generalist FM embeddings do not yet capture the fine-grained texture and shape heterogeneity driving histological subtype discrimination. Despite their potential in data-scarce settings, medical FMs did not surpass established models for renal lesion stratification, leaving radiomics as the current state-of-the-art.

ARXIV Cancer: breast cancer Method: multimodal attention learning

Multimodal Stepwise Clinically-Guided Attention Learning for Pathological Complete Response Prediction in Breast Cancer

Alice Natalina Caragliano, Valerio Guarrasi, Michela Gravina, Carlo Sansone, Paolo Soda
Published 2026-05-08 10:34

This study presents a multimodal stepwise clinically-guided attention learning framework aimed at predicting pathological complete response (pCR) in breast cancer patients undergoing neoadjuvant therapy. The method addresses challenges such as class imbalance and limited generalizability by integrating spatial guidance and multimodal data. The framework demonstrates improved sensitivity and produces interpretable attention maps, indicating its potential for robust pCR prediction across diverse clinical settings.

Read abstract

Pathological complete response (pCR) is a key prognostic factor in breast cancer patients undergoing neoadjuvant therapy, strongly associated with long-term survival and treatment personalization. However, accurate pre-treatment pCR prediction remains challenging due to severe class imbalance and limited generalizability across diverse clinical settings. In this work, we propose a multimodal stepwise clinically-guided attention learning framework for pCR prediction from breast magnetic resonance imaging (MRI), designed to address these limitations through medically grounded spatial guidance and multimodal integration. The approach follows a stepwise training strategy inspired by physician reasoning: the model first learns global discriminative imaging patterns, then attention mechanisms are introduced to constrain the network toward tumor regions, and finally clinical variables are integrated to refine decision-making. This guidance strategy encourages prioritization of task-relevant features, improving identification of responders despite their limited representation in the dataset. Moreover, grounding attention in anatomically consistent tumor regions reduces reliance on dataset-specific patterns, thereby enhancing cross-institutional generalization. The framework is evaluated through external validation across heterogeneous MRI cohorts. Compared to non-guided single-stage baselines, the proposed approach improves sensitivity while maintaining competitive specificity, and produces anatomically coherent attention maps that support interpretation of the model's predictions. These findings highlight the potential of clinically-guided multimodal attention learning for robust and generalizable pCR prediction in breast cancer.

ARXIV Cancer: pancreatic cancer Method: TransUNet

A Unified Framework for the Detection and Classification of Fatty Pancreas in Ultrasound Images

Ioan-Tudor-Alexandru Anghel, Ciprian-Mihai Ceausescu, Elena Dana Nedelcu, Elena Raluca Stirban, Camelia Croitoru, Despina Ungureanu, Ana Maria Palan, Gabriela Pop
Published 2026-05-08 09:13

This paper presents a novel end-to-end framework for the automatic classification of normal versus fatty pancreas using abdominal ultrasound images. The proposed method utilizes a TransUNet-based segmentation architecture combined with a ResNet encoder and transformer bottleneck, followed by anatomically-guided patch extraction and patient-level classification. Validation on a clinical dataset demonstrates high accuracy and F1 scores, indicating the effectiveness of the approach in capturing clinically relevant signals.

Read abstract

Non-alcoholic fatty pancreas disease (NAFPD) is an underdiagnosed condition associated with metabolic syndrome, insulin resistance, and increased risk of pancreatic cancer. Diagnosis typically relies on subjective visual assessment of ultrasound images by clinicians. We propose an end-to-end framework for automatically classifying normal versus fatty pancreas from abdominal ultrasound images. Our method employs a TransUNet-based segmentation architecture with a ResNet encoder and transformer bottleneck to delineate the pancreas and the splenic vein, followed by anatomically-guided patch extraction and patient-level classification through pairwise texture comparison. The feature engineering mimics clinical reasoning by comparing the echogenicity of peri-venous fat to the pancreatic parenchyma, providing an interpretable signal for classification. The segmentation models are initialized via domain-specific transfer learning from a liver segmentation task. We validate the full pipeline on a clinical dataset of 214 abdominal ultrasound images with 107 expert-labeled cases using 5-fold cross-validation. SVM with RBF kernel achieves a mean cross-validated accuracy of 89.7\%\,$\pm$\,1.8\% and F1 of 0.898\,$\pm$\,0.019, while the unsupervised K-Means baseline reaches 87.8\% accuracy, demonstrating that the proposed features capture the relevant clinical signal even without labeled training data. To our knowledge, this is the first end-to-end automated framework for fatty pancreas classification from ultrasound using segmentation-guided texture analysis.

ARXIV Cancer: general cancer Method: multi-view readout

DINO-MVR: Multi-View Readout of Frozen DINOv3 for Annotation-Efficient Medical Segmentation

Wei Jiang, Feng Liu, Nan Ye, Hongfu Sun
Published 2026-05-08 04:13

The paper presents DINO-MVR, a Multi-View Readout framework designed for annotation-efficient medical segmentation using frozen DINOv3 features. It demonstrates that these features contain valuable structural and boundary information, allowing for effective segmentation without the need for extensive annotations. The method achieves high performance on various medical imaging benchmarks, indicating its potential for improving segmentation tasks in medical applications.

Read abstract

Adapting foundation models to medical segmentation typically requires either backbone fine-tuning or high-capacity task-specific decoders, both of which are difficult to fit reliably when annotations are scarce. We show that frozen DINOv3 features already contain useful structural and boundary cues for medical segmentation, and that the main bottleneck lies in how these features are read out. We propose DINO-MVR, a Multi-View Readout framework for annotation-efficient medical segmentation. DINO-MVR trains only lightweight MLP probes on features from the final three transformer blocks of a frozen DINOv3 backbone, without updating the backbone itself. At inference, each input is interpreted through complementary resolutions and test-time augmentations, whose probability maps are combined by entropy-weighted fusion and refined with simple spatial regularization. For volumetric inputs, Gaussian z-axis smoothing further improves inter-slice consistency. Under fixed evaluation protocols on endoscopy, dermoscopy, and MRI benchmarks, DINO-MVR achieves strong readout-only performance, including 0.895 Dice on Kvasir-SEG, 0.897 Dice on ISIC 2018, and 0.908 Dice on BraTS FLAIR whole-tumor segmentation. With only five annotated BraTS patients, it recovers 98.4% of the performance obtained by the 40-patient BraTS reference run. These results suggest that frozen self-supervised vision backbones can support accurate medical segmentation when paired with an effective multi-view readout.

ARXIV Cancer: glioma Method: graph neural network

Hierarchical Perfusion Graphs for Tumor Heterogeneity Modeling in Glioma Molecular Subtyping

Han Jang, Junhyeok Lee, Heeseong Eum, Joon Jang, Yoseob Han, Seung Hong Choi, Kyu Sung Choi
Published 2026-05-08 02:42

This study presents HiPerfGNN, a novel framework for non-invasive glioma molecular subtyping using dynamic susceptibility contrast MRI. The method learns discrete hemodynamic representations from time-intensity curves and employs a hierarchical graph neural network for molecular prediction. The model demonstrated high accuracy in predicting IDH mutation, 1p/19q codeletion, and WHO grade in both internal and external cohorts.

Read abstract

Precise molecular subtyping of gliomas, including isocitrate dehydrogenase (IDH) mutation and 1p/19q codeletion, directly guides surgical and therapeutic decisions, yet currently relies on invasive tissue sampling. Deep learning on structural MRI has emerged as a non-invasive alternative, but anatomy-only approaches cannot capture the hemodynamic signatures that distinguish molecular subtypes. Radiogenomics based on dynamic susceptibility contrast (DSC) MRI holds immense potential for non-invasively characterizing glioma molecular subtypes, yet clinical deployment has been hindered by inter-site variability and the limitations of voxel-wise analysis. We introduce HiPerfGNN, a framework that first learns discrete hemodynamic representations from raw time-intensity curves using a vector-quantized variational autoencoder (VQ-VAE). These quantized perfusion codes define coarse-level graph nodes representing functional tumor habitats, each of which is hierarchically subdivided into fine-level subregions guided by structural MRI. A hierarchical graph neural network then propagates information across scales for molecular prediction. On an internal cohort (n=475), the model achieved AUCs of 0.96 (IDH), 0.89 (1p/19q), and 0.84 (WHO grade), and maintained robust IDH performance (AUC 0.89) on an independent external cohort (n=397) without recalibration. Gradient-based saliency analysis confirms biologically grounded attention patterns aligned with known glioma pathophysiology. Our results demonstrate the added value of integrating perfusion dynamics into radiogenomic pipelines for glioma molecular subtyping. Code is available at https://github.com/janghana/HiPerfGNN.

ARXIV Cancer: unknown Method: 3D CNN

AGA3DNet: Anatomy-Guided Gaussian Priors with Multi-view xLSTM for 3D Brain MRI Subtype Classification

Peiyu Duan, Xueqi Guo, Sepehr Farhand, Mehmet Berk Sahin, Xinyuan Zheng, James S. Duncan, Gerardo Hermosillo Valadez, Yoshihisa Shinagawa
Published 2026-05-08 02:21

The study introduces AGA3DNet, a framework designed for accurate 3D brain MRI subtype classification by integrating anatomical cues from radiology reports with a lightweight 3D CNN and multi-view xLSTM. The method utilizes anatomical phrases as soft priors to enhance classification performance without the need for dense voxel annotations. Evaluation on a brain MRI cohort demonstrates improved performance metrics and interpretable localization capabilities.

Read abstract

Accurate 3D brain MRI subtype classification benefits from both localized anatomical cues and long-range contextual reasoning. We present AGA3DNet, a report-grounded framework that incorporates brief anatomical phrases extracted from radiology reports as a soft anatomical prior channel and fuses it with a lightweight 3D CNN and multi-view xLSTM aggregation. Specifically, extracted anatomical phrases are mapped to atlas-defined regions and converted into smooth spatial priors using a signed-distance transform followed by Gaussian weighting, providing interpretable, anatomy-grounded guidance without requiring dense voxel annotations. We evaluate AGA3DNet on a retrospective institutional brain MRI cohort for abnormal subtype discrimination and compare against reproducible 3D classification baselines. AGA3DNet achieves improved overall balance across performance metrics and supports clinically interpretable localization through the prior channel. We discuss limitations related to single-cohort evaluation and the lack of large-scale public brain MRI datasets paired with radiology reports under broadly usable terms.

ARXIV Cancer: renal tumor Method: federated learning

Overcoming data scarcity through multi-center federated learning for organs-at-risk segmentation in pediatric upper abdominal radiotherapy

Mianyong Ding, Maximilian Knoll, Semi Harrabi, Martine van Grotel, Annemieke S. Littooij, Max van Noesel, Jens-Peter Schenk, Marry M. van den Heuvel-Eibrink, Geert O. Janssens, Matteo Maspero
Published 2026-05-07 18:21

This study investigates the use of federated learning (FL) to develop pediatric-specific auto-contouring models for organs-at-risk (OARs) in radiotherapy, addressing the challenges posed by data scarcity. The research involved collecting and processing CT images from pediatric patients with renal tumors and abdominal neuroblastomas across two medical centers. The results indicate that the FL model outperformed local models in cross-center performance, demonstrating improved robustness and accuracy in OAR segmentation.

Read abstract

Deep learning-based organs/structures-at-risk(OARs) auto-contouring models can improve radiotherapy workflows, but models trained on adult data often underperform in pediatric patients. Developing robust pediatric-specific models is hindered by data scarcity and fragmentation across centers. Federated learning (FL) enables privacy-preserving collaborative training without the need for data sharing. We evaluated the feasibility and performance of FL for developing pediatric-specific OAR segmentation models across two European medical centers. Computed tomography (CT) images from pediatric patients from Utrecht and Heidelberg with a renal tumor or abdominal neuroblastoma were retrospectively collected and locally processed. An nnU-Net-based framework segmented 19 OARs using local and FL schemes. FL was implemented with secure weight exchange on a cloud storage across institutional firewalls. Performance was assessed using the Dice similarity coefficient (DSC), 95th percentile Hausdorff distance, and mean surface distance. Robustness to patient orientation, false-positive segmentation of surgically removed kidneys, and failure cases were identified. A total of 310 postoperative CTs from 272 patients (105 renal tumors, 167 neuroblastomas) were included. Local models performed well on their respective center data but showed significantly reduced cross-center performance for four to seven of the nine evaluated OARs (DSC). In contrast, the FL model matched local performance for at least seven of nine OARs and achieved the best cross-center results across three metrics, with DSC gains of 0.003-0.007 over local models. FL also maintained stable performance across patient orientations and reduced false-positive kidney segmentations. Real-world FL improves cross-center robustness of CT-based OAR segmentation models in pediatric upper abdominal tumors.

ARXIV Cancer: brain tumor Method: multimodal learning

Bridging visual saliency and large language models for explainable deep learning in medical imaging

Paul Valery Nguezet, Elie Tagne Fute, Yusuf Brima, Benoit Martin Azanguezet, Marcellin Atemkeng
Published 2026-05-07 13:08

This paper introduces a multimodal explainability framework aimed at enhancing the interpretability of deep learning models in brain tumor classification. It combines convolutional neural networks with large language models to generate human-readable diagnostic narratives from medical imaging data. The framework employs visual saliency methods to create class-discriminative heatmaps and translates these into neuroanatomical structures, ultimately producing coherent diagnostic reports. The approach demonstrates improved transparency and clinical accountability in AI-assisted brain tumor diagnosis.

Read abstract

The opaque nature of deep learning models remains a significant barrier to their clinical adoption in medical imaging. This paper presents a multimodal explainability framework that bridges the gap between convolutional neural network (CNN) predictions and clinically actionable insights for brain tumor classification, leveraging large language models (LLMs) to deliver human-interpretable diagnostic narratives. The proposed framework operates through three coupled stages. First, nine CNN architectures are extended with a dual-output hybrid formulation that simultaneously optimises a classification head and a segmentation head, enabling spatially richer feature learning. Second, visual saliency attribution methods, namely Grad-CAM, Grad-CAM++, and ScoreCAM, are applied to generate class-discriminative heatmaps, which are subsequently refined into binary tumor masks via an adaptive percentile thresholding pipeline. Third, the resulting masks are mapped onto the Harvard-Oxford cortical atlas to translate pixel-level evidence into named neuroanatomical structures, and the extracted findings are encoded into a structured JSON file that conditions three LLMs (Grok3, Mistral, and LLaMA) to generate coherent, radiological-style diagnostic reports. Evaluated on a dataset of 4,834 contrast-enhanced T1-weighted brain MRI images spanning three tumor classes, InceptionResNetV2 achieved the highest classification performance and Grad-CAM++ yielded the best segmentation overlap. Among the language models, Grok3 led in lexical diversity and coherence, while LLaMA achieved the highest readability score. By integrating visual, anatomical, and linguistic modalities into a unified pipeline, the framework produces explanations that are technically grounded and meaningfully interpretable, advancing the transparency and clinical accountability of artificial intelligence assisted brain tumor diagnosis.

Find the papers that actually matter