Research Papers

ARXIV Cancer: brain tumor Method: Transformer-Graph Neural Network

Federated Transformer-GNN for Privacy-Preserving Brain Tumor Localization with Modality-Level Explainability

Andrea Protani, Riccardo Taiello, Marc Molina Van Den Bosch, Luigi Serio
Published 2026-01-21 14:46

This study presents a federated learning framework for brain tumor localization that allows for collaboration across healthcare institutions without compromising patient privacy. The proposed method utilizes a hybrid Transformer-Graph Neural Network architecture and is implemented on CERN's federated learning platform. Results indicate that federated learning enhances model performance by leveraging distributed data, achieving results comparable to centralized training. Additionally, the study includes an explainability analysis that highlights the importance of specific MRI modalities in model predictions.

Read abstract

Deep learning models for brain tumor analysis require large and diverse datasets that are often siloed across healthcare institutions due to privacy regulations. We present a federated learning framework for brain tumor localization that enables multi-institutional collaboration without sharing sensitive patient data. Our method extends a hybrid Transformer-Graph Neural Network architecture derived from prior decoder-free supervoxel GNNs and is deployed within CAFEIN\textsuperscript{\textregistered}, CERN's federated learning platform designed for healthcare environments. We provide an explainability analysis through Transformer attention mechanisms that reveals which MRI modalities drive the model predictions. Experiments on the BraTS dataset demonstrate a key finding: while isolated training on individual client data triggers early stopping well before reaching full training capacity, federated learning enables continued model improvement by leveraging distributed data, ultimately matching centralized performance. This result provides strong justification for federated learning when dealing with complex tasks and high-dimensional input data, as aggregating knowledge from multiple institutions significantly benefits the learning process. Our explainability analysis, validated through rigorous statistical testing on the full test set (paired t-tests with Bonferroni correction), reveals that deeper network layers significantly increase attention to T2 and FLAIR modalities ($p<0.001$, Cohen's $d$=1.50), aligning with clinical practice.

ARXIV Cancer: melanoma Method: multi-modal neural network

Multimodal system for skin cancer detection

Volodymyr Sydorskyi, Igor Krashenyi, Oleksii Yakubenko
Published 2026-01-21 09:50

This study presents a multimodal system for melanoma detection that utilizes conventional photo images alongside tabular metadata, such as patient demographics and lesion characteristics. The proposed system employs a multi-modal neural network and a three-stage pipeline to enhance detection accuracy and address challenges associated with imbalanced datasets. Results indicate significant performance improvements, making the system more accessible for clinical use.

Read abstract

Melanoma detection is vital for early diagnosis and effective treatment. While deep learning models on dermoscopic images have shown promise, they require specialized equipment, limiting their use in broader clinical settings. This study introduces a multi-modal melanoma detection system using conventional photo images, making it more accessible and versatile. Our system integrates image data with tabular metadata, such as patient demographics and lesion characteristics, to improve detection accuracy. It employs a multi-modal neural network combining image and metadata processing and supports a two-step model for cases with or without metadata. A three-stage pipeline further refines predictions by boosting algorithms and enhancing performance. To address the challenges of a highly imbalanced dataset, specific techniques were implemented to ensure robust training. An ablation study evaluated recent vision architectures, boosting algorithms, and loss functions, achieving a peak Partial ROC AUC of 0.18068 (0.2 maximum) and top-15 retrieval sensitivity of 0.78371. Results demonstrate that integrating photo images with metadata in a structured, multi-stage pipeline yields significant performance improvements. This system advances melanoma detection by providing a scalable, equipment-independent solution suitable for diverse healthcare environments, bridging the gap between specialized and general clinical practices.

ARXIV Cancer: unknown Method: multimodal learning

ReinPath: A Multimodal Reinforcement Learning Approach for Pathology

Kangcheng Zhou, Jun Jiang, Qing Zhang, Shuang Zheng, Qingli Li, Shugong Xu
Published 2026-01-21 08:21

This paper presents ReinPath, a novel multimodal pathology large language model aimed at enhancing interpretability in computational pathology. The model integrates histopathological images and text data, utilizing a semantic reward strategy to improve the generation of accurate textual descriptions. Experimental results indicate that ReinPath outperforms existing methods, even with limited training data, and shows competitive performance in zero-shot image classification tasks.

Read abstract

Interpretability is significant in computational pathology, leading to the development of multimodal information integration from histopathological image and corresponding text data.However, existing multimodal methods have limited interpretability due to the lack of high-quality dataset that support explicit reasoning and inference and simple reasoning process.To address the above problems, we introduce a novel multimodal pathology large language model with strong reasoning capabilities.To improve the generation of accurate and contextually relevant textual descriptions, we design a semantic reward strategy integrated with group relative policy optimization.We construct a high-quality pathology visual question answering (VQA) dataset, specifically designed to support complex reasoning tasks.Comprehensive experiments conducted on this dataset demonstrate that our method outperforms state-of-the-art methods, even when trained with only 20% of the data.Our method also achieves comparable performance on downstream zero-shot image classification task compared with CLIP.

ARXIV Cancer: adenocarcinoma Method: domain adversarial neural network

Transfer Learning from One Cancer to Another via Deep Learning Domain Adaptation

Justin Cheung, Samuel Savine, Calvin Nguyen, Lin Lu, Alhassan S. Yasin
Published 2026-01-21 05:50

This study investigates the use of domain adaptation in deep learning to improve the classification of cancer types not represented in the training data. The authors demonstrate that a domain adversarial neural network (DANN) can effectively transfer knowledge from labeled adenocarcinoma data to unlabeled data from different cancer types, achieving significant accuracy improvements. The research highlights the importance of stain normalization and its varying effects on classification performance across different cancer types.

Read abstract

Supervised deep learning models often achieve excellent performance within their training distribution but struggle to generalize beyond it. In cancer histopathology, for example, a convolutional neural network (CNN) may classify cancer severity accurately for cancer types represented in its training data, yet fail on related but unseen types. Although adenocarcinomas from different organs share morphological features that might support limited cross-domain generalization, addressing domain shift directly is necessary for robust performance. Domain adaptation offers a way to transfer knowledge from labeled data in one cancer type to unlabeled data in another, helping mitigate the scarcity of annotated medical images. This work evaluates cross-domain classification performance among lung, colon, breast, and kidney adenocarcinomas. A ResNet50 trained on any single adenocarcinoma achieves over 98% accuracy on its own domain but shows minimal generalization to others. Ensembling multiple supervised models does not resolve this limitation. In contrast, converting the ResNet50 into a domain adversarial neural network (DANN) substantially improves performance on unlabeled target domains. A DANN trained on labeled breast and colon data and adapted to unlabeled lung data reaches 95.56% accuracy. We also examine the impact of stain normalization on domain adaptation. Its effects vary by target domain: for lung, accuracy drops from 95.56% to 66.60%, while for breast and colon targets, stain normalization boosts accuracy from 49.22% to 81.29% and from 78.48% to 83.36%, respectively. Finally, using Integrated Gradients reveals that DANNs consistently attribute importance to biologically meaningful regions such as densely packed nuclei, indicating that the model learns clinically relevant features and can apply them to unlabeled cancer types.

ARXIV Cancer: lung cancer Method: deep learning

LLM Augmented Intervenable Multimodal Adaptor for Post-operative Complication Prediction in Lung Cancer Surgery

Shubham Pandey, Bhavin Jawade, Srirangaraj Setlur, Venu Govindaraju, Kenneth Seastedt
Published 2026-01-20 16:58

This paper presents MIRACLE, a deep learning architecture designed to predict postoperative complications in lung cancer surgery by integrating preoperative clinical and radiological data. The method utilizes a hyperspherical embedding space to fuse heterogeneous inputs, allowing for the extraction of robust features. The approach enhances prediction transparency and clinical utility through an interventional deep learning module, enabling domain experts to refine recommendations. Validation on a dataset of 3,094 lung cancer patients shows that MIRACLE outperforms traditional machine learning models and contemporary large language models.

Read abstract

Postoperative complications remain a critical concern in clinical practice, adversely affecting patient outcomes and contributing to rising healthcare costs. We present MIRACLE, a deep learning architecture for prediction of risk of postoperative complications in lung cancer surgery by integrating preoperative clinical and radiological data. MIRACLE employs a hyperspherical embedding space fusion of heterogeneous inputs, enabling the extraction of robust, discriminative features from both structured clinical records and high-dimensional radiological images. To enhance transparency of prediction and clinical utility, we incorporate an interventional deep learning module in MIRACLE, that not only refines predictions but also provides interpretable and actionable insights, allowing domain experts to interactively adjust recommendations based on clinical expertise. We validate our approach on POC-L, a real-world dataset comprising 3,094 lung cancer patients who underwent surgery at Roswell Park Comprehensive Cancer Center. Our results demonstrate that MIRACLE outperforms various traditional machine learning models and contemporary large language models (LLM) variants alone, for personalized and explainable postoperative risk management.

ARXIV Cancer: brain tumor Method: Graph Attention Network

Decoder-Free Supervoxel GNN for Accurate Brain-Tumor Localization in Multi-Modal MRI

Andrea Protani, Marc Molina Van Den Bosch, Lorenzo Giusti, Heloisa Barbosa Da Silva, Paolo Cacace, Albert Sund Aillet, Miguel Angel Gonzalez Ballester, Friedhelm Hummel, Luigi Serio
Published 2026-01-20 15:13

This study presents SVGFormer, a decoder-free pipeline designed for accurate brain-tumor localization in multi-modal MRI. The method utilizes a hierarchical encoder that combines a patch-level Transformer with a supervoxel-level Graph Attention Network to enhance feature learning. The framework was validated on the BraTS dataset, achieving strong performance in both node-level classification and tumor proportion regression, indicating its effectiveness in learning discriminative features.

Read abstract

Modern vision backbones for 3D medical imaging typically process dense voxel grids through parameter-heavy encoder-decoder structures, a design that allocates a significant portion of its parameters to spatial reconstruction rather than feature learning. Our approach introduces SVGFormer, a decoder-free pipeline built upon a content-aware grouping stage that partitions the volume into a semantic graph of supervoxels. Its hierarchical encoder learns rich node representations by combining a patch-level Transformer with a supervoxel-level Graph Attention Network, jointly modeling fine-grained intra-region features and broader inter-regional dependencies. This design concentrates all learnable capacity on feature encoding and provides inherent, dual-scale explainability from the patch to the region level. To validate the framework's flexibility, we trained two specialized models on the BraTS dataset: one for node-level classification and one for tumor proportion regression. Both models achieved strong performance, with the classification model achieving a F1-score of 0.875 and the regression model a MAE of 0.028, confirming the encoder's ability to learn discriminative and localized features. Our results establish that a graph-based, encoder-only paradigm offers an accurate and inherently interpretable alternative for 3D medical image representation.

ARXIV Cancer: brain tumors Method: partial decoder attention network

Partial Decoder Attention Network with Contour-weighted Loss Function for Data-Imbalance Medical Image Segmentation

Zhengyong Huang, Ning Jiang, Xingwen Sun, Lihua Zhang, Peng Chen, Jens Domke, Yao Sui
Published 2026-01-20 13:21

This paper presents a novel contour-weighted segmentation approach using a Partial Decoder Attention Network (PDANet) to address data imbalance in medical image segmentation. The method enhances the model's ability to accurately represent small and underrepresented structures in medical images. Experimental results demonstrate that PDANet outperformed nine state-of-the-art methods across multiple tasks, including segmenting abdominal organs, brain tumors, and pelvic bone fragments, with significant improvements in segmentation accuracy.

Read abstract

Image segmentation is pivotal in medical image analysis, facilitating clinical diagnosis, treatment planning, and disease evaluation. Deep learning has significantly advanced automatic segmentation methodologies by providing superior modeling capability for complex structures and fine-grained anatomical regions. However, medical images often suffer from data imbalance issues, such as large volume disparities among organs or tissues, and uneven sample distributions across different anatomical structures. This imbalance tends to bias the model toward larger organs or more frequently represented structures, while overlooking smaller or less represented structures, thereby affecting the segmentation accuracy and robustness. To address these challenges, we proposed a novel contour-weighted segmentation approach, which improves the model's capability to represent small and underrepresented structures. We developed PDANet, a lightweight and efficient segmentation network based on a partial decoder mechanism. We evaluated our method using three prominent public datasets. The experimental results show that our methodology excelled in three distinct tasks: segmenting multiple abdominal organs, brain tumors, and pelvic bone fragments with injuries. It consistently outperformed nine state-of-the-art methods. Moreover, the proposed contour-weighted strategy improved segmentation for other comparison methods across the three datasets, yielding average enhancements in Dice scores of 2.32%, 1.67%, and 3.60%, respectively. These results demonstrate that our contour-weighted segmentation method surpassed current leading approaches in both accuracy and robustness. As a model-independent strategy, it can seamlessly fit various segmentation frameworks, enhancing their performance. This flexibility highlighted its practical importance and potential for broad use in medical image analysis.

ARXIV Cancer: breast cancer Method: cross-attention-based multimodal model

MultiST: A Cross-Attention-Based Multimodal Model for Spatial Transcriptomic

Wei Wang, Quoc-Toan Ly, Chong Yu, Jun Bai
Published 2026-01-19 19:11

The paper presents MultiST, a cross-attention-based multimodal model designed to enhance spatial transcriptomics by integrating histological morphology with molecular profiles. This framework utilizes graph-based gene encoders and adversarial alignment to improve spatial representations and refine domain boundaries. Evaluation on 13 diverse datasets, including those from human brain cortex and breast cancer tissue, demonstrates that MultiST achieves clearer spatial domains and more interpretable cell-cell interaction patterns compared to existing methods.

Read abstract

Spatial transcriptomics (ST) enables transcriptome-wide profiling while preserving the spatial context of tissues, offering unprecedented opportunities to study tissue organization and cell-cell interactions in situ. Despite recent advances, existing methods often lack effective integration of histological morphology with molecular profiles, relying on shallow fusion strategies or omitting tissue images altogether, which limits their ability to resolve ambiguous spatial domain boundaries. To address this challenge, we propose MultiST, a unified multimodal framework that jointly models spatial topology, gene expression, and tissue morphology through cross-attention-based fusion. MultiST employs graph-based gene encoders with adversarial alignment to learn robust spatial representations, while integrating color-normalized histological features to capture molecular-morphological dependencies and refine domain boundaries. We evaluated the proposed method on 13 diverse ST datasets spanning two organs, including human brain cortex and breast cancer tissue. MultiST yields spatial domains with clearer and more coherent boundaries than existing methods, leading to more stable pseudotime trajectories and more biologically interpretable cell-cell interaction patterns. The MultiST framework and source code are available at https://github.com/LabJunBMI/MultiST.git.

ARXIV Cancer: unknown Method: U-Net CNN

From 100,000+ images to winning the first brain MRI foundation model challenges: Sharing lessons and models

Pedro M. Gordaliza, Jaume Banus, Benoît Gérin, Maxence Wynen, Nataliia Molchanova, Jonas Richiardi, Meritxell Bach Cuadra
Published 2026-01-19 15:43

This paper discusses the development of Foundation Models for medical image analysis, specifically targeting the challenges associated with 3D brain MRI. The authors' solution, which employs a U-Net CNN architecture, achieved first place in two contests at MICCAI 2025. The models demonstrated significant efficiency, training faster and being smaller than competing transformer-based methods.

Read abstract

Developing Foundation Models for medical image analysis is essential to overcome the unique challenges of radiological tasks. The first challenges of this kind for 3D brain MRI, SSL3D and FOMO25, were held at MICCAI 2025. Our solution ranked first in tracks of both contests. It relies on a U-Net CNN architecture combined with strategies leveraging anatomical priors and neuroimaging domain knowledge. Notably, our models trained 1-2 orders of magnitude faster and were 10 times smaller than competing transformer-based approaches. Models are available here: https://github.com/jbanusco/BrainFM4Challenges.

ARXIV Cancer: lung cancer Method: Gradient-weighted Class Activation Mapping

Seeing Isn't Always Believing: Analysis of Grad-CAM Faithfulness and Localization Reliability in Lung Cancer CT Classification

Teerapong Panboonyuen
Published 2026-01-19 08:35

This study investigates the faithfulness and reliability of Grad-CAM, an explainable AI technique, in the context of lung cancer image classification. By evaluating various deep learning architectures, the research introduces a quantitative framework to assess Grad-CAM's interpretability. The findings indicate that while Grad-CAM highlights tumor regions effectively in convolutional networks, its reliability diminishes in Vision Transformer models, raising concerns about the trustworthiness of such explanations in medical AI.

Read abstract

Explainable Artificial Intelligence (XAI) techniques, such as Gradient-weighted Class Activation Mapping (Grad-CAM), have become indispensable for visualizing the reasoning process of deep neural networks in medical image analysis. Despite their popularity, the faithfulness and reliability of these heatmap-based explanations remain under scrutiny. This study critically investigates whether Grad-CAM truly represents the internal decision-making of deep models trained for lung cancer image classification. Using the publicly available IQ-OTH/NCCD dataset, we evaluate five representative architectures: ResNet-50, ResNet-101, DenseNet-161, EfficientNet-B0, and ViT-Base-Patch16-224, to explore model-dependent variations in Grad-CAM interpretability. We introduce a quantitative evaluation framework that combines localization accuracy, perturbation-based faithfulness, and explanation consistency to assess Grad-CAM reliability across architectures. Experimental findings reveal that while Grad-CAM effectively highlights salient tumor regions in most convolutional networks, its interpretive fidelity significantly degrades for Vision Transformer models due to non-local attention behavior. Furthermore, cross-model comparisons indicate substantial variability in saliency localization, implying that Grad-CAM explanations may not always correspond to the true diagnostic evidence used by the networks. This work exposes critical limitations of current saliency-based XAI approaches in medical imaging and emphasizes the need for model-aware interpretability methods that are both computationally sound and clinically meaningful. Our findings aim to inspire a more cautious and rigorous adoption of visual explanation tools in medical AI, urging the community to rethink what it truly means to "trust" a model's explanation.

Find the papers that actually matter