Research Papers

ARXIV Cancer: colorectal cancer Method: graph neural network

INSIGHT: Spatially resolved survival modelling from routine histology crosslinked with molecular profiling reveals prognostic epithelial-immune axes in stage II/III colorectal cancer

Piotr Keller, Mark Eastwood, Zedong Hu, Aimée Selten, Ruqayya Awan, Gertjan Rasschaert, Sara Verbandt, Vlad Popovici, Hubert Piessevaux, Hayley T Morris, Petros Tsantoulis, Thomas Alexander McKee, André D'Hoore, Cédric Schraepen, Xavier Sagaert, Gert De Hertogh, Sabine Tejpar, Fayyaz Minhas
Published 2025-12-24 14:36

The study introduces INSIGHT, a graph neural network designed to predict survival outcomes from routine histology images in stage II/III colorectal cancer. It was trained on datasets from TCGA and SURGEN, demonstrating superior prognostic performance compared to traditional pTNM staging. The model generates spatially resolved risk scores and identifies key histopathological features associated with patient outcomes.

Read abstract

Routine histology contains rich prognostic information in stage II/III colorectal cancer, much of which is embedded in complex spatial tissue organisation. We present INSIGHT, a graph neural network that predicts survival directly from routine histology images. Trained and cross-validated on TCGA (n=342) and SURGEN (n=336), INSIGHT produces patient-level spatially resolved risk scores. Large independent validation showed superior prognostic performance compared with pTNM staging (C-index 0.68-0.69 vs 0.44-0.58). INSIGHT spatial risk maps recapitulated canonical prognostic histopathology and identified nuclear solidity and circularity as quantitative risk correlates. Integrating spatial risk with data-driven spatial transcriptomic signatures, spatial proteomics, bulk RNA-seq, and single-cell references revealed an epithelium-immune risk manifold capturing epithelial dedifferentiation and fetal programs, myeloid-driven stromal states including $\mathrm{SPP1}^{+}$ macrophages and $\mathrm{LAMP3}^{+}$ dendritic cells, and adaptive immune dysfunction. This analysis exposed patient-specific epithelial heterogeneity, stratification within MSI-High tumours, and high-risk routes of CDX2/HNF4A loss and CEACAM5/6-associated proliferative programs, highlighting coordinated therapeutic vulnerabilities.

ARXIV Cancer: unknown Method: dual-stream vision transformer

A Graph-Augmented knowledge Distillation based Dual-Stream Vision Transformer with Region-Aware Attention for Gastrointestinal Disease Classification with Explainable AI

Md Assaduzzaman, Nushrat Jahan Oyshi, Eram Mahamud
Published 2025-12-24 07:51

This study introduces a hybrid dual-stream deep learning framework for the classification of gastrointestinal diseases using endoscopic and histopathological imagery. The framework employs a teacher-student knowledge distillation approach, integrating a Swin Transformer and a compact Tiny-ViT model to enhance diagnostic accuracy while maintaining efficiency. The proposed method achieved high accuracy and interpretability, demonstrating its potential for practical clinical applications in gastrointestinal diagnostics.

Read abstract

The accurate classification of gastrointestinal diseases from endoscopic and histopathological imagery remains a significant challenge in medical diagnostics, mainly due to the vast data volume and subtle variation in inter-class visuals. This study presents a hybrid dual-stream deep learning framework built on teacher-student knowledge distillation, where a high-capacity teacher model integrates the global contextual reasoning of a Swin Transformer with the local fine-grained feature extraction of a Vision Transformer. The student network was implemented as a compact Tiny-ViT structure that inherits the teacher's semantic and morphological knowledge via soft-label distillation, achieving a balance between efficiency and diagnostic accuracy. Two carefully curated Wireless Capsule Endoscopy datasets, encompassing major GI disease classes, were employed to ensure balanced representation and prevent inter-sample bias. The proposed framework achieved remarkable performance with accuracies of 0.9978 and 0.9928 on Dataset 1 and Dataset 2 respectively, and an average AUC of 1.0000, signifying near-perfect discriminative capability. Interpretability analyses using Grad-CAM, LIME, and Score-CAM confirmed that the model's predictions were grounded in clinically significant tissue regions and pathologically relevant morphological cues, validating the framework's transparency and reliability. The Tiny-ViT demonstrated diagnostic performance with reduced computational complexity comparable to its transformer-based teacher while delivering faster inference, making it suitable for resource-constrained clinical environments. Overall, the proposed framework provides a robust, interpretable, and scalable solution for AI-assisted GI disease diagnosis, paving the way toward future intelligent endoscopic screening that is compatible with clinical practicality.

ARXIV Cancer: lung cancer Method: Dual-Graph Spatiotemporal Attention Network

DGSAN: Dual-Graph Spatiotemporal Attention Network for Pulmonary Nodule Malignancy Prediction

Xiao Yu, Zhaojie Fang, Guanyu Zhou, Yin Shen, Huoling Luo, Ye Li, Ahmed Elazab, Xiang Wan, Ruiquan Ge, Changmiao Wang
Published 2025-12-24 02:47

This study presents the Dual-Graph Spatiotemporal Attention Network (DGSAN) aimed at improving the prediction of pulmonary nodule malignancy in lung cancer. The method integrates temporal variations and multimodal data through a Global-Local Feature Encoder and a Hierarchical Cross-Modal Graph Fusion Module. Experimental results indicate that DGSAN significantly outperforms existing state-of-the-art methods in terms of classification accuracy and computational efficiency.

Read abstract

Lung cancer continues to be the leading cause of cancer-related deaths globally. Early detection and diagnosis of pulmonary nodules are essential for improving patient survival rates. Although previous research has integrated multimodal and multi-temporal information, outperforming single modality and single time point, the fusion methods are limited to inefficient vector concatenation and simple mutual attention, highlighting the need for more effective multimodal information fusion. To address these challenges, we introduce a Dual-Graph Spatiotemporal Attention Network, which leverages temporal variations and multimodal data to enhance the accuracy of predictions. Our methodology involves developing a Global-Local Feature Encoder to better capture the local, global, and fused characteristics of pulmonary nodules. Additionally, a Dual-Graph Construction method organizes multimodal features into inter-modal and intra-modal graphs. Furthermore, a Hierarchical Cross-Modal Graph Fusion Module is introduced to refine feature integration. We also compiled a novel multimodal dataset named the NLST-cmst dataset as a comprehensive source of support for related research. Our extensive experiments, conducted on both the NLST-cmst and curated CSTL-derived datasets, demonstrate that our DGSAN significantly outperforms state-of-the-art methods in classifying pulmonary nodules with exceptional computational efficiency.

ARXIV Cancer: breast cancer Method: multimodal mixed-supervision

NULLBUS: Multimodal Mixed-Supervision for Breast Ultrasound Segmentation via Nullable Global-Local Prompts

Raja Mallina, Bryar Shareef
Published 2025-12-23 21:30

This paper presents NullBUS, a multimodal mixed-supervision framework designed for breast ultrasound segmentation. The method utilizes nullable prompts to effectively learn from images with and without accompanying metadata, enhancing robustness in scenarios where data is incomplete. Evaluations on three public breast ultrasound datasets show that NullBUS achieves state-of-the-art performance, with a mean IoU of 0.8568 and a mean Dice of 0.9103.

Read abstract

Breast ultrasound (BUS) segmentation provides lesion boundaries essential for computer-aided diagnosis and treatment planning. While promptable methods can improve segmentation performance and tumor delineation when text or spatial prompts are available, many public BUS datasets lack reliable metadata or reports, constraining training to small multimodal subsets and reducing robustness. We propose NullBUS, a multimodal mixed-supervision framework that learns from images with and without prompts in a single model. To handle missing text, we introduce nullable prompts, implemented as learnable null embeddings with presence masks, enabling fallback to image-only evidence when metadata are absent and the use of text when present. Evaluated on a unified pool of three public BUS datasets, NullBUS achieves a mean IoU of 0.8568 and a mean Dice of 0.9103, demonstrating state-of-the-art performance under mixed prompt availability.

ARXIV Cancer: lung cancer Method: deep learning

Fairness Evaluation of Risk Estimation Models for Lung Cancer Screening

Shaurya Gaur, Michel Vitale, Alessa Hering, Johan Kwisthout, Colin Jacobs, Lena Philipp, Fennie van der Graaf
Published 2025-12-23 19:57

This study evaluates the fairness of two deep learning models for lung cancer risk estimation in screening high-risk individuals using low-dose CT scans. The models were assessed for performance disparities across demographic groups, revealing significant differences in AUROC and sensitivity between genders and racial groups. The findings underscore the need for improved monitoring of model performance to address potential biases in lung cancer screening.

Read abstract

Lung cancer is the leading cause of cancer-related mortality in adults worldwide. Screening high-risk individuals with annual low-dose CT (LDCT) can support earlier detection and reduce deaths, but widespread implementation may strain the already limited radiology workforce. AI models have shown potential in estimating lung cancer risk from LDCT scans. However, high-risk populations for lung cancer are diverse, and these models' performance across demographic groups remains an open question. In this study, we drew on the considerations on confounding factors and ethically significant biases outlined in the JustEFAB framework to evaluate potential performance disparities and fairness in two deep learning risk estimation models for lung cancer screening: the Sybil lung cancer risk model and the Venkadesh21 nodule risk estimator. We also examined disparities in the PanCan2b logistic regression model recommended in the British Thoracic Society nodule management guideline. Both deep learning models were trained on data from the US-based National Lung Screening Trial (NLST), and assessed on a held-out NLST validation set. We evaluated AUROC, sensitivity, and specificity across demographic subgroups, and explored potential confounding from clinical risk factors. We observed a statistically significant AUROC difference in Sybil's performance between women (0.88, 95% CI: 0.86, 0.90) and men (0.81, 95% CI: 0.78, 0.84, p < .001). At 90% specificity, Venkadesh21 showed lower sensitivity for Black (0.39, 95% CI: 0.23, 0.59) than White participants (0.69, 95% CI: 0.65, 0.73). These differences were not explained by available clinical confounders and thus may be classified as unfair biases according to JustEFAB. Our findings highlight the importance of improving and monitoring model performance across underrepresented subgroups, and further research on algorithmic fairness, in lung cancer screening.

ARXIV Cancer: general cancer Method: federated learning

FedPOD: the deployable units of training for federated learning

Daewoon Kim, Si Young Yie, Jae Sung Lee
Published 2025-12-23 18:57

This paper introduces FedPOD, a novel approach that ranked first in the 2024 Federated Tumor Segmentation Challenge, aimed at optimizing learning efficiency and communication costs in federated learning. The method enhances training efficiency by defining a round-wise task and incorporates a PID controller to model data distribution, reducing communication costs even with skewed data. FedPOD also addresses limitations of previous methods by including outlier participants and eliminating dependency on prior learning rounds, demonstrating comparable performance metrics to existing methods.

Read abstract

This paper proposes FedPOD, which ranked first in the 2024 Federated Tumor Segmentation (FeTS) Challenge, for optimizing learning efficiency and communication cost in federated learning among multiple clients. Inspired by FedPIDAvg, we define a round-wise task for FedPOD to enhance training efficiency. FedPIDAvg achieved performance improvement by incorporating the training loss reduction for prediction entropy as weights using differential terms. Furthermore, by modeling data distribution with a Poisson distribution and using a PID controller, it reduced communication costs even in skewed data distribution. However, excluding participants classified as outliers based on the Poisson distribution can limit data utilization. Additionally, PID controller requires the same participants to be maintained throughout the federated learning process as it uses previous rounds' learning information in the current round. In our approach, FedPOD addresses these issues by including participants excluded as outliers, eliminating dependency on previous rounds' learning information, and applying a method for calculating validation loss at each round. In this challenge, FedPOD presents comparable performance to FedPIDAvg in metrics of Dice score, 0.78, 0.71 and 0.72 for WT, ET and TC in average, and projected convergence score, 0.74 in average. Furthermore, the concept of FedPOD draws inspiration from Kubernetes' smallest computing unit, POD, designed to be compatible with Kubernetes auto-scaling. Extending round-wise tasks of FedPOD to POD units allows flexible design by applying scale-out similar to Kubernetes' auto-scaling. This work demonstrated the potentials of FedPOD to enhance federated learning by improving efficiency, flexibility, and performance in metrics.

ARXIV Cancer: breast cancer Method: deep learning

EvoXplain: When Machine Learning Models Agree on Predictions but Disagree on Why -- Measuring Mechanistic Multiplicity Across Training Runs

Chama Bensmail
Published 2025-12-23 18:34

The paper introduces EvoXplain, a framework designed to measure the stability of explanations provided by machine learning models across different training runs. It evaluates the framework on datasets related to Adult Income and Breast Cancer, demonstrating that while models may achieve high predictive accuracy, their explanations can vary significantly. The findings reveal that deep neural networks applied to Breast Cancer data converge to a single explanatory basin, whereas other models may exhibit multiple explanatory structures. EvoXplain emphasizes the importance of understanding the interpretability of models as a function of the training process rather than the models themselves.

Read abstract

Machine learning models are primarily judged by predictive performance, especially in applied settings. Once a model reaches high accuracy, its explanation is often assumed to be correct and trustworthy. This assumption raises an overlooked question: when two models achieve high accuracy, do they rely on the same internal logic, or do they reach the same outcome via different and potentially competing mechanisms? We introduce EvoXplain, a diagnostic framework that measures the stability of model explanations across repeated training. Rather than analysing the explanation of a single trained model, EvoXplain treats explanations as samples drawn from the training and model selection pipeline itself, without aggregating predictions or constructing ensembles. It examines whether these samples form a single coherent explanatory basin or separate into multiple structured explanatory basins. We evaluate EvoXplain on the Adult Income and Breast Cancer datasets using deep neural networks and Logistic Regression. Although all models achieve high predictive accuracy, explanation stability differs across pipelines. Deep neural networks on Breast Cancer converge to a single explanatory basin, while the same architecture on Adult Income separates into distinct explanatory basins despite identical training conditions. Logistic Regression on Breast Cancer exhibits conditional multiplicity, where basin accessibility is controlled by regularisation configuration. EvoXplain does not attempt to select a correct explanation. Instead, it makes explanatory structure visible and quantifiable, revealing when single instance explanations obscure the existence of multiple admissible predictive mechanisms. More broadly, EvoXplain reframes interpretability as a property of the training pipeline under repeated instantiation, rather than of any single trained model.

ARXIV Cancer: skin cancer Method: soft voting ensemble of convolutional neural networks

Skin Lesion Classification Using a Soft Voting Ensemble of Convolutional Neural Networks

Abdullah Al Shafi, Abdul Muntakim, Pintu Chandra Shill, Rowzatul Zannat, Abdullah Al-Amin
Published 2025-12-23 15:20

This paper presents a method for early skin cancer classification using a soft voting ensemble of Convolutional Neural Networks (CNNs). The approach utilizes three benchmark datasets and incorporates techniques such as rebalancing, image augmentation, and filtering to enhance diagnostic accuracy. The method achieved high lesion recognition accuracies, demonstrating its potential for real-world deployment in skin cancer detection.

Read abstract

Skin cancer can be identified by dermoscopic examination and ocular inspection, but early detection significantly increases survival chances. Artificial intelligence (AI), using annotated skin images and Convolutional Neural Networks (CNNs), improves diagnostic accuracy. This paper presents an early skin cancer classification method using a soft voting ensemble of CNNs. In this investigation, three benchmark datasets, namely HAM10000, ISIC 2016, and ISIC 2019, were used. The process involved rebalancing, image augmentation, and filtering techniques, followed by a hybrid dual encoder for segmentation via transfer learning. Accurate segmentation focused classification models on clinically significant features, reducing background artifacts and improving accuracy. Classification was performed through an ensemble of MobileNetV2, VGG19, and InceptionV3, balancing accuracy and speed for real-world deployment. The method achieved lesion recognition accuracies of 96.32\%, 90.86\%, and 93.92\% for the three datasets. The system performance was evaluated using established skin lesion detection metrics, yielding impressive results.

ARXIV Cancer: unknown Method: deep learning

A DeepSeek-Powered AI System for Automated Chest Radiograph Interpretation in Clinical Practice

Yaowei Bai, Ruiheng Zhang, Yu Lei, Xuhua Duan, Jingfeng Yao, Shuguang Ju, Chaoyang Wang, Wei Yao, Yiwan Guo, Guilin Zhang, Chao Wan, Qian Yuan, Lei Chen, Wenjuan Tang, Biqiang Zhu, Xinggang Wang, Tao Sun, Wei Zhou, Dacheng Tao, Yongchao Xu, Chuansheng Zheng, Huangxuan Zhao, Bo Du
Published 2025-12-23 13:26

The study presents Janus-Pro-CXR, an AI system designed for automated interpretation of chest radiographs, addressing the shortage of radiologists. It was developed using the DeepSeek Janus-Pro model and validated through a multicenter prospective trial. The system demonstrated superior performance in report generation and detection of critical radiographic findings compared to existing models, while also improving report quality and reducing interpretation time in clinical settings.

Read abstract

A global shortage of radiologists has been exacerbated by the significant volume of chest X-ray workloads, particularly in primary care. Although multimodal large language models show promise, existing evaluations predominantly rely on automated metrics or retrospective analyses, lacking rigorous prospective clinical validation. Janus-Pro-CXR (1B), a chest X-ray interpretation system based on DeepSeek Janus-Pro model, was developed and rigorously validated through a multicenter prospective trial (NCT07117266). Our system outperforms state-of-the-art X-ray report generation models in automated report generation, surpassing even larger-scale models including ChatGPT 4o (200B parameters), while demonstrating reliable detection of six clinically critical radiographic findings. Retrospective evaluation confirms significantly higher report accuracy than Janus-Pro and ChatGPT 4o. In prospective clinical deployment, AI assistance significantly improved report quality scores, reduced interpretation time by 18.3% (P < 0.001), and was preferred by a majority of experts in 54.3% of cases. Through lightweight architecture and domain-specific optimization, Janus-Pro-CXR improves diagnostic reliability and workflow efficiency, particularly in resource-constrained settings. The model architecture and implementation framework will be open-sourced to facilitate the clinical translation of AI-assisted radiology solutions.

ARXIV Cancer: general cancer Method: deep learning

UbiQVision: Quantifying Uncertainty in XAI for Image Recognition

Akshat Dubey, Aleksandar Anžel, Bahar İlgen, Georges Hattab
Published 2025-12-23 11:57

This study presents UbiQVision, a framework designed to quantify uncertainty in explainable AI (XAI) for medical imaging applications. It employs Dirichlet posterior sampling and Dempster-Shafer theory to enhance the stability and reliability of SHAP explanations, which are often compromised by uncertainty. The framework was evaluated on multiple medical imaging datasets, demonstrating its effectiveness in addressing epistemic uncertainty across various imaging modalities.

Read abstract

Recent advances in deep learning have led to its widespread adoption across diverse domains, including medical imaging. This progress is driven by increasingly sophisticated model architectures, such as ResNets, Vision Transformers, and Hybrid Convolutional Neural Networks, that offer enhanced performance at the cost of greater complexity. This complexity often compromises model explainability and interpretability. SHAP has emerged as a prominent method for providing interpretable visualizations that aid domain experts in understanding model predictions. However, SHAP explanations can be unstable and unreliable in the presence of epistemic and aleatoric uncertainty. In this study, we address this challenge by using Dirichlet posterior sampling and Dempster-Shafer theory to quantify the uncertainty that arises from these unstable explanations in medical imaging applications. The framework uses a belief, plausible, and fusion map approach alongside statistical quantitative analysis to produce quantification of uncertainty in SHAP. Furthermore, we evaluated our framework on three medical imaging datasets with varying class distributions, image qualities, and modality types which introduces noise due to varying image resolutions and modality-specific aspect covering the examples from pathology, ophthalmology, and radiology, introducing significant epistemic uncertainty.

Find the papers that actually matter