Research Papers

ARXIV Cancer: nasopharyngeal carcinoma Method: unknown

SegRap2025: A Benchmark of Gross Tumor Volume and Lymph Node Clinical Target Volume Segmentation for Radiotherapy Planning of Nasopharyngeal Carcinoma

Jia Fu, Litingyu Wang, He Li, Zihao Luo, Huamin Wang, Chenyuan Bian, Zijun Gao, Chunbin Gu, Xin Weng, Jianghao Wu, Yicheng Wu, Jin Ye, Linhao Li, Yiwen Ye, Yong Xia, Elias Tappeiner, Fei He, Abdul qayyum, Moona Mazher, Steven A Niederer, Junqiang Chen, Chuanyi Huang, Lisheng Wang, Zhaohu Xing, Hongqiu Wang, Lei Zhu, Shichuan Zhang, Shaoting Zhang, Wenjun Liao, Guotai Wang
Published 2026-01-28 13:11

This paper introduces SegRap2025, a benchmark aimed at improving the segmentation of Gross Tumor Volume (GTV) and Lymph Node Clinical Target Volume (LN CTV) for radiotherapy planning in Nasopharyngeal Carcinoma. The challenge includes two tasks that utilize multi-center and multi-modality data to enhance the robustness and generalizability of segmentation models. Results indicate that top-performing models achieved significant Dice Similarity Coefficient scores, demonstrating the effectiveness of the proposed methods.

Read abstract

Accurate delineation of Gross Tumor Volume (GTV), Lymph Node Clinical Target Volume (LN CTV), and Organ-at-Risk (OAR) from Computed Tomography (CT) scans is essential for precise radiotherapy planning in Nasopharyngeal Carcinoma (NPC). Building upon SegRap2023, which focused on OAR and GTV segmentation using single-center paired non-contrast CT (ncCT) and contrast-enhanced CT (ceCT) scans, the SegRap2025 challenge aims to enhance the generalizability and robustness of segmentation models across imaging centers and modalities. SegRap2025 comprises two tasks: Task01 addresses GTV segmentation using paired CT from the SegRap2023 dataset, with an additional external testing set to evaluate cross-center generalization, and Task02 focuses on LN CTV segmentation using multi-center training data and an unseen external testing set, where each case contains paired CT scans or a single modality, emphasizing both cross-center and cross-modality robustness. This paper presents the challenge setup and provides a comprehensive analysis of the solutions submitted by ten participating teams. For GTV segmentation task, the top-performing models achieved average Dice Similarity Coefficient (DSC) of 74.61% and 56.79% on the internal and external testing cohorts, respectively. For LN CTV segmentation task, the highest average DSC values reached 60.24%, 60.50%, and 57.23% on paired CT, ceCT-only, and ncCT-only subsets, respectively. SegRap2025 establishes a large-scale multi-center, multi-modality benchmark for evaluating the generalization and robustness in radiotherapy target segmentation, providing valuable insights toward clinically applicable automated radiotherapy planning systems. The benchmark is available at: https://hilab-git.github.io/SegRap2025_Challenge.

ARXIV Cancer: non-small cell lung cancer Method: multitask and multimodal supervised framework

MMSF: Multitask and Multimodal Supervised Framework for WSI Classification and Survival Analysis

Chengying She, Chengwei Chen, Xinran Zhang, Ben Wang, Lizhuang Liu, Chengwei Shao, Yun Bian
Published 2026-01-28 08:05

The paper presents MMSF, a multitask and multimodal supervised framework designed for the classification of whole slide images (WSI) and survival analysis in computational pathology. The framework integrates tumor morphology from gigapixel images with clinical descriptors to enhance prognostic accuracy. Experimental results on CAMELYON16 and TCGA-NSCLC datasets show significant improvements in accuracy and AUC compared to existing methods, as well as enhanced C-index in survival analysis across multiple cohorts.

Read abstract

Multimodal evidence is critical in computational pathology: gigapixel whole slide images capture tumor morphology, while patient-level clinical descriptors preserve complementary context for prognosis. Integrating such heterogeneous signals remains challenging because feature spaces exhibit distinct statistics and scales. We introduce MMSF, a multitask and multimodal supervised framework built on a linear-complexity MIL backbone that explicitly decomposes and fuses cross-modal information. MMSF comprises a graph feature extraction module embedding tissue topology at the patch level, a clinical data embedding module standardizing patient attributes, a feature fusion module aligning modality-shared and modality-specific representations, and a Mamba-based MIL encoder with multitask prediction heads. Experiments on CAMELYON16 and TCGA-NSCLC demonstrate 2.1--6.6\% accuracy and 2.2--6.9\% AUC improvements over competitive baselines, while evaluations on five TCGA survival cohorts yield 7.1--9.8\% C-index improvements compared with unimodal methods and 5.6--7.1\% over multimodal alternatives.

ARXIV Cancer: brain metastasis Method: Pareto-consistent loss

Pareto-Guided Optimization for Uncertainty-Aware Medical Image Segmentation

Jinming Zhang, Youpeng Yang, Xi Yang, Haosen Shi, Yuyao Yan, Qiufeng Wang, Guangliang Cheng, Kaizhu Huang
Published 2026-01-27 08:47

This paper addresses the challenges of uncertainty in medical image segmentation, particularly in boundary regions where ambiguity is higher. The authors propose a region-wise curriculum strategy and a Pareto-consistent loss to improve model convergence and stability. Experiments demonstrate that their method outperforms traditional approaches in segmenting brain metastasis and non-metastatic tumors.

Read abstract

Uncertainty in medical image segmentation is inherently non-uniform, with boundary regions exhibiting substantially higher ambiguity than interior areas. Conventional training treats all pixels equally, leading to unstable optimization during early epochs when predictions are unreliable. We argue that this instability hinders convergence toward Pareto-optimal solutions and propose a region-wise curriculum strategy that prioritizes learning from certain regions and gradually incorporates uncertain ones, reducing gradient variance. Methodologically, we introduce a Pareto-consistent loss that balances trade-offs between regional uncertainties by adaptively reshaping the loss landscape and constraining convergence dynamics between interior and boundary regions; this guides the model toward Pareto-approximate solutions. To address boundary ambiguity, we further develop a fuzzy labeling mechanism that maintains binary confidence in non-boundary areas while enabling smooth transitions near boundaries, stabilizing gradients, and expanding flat regions in the loss surface. Experiments on brain metastasis and non-metastatic tumor segmentation show consistent improvements across multiple configurations, with our method outperforming traditional crisp-set approaches in all tumor subregions.

ARXIV Cancer: brain tumor Method: adaptive multi-granular transformer

AMGFormer: Adaptive Multi-Granular Transformer for Brain Tumor Segmentation with Missing Modalities

Chengxiang Guo, Jian Wang, Junhua Fei, Xiao Li, Chunling Chen, Yun Jin
Published 2026-01-27 08:29

The paper presents AMGFormer, a novel approach for brain tumor segmentation that addresses the challenges posed by missing modalities in multimodal MRI. The method incorporates three innovative modules to enhance stability and accuracy, achieving significant improvements in segmentation performance on the BraTS datasets. Results indicate that AMGFormer outperforms existing methods, demonstrating its potential for clinical application.

Read abstract

Multimodal MRI is essential for brain tumor segmentation, yet missing modalities in clinical practice cause existing methods to exhibit >40% performance variance across modality combinations, rendering them clinically unreliable. We propose AMGFormer, achieving significantly improved stability through three synergistic modules: (1) QuadIntegrator Bridge (QIB) enabling spatially adaptive fusion maintaining consistent predictions regardless of available modalities, (2) Multi-Granular Attention Orchestrator (MGAO) focusing on pathological regions to reduce background sensitivity, and (3) Modality Quality-Aware Enhancement (MQAE) preventing error propagation from corrupted sequences. On BraTS 2018, our method achieves 89.33% WT, 82.70% TC, 67.23% ET Dice scores with <0.5% variance across 15 modality combinations, solving the stability crisis. Single-modality ET segmentation shows 40-81% relative improvements over state-of-the-art methods. The method generalizes to BraTS 2020/2021, achieving up to 92.44% WT, 89.91% TC, 84.57% ET. The model demonstrates potential for clinical deployment with 1.2s inference. Code: https://github.com/guochengxiangives/AMGFormer.

ARXIV Cancer: general cancer Method: reinforcement learning

Glance and Focus Reinforcement for Pan-cancer Screening

Linshan Wu, Jiaxin Zhuang, Hao Chen
Published 2026-01-27 02:10

This paper presents GF-Screen, a Glance and Focus reinforcement learning framework designed for pan-cancer screening in large-scale CT scans. The method addresses challenges in localizing tiny lesions by employing a Glance model to identify diseased regions and a Focus model for precise segmentation. The approach utilizes reinforcement learning to enhance the Glance model's performance based on segmentation results, leading to improved efficiency and reduced false positives. Extensive experiments demonstrate GF-Screen's effectiveness, achieving top results in the MICCAI FLARE25 pan-cancer challenge.

Read abstract

Pan-cancer screening in large-scale CT scans remains challenging for existing AI methods, primarily due to the difficulty of localizing diverse types of tiny lesions in large CT volumes. The extreme foreground-background imbalance significantly hinders models from focusing on diseased regions, while redundant focus on healthy regions not only decreases the efficiency but also increases false positives. Inspired by radiologists' glance and focus diagnostic strategy, we introduce GF-Screen, a Glance and Focus reinforcement learning framework for pan-cancer screening. GF-Screen employs a Glance model to localize the diseased regions and a Focus model to precisely segment the lesions, where segmentation results of the Focus model are leveraged to reward the Glance model via Reinforcement Learning (RL). Specifically, the Glance model crops a group of sub-volumes from the entire CT volume and learns to select the sub-volumes with lesions for the Focus model to segment. Given that the selecting operation is non-differentiable for segmentation training, we propose to employ the segmentation results to reward the Glance model. To optimize the Glance model, we introduce a novel group relative learning paradigm, which employs group relative comparison to prioritize high-advantage predictions and discard low-advantage predictions within sub-volume groups, not only improving efficiency but also reducing false positives. In this way, for the first time, we effectively extend cutting-edge RL techniques to tackle the specific challenges in pan-cancer screening. Extensive experiments on 16 internal and 7 external datasets across 9 lesion types demonstrated the effectiveness of GF-Screen. Notably, GF-Screen leads the public validation leaderboard of MICCAI FLARE25 pan-cancer challenge, surpassing the FLARE24 champion solution by a large margin (+25.6% DSC and +28.2% NSD).

ARXIV Cancer: breast cancer Method: quantum-enhanced classification

Generative Diffusion Augmentation with Quantum-Enhanced Discrimination for Medical Image Diagnosis

Jingsong Xia, Siqi Wang
Published 2026-01-26 15:05

This paper presents SDA-QEC, a novel framework that combines simplified diffusion-based data augmentation with quantum-enhanced feature discrimination to improve medical image classification. The method addresses class imbalance in datasets, which often leads to biased models and misdiagnosis risks. Experimental results on coronary angiography images show that SDA-QEC achieves high accuracy and balanced performance metrics, indicating its potential for clinical application in medical diagnostics.

Read abstract

In biomedical engineering, artificial intelligence has become a pivotal tool for enhancing medical diagnostics, particularly in medical image classification tasks such as detecting pneumonia from chest X-rays and breast cancer screening. However, real-world medical datasets frequently exhibit severe class imbalance, where positive samples substantially outnumber negative samples, leading to biased models with low recall rates for minority classes. This imbalance not only compromises diagnostic accuracy but also poses clinical misdiagnosis risks. To address this challenge, we propose SDA-QEC (Simplified Diffusion Augmentation with Quantum-Enhanced Classification), an innovative framework that integrates simplified diffusion-based data augmentation with quantum-enhanced feature discrimination. Our approach employs a lightweight diffusion augmentor to generate high-quality synthetic samples for minority classes, rebalancing the training distribution. Subsequently, a quantum feature layer embedded within MobileNetV2 architecture enhances the model's discriminative capability through high-dimensional feature mapping in Hilbert space. Comprehensive experiments on coronary angiography image classification demonstrate that SDA-QEC achieves 98.33% accuracy, 98.78% AUC, and 98.33% F1-score, significantly outperforming classical baselines including ResNet18, MobileNetV2, DenseNet121, and VGG16. Notably, our framework simultaneously attains 98.33% sensitivity and 98.33% specificity, achieving a balanced performance critical for clinical deployment. The proposed method validates the feasibility of integrating generative augmentation with quantum-enhanced modeling in real-world medical imaging tasks, offering a novel research pathway for developing highly reliable medical AI systems in small-sample, highly imbalanced, and high-risk diagnostic scenarios.

ARXIV Cancer: glioma Method: Densely Swin Hybrid

A Tumor Aware DenseNet Swin Hybrid Learning with Boosted and Hierarchical Feature Spaces for Large-Scale Brain MRI Classification

Muhammad Ali Shah, Muhammad Mansoor Alam, Saddam Hussain Khan
Published 2026-01-26 10:14

This study presents an Efficient Densely Swin Hybrid (EDSH) framework for the analysis of brain tumor MRIs, focusing on capturing both fine-grained texture patterns and long-range contextual dependencies. The framework includes two experimental setups that enhance diagnostic accuracy for diffuse gliomas, meningiomas, and pituitary tumors by utilizing customized DenseNet and Swin architectures. The proposed method achieves a high accuracy of 98.50% on a large-scale dataset, outperforming existing models.

Read abstract

This study proposes an efficient Densely Swin Hybrid (EDSH) framework for brain tumor MRI analysis, designed to jointly capture fine grained texture patterns and long range contextual dependencies. Two tumor aware experimental setups are introduced to address class-specific diagnostic challenges. The first setup employs a Boosted Feature Space (BFS), where independently customized DenseNet and Swint branches learn complementary local and global representations that are dimension aligned, fused, and boosted, enabling highly sensitive detection of diffuse glioma patterns by successfully learning the features of irregular shape, poorly defined mass, and heterogeneous texture. The second setup adopts a hierarchical DenseNet Swint architecture with Deep Feature Extraction have Dual Residual connections (DFE and DR), in which DenseNet serves as a stem CNN for structured local feature learning, while Swin_t models global tumor morphology, effectively suppressing false negatives in meningioma and pituitary tumor classification by learning the features of well defined mass, location (outside brain) and enlargments in tumors (dural tail or upward extension). DenseNet is customized at the input level to match MRI spatial characteristics, leveraging dense residual connectivity to preserve texture information and mitigate vanishing-gradient effects. In parallel, Swint is tailored through task aligned patch embedding and shifted-window self attention to efficiently capture hierarchical global dependencies. Extensive evaluation on a large-scale MRI dataset (stringent 40,260 images across four tumor classes) demonstrates consistent superiority over standalone CNNs, Vision Transformers, and hybrids, achieving 98.50 accuracy and recall on the test unseen dataset.

ARXIV Cancer: unknown Method: convolutional neural network

Depth to Anatomy: Organ Localization from Depth Images for Automated Patient Table Positioning in Radiology Workflow

Eytan Kats, Kai Geissler, Daniel Mensing, Julien Senegas, Jochen G. Hirsch, Stefan Heldman, Mattias P. Heinrich
Published 2026-01-26 08:33

This study presents a learning-based framework for predicting 3D organ locations and shapes from a single 2D depth image, aimed at enhancing radiology workflow efficiency through automated patient positioning. Utilizing a dataset of whole-body MRI scans, the authors trained a convolutional neural network to achieve significant accuracy in volumetric organ prediction. The results indicate that the proposed method can effectively generalize to clinical settings, potentially reducing setup time and improving patient comfort.

Read abstract

Automated patient positioning can improve radiology workflow efficiency by reducing the time required for manual table adjustments and scout-based scan planning. We propose a learning-based framework that predicts 3D organ locations and shapes for 41 anatomical structures, including both bones and soft tissues, directly from a single 2D depth image of the body surface. Leveraging $10,020$ whole-body MRI scans from the German National Cohort (NAKO) dataset, we synthetically generate depth images paired with anatomical segmentations to train a convolutional neural network for volumetric organ prediction. Our method achieves a mean dice similarity coefficient of $0.44\pm0.2$ and and a symmetric average surface distance of $7.69\pm5.68$ mm across all structures. Furthermore, the model derives organ bounding boxes with a mean absolute detection offset of $10.99\pm5.54$ mm. Qualitative results on real-world depth images confirm the ability of the model to generalize to practical clinical settings. These findings suggest that depth-only organ localization can support automated patient positioning reducing setup time, minimizing operator variability, and improving patient comfort.

ARXIV Cancer: unknown Method: self-supervised contrastive learning

A multimodal vision foundation model for generalizable knee pathology

Kang Yu, Dingyu Wang, Zimu Yuan, Nan Zhou, Jiajun Liu, Jiaxin Liu, Shanggui Liu, Yaoyan Zheng, Huishu Yuan, Di Huang, Dong Jiang
Published 2026-01-26 08:14

This paper presents OrthoFoundation, a multimodal vision foundation model designed to improve the interpretation of medical imaging for musculoskeletal disorders. The model was trained using a large dataset of knee X-ray and MRI images through self-supervised contrastive learning, achieving state-of-the-art performance in various diagnostic tasks. Notably, it demonstrated high accuracy in diagnosing osteoarthritis and showed strong generalization capabilities across different anatomical regions.

Read abstract

Musculoskeletal disorders represent a leading cause of global disability, creating an urgent demand for precise interpretation of medical imaging. Current artificial intelligence (AI) approaches in orthopedics predominantly rely on task-specific, supervised learning paradigms. These methods are inherently fragmented, require extensive annotated datasets, and often lack generalizability across different modalities and clinical scenarios. The development of foundation models in this field has been constrained by the scarcity of large-scale, curated, and open-source musculoskeletal datasets. To address these challenges, we introduce OrthoFoundation, a multimodal vision foundation model optimized for musculoskeletal pathology. We constructed a pre-training dataset of 1.2 million unlabeled knee X-ray and MRI images from internal and public databases. Utilizing a Dinov3 backbone, the model was trained via self-supervised contrastive learning to capture robust radiological representations. OrthoFoundation achieves state-of-the-art (SOTA) performance across 14 downstream tasks. It attained superior accuracy in X-ray osteoarthritis diagnosis and ranked first in MRI structural injury detection. The model demonstrated remarkable label efficiency, matching supervised baselines using only 50% of labeled data. Furthermore, despite being pre-trained on knee images, OrthoFoundation exhibited exceptional cross-anatomy generalization to the hip, shoulder, and ankle. OrthoFoundation represents a significant advancement toward general-purpose AI for musculoskeletal imaging. By learning fundamental, joint-agnostic radiological semantics from large-scale multimodal data, it overcomes the limitations of conventional models, which provides a robust framework for reducing annotation burdens and enhancing diagnostic accuracy in clinical practice.

ARXIV Cancer: breast cancer Method: deep learning

Automated HER2 scoring with uncertainty quantification using lensfree holography and deep learning

Che-Yung Shen, Xilin Yang, Yuzhu Li, Leon Lenk, Aydogan Ozcan
Published 2026-01-26 07:09

This study presents a novel lensfree holography platform integrated with deep learning for the automated scoring of HER2 expression in breast cancer tissue. The method captures diffraction patterns of stained tissue sections and employs Bayesian Monte Carlo dropout for uncertainty quantification, enhancing diagnostic reliability. The approach achieved a testing accuracy of 84.9% for 4-class HER2 classification and 94.8% for binary HER2 scoring, demonstrating its potential for cost-effective and portable cancer diagnostics.

Read abstract

Accurate assessment of human epidermal growth factor receptor 2 (HER2) expression is critical for breast cancer diagnosis, prognosis, and therapy selection; yet, most existing digital HER2 scoring methods rely on bulky and expensive optical systems. Here, we present a compact and cost-effective lensfree holography platform integrated with deep learning for automated HER2 scoring of immunohistochemically stained breast tissue sections. The system captures lensfree diffraction patterns of stained HER2 tissue sections under RGB laser illumination and acquires complex field information over a sample area of ~1,250 mm^2 at an effective throughput of ~84 mm^2 per minute. To enhance diagnostic reliability, we incorporated an uncertainty quantification strategy based on Bayesian Monte Carlo dropout, which provides autonomous uncertainty estimates for each prediction and supports reliable, robust HER2 scoring, with an overall correction rate of 30.4%. Using a blinded test set of 412 unique tissue samples, our approach achieved a testing accuracy of 84.9% for 4-class (0, 1+, 2+, 3+) HER2 classification and 94.8% for binary (0/1+ vs. 2+/3+) HER2 scoring with uncertainty quantification. Overall, this lensfree holography approach provides a practical pathway toward portable, high-throughput, and cost-effective HER2 scoring, particularly suited for resource-limited settings, where traditional digital pathology infrastructure is unavailable.

Find the papers that actually matter