Research Papers

ARXIV Cancer: colorectal cancer Method: deep learning

A multi-center analysis of deep learning methods for video polyp detection and segmentation

Noha Ghatwary, Pedro Chavarias Solano, Mohamed Ramzy Ibrahim, Adrian Krenzer, Frank Puppe, Stefano Realdon, Renato Cannizzaro, Jiacheng Wang, Liansheng Wang, Thuy Nuong Tran, Lena Maier-Hein, Amine Yamlahi, Patrick Godau, Quan He, Qiming Wan, Mariia Kokshaikyna, Mariia Dobko, Haili Ye, Heng Li, Ragu B, Antony Raj, Hanaa Nagdy, Osama E Salem, James E. East, Dominique Lamarque, Thomas de Lange, Sharib Ali
Published 2026-03-04 17:05

This study investigates the application of deep learning methods for the detection and segmentation of colonic polyps during colonoscopy, which are precursors to colorectal cancer. It addresses the challenges posed by the variability in polyp appearance and emphasizes the importance of integrating sequence data and temporal information to enhance diagnostic accuracy. The collaboration between data scientists and gastroenterologists aims to improve real-time clinical processes and reduce missed detections.

Read abstract

Colonic polyps are well-recognized precursors to colorectal cancer (CRC), typically detected during colonoscopy. However, the variability in appearance, location, and size of these polyps complicates their detection and removal, leading to challenges in effective surveillance, intervention, and subsequently CRC prevention. The processes of colonoscopy surveillance and polyp removal are highly reliant on the expertise of gastroenterologists and occur within the complexities of the colonic structure. As a result, there is a high rate of missed detections and incomplete removal of colonic polyps, which can adversely impact patient outcomes. Recently, automated methods that use machine learning have been developed to enhance polyps detection and segmentation, thus helping clinical processes and reducing missed rates. These advancements highlight the potential for improving diagnostic accuracy in real-time applications, which ultimately facilitates more effective patient management. Furthermore, integrating sequence data and temporal information could significantly enhance the precision of these methods by capturing the dynamic nature of polyp growth and the changes that occur over time. To rigorously investigate these challenges, data scientists and experts gastroenterologists collaborated to compile a comprehensive dataset that spans multiple centers and diverse populations. This initiative aims to underscore the critical importance of incorporating sequence data and temporal information in the development of robust automated detection and segmentation methods. This study evaluates the applicability of deep learning techniques developed in real-time clinical colonoscopy tasks using sequence data, highlighting the critical role of temporal relationships between frames in improving diagnostic precision.

ARXIV Cancer: unknown Method: Zero-Initialized Gated Cross-Task Attention

A Unified Framework for Joint Detection of Lacunes and Enlarged Perivascular Spaces

Lucas He, Krinos Li, Hanyuan Zhang, Runlong He, Silvia Ingala, Luigi Lorenzini, Marleen de Bruijne, Frederik Barkhof, Rhodri Davies, Carole Sudre
Published 2026-03-04 16:30

This study addresses the challenges in detecting cerebral small vessel disease markers, specifically enlarged perivascular spaces and lacunae, in medical images. The authors propose a novel framework that utilizes Zero-Initialized Gated Cross-Task Attention to improve detection accuracy by leveraging contextual information. Their approach incorporates a mixed-supervision strategy to enhance biological and topological consistency, achieving state-of-the-art performance in precision and F1-score during validation. The model's robustness is further validated on a larger external dataset.

Read abstract

Cerebral small vessel disease (CSVD) markers, specifically enlarged perivascular spaces (EPVS) and lacunae, present a unique challenge in medical image analysis due to their radiological mimicry. Standard segmentation networks struggle with feature interference and extreme class imbalance when handling these divergent targets simultaneously. To address these issues, we propose a morphology-decoupled framework where Zero-Initialized Gated Cross-Task Attention exploits dense EPVS context to guide sparse lacune detection. Furthermore, biological and topological consistency are enforced via a mixed-supervision strategy integrating Mutual Exclusion and Centerline Dice losses. Finally, we introduce an Anatomically-Informed Inference Calibration mechanism to dynamically suppress false positives based on tissue semantics. Extensive 5-folds cross-validation on the VALDO 2021 dataset (N=40) demonstrates state-of-the-art performance, notably surpassing task winners in lacunae detection precision (71.1%, p=0.01) and F1-score (62.6%, p=0.03). Furthermore, evaluation on the external EPAD cohort (N=1762) confirms the model's robustness for large-scale population studies. Code will be released upon acceptance.

ARXIV Cancer: general cancer Method: decoupling nuclei detection and classification

DeNuC: Decoupling Nuclei Detection and Classification in Histopathology

Zijiang Yang, Chen Kuang, Dongmei Fu
Published 2026-03-04 16:23

This study introduces DeNuC, a method that decouples nuclei detection and classification in histopathology to improve performance. The authors argue that joint optimization of these tasks leads to representation degradation in pathology foundation models. By employing a lightweight model for nuclei localization and leveraging a pathology FM for classification, DeNuC demonstrates significant improvements in F1 scores on benchmark datasets while reducing the number of trainable parameters.

Read abstract

Pathology Foundation Models (FMs) have shown strong performance across a wide range of pathology image representation and diagnostic tasks. However, FMs do not exhibit the expected performance advantage over traditional specialized models in Nuclei Detection and Classification (NDC). In this work, we reveal that jointly optimizing nuclei detection and classification leads to severe representation degradation in FMs. Moreover, we identify that the substantial intrinsic disparity in task difficulty between nuclei detection and nuclei classification renders joint NDC optimization unnecessarily computationally burdensome for the detection stage. To address these challenges, we propose DeNuC, a simple yet effective method designed to break through existing bottlenecks by Decoupling Nuclei detection and Classification. DeNuC employs a lightweight model for accurate nuclei localization, subsequently leveraging a pathology FM to encode input images and query nucleus-specific features based on the detected coordinates for classification. Extensive experiments on three widely used benchmarks demonstrate that DeNuC effectively unlocks the representational potential of FMs for NDC and significantly outperforms state-of-the-art methods. Notably, DeNuC improves F1 scores by 4.2% and 3.6% (or higher) on the BRCAM2C and PUMA datasets, respectively, while using only 16% (or fewer) trainable parameters compared to other methods. Code is available at https://github.com/ZijiangY1116/DeNuC.

ARXIV Cancer: colorectal cancer Method: Vision Transformer

Revisiting the Role of Foundation Models in Cell-Level Histopathological Image Analysis under Small-Patch Constraints -- Effects of Training Data Scale and Blur Perturbations on CNNs and Vision Transformers

Hiroki Kagiyama, Toru Nagasaka, Yukari Adachi, Takaaki Tachibana, Ryota Ito, Mitsugu Fujita, Kimihiro Yamashita, Yoshihiro Kakeji
Published 2026-03-04 13:52

This study investigates the effectiveness of various deep learning architectures for cell-level histopathological image analysis using small image patches. A total of 303 colorectal cancer specimens were analyzed, generating a large dataset of annotated cell images. The results indicate that task-specific models outperform foundation models in terms of accuracy and efficiency when sufficient training data is available, particularly under the constraints of small image sizes.

Read abstract

Background and objective: Cell-level pathological image analysis requires working with extremely small image patches (40x40 pixels), far below standard ImageNet resolutions. It remains unclear whether modern deep learning architectures and foundation models can learn robust and scalable representations under this constraint. We systematically evaluated architectural suitability and data-scale effects for small-patch cell classification. Methods: We analyzed 303 colorectal cancer specimens with CD103/CD8 immunostaining, generating 185,432 annotated cell images. Eight task-specific architectures were trained from scratch at multiple data scales (FlagLimit: 256--16,384 samples per class), and three foundation models were evaluated via linear probing and fine-tuning after resizing inputs to 224x224 pixels. Robustness to blur was assessed using pre- and post-resize Gaussian perturbations. Results: Task-specific models improved consistently with increasing data scale, whereas foundation models saturated at moderate sample sizes. A Vision Transformer optimized for small patches (CustomViT) achieved the highest accuracy, outperforming all foundation models with substantially lower inference cost. Blur robustness was comparable across architectures, with no qualitative advantage observed for foundation models. Conclusion: For cell-level classification under extreme spatial constraints, task-specific architectures are more effective and efficient than foundation models once sufficient training data are available. Higher clean accuracy does not imply superior robustness, and large pre-trained models offer limited benefit in the small-patch regime.

ARXIV Cancer: glioblastoma Method: generative model

TumorFlow: Physics-Guided Longitudinal MRI Synthesis of Glioblastoma Growth

Valentin Biller, Niklas Bubeck, Lucas Zimmer, Ayhan Can Erdur, Sandeep Nagar, Anke Meyer-Baese, Daniel Rückert, Benedikt Wiestler, Jonas Weidner
Published 2026-03-04 13:38

This study presents TumorFlow, a biophysically-conditioned generative framework designed to synthesize realistic 3D brain MRI volumes for glioblastoma. By integrating tumor-infiltration maps with a biophysical growth model, the method allows for the generation of coherent tumor growth trajectories that enhance the assessment of tumor extent and progression. The framework was evaluated on longitudinal glioblastoma cases, demonstrating a 75% Dice overlap with the biophysical model and consistent image quality.

Read abstract

Glioblastoma exhibits diverse, infiltrative, and patient-specific growth patterns that are only partially visible on routine MRI, making it difficult to reliably assess true tumor extent and personalize treatment planning and follow-up. We present a biophysically-conditioned generative framework that synthesizes biologically realistic 3D brain MRI volumes from estimated, spatially continuous tumor-concentration fields. Our approach combines a generative model with tumor-infiltration maps that can be propagated through time using a biophysical growth model, enabling fine-grained control over tumor shape and growth while preserving patient anatomy. This enables us to synthesize consistent tumor growth trajectories directly in the space of real patients, providing interpretable, controllable estimation of tumor infiltration and progression beyond what is explicitly observed in imaging. We evaluate the framework on longitudinal glioblastoma cases and demonstrate that it can generate temporally coherent sequences with realistic changes in tumor appearance and surrounding tissue response. These results suggest that integrating mechanistic tumor growth priors with modern generative modeling can provide a practical tool for patient-specific progression visualization and for generating controlled synthetic data to support downstream neuro-oncology workflows. In longitudinal extrapolation, we achieve a consistent 75% Dice overlap with the biophysical model while maintaining a constant PSNR of 25 in the surrounding tissue. Our code is available at: https://github.com/valentin-biller/lgm.git

ARXIV Cancer: general cancer Method: reinforcement learning

Rethinking the Efficiency and Effectiveness of Reinforcement Learning for Radiology Report Generation

Zilin Lu, Ruifeng Yuan, Weiwei Cao, Wanxing Chang, Zhongyu Wei, Sinuo Wang, Yong Xia, Ling Zhang, Jianpeng Zhang
Published 2026-03-04 12:57

This paper investigates the application of reinforcement learning (RL) for radiology report generation (R2G), addressing the limitations of existing automated AI approaches. It emphasizes the importance of data quality over quantity and introduces a novel data sampling strategy to enhance performance with fewer samples. The authors propose a new method, Diagnostic Token-weighted Policy Optimization (DiTPO), which optimizes clinical accuracy by prioritizing clinically relevant tokens in radiology reports. Experiments demonstrate that this framework achieves state-of-the-art performance with significantly reduced training samples.

Read abstract

Radiologists highly desire fully automated AI for radiology report generation (R2G), yet existing approaches fall short in clinical utility. Reinforcement learning (RL) holds potential to address these shortcomings, but its adoption in this task remains underexplored. In this paper, we revisit RL in terms of data efficiency and optimization effectiveness for R2G tasks. First, we explore the impact of data quantity and quality on the performance of RL in medical contexts, revealing that data quality plays a more critical role than quantity. To this end, we propose a diagnostic diversity-based data sampling strategy that enables comparable performance with fewer samples. Second, we observe that the majority of tokens in radiology reports are template-like and diagnostically uninformative, whereas the low frequency of clinically critical tokens heightens the risk of being overlooked during optimization. To tackle this, we introduce Diagnostic Token-weighted Policy Optimization (DiTPO), which directly optimizes for clinical accuracy by using a diagnostic F1 score as the reward signal. Unlike standard RL approaches that treat all tokens equally, DiTPO explicitly models the varying importance of different tokens through rule- or gradient-based mechanisms to prioritize clinically relevant content. Extensive experiments on the MIMIC-CXR, IU-Xray, and CheXpert Plus datasets demonstrate that our framework achieves state-of-the-art (SOTA) performance while requiring substantially fewer training samples in RL. Notably, on MIMIC-CXR, our framework attains an F1 score of 0.516 using only 20% of the RL training samples.

ARXIV Cancer: prostate cancer Method: vision foundation model

ProFound: A moderate-sized vision foundation model for multi-task prostate imaging

Yipei Wang, Yinsong Xu, Weixi Yi, Shaheer Ullah Saeed, Natasha Thorley, Alexander Ng, Yukun Zhou, Wen Yan, Dean Barratt, Shonit Punwani, Veeru Kasivisvanathan, Mark Emberton, Daniel C. Alexander, Yipeng Hu
Published 2026-03-04 11:46

This paper introduces ProFound, a vision foundation model designed for multi-task imaging in prostate cancer. The model is pre-trained on a large dataset of multi-parametric MRI scans and evaluated across various clinical tasks. Results indicate that ProFound outperforms or is competitive with existing specialized models in tasks such as cancer detection and lesion localization.

Read abstract

Many diagnostic and therapeutic clinical tasks for prostate cancer increasingly rely on multi-parametric MRI. Automating these tasks is challenging because they necessitate expert interpretations, which are difficult to scale to capitalise on modern deep learning. Although modern automated systems achieve expert-level performance in isolated tasks, their general clinical utility remains limited by the requirement of large task-specific labelled datasets. In this paper, we present ProFound, a domain-specialised vision foundation model for volumetric prostate mpMRI. ProFound is pre-trained using several variants of self-supervised approaches on a diverse, multi-institutional collection of 5,000 patients, with a total of over 22,000 unique 3D MRI volumes (over 1,800,000 2D image slices). We conducted a systematic evaluation of ProFound across a broad spectrum of $11$ downstream clinical tasks on over 3,000 independent patients, including prostate cancer detection, Gleason grading, lesion localisation, gland volume estimation, zonal and surrounding structure segmentation. Experimental results demonstrate that finetuned ProFound consistently outperforms or remains competitive with state-of-the-art specialised models and existing medical vision foundation models trained/finetuned on the same data.

ARXIV Cancer: brain tumor Method: multi-modal reconstruction

MPFlow: Multi-modal Posterior-Guided Flow Matching for Zero-Shot MRI Reconstruction

Seunghoi Kim, Chen Jin, Henry F. J. Tregidgo, Matteo Figini, Daniel C. Alexander
Published 2026-03-04 04:25

The paper presents MPFlow, a zero-shot multi-modal reconstruction framework designed to enhance MRI reconstruction by incorporating auxiliary MRI modalities without the need for retraining. The method utilizes a self-supervised pretraining strategy called PAMRI to learn shared representations across different MRI modalities. Experimental results indicate that MPFlow achieves comparable image quality to diffusion baselines while significantly reducing tumor hallucinations, demonstrating its effectiveness in improving anatomical fidelity in MRI reconstruction.

Read abstract

Zero-shot MRI reconstruction relies on generative priors, but single-modality unconditional priors produce hallucinations under severe ill-posedness. In many clinical workflows, complementary MRI acquisitions (e.g. high-quality structural scans) are routinely available, yet existing reconstruction methods lack mechanisms to leverage this additional information. We propose MPFlow, a zero-shot multi-modal reconstruction framework built on rectified flow that incorporates auxiliary MRI modalities at inference time without retraining the generative prior to improve anatomical fidelity. Cross-modal guidance is enabled by our proposed self-supervised pretraining strategy, Patch-level Multi-modal MR Image Pretraining (PAMRI), which learns shared representations across modalities. Sampling is jointly guided by data consistency and cross-modal feature alignment using pre-trained PAMRI, systematically suppressing intrinsic and extrinsic hallucinations. Extensive experiments on HCP and BraTS show that MPFlow matches diffusion baselines on image quality using only 20% of sampling steps while reducing tumor hallucinations by more than 15% (segmentation dice score). This demonstrates that cross-modal guidance enables more reliable and efficient zero-shot MRI reconstruction.

ARXIV Cancer: lupus nephritis Method: Clinical-Injection Transformer

Clinical-Injection Transformer with Domain-Adapted MAE for Lupus Nephritis Prognosis Prediction

Yuewen Huang, Zhitao Ye, Guangnan Feng, Fudan Zheng, Xia Gao, Yutong Lu
Published 2026-03-04 03:38

This study presents a novel multimodal computational pathology framework aimed at predicting treatment responses in pediatric lupus nephritis (LN). The framework utilizes routine PAS-stained biopsies and structured clinical data, integrating clinical features through a Clinical-Injection Transformer and employing a domain-adapted Masked Autoencoder for feature learning. The proposed method achieved a three-class accuracy of 90.1% and an AUC of 89.4%, indicating its effectiveness as a prognostic tool.

Read abstract

Lupus nephritis (LN) is a severe complication of systemic lupus erythematosus that affects pediatric patients with significantly greater severity and worse renal outcomes compared to adults. Despite the urgent clinical need, predicting pediatric LN prognosis remains unexplored in computational pathology. Furthermore, the only existing histopathology-based approach for LN relies on multiple costly staining protocols and fails to integrate complementary clinical data. To address these gaps, we propose the first multimodal computational pathology framework for three-class treatment response prediction (complete remission, partial response, and no response) in pediatric LN, utilizing only routine PAS-stained biopsies and structured clinical data. Our framework introduces two key methodological innovations. First, a Clinical-Injection Transformer (CIT) embeds clinical features as condition tokens into patch-level self-attention, facilitating implicit and bidirectional cross-modal interactions within a unified attention space. Second, we design a decoupled representation-knowledge adaptation strategy using a domain-adapted Masked Autoencoder (MAE). This strategy explicitly separates self-supervised morphological feature learning from pathological knowledge extraction. Additionally, we introduce a multi-granularity morphological type injection mechanism to bridge distilled classification knowledge with downstream prognostic predictions at both the instance and patient levels. Evaluated on a cohort of 71 pediatric LN patients with KDIGO-standardized labels, our method achieves a three-class accuracy of 90.1% and an AUC of 89.4%, demonstrating its potential as a highly accurate and cost-effective prognostic tool.

ARXIV Cancer: colorectal cancer Method: wavelet-based cross-band integration

Polyp Segmentation Using Wavelet-Based Cross-Band Integration for Enhanced Boundary Representation

Haesung Oh, Jaesung Lee
Published 2026-03-04 03:17

This study addresses the challenge of accurate polyp segmentation for early colorectal cancer detection, focusing on improving boundary localization. The authors propose a novel segmentation model that integrates grayscale and RGB representations to enhance boundary precision. Their experiments on multiple benchmark datasets show that this approach outperforms conventional methods in terms of boundary accuracy and robustness.

Read abstract

Accurate polyp segmentation is essential for early colorectal cancer detection, yet achieving reliable boundary localization remains challenging due to low mucosal contrast, uneven illumination, and color similarity between polyps and surrounding tissue. Conventional methods relying solely on RGB information often struggle to delineate precise boundaries due to weak contrast and ambiguous structures between polyps and surrounding mucosa. To establish a quantitative foundation for this limitation, we analyzed polyp-background contrast in the wavelet domain, revealing that grayscale representations consistently preserve higher boundary contrast than RGB images across all frequency bands. This finding suggests that boundary cues are more distinctly represented in the grayscale domain than in the color domain. Motivated by this finding, we propose a segmentation model that integrates grayscale and RGB representations through complementary frequency-consistent interaction, enhancing boundary precision while preserving structural coherence. Extensive experiments on four benchmark datasets demonstrate that the proposed approach achieves superior boundary precision and robustness compared to conventional models.

Find the papers that actually matter