Research Papers

ARXIV Cancer: breast cancer Method: graph neural networks

Jigsaw Regularization in Whole-Slide Image Classification

So Won Jeong, Veronika Ročková
Published 2026-03-20 18:04

This study presents a novel approach to whole-slide image classification in computational pathology by integrating graph neural networks and jigsaw regularization. The method enhances multiple instance learning by incorporating local spatial structure and achieving across-patch spatial awareness. The results demonstrate significant improvements in classification performance compared to existing attention-based methods on benchmark datasets.

Read abstract

Computational pathology involves the digitization of stained tissues into whole-slide images (WSIs) that contain billions of pixels arranged as contiguous patches. Statistical analysis of WSIs largely focuses on classification via multiple instance learning (MIL), in which slide-level labels are inferred from unlabeled patches. Most MIL methods treat patches as exchangeable, overlooking the rich spatial and topological structure that underlies tissue images. This work builds on recent graph-based methods that aim to incorporate spatial awareness into MIL. Our approach is new in two regards: (1) we deploy vision \emph{foundation-model embeddings} to incorporate local spatial structure within each patch, and (2) achieve across-patch spatial awareness using graph neural networks together with a novel {\em jigsaw regularization}. We find that a combination of these two features markedly improves classification over state-of-the-art attention-based MIL approaches on benchmark datasets in breast, head-and-neck, and colon cancer.

ARXIV Cancer: brain tumor Method: machine learning

AEGIS: An Operational Infrastructure for Post-Market Governance of Adaptive Medical AI Under US and EU Regulations

Fardin Afdideh, Mehdi Astaraki, Fernando Seoane, Farhad Abtahi
Published 2026-03-20 11:56

This paper presents AEGIS, a governance framework designed for the post-market oversight of adaptive medical AI systems in compliance with US and EU regulations. AEGIS includes modules for dataset assimilation, model monitoring, and decision-making, facilitating the safe iterative improvement of AI models. The framework was tested through examples of sepsis prediction and brain tumor segmentation, demonstrating its capability to manage model updates and detect performance drift effectively.

Read abstract

Machine learning systems deployed in medical devices require governance frameworks that ensure safety while enabling continuous improvement. Regulatory bodies including the FDA and European Union have introduced mechanisms such as the Predetermined Change Control Plan (PCCP) and Post-Market Surveillance (PMS) to manage iterative model updates without repeated submissions. This paper presents AI/ML Evaluation and Governance Infrastructure for Safety (AEGIS), a governance framework applicable to any healthcare AI system. AEGIS comprises three modules, i.e., dataset assimilation and retraining, model monitoring, and conditional decision, that operationalize FDA PCCP and EU AI Act Article 43(4) provisions. We implement a four-category deployment decision taxonomy (APPROVE, CONDITIONAL APPROVAL, CLINICAL REVIEW, REJECT) with an independent PMS ALARM signal, enabling detection of the critical state in which no deployable model exists while the released model is simultaneously at risk. To illustrate how AEGIS can be instantiated across heterogeneous clinical contexts, we provide two examples: sepsis prediction from electronic health records and brain tumor segmentation from medical imaging. Both cases use identical governance architecture, differing only in configuration. Across 11 simulated iterations on the sepsis example, AEGIS yielded 8 APPROVE, 1 CONDITIONAL APPROVAL, 1 CLINICAL REVIEW, and 1 REJECT decision, exercising all four categories. ALARM signals were co-issued at iterations 8 and 10, including the critical state where no deployable model exists and the released model is simultaneously failing. AEGIS detected drift before observable performance degradation. These results demonstrate that AEGIS translates regulatory change-control concepts into executable governance procedures, supporting safe continuous learning for adaptive medical AI across diverse clinical applications.

ARXIV Cancer: brain tumor Method: Hyper-Connections

Hyper-Connections for Adaptive Multi-Modal MRI Brain Tumor Segmentation

Lokendra Kumar, Shubham Aggarwal
Published 2026-03-20 10:44

This study introduces Hyper-Connections (HC) for volumetric multi-modal brain tumor segmentation, demonstrating their effectiveness as a replacement for fixed residual connections in various architectures. The integration of HC leads to significant improvements in segmentation accuracy, particularly in the Enhancing Tumor sub-region, as evidenced by a mean Dice gain of up to +1.03 percent on the BraTS 2021 dataset. The findings highlight the advantages of HC in enhancing sensitivity to clinically relevant imaging sequences.

Read abstract

We present the first study of Hyper-Connections (HC) for volumetric multi-modal brain tumor segmentation, integrating them as a drop-in replacement for fixed residual connections across five architectures: nnU-Net, SwinUNETR, VT-UNet, U-Net, and U-Netpp. Dynamic HC consistently improves all 3D models on the BraTS 2021 dataset, yielding up to +1.03 percent mean Dice gain with negligible parameter overhead. Gains are most pronounced in the Enhancing Tumor sub-region, reflecting improved fine-grained boundary delineation. Modality ablation further reveals that HC-equipped models develop sharper sensitivity toward clinically dominant sequences, specifically T1ce for Tumor Core and Enhancing Tumor, and FLAIR for Whole Tumor, a behavior absent in fixed-connection baselines and consistent across all architectures. In 2D settings, improvements are smaller and configuration-sensitive, suggesting that volumetric spatial context amplifies the benefit of adaptive aggregation. These results establish HC as a simple, efficient, and broadly applicable mechanism for multi-modal feature fusion in medical image segmentation.

ARXIV Cancer: triple-negative breast cancer Method: SAM adaptation

Prompt-Free Lightweight SAM Adaptation for Histopathology Nuclei Segmentation with Strong Cross-Dataset Generalization

Muhammad Hassan Maqsood, Yanming Zhu, Alfred Lam, Getamesay Dagnaw, Xuefei Yin, Alan Wee-Chung Liew
Published 2026-03-20 01:23

This study presents a prompt-free and lightweight adaptation of the Segment Anything Model (SAM) for the segmentation of histopathology nuclei, which is essential for cancer diagnosis. The proposed method utilizes multi-level encoder features and residual decoding, requiring only a small number of trainable parameters. Experiments conducted on three benchmark datasets demonstrate its state-of-the-art performance and strong generalization across different datasets.

Read abstract

Histopathology nuclei segmentation is crucial for quantitative tissue analysis and cancer diagnosis. Although existing segmentation methods have achieved strong performance, they are often computationally heavy and show limited generalization across datasets, which constrains their practical deployment. Recent SAM-based approaches have shown great potential in general and medical imaging, but typically rely on prompt guidance or complex decoders, making them less suitable for histopathology images with dense nuclei and heterogeneous appearances. We propose a prompt-free and lightweight SAM adaptation that leverages multi-level encoder features and residual decoding for accurate and efficient nuclei segmentation. The framework fine-tunes only LoRA modules within the frozen SAM encoder, requiring just 4.1M trainable parameters. Experiments on three benchmark datasets TNBC, MoNuSeg, and PanNuke demonstrate state-of-the-art performance and strong cross-dataset generalization, highlighting the effectiveness and practicality of the proposed framework for histopathology applications.

ARXIV Cancer: gastric cancer Method: vision-language model

Gastric-X: A Multimodal Multi-Phase Benchmark Dataset for Advancing Vision-Language Models in Gastric Cancer Analysis

Sheng Lu, Hao Chen, Rui Yin, Juyan Ba, Yu Zhang, Yuanzhe Li
Published 2026-03-19 22:47

The paper introduces Gastric-X, a large-scale multimodal benchmark dataset designed to enhance the application of vision-language models (VLMs) in gastric cancer analysis. It includes 1.7K cases with various clinical data types, such as CT scans and expert diagnostic notes, to simulate real clinical workflows. The study evaluates VLMs on five core tasks, aiming to assess their performance and understanding in correlating biochemical signals with tumor features.

Read abstract

Recent vision-language models (VLMs) have shown strong generalization and multimodal reasoning abilities in natural domains. However, their application to medical diagnosis remains limited by the lack of comprehensive and structured datasets that capture real clinical workflows. To advance the development of VLMs for clinical applications, particularly in gastric cancer, we introduce Gastric-X, a large-scale multimodal benchmark for gastric cancer analysis providing 1.7K cases. Each case in Gastric-X includes paired resting and dynamic CT scans, endoscopic image, a set of structured biochemical indicators, expert-authored diagnostic notes, and bounding box annotations of tumor regions, reflecting realistic clinical conditions. We systematically examine the capability of recent VLMs on five core tasks: Visual Question Answering (VQA), report generation, cross-modal retrieval, disease classification, and lesion localization. These tasks simulate critical stages of clinical workflow, from visual understanding and reasoning to multimodal decision support. Through this evaluation, we aim not only to assess model performance but also to probe the nature of VLM understanding: Can current VLMs meaningfully correlate biochemical signals with spatial tumor features and textual reports? We envision Gastric-X as a step toward aligning machine intelligence with the cognitive and evidential reasoning processes of physicians, and as a resource to inspire the development of next-generation medical VLMs.

ARXIV Cancer: brain tumor Method: generative adversarial network

TuLaBM: Tumor-Biased Latent Bridge Matching for Contrast-Enhanced MRI Synthesis

Atharva Rege, Adinath Madhavrao Dukre, Numan Balci, Dwarikanath Mahapatra, Imran Razzak
Published 2026-03-19 18:22

This study presents Tumor-Biased Latent Bridge Matching (TuLaBM), a novel method for synthesizing contrast-enhanced MRI (CE-MRI) from non-contrast MRI (NC-MRI) to improve brain tumor assessment. The approach utilizes a learned latent space for efficient training and inference, incorporating a Tumor-Biased Attention Mechanism to enhance tumor-region fidelity. Experimental results demonstrate that TuLaBM outperforms existing methods in both whole-image and tumor-region metrics, achieving rapid inference times.

Read abstract

Contrast-enhanced magnetic resonance imaging (CE-MRI) plays a crucial role in brain tumor assessment; however, its acquisition requires gadolinium-based contrast agents (GBCAs), which increase costs and raise safety concerns. Consequently, synthesizing CE-MRI from non-contrast MRI (NC-MRI) has emerged as a promising alternative. Early Generative Adversarial Network (GAN)-based approaches suffered from instability and mode collapse, while diffusion models, despite impressive synthesis quality, remain computationally expensive and often fail to faithfully reproduce critical tumor contrast patterns. To address these limitations, we propose Tumor-Biased Latent Bridge Matching (TuLaBM), which formulates NC-to-CE MRI translation as Brownian bridge transport between source and target distributions in a learned latent space, enabling efficient training and inference. To enhance tumor-region fidelity, we introduce a Tumor-Biased Attention Mechanism (TuBAM) that amplifies tumor-relevant latent features during bridge evolution, along with a boundary-aware loss that constrains tumor interfaces to improve margin sharpness. While bridge matching has been explored for medical image translation in pixel space, our latent formulation substantially reduces computational cost and inference time. Experiments on BraTS2023-GLI (BraSyn) and Cleveland Clinic (in-house) liver MRI dataset show that TuLaBM consistently outperforms state-of-the-art baselines on both whole-image and tumor-region metrics, generalizes effectively to unseen liver MRI data in zero-shot and fine-tuned settings, and achieves inference times under 0.097 seconds per image.

ARXIV Cancer: general cancer Method: multimodal learning

Multimodal Model for Computational Pathology:Representation Learning and Image Compression

Peihang Wu, Zehong Chen, Lijian Xu
Published 2026-03-19 09:24

This paper reviews recent advancements in multimodal computational pathology, focusing on the integration of various data types for improved diagnostic accuracy. It discusses challenges such as the high resolution of whole slide images and the need for expert annotations. The authors analyze four key research directions, including self-supervised learning and multi-agent collaborative reasoning. The goal is to enhance AI-assisted diagnosis through unified frameworks that combine visual and clinical data.

Read abstract

Whole slide imaging (WSI) has transformed digital pathology by enabling computational analysis of gigapixel histopathology images. Recent foundation model advances have accelerated progress in computational pathology, facilitating joint reasoning across pathology images, clinical reports, and structured data. Despite this progress, challenges remain: the extreme resolution of WSIs creates computational hurdles for visual learning; limited expert annotations constrain supervised approaches; integrating multimodal information while preserving biological interpretability remains difficult; and the opacity of modeling ultra-long visual sequences hinders clinical transparency. This review comprehensively surveys recent advances in multimodal computational pathology. We systematically analyze four research directions: (1) self-supervised representation learning and structure-aware token compression for WSIs; (2) multimodal data generation and augmentation; (3) parameter-efficient adaptation and reasoning-enhanced few-shot learning; and (4) multi-agent collaborative reasoning for trustworthy diagnosis. We specifically examine how token compression enables cross-scale modeling and how multi-agent mechanisms simulate a pathologist's "Chain of Thought" across magnifications to achieve uncertainty-aware evidence fusion. Finally, we discuss open challenges and argue that future progress depends on unified multimodal frameworks integrating high-resolution visual data with clinical and biomedical knowledge to support interpretable and safe AI-assisted diagnosis.

ARXIV Cancer: general cancer Method: multimodal large language models

CoDA: Exploring Chain-of-Distribution Attacks and Post-Hoc Token-Space Repair for Medical Vision-Language Models

Xiang Chen, Fangfang Yang, Chunlei Meng, Chengyin Hu, Ang Li, Yiwei Wei, Jiahuan Long, Jiujiang Guo
Published 2026-03-19 07:00

This paper introduces CoDA, a framework designed to explore the vulnerabilities of medical vision-language models (MVLMs) under real clinical workflows. It constructs clinically plausible pipeline shifts that simulate various degradation processes affecting image quality. The study demonstrates that these shifts significantly degrade the performance of MVLMs, particularly in zero-shot scenarios, and evaluates the effectiveness of multimodal large language models in auditing imaging realism. Additionally, a post-hoc repair strategy is proposed to enhance accuracy on affected outputs.

Read abstract

Medical vision--language models (MVLMs) are increasingly used as perceptual backbones in radiology pipelines and as the visual front end of multimodal assistants, yet their reliability under real clinical workflows remains underexplored. Prior robustness evaluations often assume clean, curated inputs or study isolated corruptions, overlooking routine acquisition, reconstruction, display, and delivery operations that preserve clinical readability while shifting image statistics. To address this gap, we propose CoDA, a chain-of-distribution framework that constructs clinically plausible pipeline shifts by composing acquisition-like shading, reconstruction and display remapping, and delivery and export degradations. Under masked structural-similarity constraints, CoDA jointly optimizes stage compositions and parameters to induce failures while preserving visual plausibility. Across brain MRI, chest X-ray, and abdominal CT, CoDA substantially degrades the zero-shot performance of CLIP-style MVLMs, with chained compositions consistently more damaging than any single stage. We also evaluate multimodal large language models (MLLMs) as technical-authenticity auditors of imaging realism and quality rather than pathology. Proprietary multimodal models show degraded auditing reliability and persistent high-confidence errors on CoDA-shifted samples, while the medical-specific MLLMs we test exhibit clear deficiencies in medical image quality auditing. Finally, we introduce a post-hoc repair strategy based on teacher-guided token-space adaptation with patch-level alignment, which improves accuracy on archived CoDA outputs. Overall, our findings characterize a clinically grounded threat surface for MVLM deployment and show that lightweight alignment improves robustness in deployment.

ARXIV Cancer: colon cancer Method: flow matching

CAFlow: Adaptive-Depth Single-Step Flow Matching for Efficient Histopathology Super-Resolution

Elad Yoshai, Ariel D. Yoshai, Natan T. Shaked
Published 2026-03-19 05:45

The paper presents CAFlow, an adaptive-depth single-step flow-matching framework designed for efficient histopathology super-resolution. By routing image tiles to the shallowest network exit that maintains reconstruction quality, CAFlow significantly reduces computational demands while achieving competitive performance. The method demonstrates effective generalization to colon tissue and preserves clinically relevant structures in downstream nuclei segmentation tasks.

Read abstract

In digital pathology, whole-slide images routinely exceed gigapixel resolution, making computationally intensive generative super-resolution (SR) impractical for routine deployment. We introduce CAFlow, an adaptive-depth single-step flow-matching framework that routes each image tile to the shallowest network exit that preserves reconstruction quality. CAFlow performs flow matching in pixel-unshuffled rearranged space, reducing spatial computation by 16x while enabling direct inference. We show that dedicating half of training to exact t=0 samples is essential for single-step quality (-1.5 dB without it). The backbone, FlowResNet (1.90M parameters), mixes convolution and window self-attention blocks across four early exits spanning 3.1 to 13.3 GFLOPs. A lightweight exit classifier (~6K parameters) achieves 33% compute savings at only 0.12 dB cost. On multi-organ histopathology x4 SR, adaptive routing achieves 31.72 dB PSNR versus 31.84 dB at full depth, while the shallowest exit exceeds bicubic by +1.9 dB at 2.8x less compute than SwinIR-light. The method generalizes to held-out colon tissue with minimal quality loss (-0.02 dB), and at x8 upscaling it outperforms all comparable-compute baselines while remaining competitive with the much larger SwinIR-Medium model. Downstream nuclei segmentation confirms preservation of clinically relevant structure. The model trains in under 5 hours on a single GPU, and adaptive routing can reduce whole-slide inference from minutes to seconds.

ARXIV Cancer: prostate cancer Method: transfer learning

Interpretable Prostate Cancer Detection using a Small Cohort of MRI Images

Vahid Monfared, Mohammad Hadi Gharib, Ali Sabri, Maryam Shahali, Farid Rashidi, Amit Mehta, Reza Rawassizadeh
Published 2026-03-19 03:40

This study presents an interpretable framework for the automatic detection of prostate cancer using a limited dataset of T2-weighted MRI images. The authors employed transfer learning and augmentation techniques, comparing various AI methods including Vision Transformers and CNNs. The transfer-learned ResNet18 model demonstrated the highest performance with an accuracy of 90.9% and sensitivity of 95.2%. The findings suggest that AI-assisted screening could enhance detection rates and consistency among radiologists.

Read abstract

Prostate cancer is a leading cause of mortality in men, yet interpretation of T2-weighted prostate MRI remains challenging due to subtle and heterogeneous lesions. We developed an interpretable framework for automatic cancer detection using a small dataset of 162 T2-weighted images (102 cancer, 60 normal), addressing data scarcity through transfer learning and augmentation. We performed a comprehensive comparison of Vision Transformers (ViT, Swin), CNNs (ResNet18), and classical methods (Logistic Regression, SVM, HOG+SVM). Transfer-learned ResNet18 achieved the best performance (90.9% accuracy, 95.2% sensitivity, AUC 0.905) with only 11M parameters, while Vision Transformers showed lower performance despite substantially higher complexity. Notably, HOG+SVM achieved comparable accuracy (AUC 0.917), highlighting the effectiveness of handcrafted features in small datasets. Unlike state-of-the-art approaches relying on biparametric MRI (T2+DWI) and large cohorts, our method achieves competitive performance using only T2-weighted images, reducing acquisition complexity and computational cost. In a reader study of 22 cases, five radiologists achieved a mean sensitivity of 67.5% (Fleiss Kappa = 0.524), compared to 95.2% for the AI model, suggesting potential for AI-assisted screening to reduce missed cancers and improve consistency. Code and data are publicly available.

Find the papers that actually matter