Research Papers

ARXIV Cancer: general cancer Method: vision-language pretraining

MedTri: A Platform for Structured Medical Report Normalization to Enhance Vision-Language Pretraining

Yuetan Chu, Xinhua Ma, Xinran Jin, Gongning Luo, Xin Gao
Published 2026-02-25 17:49

This study introduces MedTri, a normalization framework designed to enhance medical vision-language pretraining by converting free-text reports into structured triplets. The framework aims to reduce stylistic noise and irrelevant content while preserving essential anatomical information. Results demonstrate that this structured normalization significantly improves the quality of vision-language pretraining across various datasets, including X-ray and CT modalities. Additionally, the framework supports modular text-level augmentation strategies to further enhance robustness and generalization.

Read abstract

Medical vision-language pretraining increasingly relies on medical reports as large-scale supervisory signals; however, raw reports often exhibit substantial stylistic heterogeneity, variable length, and a considerable amount of image-irrelevant content. Although text normalization is frequently adopted as a preprocessing step in prior work, its design principles and empirical impact on vision-language pretraining remain insufficiently and systematically examined. In this study, we present MedTri, a deployable normalization framework for medical vision-language pretraining that converts free-text reports into a unified [Anatomical Entity: Radiologic Description + Diagnosis Category] triplet. This structured, anatomy-grounded normalization preserves essential morphological and spatial information while removing stylistic noise and image-irrelevant content, providing consistent and image-grounded textual supervision at scale. Across multiple datasets spanning both X-ray and computed tomography (CT) modalities, we demonstrate that structured, anatomy-grounded text normalization is an important factor in medical vision-language pretraining quality, yielding consistent improvements over raw reports and existing normalization baselines. In addition, we illustrate how this normalization can easily support modular text-level augmentation strategies, including knowledge enrichment and anatomy-grounded counterfactual supervision, which provide complementary gains in robustness and generalization without altering the core normalization process. Together, our results position structured text normalization as a critical and generalizable preprocessing component for medical vision-language learning, while MedTri provides this normalization platform. Code and data will be released at https://github.com/Arturia-Pendragon-Iris/MedTri.

ARXIV Cancer: brain tumor Method: vision transformer

Brain3D: Brain Report Automation via Inflated Vision Transformers in 3D

Mariano Barone, Francesco Di Serio, Giuseppe Riccio, Antonio Romano, Marco Postiglione, Antonino Ferraro, Vincenzo Moscato
Published 2026-02-25 16:46

The study presents Brain3D, a vision-language framework designed for automated radiology report generation from 3D brain tumor MRI. By inflating a pretrained 2D medical encoder into a 3D architecture, the model enhances neuroradiological interpretation by maintaining spatial context. Evaluated on a dataset of 468 subjects, Brain3D significantly outperforms a 2D baseline in generating clinically relevant reports, achieving a Clinical Pathology F1 score of 0.951.

Read abstract

Current medical vision-language models (VLMs) process volumetric brain MRI using 2D slice-based approximations, fragmenting the spatial context required for accurate neuroradiological interpretation. We developed \textbf{Brain3D}, a staged vision-language framework for automated radiology report generation from 3D brain tumor MRI. Our approach inflates a pretrained 2D medical encoder into a native 3D architecture and progressively aligns it with a causal language model through three stages: contrastive grounding, supervised projector warmup, and LoRA-based linguistic specialization. Unlike generalist 3D medical VLMs, \textbf{Brain3D} is tailored to neuroradiology, where hemispheric laterality, tumor infiltration patterns, and anatomical localization are critical. Evaluated on 468 subjects (BraTS pathological cases plus healthy controls), our model achieves a Clinical Pathology F1 of 0.951 versus 0.413 for a strong 2D baseline while maintaining perfect specificity on healthy scans. The staged alignment proves essential: contrastive grounding establishes visual-textual correspondence, projector warmup stabilizes conditioning, and LoRA adaptation shifts output from verbose captions to structured clinical reports\footnote{Our code is publicly available for transparency and reproducibility

ARXIV Cancer: general cancer Method: multi-scale patch-based denoising

PatchDenoiser: Parameter-efficient multi-scale patch learning and fusion denoiser for Low-dose CT imaging

Jitindra Fartiyal, Pedro Freire, Sergei K. Turitsyn, Sergei G. Solovski
Published 2026-02-25 15:08

The paper presents PatchDenoiser, a lightweight multi-scale patch-based denoising framework designed to improve the quality of low-dose CT images used in cancer diagnostics. It addresses the challenges of noise in low-dose imaging while preserving fine anatomical details, outperforming traditional CNN and GAN methods in terms of PSNR and SSIM. The proposed method is energy-efficient and significantly reduces computational complexity, making it suitable for clinical applications.

Read abstract

Low-dose CT images are essential for reducing radiation exposure in cancer screening, pediatric imaging, and longitudinal monitoring protocols, but their quality is often degraded by noise from low-dose acquisition, patient motion, or scanner limitations, affecting both clinical interpretation and downstream analysis. Traditional filtering approaches often over-smooth and lose fine anatomical details, while deep learning methods, including CNNs, GANs, and transformers, may struggle to preserve such details or require large, computationally expensive models, limiting clinical practicality. We propose PatchDenoiser, a lightweight, energy-efficient multi-scale patch-based denoising framework. It decomposes denoising into local texture extraction and global context aggregation, fused via a spatially aware patch fusion strategy. This design enables effective noise suppression while preserving fine structural and anatomical details. PatchDenoiser is ultra-lightweight, with far fewer parameters and lower computational complexity than CNN, GAN, and transformer based denoisers. On the 2016 Mayo Low-Dose CT dataset, PatchDenoiser consistently outperforms state-of-the-art CNN- and GAN-based methods in PSNR and SSIM. It is robust to variations in slice thickness, reconstruction kernels, and HU windows, generalizes across scanners without fine-tuning, and reduces parameters by ~9x and energy consumption per inference by ~27x compared with conventional CNN denoisers. PatchDenoiser thus provides a practical, scalable, and computationally efficient solution for medical image denoising, balancing performance, robustness, and clinical deployability.

ARXIV Cancer: brain tumor Method: U-Net

Brain Tumor Segmentation with Special Emphasis on the Non-Enhancing Brain Tumor Compartment

T. Schaffer, A. Brawanski, S. Wein, A. M. Tomé, E. W. Lang
Published 2026-02-25 09:09

This study presents a U-Net based deep learning architecture aimed at segmenting brain tumors from different MRI modalities. The focus is particularly on the non-enhancing tumor compartment, which has been overlooked in recent segmentation challenges. Accurately delineating this compartment is crucial as it correlates with patient survival time and potential tumor growth areas.

Read abstract

A U-Net based deep learning architecture is designed to segment brain tumors as they appear on various MRI modalities. Special emphasis is lent to the non-enhancing tumor compartment. The latter has not been considered anymore in recent brain tumor segmentation challenges like the MICCAI challenges. However, it is considered to be indicative of the survival time of the patient as well as of areas of further tumor growth. Hence it deems essential to have means to automatically delineate its extension within the tumor.

ARXIV Cancer: breast cancer Method: gradient boosting

Multimodal Survival Modeling and Fairness-Aware Clinical Machine Learning for 5-Year Breast Cancer Risk Prediction

Toktam Khatibi
Published 2026-02-25 07:20

This study presents a multimodal machine learning framework aimed at predicting 5-year overall survival in breast cancer patients. The framework integrates clinical variables with high-dimensional transcriptomic and copy-number alteration features from the METABRIC cohort. Two survival modeling approaches, CoxNet and XGBoost, were compared, demonstrating high performance in terms of area under the ROC curve and average precision. The study emphasizes the importance of calibration, fairness, and reproducibility in clinical machine learning applications.

Read abstract

Clinical risk prediction models often underperform in real-world settings due to poor calibration, limited transportability, and subgroup disparities. These challenges are amplified in high-dimensional multimodal cancer datasets characterized by complex feature interactions and a p >> n structure. We present a fully reproducible multimodal machine learning framework for 5-year overall survival prediction in breast cancer, integrating clinical variables with high-dimensional transcriptomic and copy-number alteration (CNA) features from the METABRIC cohort. After variance- and sparsity-based filtering and dimensionality reduction, models were trained using stratified train/validation/test splits with validation-based hyperparameter tuning. Two survival approaches were compared: an elastic-net regularized Cox model (CoxNet) and a gradient-boosted survival tree model implemented using XGBoost. CoxNet provides embedded feature selection and stable estimation, whereas XGBoost captures nonlinear effects and higher-order interactions. Performance was assessed using time-dependent area under the ROC curve (AUC), average precision (AP), calibration curves, Brier score, and bootstrapped 95 percent confidence intervals. CoxNet achieved validation and test AUCs of 98.3 and 96.6, with AP values of 90.1 and 80.4. XGBoost achieved validation and test AUCs of 98.6 and 92.5, with AP values of 92.5 and 79.9. Fairness diagnostics showed stable discrimination across age groups, estrogen receptor status, molecular subtypes, and menopausal state. This work introduces a governance-oriented multimodal survival framework emphasizing calibration, fairness auditing, robustness, and reproducibility for high-dimensional clinical machine learning.

ARXIV Cancer: general cancer Method: foundation model

CARE: A Molecular-Guided Foundation Model with Adaptive Region Modeling for Whole Slide Image Analysis

Di Zhang, Zhangpeng Gong, Xiaobo Pang, Jiashuai Liu, Junbo Lu, Hao Cui, Jiusong Ge, Zhi Zeng, Kai Yi, Yinghua Li, Si Liu, Tingsong Yu, Haoran Wang, Mireia Crispin-Ortuzar, Weimiao Yu, Chen Li, Zeyu Gao
Published 2026-02-25 07:01

The paper presents CARE, a foundation model designed for whole slide image analysis in computational pathology. It addresses the limitations of existing models by employing a two-stage pretraining strategy that incorporates both self-supervised learning from whole-slide images and cross-modal alignment with RNA and protein profiles. CARE demonstrates improved performance across various pathology tasks, achieving superior results with significantly less pretraining data compared to traditional models.

Read abstract

Foundation models have recently achieved impressive success in computational pathology, demonstrating strong generalization across diverse histopathology tasks. However, existing models overlook the heterogeneous and non-uniform organization of pathological regions of interest (ROIs) because they rely on natural image backbones not tailored for tissue morphology. Consequently, they often fail to capture the coherent tissue architecture beyond isolated patches, limiting interpretability and clinical relevance. To address these challenges, we present Cross-modal Adaptive Region Encoder (CARE), a foundation model for pathology that automatically partitions WSIs into several morphologically relevant regions. Specifically, CARE employs a two-stage pretraining strategy: (1) a self-supervised unimodal pretraining stage that learns morphological representations from 34,277 whole-slide images (WSIs) without segmentation annotations, and (2) a cross-modal alignment stage that leverages RNA and protein profiles to refine the construction and representation of adaptive regions. This molecular guidance enables CARE to identify biologically relevant patterns and generate irregular yet coherent tissue regions, selecting the most representative area as ROI. CARE supports a broad range of pathology-related tasks, using either the ROI feature or the slide-level feature obtained by aggregating adaptive regions. Based on only one-tenth of the pretraining data typically used by mainstream foundation models, CARE achieves superior average performance across 33 downstream benchmarks, including morphological classification, molecular prediction, and survival analysis, and outperforms other foundation model baselines overall.

ARXIV Cancer: intracranial tumors Method: vision-language models

Virtual Biopsy for Intracranial Tumors Diagnosis on MRI

Xinzhe Luo, Shuai Shao, Yan Wang, Jiangtao Wang, Yutong Bai, Jianguo Zhang
Published 2026-02-25 06:14

This study addresses the diagnostic challenges of deep intracranial tumors using a non-invasive MRI-based approach. The authors introduce the ICT-MRI dataset, a public benchmark with 249 biopsy-verified cases, and propose a Virtual Biopsy framework that includes a Tumor-Localizer and an Adaptive-Diagnoser. The framework achieves over 90% accuracy, significantly outperforming existing methods.

Read abstract

Deep intracranial tumors situated in eloquent brain regions controlling vital functions present critical diagnostic challenges. Clinical practice has shifted toward stereotactic biopsy for pathological confirmation before treatment. Yet biopsy carries inherent risks of hemorrhage and neurological deficits and struggles with sampling bias due to tumor spatial heterogeneity, because pathological changes are typically region-selective rather than tumor-wide. Therefore, advancing non-invasive MRI-based pathology prediction is essential for holistic tumor assessment and modern clinical decision-making. The primary challenge lies in data scarcity: low tumor incidence requires long collection cycles, and annotation demands biopsy-verified pathology from neurosurgical experts. Additionally, tiny lesion volumes lacking segmentation masks cause critical features to be overwhelmed by background noise. To address these challenges, we construct the ICT-MRI dataset - the first public biopsy-verified benchmark with 249 cases across four categories. We propose a Virtual Biopsy framework comprising: MRI-Processor for standardization; Tumor-Localizer employing vision-language models for coarse-to-fine localization via weak supervision; and Adaptive-Diagnoser with a Masked Channel Attention mechanism fusing local discriminative features with global contexts. Experiments demonstrate over 90% accuracy, outperforming baselines by more than 20%.

ARXIV Cancer: hepatocellular carcinoma Method: Graph Convolutional Networks

VasGuideNet: Vascular Topology-Guided Couinaud Liver Segmentation with Structural Contrastive Loss

Chaojie Shen, Jingjun Gu, Zihao Zhao, Ruocheng Li, Cunyuan Yang, Jiajun Bu, Lei Wu
Published 2026-02-25 03:50

This study presents VasGuideNet, a novel framework for Couinaud liver segmentation that incorporates vascular topology to enhance accuracy in preoperative surgical planning and tumor localization. The method utilizes Graph Convolutional Networks to encode vascular features and integrates them into a 3D encoder-decoder architecture with a cross-attention fusion module. The proposed approach demonstrates superior performance compared to existing segmentation models, achieving high Dice scores and improved anatomical consistency.

Read abstract

Accurate Couinaud liver segmentation is critical for preoperative surgical planning and tumor localization.However, existing methods primarily rely on image intensity and spatial location cues, without explicitly modeling vascular topology. As a result, they often produce indistinct boundaries near vessels and show limited generalization under anatomical variability.We propose VasGuideNet, the first Couinaud segmentation framework explicitly guided by vascular topology. Specifically, skeletonized vessels, Euclidean distance transform (EDT)--derived geometry, and k-nearest neighbor (kNN) connectivity are encoded into topology features using Graph Convolutional Networks (GCNs). These features are then injected into a 3D encoder--decoder backbone via a cross-attention fusion module. To further improve inter-class separability and anatomical consistency, we introduce a Structural Contrastive Loss (SCL) with a global memory bank.On Task08_HepaticVessel and our private LASSD dataset, VasGuideNet achieves Dice scores of 83.68% and 76.65% with RVDs of 1.68 and 7.08, respectively. It consistently outperforms representative baselines including UNETR, Swin UNETR, and G-UNETR++, delivering higher Dice/mIoU and lower RVD across datasets, demonstrating its effectiveness for anatomically consistent segmentation. Code is available at https://github.com/Qacket/VasGuideNet.git.

ARXIV Cancer: breast cancer Method: knowledge distillation

Momentum Memory for Knowledge Distillation in Computational Pathology

Yongxin Guo, Hao Lu, Onur C. Koyun, Zhengjie Zhu, Muhammet Fatih Demir, Metin Nafi Gurcan
Published 2026-02-24 21:51

This paper presents Momentum Memory Knowledge Distillation (MoMKD), a novel framework aimed at enhancing the integration of genomics and histopathology for cancer diagnosis. The method addresses the limitations of existing knowledge distillation techniques by utilizing a momentum-updated memory to improve the supervisory context for histopathology models. Experimental results on the TCGA-BRCA benchmark indicate that MoMKD outperforms current state-of-the-art methods, demonstrating its effectiveness in histology-only inference.

Read abstract

Multimodal learning that integrates genomics and histopathology has shown strong potential in cancer diagnosis, yet its clinical translation is hindered by the limited availability of paired histology-genomics data. Knowledge distillation (KD) offers a practical solution by transferring genomic supervision into histopathology models, enabling accurate inference using histology alone. However, existing KD methods rely on batch-local alignment, which introduces instability due to limited within-batch comparisons and ultimately degrades performance. To address these limitations, we propose Momentum Memory Knowledge Distillation (MoMKD), a cross-modal distillation framework driven by a momentum-updated memory. This memory aggregates genomic and histopathology information across batches, effectively enlarging the supervisory context available to each mini-batch. Furthermore, we decouple the gradients of the genomics and histology branches, preventing genomic signals from dominating histology feature learning during training and eliminating the modality-gap issue at inference time. Extensive experiments on the TCGA-BRCA benchmark (HER2, PR, and ODX classification tasks) and an independent in-house testing dataset demonstrate that MoMKD consistently outperforms state-of-the-art MIL and multimodal KD baselines, delivering strong performance and generalization under histology-only inference. Overall, MoMKD establishes a robust and generalizable knowledge distillation paradigm for computational pathology.

ARXIV Cancer: general cancer Method: small language models

Small Language Models for Privacy-Preserving Clinical Information Extraction in Low-Resource Languages

Mohammadreza Ghaffarzadeh-Esfahani, Nahid Yousefian, Ebrahim Heidari-Farsani, Ali Akbar Omidvarian, Sepehr Ghahraei, Atena Farangi, AmirBahador Boroumand
Published 2026-02-24 21:10

This study evaluates a two-step pipeline for extracting clinical information from Persian medical transcripts using small language models (SLMs) and a translation model. The approach focuses on binary extraction of clinical features from anonymized transcripts collected at a cancer palliative care call center. The results indicate that larger models outperform smaller ones in terms of sensitivity and overall performance metrics, while the bilingual analysis shows improved extraction capabilities when translating from Persian to English.

Read abstract

Extracting clinical information from medical transcripts in low-resource languages remains a significant challenge in healthcare natural language processing (NLP). This study evaluates a two-step pipeline combining Aya-expanse-8B as a Persian-to-English translation model with five open-source small language models (SLMs) -- Qwen2.5-7B-Instruct, Llama-3.1-8B-Instruct, Llama-3.2-3B-Instruct, Qwen2.5-1.5B-Instruct, and Gemma-3-1B-it -- for binary extraction of 13 clinical features from 1,221 anonymized Persian transcripts collected at a cancer palliative care call center. Using a few-shot prompting strategy without fine-tuning, models were assessed on macro-averaged F1-score, Matthews Correlation Coefficient (MCC), sensitivity, and specificity to account for class imbalance. Qwen2.5-7B-Instruct achieved the highest overall performance (median macro-F1: 0.899; MCC: 0.797), while Gemma-3-1B-it showed the weakest results. Larger models (7B--8B parameters) consistently outperformed smaller counterparts in sensitivity and MCC. A bilingual analysis of Aya-expanse-8B revealed that translating Persian transcripts to English improved sensitivity, reduced missing outputs, and boosted metrics robust to class imbalance, though at the cost of slightly lower specificity and precision. Feature-level results showed reliable extraction of physiological symptoms across most models, whereas psychological complaints, administrative requests, and complex somatic features remained challenging. These findings establish a practical, privacy-preserving blueprint for deploying open-source SLMs in multilingual clinical NLP settings with limited infrastructure and annotation resources, and highlight the importance of jointly optimizing model scale and input language strategy for sensitive healthcare applications.

Find the papers that actually matter