Research Papers

ARXIV Cancer: lung cancer Method: latent causal representation learning

LungCRCT: Causal Representation based Lung CT Processing for Lung Cancer Treatment

Daeyoung Kim
Published 2026-01-26 04:03

This study presents LungCRCT, a novel framework for lung cancer analysis that utilizes latent causal representation learning to enhance understanding of lung cancer progression. The method incorporates advanced graph autoencoder algorithms to facilitate causal intervention analysis and improve tumor classification. The framework demonstrates a high performance in malignant tumor classification tasks, achieving an AUC score of 93.91%.

Read abstract

Due to silence in early stages, lung cancer has been one of the most leading causes of mortality in cancer patients world-wide. Moreover, major symptoms of lung cancer are hard to differentiate with other respiratory disease symptoms such as COPD, further leading patients to overlook cancer progression in early stages. Thus, to enhance survival rates in lung cancer, early detection from consistent proactive respiratory system monitoring becomes crucial. One of the most prevalent and effective methods for lung cancer monitoring would be low-dose computed tomography(LDCT) chest scans, which led to remarkable enhancements in lung cancer detection or tumor classification tasks under rapid advancements and applications of computer vision based AI models such as EfficientNet or ResNet in image processing. However, though advanced CNN models under transfer learning or ViT based models led to high performing lung cancer detections, due to its intrinsic limitations in terms of correlation dependence and low interpretability due to complexity, expansions of deep learning models to lung cancer treatment analysis or causal intervention analysis simulations are still limited. Therefore, this research introduced LungCRCT: a latent causal representation learning based lung cancer analysis framework that retrieves causal representations of factors within the physical causal mechanism of lung cancer progression. With the use of advanced graph autoencoder based causal discovery algorithms with distance Correlation disentanglement and entropy-based image reconstruction refinement, LungCRCT not only enables causal intervention analysis for lung cancer treatments, but also leads to robust, yet extremely light downstream models in malignant tumor classification tasks with an AUC score of 93.91%.

ARXIV Cancer: renal cell carcinoma Method: quantitative systems pharmacology

Quantitative cancer-immunity cycle modeling to optimize bevacizumab and atezolizumab combination therapy for advanced renal cell carcinoma

Lei Du, Chenghang Li, Jinzhi Lei
Published 2026-01-25 03:13

This study presents a Quantitative Cancer-Immunity Cycle (QCIC) model designed to optimize combination therapy with bevacizumab and atezolizumab for advanced renal cell carcinoma (RCC). The model integrates ordinary differential equations and stochastic modeling to predict tumor evolution and enhance treatment strategies. By utilizing clinical immunohistochemistry data, the model aims to identify optimal treatment regimens and relevant tumor biomarkers, contributing to precision medicine in RCC.

Read abstract

The incidence of advanced renal cell carcinoma(RCC) has been rising, presenting significant challenges due to the limited efficacy and severe side effects of traditional radiotherapy and chemotherapy. While combination immunotherapies show promise, optimizing treatment strategies remains difficult due to individual heterogeneity. To address this, we developed a Quantitative Cancer-Immunity Cycle (QCIC) model that integrates ordinary differential equations with stochastic modelling to quantitatively characterize and predict tumor evolution in patients with advanced RCC. By systematically integrating quantitative systems pharmacology principles with biological mechanistic knowledge, we constructed a virtual patient cohort and calibrated the model parameters using clinical immunohistochemistry data to ensure biological validity. To enhance predictive performance, we coupled the model with pharmacokinetic equations and defined the Tumor Response Index (TRI) as a quantitative metric of efficacy. Systematic analysis of the QCIC model allowed us to determine an optimal treatment regimen for the combination of bevacizumab and atezolizumab and identify tumor biomarkers with clinical predictive value. This study provides a theoretical framework and methodological support for precision medicine in the treatment of advanced RCC.

ARXIV Cancer: general cancer Method: vision transformer

Stylizing ViT: Anatomy-Preserving Instance Style Transfer for Domain Generalization

Sebastian Doerrich, Francesco Di Salvo, Jonas Alle, Christian Ledig
Published 2026-01-24 20:53

This paper presents Stylizing ViT, a novel Vision Transformer encoder designed to enhance domain generalization in medical image analysis. The method utilizes weight-shared attention blocks for self- and cross-attention, allowing for anatomical consistency while performing style transfer. The results indicate a significant improvement in robustness and accuracy during both training and inference across histopathology and dermatology tasks.

Read abstract

Deep learning models in medical image analysis often struggle with generalizability across domains and demographic groups due to data heterogeneity and scarcity. Traditional augmentation improves robustness, but fails under substantial domain shifts. Recent advances in stylistic augmentation enhance domain generalization by varying image styles but fall short in terms of style diversity or by introducing artifacts into the generated images. To address these limitations, we propose Stylizing ViT, a novel Vision Transformer encoder that utilizes weight-shared attention blocks for both self- and cross-attention. This design allows the same attention block to maintain anatomical consistency through self-attention while performing style transfer via cross-attention. We assess the effectiveness of our method for domain generalization by employing it for data augmentation on three distinct image classification tasks in the context of histopathology and dermatology. Results demonstrate an improved robustness (up to +13% accuracy) over the state of the art while generating perceptually convincing images without artifacts. Additionally, we show that Stylizing ViT is effective beyond training, achieving a 17% performance improvement during inference when used for test-time augmentation. The source code is available at https://github.com/sdoerrich97/stylizing-vit .

ARXIV Cancer: general cancer Method: CPath-CLIP

Lost in Translation: How Language Re-Aligns Vision for Cross-Species Pathology

Ekansh Arora
Published 2026-01-24 18:49

This study explores the application of the CPath-CLIP model in computational pathology for cancer detection across different species and cancer types. The research demonstrates that few-shot fine-tuning can enhance performance in both same-cancer and cross-cancer scenarios, although cross-species performance remains below optimal benchmarks. The introduction of Semantic Anchoring is proposed as a method to improve visual feature alignment through language, addressing issues of semantic collapse. Overall, the findings highlight the importance of language in enhancing model performance in cross-species pathology.

Read abstract

Foundation models are increasingly applied to computational pathology, yet their behavior under cross-cancer and cross-species transfer remains unspecified. This study investigated how fine-tuning CPath-CLIP affects cancer detection under same-cancer, cross-cancer, and cross-species conditions using whole-slide image patches from canine and human histopathology. Performance was measured using area under the receiver operating characteristic curve (AUC). Few-shot fine-tuning improved same-cancer (64.9% to 72.6% AUC) and cross-cancer performance (56.84% to 66.31% AUC). Cross-species evaluation revealed that while tissue matching enables meaningful transfer, performance remains below state-of-the-art benchmarks (H-optimus-0: 84.97% AUC), indicating that standard vision-language alignment is suboptimal for cross-species generalization. Embedding space analysis revealed extremely high cosine similarity (greater than 0.99) between tumor and normal prototypes. Grad-CAM shows prototype-based models remain domain-locked, while language-guided models attend to conserved tumor morphology. To address this, we introduce Semantic Anchoring, which uses language to provide a stable coordinate system for visual features. Ablation studies reveal that benefits stem from the text-alignment mechanism itself, regardless of text encoder complexity. Benchmarking against H-optimus-0 shows that CPath-CLIP's failure stems from intrinsic embedding collapse, which text alignment effectively circumvents. Additional gains were observed in same-cancer (8.52%) and cross-cancer classification (5.67%). We identified a previously uncharacterized failure mode: semantic collapse driven by species-dominated alignment rather than missing visual information. These results demonstrate that language acts as a control mechanism, enabling semantic re-interpretation without retraining.

ARXIV Cancer: brain tumor Method: Bayesian Multi-Modal Deep Supervision Network

BMDS-Net: A Bayesian Multi-Modal Deep Supervision Network for Robust Brain Tumor Segmentation

Yan Zhou, Zhen Huang, Yingqiu Li, Yue Ouyang, Suncheng Xiang, Zehua Wang
Published 2026-01-24 16:06

This paper presents BMDS-Net, a Bayesian Multi-Modal Deep Supervision Network designed for robust brain tumor segmentation from multi-modal MRI. The framework addresses critical issues such as sensitivity to missing modalities and confidence calibration, which are essential for clinical applications. The proposed method integrates a multimodal contextual fusion module and a Bayesian fine-tuning strategy to enhance stability and provide uncertainty maps for clinicians. Experiments on the BraTS 2021 dataset demonstrate that BMDS-Net achieves competitive accuracy while maintaining robustness in scenarios with missing modalities.

Read abstract

Accurate brain tumor segmentation from multi-modal magnetic resonance imaging (MRI) is a prerequisite for precise radiotherapy planning and surgical navigation. While recent Transformer-based models such as Swin UNETR have achieved impressive benchmark performance, their clinical utility is often compromised by two critical issues: sensitivity to missing modalities (common in clinical practice) and a lack of confidence calibration. Merely chasing higher Dice scores on idealized data fails to meet the safety requirements of real-world medical deployment. In this work, we propose BMDS-Net, a unified framework that prioritizes clinical robustness and trustworthiness over simple metric maximization. Our contribution is three-fold. First, we construct a robust deterministic backbone by integrating a Zero-Init Multimodal Contextual Fusion (MMCF) module and a Residual-Gated Deep Decoder Supervision (DDS) mechanism, enabling stable feature learning and precise boundary delineation with significantly reduced Hausdorff Distance, even under modality corruption. Second, and most importantly, we introduce a memory-efficient Bayesian fine-tuning strategy that transforms the network into a probabilistic predictor, providing voxel-wise uncertainty maps to highlight potential errors for clinicians. Third, comprehensive experiments on the BraTS 2021 dataset demonstrate that BMDS-Net not only maintains competitive accuracy but, more importantly, exhibits superior stability in missing-modality scenarios where baseline models fail. The source code is publicly available at https://github.com/RyanZhou168/BMDS-Net.

ARXIV Cancer: colorectal cancer Method: Geometric Prior-guided Module

Learning with Geometric Priors in U-Net Variants for Polyp Segmentation

Fabian Vazquez, Jose A. Nuñez, Diego Adame, Alissen Moreno, Augustin Zhan, Huimin Li, Jinghao Yang, Haoteng Tang, Bin Fu, Pengfei Gu
Published 2026-01-24 06:27

This study presents a novel Geometric Prior-guided Module (GPM) designed to enhance polyp segmentation in colorectal cancer detection. The GPM integrates geometric priors into U-Net-based architectures, improving the model's ability to capture structural cues in challenging colonoscopy images. The method was fine-tuned using the Visual Geometry Grounded Transformer on a simulated dataset, leading to significant performance improvements across multiple public datasets.

Read abstract

Accurate and robust polyp segmentation is essential for early colorectal cancer detection and for computer-aided diagnosis. While convolutional neural network-, Transformer-, and Mamba-based U-Net variants have achieved strong performance, they still struggle to capture geometric and structural cues, especially in low-contrast or cluttered colonoscopy scenes. To address this challenge, we propose a novel Geometric Prior-guided Module (GPM) that injects explicit geometric priors into U-Net-based architectures for polyp segmentation. Specifically, we fine-tune the Visual Geometry Grounded Transformer (VGGT) on a simulated ColonDepth dataset to estimate depth maps of polyp images tailored to the endoscopic domain. These depth maps are then processed by GPM to encode geometric priors into the encoder's feature maps, where they are further refined using spatial and channel attention mechanisms that emphasize both local spatial and global channel information. GPM is plug-and-play and can be seamlessly integrated into diverse U-Net variants. Extensive experiments on five public polyp segmentation datasets demonstrate consistent gains over three strong baselines. Code and the generated depth maps are available at: https://github.com/fvazqu/GPM-PolypSeg

ARXIV Cancer: general cancer Method: large language model

Standardizing Longitudinal Radiology Report Evaluation via Large Language Model Annotation

Xinyi Wang, Grazziela Figueredo, Ruizhe Li, Xin Chen
Published 2026-01-23 13:57

This study presents a large language model (LLM)-based pipeline designed to automatically annotate longitudinal information in radiology reports, which is essential for monitoring disease progression. The authors evaluate five mainstream LLMs and select Qwen2.5-32B for its efficiency and performance, ultimately annotating a large dataset of radiology reports. The proposed method outperforms existing annotation solutions, achieving significant improvements in F1-scores for detecting longitudinal information and tracking diseases.

Read abstract

Longitudinal information in radiology reports refers to the sequential tracking of findings across multiple examinations over time, which is crucial for monitoring disease progression and guiding clinical decisions. Many recent automated radiology report generation methods are designed to capture longitudinal information; however, validating their performance is challenging. There is no proper tool to consistently label temporal changes in both ground-truth and model-generated texts for meaningful comparisons. Existing annotation methods are typically labor-intensive, relying on the use of manual lexicons and rules. Complex rules are closed-source, domain specific and hard to adapt, whereas overly simple ones tend to miss essential specialised information. Large language models (LLMs) offer a promising annotation alternative, as they are capable of capturing nuanced linguistic patterns and semantic similarities without extensive manual intervention. They also adapt well to new contexts. In this study, we therefore propose an LLM-based pipeline to automatically annotate longitudinal information in radiology reports. The pipeline first identifies sentences containing relevant information and then extracts the progression of diseases. We evaluate and compare five mainstream LLMs on these two tasks using 500 manually annotated reports. Considering both efficiency and performance, Qwen2.5-32B was subsequently selected and used to annotate another 95,169 reports from the public MIMIC-CXR dataset. Our Qwen2.5-32B-annotated dataset provided us with a standardized benchmark for evaluating report generation models. Using this new benchmark, we assessed seven state-of-the-art report generation models. Our LLM-based annotation method outperforms existing annotation solutions, achieving 11.3\% and 5.3\% higher F1-scores for longitudinal information detection and disease tracking, respectively.

ARXIV Cancer: general cancer Method: multi-agent reasoning

AgentsEval: Clinically Faithful Evaluation of Medical Imaging Reports via Multi-Agent Reasoning

Suzhong Fu, Jingqi Dong, Xuan Ding, Rui Sun, Yiming Yang, Shuguang Cui, Zhen Li
Published 2026-01-23 11:59

The paper presents AgentsEval, a multi-agent stream reasoning framework designed to evaluate the clinical correctness and reasoning fidelity of medical imaging reports. It addresses the limitations of existing evaluation methods by providing a structured approach that includes criteria definition, evidence extraction, alignment, and consistency scoring. Experimental results indicate that AgentsEval offers clinically aligned and interpretable evaluations, enhancing the reliability of medical report generation systems.

Read abstract

Evaluating the clinical correctness and reasoning fidelity of automatically generated medical imaging reports remains a critical yet unresolved challenge. Existing evaluation methods often fail to capture the structured diagnostic logic that underlies radiological interpretation, resulting in unreliable judgments and limited clinical relevance. We introduce AgentsEval, a multi-agent stream reasoning framework that emulates the collaborative diagnostic workflow of radiologists. By dividing the evaluation process into interpretable steps including criteria definition, evidence extraction, alignment, and consistency scoring, AgentsEval provides explicit reasoning traces and structured clinical feedback. We also construct a multi-domain perturbation-based benchmark covering five medical report datasets with diverse imaging modalities and controlled semantic variations. Experimental results demonstrate that AgentsEval delivers clinically aligned, semantically faithful, and interpretable evaluations that remain robust under paraphrastic, semantic, and stylistic perturbations. This framework represents a step toward transparent and clinically grounded assessment of medical report generation systems, fostering trustworthy integration of large language models into clinical practice.

ARXIV Cancer: brain tumor Method: spiking neural networks

Reliable Brain Tumor Segmentation Based on Spiking Neural Networks with Efficient Training

Aurora Pia Ghiardelli, Guangzhi Tang, Tao Sun
Published 2026-01-23 11:16

This study presents a framework for 3D brain tumor segmentation utilizing spiking neural networks (SNNs) that emphasizes energy efficiency and reliability. The method incorporates a multi-view ensemble approach to enhance segmentation robustness and provides voxel-wise uncertainty estimation. The training process is optimized using Forward Propagation Through Time (FPTT), resulting in significant reductions in computational costs while maintaining competitive accuracy.

Read abstract

We propose a reliable and energy-efficient framework for 3D brain tumor segmentation using spiking neural networks (SNNs). A multi-view ensemble of sagittal, coronal, and axial SNN models provides voxel-wise uncertainty estimation and enhances segmentation robustness. To address the high computational cost in training SNN models for semantic image segmentation, we employ Forward Propagation Through Time (FPTT), which maintains temporal learning efficiency with significantly reduced computational cost. Experiments on the Multimodal Brain Tumor Segmentation Challenges (BraTS 2017 and BraTS 2023) demonstrate competitive accuracy, well-calibrated uncertainty, and an 87% reduction in FLOPs, underscoring the potential of SNNs for reliable, low-power medical IoT and Point-of-Care systems.

ARXIV Cancer: general cancer Method: hybrid encoder-decoder architecture

PanopMamba: Vision State Space Modeling for Nuclei Panoptic Segmentation

Ming Kang, Fung Fung Ting, Raphaël C. -W. Phan, Zongyuan Ge, Chee-Ming Ting
Published 2026-01-23 10:33

This paper presents PanopMamba, a novel hybrid encoder-decoder architecture designed for nuclei panoptic segmentation in histopathology images. The method addresses challenges such as detecting small objects and handling ambiguous boundaries by integrating Mamba and Transformer architectures with state space modeling. Experimental results on benchmark datasets demonstrate the effectiveness of PanopMamba in improving segmentation performance compared to existing methods.

Read abstract

Nuclei panoptic segmentation supports cancer diagnostics by integrating both semantic and instance segmentation of different cell types to analyze overall tissue structure and individual nuclei in histopathology images. Major challenges include detecting small objects, handling ambiguous boundaries, and addressing class imbalance. To address these issues, we propose PanopMamba, a novel hybrid encoder-decoder architecture that integrates Mamba and Transformer with additional feature-enhanced fusion via state space modeling. We design a multiscale Mamba backbone and a State Space Model (SSM)-based fusion network to enable efficient long-range perception in pyramid features, thereby extending the pure encoder-decoder framework while facilitating information sharing across multiscale features of nuclei. The proposed SSM-based feature-enhanced fusion integrates pyramid feature networks and dynamic feature enhancement across different spatial scales, enhancing the feature representation of densely overlapping nuclei in both semantic and spatial dimensions. To the best of our knowledge, this is the first Mamba-based approach for panoptic segmentation. Additionally, we introduce alternative evaluation metrics, including image-level Panoptic Quality ($i$PQ), boundary-weighted PQ ($w$PQ), and frequency-weighted PQ ($fw$PQ), which are specifically designed to address the unique challenges of nuclei segmentation and thereby mitigate the potential bias inherent in vanilla PQ. Experimental evaluations on two multiclass nuclei segmentation benchmark datasets, MoNuSAC2020 and NuInsSeg, demonstrate the superiority of PanopMamba for nuclei panoptic segmentation over state-of-the-art methods. Consequently, the robustness of PanopMamba is validated across various metrics, while the distinctiveness of PQ variants is also demonstrated. Code is available at https://github.com/mkang315/PanopMamba.

Find the papers that actually matter