Research Papers

ARXIV Cancer: unknown Method: XGBoost

Interpolation-Driven Machine Learning Approaches for Plume Shine Dose Estimation: A Comparison of XGBoost, Random Forest, and TabNet

Biswajit Sadhu, Kalpak Gupte, Trijit Sadhu, S. Anand
Published 2026-02-23 08:12

This study presents an interpolation-assisted machine learning framework for plume shine dose estimation, addressing challenges in radiation dose assessment. The authors developed high-resolution training data from discrete datasets and evaluated three models: Random Forest, XGBoost, and TabNet. Results indicated that XGBoost achieved the highest prediction accuracy, with interpretability analysis revealing differences in how models utilized input features. A web-based GUI was also created for practical deployment and scenario evaluation.

Read abstract

Despite the success of machine learning (ML) in surrogate modeling, its use in radiation dose assessment is limited by safety-critical constraints, scarce training-ready data, and challenges in selecting suitable architectures for physics-dominated systems. Within this context, rapid and accurate plume shine dose estimation serves as a practical test case, as it is critical for nuclear facility safety assessment and radiological emergency response, while conventional photon-transport-based calculations remain computationally expensive. In this work, an interpolation-assisted ML framework was developed using discrete dose datasets generated with the pyDOSEIA suite for 17 gamma-emitting radionuclides across varying downwind distances, release heights, and atmospheric stability categories. The datasets were augmented using shape-preserving interpolation to construct dense, high-resolution training data. Two tree-based ML models (Random Forest and XGBoost) and one deep learning (DL) model (TabNet) were evaluated to examine predictive performance and sensitivity to dataset resolution. All models showed higher prediction accuracy with the interpolated high-resolution dataset than with the discrete data; however, XGBoost consistently achieved the highest accuracy. Interpretability analysis using permutation importance (tree-based models) and attention-based feature attribution (TabNet) revealed that performance differences stem from how the models utilize input features. Tree-based models focus mainly on dominant geometry-dispersion features (release height, stability category, and downwind distance), treating radionuclide identity as a secondary input, whereas TabNet distributes attention more broadly across multiple variables. For practical deployment, a web-based GUI was developed for interactive scenario evaluation and transparent comparison with photon-transport reference calculations.

ARXIV Cancer: general cancer Method: large vision-language models

CT-Flow: Orchestrating CT Interpretation Workflow with Model Context Protocol Servers

Yannian Gu, Xizhuo Zhang, Linjie Mu, Yongrui Yu, Zhongzhen Huang, Shaoting Zhang, Xiaofan Zhang
Published 2026-02-23 07:19

The paper presents CT-Flow, an innovative framework designed to enhance the interpretation of 3D CT scans by integrating multi-modal radiological reasoning. It shifts from traditional static inference to a dynamic, tool-mediated workflow that allows radiologists to iteratively refine their findings. Experimental results indicate that CT-Flow significantly improves diagnostic accuracy and tool invocation success rates compared to baseline models.

Read abstract

Recent advances in Large Vision-Language Models (LVLMs) have shown strong potential for multi-modal radiological reasoning, particularly in tasks like diagnostic visual question answering (VQA) and radiology report generation. However, most existing approaches for 3D CT analysis largely rely on static, single-pass inference. In practice, clinical interpretation is a dynamic, tool-mediated workflow where radiologists iteratively review slices and use measurement, radiomics, and segmentation tools to refine findings. To bridge this gap, we propose CT-Flow, an agentic framework designed for interoperable volumetric interpretation. By leveraging the Model Context Protocol (MCP), CT-Flow shifts from closed-box inference to an open, tool-aware paradigm. We curate CT-FlowBench, the first large-scale instruction-tuning benchmark tailored for 3D CT tool-use and multi-step reasoning. Built upon this, CT-Flow functions as a clinical orchestrator capable of decomposing complex natural language queries into automated tool-use sequences. Experimental evaluations on CT-FlowBench and standard 3D VQA datasets demonstrate that CT-Flow achieves state-of-the-art performance, surpassing baseline models by 41% in diagnostic accuracy and achieving a 95% success rate in autonomous tool invocation. This work provides a scalable foundation for integrating autonomous, agentic intelligence into real-world clinical radiology.

ARXIV Cancer: brain tumor Method: hypothesis-driven test-time adaptation

HD-TTA: Hypothesis-Driven Test-Time Adaptation for Safer Brain Tumor Segmentation

Kartik Jhawar, Lipo Wang
Published 2026-02-23 02:53

The paper presents Hypothesis-Driven Test-Time Adaptation (HD-TTA), a framework designed to enhance the safety of brain tumor segmentation in medical imaging. By reformulating adaptation as a dynamic decision process, the method generates competing geometric hypotheses to optimize segmentation outcomes. Validation on a cross-domain binary brain tumor segmentation task shows that HD-TTA significantly improves safety-oriented metrics while maintaining comparable performance in other evaluation criteria.

Read abstract

Standard Test-Time Adaptation (TTA) methods typically treat inference as a blind optimization task, applying generic objectives to all or filtered test samples. In safety-critical medical segmentation, this lack of selectivity often causes the tumor mask to spill into healthy brain tissue or degrades predictions that were already correct. We propose Hypothesis-Driven TTA, a novel framework that reformulates adaptation as a dynamic decision process. Rather than forcing a single optimization trajectory, our method generates intuitive competing geometric hypotheses: compaction (is the prediction noisy? trim artifacts) versus inflation (is the valid tumor under-segmented? safely inflate to recover). It then employs a representation-guided selector to autonomously identify the safest outcome based on intrinsic texture consistency. Additionally, a pre-screening Gatekeeper prevents negative transfer by skipping adaptation on confident cases. We validate this proof-of-concept on a cross-domain binary brain tumor segmentation task, applying a source model trained on adult BraTS gliomas to unseen pediatric and more challenging meningioma target domains. HD-TTA improves safety-oriented outcomes (Hausdorff Distance (HD95) and Precision) over several state-of-the-art representative baselines in the challenging safety regime, reducing the HD95 by approximately 6.4 mm and improving Precision by over 4%, while maintaining comparable Dice scores. These results demonstrate that resolving the safety-adaptation trade-off via explicit hypothesis selection is a viable, robust path for safe clinical model deployment. Code will be made publicly available upon acceptance.

ARXIV Cancer: non-small cell lung cancer Method: unknown

Time-Varying Hazard Patterns and Co-Mutation Profiles of KRAS G12C and G12D in Real-World NSCLC

Robert Amevor, Dennis Baidoo, Emmanuel Kubuafor
Published 2026-02-22 18:07

This study investigates the time-to-next-treatment (TTNT) and overall survival (OS) differences between KRAS G12C and G12D mutations in non-small cell lung cancer (NSCLC). Using de-identified data, the authors applied various statistical models to analyze the effects of these mutations on patient outcomes. The results indicate that while G12D shows an early TTNT advantage and improved OS, late-period differences in TTNT were not significant. These findings suggest the need for further validation in larger cohorts to support targeted therapies for G12D.

Read abstract

Background: KRAS mutations are the largest oncogenic subset in NSCLC. While KRAS G12C is now targetable, no approved therapies exist for G12D. We examined time-to-next-treatment (TTNT) and overall survival (OS) differences between G12C and G12D, allowing for time-varying hazard effects. Methods: De-identified data from AACR Project GENIE BPC NSCLC v2.0-public were analyzed. TTNT served as a real-world surrogate for progression-free survival. Co-mutations (TP53, STK11, KEAP1, SMARCA4, MET), TMB, and PD-L1 were harmonized. Kaplan-Meier, multivariable Cox, and a pre-specified piecewise Cox model (split at median TTNT = 23 months) were applied. Schoenfeld residuals assessed proportional hazards; bootstrap resampling (B=1000) evaluated stability. Results: Among 162 TTNT-evaluable patients (G12C n=130; G12D n=32), median TTNT was 28.6 versus 32.0 months (log-rank p=0.79). Adjusted Cox regression showed no overall hazard difference (HR=0.85; 95% CI 0.53-1.37; p=0.50), but Schoenfeld testing indicated borderline non-proportionality (p=0.053). Piecewise Cox modeling revealed time-varying effects: early TTNT hazard favored G12D (HR=0.41; 95% CI 0.17-0.97; p=0.043) with significant KRAS x period interaction (HR=3.33; p=0.021) and late-period attenuation (HR=1.38; 95% CI 0.77-2.47; p=0.285). Bootstrap resampling confirmed this pattern (median HRearly=0.39; HRlate=1.41). Among 278 OS-evaluable patients (133 deaths), G12D showed improved OS (adjusted HR=0.63; 95% CI 0.39-0.99; p=0.048). G12C tumors exhibited higher TMB (9.79 vs 7.83 mut/Mb; p=0.002) and greater STK11/KEAP1 enrichment. Conclusions: KRAS G12D demonstrated early TTNT advantage and improved OS. Late-period TTNT differences were non-significant (post-hoc power: 12.3%). These exploratory findings require validation in larger cohorts but support allele-specific therapeutic development for G12D.

ARXIV Cancer: prostate cancer Method: knowledge distillation

GUIDE-US: Grade-Informed Unpaired Distillation of Encoder Knowledge from Histopathology to Micro-UltraSound

Emma Willis, Tarek Elghareb, Paul F. R. Wilson, Minh Nguyen Nhat To, Mohammad Mahdi Abootorabi, Amoon Jamzad, Brian Wodlinger, Parvin Mousavi, Purang Abolmaesumi
Published 2026-02-22 02:02

This study presents a novel unpaired histopathology knowledge-distillation strategy aimed at non-invasive grading of prostate cancer using micro-ultrasound. The method trains a micro-ultrasound encoder to replicate the embedding distribution of a pretrained histopathology model, without requiring patient-level pairing or image registration. The results demonstrate a significant improvement in sensitivity to clinically significant prostate cancer compared to existing methods, enhancing overall cancer risk stratification from imaging.

Read abstract

Purpose: Non-invasive grading of prostate cancer (PCa) from micro-ultrasound (micro-US) could expedite triage and guide biopsies toward the most aggressive regions, yet current models struggle to infer tissue micro-structure at coarse imaging resolutions. Methods: We introduce an unpaired histopathology knowledge-distillation strategy that trains a micro-US encoder to emulate the embedding distribution of a pretrained histopathology foundation model, conditioned on International Society of Urological Pathology (ISUP) grades. Training requires no patient-level pairing or image registration, and histopathology inputs are not used at inference. Results: Compared to the current state of the art, our approach increases sensitivity to clinically significant PCa (csPCa) at 60% specificity by 3.5% and improves overall sensitivity at 60% specificity by 1.2%. Conclusion: By enabling earlier and more dependable cancer risk stratification solely from imaging, our method advances clinical feasibility. Source code will be publicly released upon publication.

ARXIV Cancer: general cancer Method: multi-instance learning

AINet: Anchor Instances Learning for Regional Heterogeneity in Whole Slide Image

Tingting Zheng, Hongxun Yao, Kui Jiang, Sicheng Zhao, Yi Xiao
Published 2026-02-21 09:36

This paper presents AINet, a novel framework designed to enhance multi-instance learning (MIL) for whole slide image (WSI) analysis in cancer diagnostics. The method introduces anchor instances (AIs) to address the challenges posed by tumor sparsity and morphological diversity, enabling better aggregation of discriminative representations. The framework includes a dual-level anchor mining module and an anchor-guided region correction module, which together improve the performance of existing MIL frameworks while reducing computational complexity.

Read abstract

Recent advances in multi-instance learning (MIL) have witnessed impressive performance in whole slide image (WSI) analysis. However, the inherent sparsity of tumors and their morphological diversity lead to obvious heterogeneity across regions, posing significant challenges in aggregating high-quality and discriminative representations. To address this, we introduce a novel concept of anchor instance (AI), a compact subset of instances that are representative within their regions (local) and discriminative at the bag (global) level. These AIs act as semantic references to guide interactions across regions, correcting non-discriminative patterns while preserving regional diversity. Specifically, we propose a dual-level anchor mining (DAM) module to \textbf{select} AIs from massive instances, where the most informative AI in each region is extracted by assessing its similarity to both local and global embeddings. Furthermore, to ensure completeness and diversity, we devise an anchor-guided region correction (ARC) module that explores the complementary information from all regions to \textbf{correct} each regional representation. Building upon DAM and ARC, we develop a concise yet effective framework, AINet, which employs a simple predictor and surpasses state-of-the-art methods with substantially fewer FLOPs and parameters. Moreover, both DAM and ARC are modular and can be seamlessly integrated into existing MIL frameworks, consistently improving their performance.

ARXIV Cancer: general cancer Method: Zero-Shot Multiple-Instance Learning

Initialization matters in few-shot adaptation of vision-language models for histopathological image classification

Pablo Meseguer, Rocío del Amor, Valery Naranjo
Published 2026-02-21 09:08

This study investigates the impact of classifier weight initialization on the performance of few-shot learning in the context of histopathological image classification using vision-language models. The authors propose a novel approach called Zero-Shot Multiple-Instance Learning (ZS-MIL) to enhance classification accuracy by leveraging class-level embeddings from the text encoder. Experimental results indicate that ZS-MIL outperforms traditional weight initialization methods, demonstrating improved robustness in subtyping predictions.

Read abstract

Vision language models (VLM) pre-trained on datasets of histopathological image-caption pairs enabled zero-shot slide-level classification. The ability of VLM image encoders to extract discriminative features also opens the door for supervised fine-tuning for whole-slide image (WSI) classification, ideally using few labeled samples. Slide-level prediction frameworks require the incorporation of multiple instance learning (MIL) due to the gigapixel size of the WSI. Following patch-level feature extraction and aggregation, MIL frameworks rely on linear classifiers trained on top of the slide-level aggregated features. Classifier weight initialization has a large influence on Linear Probing performance in efficient transfer learning (ETL) approaches based on few-shot learning. In this work, we propose Zero-Shot Multiple-Instance Learning (ZS-MIL) to address the limitations of random classifier initialization that underperform zero-shot prediction in MIL problems. ZS-MIL uses the class-level embeddings of the VLM text encoder as the classification layer's starting point to compute each sample's bag-level probabilities. Through multiple experiments, we demonstrate the robustness of ZS-MIL compared to well-known weight initialization techniques both in terms of performance and variability in an ETL few-shot scenario for subtyping prediction.

ARXIV Cancer: unknown Method: XGBoost

Benchmarking Computational Pathology Foundation Models For Semantic Segmentation

Lavish Ramchandani, Aashay Tinaikar, Dev Kumar Das, Rohit Garg, Tijo Thomas
Published 2026-02-21 08:00

This study benchmarks ten foundational models for pixel-level semantic segmentation in histopathology, focusing on both tissue-region and cellular/nuclear segmentation tasks. The authors utilize attention maps from these models as pixel-wise features, classified using the XGBoost algorithm. Results indicate that the vision language model CONCH outperforms others, with ensemble methods yielding improved segmentation performance across various datasets.

Read abstract

In recent years, foundation models such as CLIP, DINO,and CONCH have demonstrated remarkable domain generalization and unsupervised feature extraction capabilities across diverse imaging tasks. However, systematic and independent evaluations of these models for pixel-level semantic segmentation in histopathology remain scarce. In this study, we propose a robust benchmarking approach to asses 10 foundational models on four histopathological datasets covering both morphological tissue-region and cellular/nuclear segmentation tasks. Our method leverages attention maps of foundation models as pixel-wise features, which are then classified using a machine learning algorithm, XGBoost, enabling fast, interpretable, and model-agnostic evaluation without finetuning. We show that the vision language foundation model, CONCH performed the best across datasets when compared to vision-only foundation models, with PathDino as close second. Further analysis shows that models trained on distinct histopathology cohorts capture complementary morphological representations, and concatenating their features yields superior segmentation performance. Concatenating features from CONCH, PathDino and CellViT outperformed individual models across all the datasets by 7.95% (averaged across the datasets), suggesting that ensembles of foundation models can better generalize to diverse histopathological segmentation tasks.

ARXIV Cancer: unknown Method: deep learning

RamanSeg: Interpretability-driven Deep Learning on Raman Spectra for Cancer Diagnosis

Chris Tomy, Mo Vali, David Pertzborn, Tammam Alamatouri, Anna Mühlig, Orlando Guntinas-Lichius, Anna Xylander, Eric Michele Fantuzzi, Matteo Negro, Francesco Crisafi, Pietro Lio, Tiago Azevedo
Published 2026-02-20 10:18

This study presents RamanSeg, an interpretable deep learning model designed for cancer diagnosis using Raman spectroscopy. The model was trained on a dataset of spatial Raman spectra aligned with tumor annotations, achieving a mean foreground Dice score of 80.9%. RamanSeg offers two variants that balance interpretability and performance, with the projection-free version outperforming a U-Net baseline. The results indicate a significant advancement over traditional black-box methods in cancer diagnostics.

Read abstract

Histopathology, the current gold standard for cancer diagnosis, involves the manual examination of tissue samples after chemical staining, a time-consuming process requiring expert analysis. Raman spectroscopy is an alternative, stain-free method of extracting information from samples. Using nnU-Net, we trained a segmentation model on a novel dataset of spatial Raman spectra aligned with tumour annotations, achieving a mean foreground Dice score of 80.9%, surpassing previous work. Furthermore, we propose a novel, interpretable, prototype-based architecture called RamanSeg. RamanSeg classifies pixels based on discovered regions of the training set, generating a segmentation mask. Two variants of RamanSeg allow a trade-off between interpretability and performance: one with prototype projection and another projection-free version. The projection-free RamanSeg outperformed a U-Net baseline with a mean foreground Dice score of 67.3%, offering a meaningful improvement over a black-box training approach.

ARXIV Cancer: prostate cancer Method: reinforcement learning

Promptable segmentation with region exploration enables minimal-effort expert-level prostate cancer delineation

Junqing Yang, Natasha Thorley, Ahmed Nadeem Abbasi, Shonit Punwani, Zion Tse, Yipeng Hu, Shaheer U. Saeed
Published 2026-02-19 20:29

This study presents a novel framework for the segmentation of prostate cancer in magnetic resonance images, aiming to reduce the effort required for expert-level delineation. The method integrates reinforcement learning with a region-growing process, guided by user-provided point prompts, to enhance segmentation accuracy while minimizing manual annotation. Evaluation on two public datasets demonstrated that the framework outperformed existing automated methods and achieved performance comparable to manual segmentation, significantly reducing annotation time.

Read abstract

Purpose: Accurate segmentation of prostate cancer on magnetic resonance (MR) images is crucial for planning image-guided interventions such as targeted biopsies, cryoablation, and radiotherapy. However, subtle and variable tumour appearances, differences in imaging protocols, and limited expert availability make consistent interpretation difficult. While automated methods aim to address this, they rely on large expertly-annotated datasets that are often inconsistent, whereas manual delineation remains labour-intensive. This work aims to bridge the gap between automated and manual segmentation through a framework driven by user-provided point prompts, enabling accurate segmentation with minimal annotation effort. Methods: The framework combines reinforcement learning (RL) with a region-growing segmentation process guided by user prompts. Starting from an initial point prompt, region-growing generates a preliminary segmentation, which is iteratively refined through RL. At each step, the RL agent observes the image and current segmentation to predict a new point, from which region growing updates the mask. A reward, balancing segmentation accuracy and voxel-wise uncertainty, encourages exploration of ambiguous regions, allowing the agent to escape local optima and perform sample-specific optimisation. Despite requiring fully supervised training, the framework bridges manual and fully automated segmentation at inference by substantially reducing user effort while outperforming current fully automated methods. Results: The framework was evaluated on two public prostate MR datasets (PROMIS and PICAI, with 566 and 1090 cases). It outperformed the previous best automated methods by 9.9% and 8.9%, respectively, with performance comparable to manual radiologist segmentation, reducing annotation time tenfold.

Find the papers that actually matter