Log in to save searches and build a personal reading queue.
Find the papers that actually matter
Search by concept, cancer type, source, or modeling approach. Every result is presented in a cleaner, review-friendly layout with summaries and direct access to the abstract.
Interpolation-Driven Machine Learning Approaches for Plume Shine Dose Estimation: A Comparison of XGBoost, Random Forest, and TabNet
Read abstract
Despite the success of machine learning (ML) in surrogate modeling, its use in radiation dose assessment is limited by safety-critical constraints, scarce training-ready data, and challenges in selecting suitable architectures for physics-dominated systems. Within this context, rapid and accurate plume shine dose estimation serves as a practical test case, as it is critical for nuclear facility safety assessment and radiological emergency response, while conventional photon-transport-based calculations remain computationally expensive. In this work, an interpolation-assisted ML framework was developed using discrete dose datasets generated with the pyDOSEIA suite for 17 gamma-emitting radionuclides across varying downwind distances, release heights, and atmospheric stability categories. The datasets were augmented using shape-preserving interpolation to construct dense, high-resolution training data. Two tree-based ML models (Random Forest and XGBoost) and one deep learning (DL) model (TabNet) were evaluated to examine predictive performance and sensitivity to dataset resolution. All models showed higher prediction accuracy with the interpolated high-resolution dataset than with the discrete data; however, XGBoost consistently achieved the highest accuracy. Interpretability analysis using permutation importance (tree-based models) and attention-based feature attribution (TabNet) revealed that performance differences stem from how the models utilize input features. Tree-based models focus mainly on dominant geometry-dispersion features (release height, stability category, and downwind distance), treating radionuclide identity as a secondary input, whereas TabNet distributes attention more broadly across multiple variables. For practical deployment, a web-based GUI was developed for interactive scenario evaluation and transparent comparison with photon-transport reference calculations.
CT-Flow: Orchestrating CT Interpretation Workflow with Model Context Protocol Servers
Read abstract
Recent advances in Large Vision-Language Models (LVLMs) have shown strong potential for multi-modal radiological reasoning, particularly in tasks like diagnostic visual question answering (VQA) and radiology report generation. However, most existing approaches for 3D CT analysis largely rely on static, single-pass inference. In practice, clinical interpretation is a dynamic, tool-mediated workflow where radiologists iteratively review slices and use measurement, radiomics, and segmentation tools to refine findings. To bridge this gap, we propose CT-Flow, an agentic framework designed for interoperable volumetric interpretation. By leveraging the Model Context Protocol (MCP), CT-Flow shifts from closed-box inference to an open, tool-aware paradigm. We curate CT-FlowBench, the first large-scale instruction-tuning benchmark tailored for 3D CT tool-use and multi-step reasoning. Built upon this, CT-Flow functions as a clinical orchestrator capable of decomposing complex natural language queries into automated tool-use sequences. Experimental evaluations on CT-FlowBench and standard 3D VQA datasets demonstrate that CT-Flow achieves state-of-the-art performance, surpassing baseline models by 41% in diagnostic accuracy and achieving a 95% success rate in autonomous tool invocation. This work provides a scalable foundation for integrating autonomous, agentic intelligence into real-world clinical radiology.
HD-TTA: Hypothesis-Driven Test-Time Adaptation for Safer Brain Tumor Segmentation
Read abstract
Standard Test-Time Adaptation (TTA) methods typically treat inference as a blind optimization task, applying generic objectives to all or filtered test samples. In safety-critical medical segmentation, this lack of selectivity often causes the tumor mask to spill into healthy brain tissue or degrades predictions that were already correct. We propose Hypothesis-Driven TTA, a novel framework that reformulates adaptation as a dynamic decision process. Rather than forcing a single optimization trajectory, our method generates intuitive competing geometric hypotheses: compaction (is the prediction noisy? trim artifacts) versus inflation (is the valid tumor under-segmented? safely inflate to recover). It then employs a representation-guided selector to autonomously identify the safest outcome based on intrinsic texture consistency. Additionally, a pre-screening Gatekeeper prevents negative transfer by skipping adaptation on confident cases. We validate this proof-of-concept on a cross-domain binary brain tumor segmentation task, applying a source model trained on adult BraTS gliomas to unseen pediatric and more challenging meningioma target domains. HD-TTA improves safety-oriented outcomes (Hausdorff Distance (HD95) and Precision) over several state-of-the-art representative baselines in the challenging safety regime, reducing the HD95 by approximately 6.4 mm and improving Precision by over 4%, while maintaining comparable Dice scores. These results demonstrate that resolving the safety-adaptation trade-off via explicit hypothesis selection is a viable, robust path for safe clinical model deployment. Code will be made publicly available upon acceptance.
Time-Varying Hazard Patterns and Co-Mutation Profiles of KRAS G12C and G12D in Real-World NSCLC
Read abstract
Background: KRAS mutations are the largest oncogenic subset in NSCLC. While KRAS G12C is now targetable, no approved therapies exist for G12D. We examined time-to-next-treatment (TTNT) and overall survival (OS) differences between G12C and G12D, allowing for time-varying hazard effects. Methods: De-identified data from AACR Project GENIE BPC NSCLC v2.0-public were analyzed. TTNT served as a real-world surrogate for progression-free survival. Co-mutations (TP53, STK11, KEAP1, SMARCA4, MET), TMB, and PD-L1 were harmonized. Kaplan-Meier, multivariable Cox, and a pre-specified piecewise Cox model (split at median TTNT = 23 months) were applied. Schoenfeld residuals assessed proportional hazards; bootstrap resampling (B=1000) evaluated stability. Results: Among 162 TTNT-evaluable patients (G12C n=130; G12D n=32), median TTNT was 28.6 versus 32.0 months (log-rank p=0.79). Adjusted Cox regression showed no overall hazard difference (HR=0.85; 95% CI 0.53-1.37; p=0.50), but Schoenfeld testing indicated borderline non-proportionality (p=0.053). Piecewise Cox modeling revealed time-varying effects: early TTNT hazard favored G12D (HR=0.41; 95% CI 0.17-0.97; p=0.043) with significant KRAS x period interaction (HR=3.33; p=0.021) and late-period attenuation (HR=1.38; 95% CI 0.77-2.47; p=0.285). Bootstrap resampling confirmed this pattern (median HRearly=0.39; HRlate=1.41). Among 278 OS-evaluable patients (133 deaths), G12D showed improved OS (adjusted HR=0.63; 95% CI 0.39-0.99; p=0.048). G12C tumors exhibited higher TMB (9.79 vs 7.83 mut/Mb; p=0.002) and greater STK11/KEAP1 enrichment. Conclusions: KRAS G12D demonstrated early TTNT advantage and improved OS. Late-period TTNT differences were non-significant (post-hoc power: 12.3%). These exploratory findings require validation in larger cohorts but support allele-specific therapeutic development for G12D.
GUIDE-US: Grade-Informed Unpaired Distillation of Encoder Knowledge from Histopathology to Micro-UltraSound
Read abstract
Purpose: Non-invasive grading of prostate cancer (PCa) from micro-ultrasound (micro-US) could expedite triage and guide biopsies toward the most aggressive regions, yet current models struggle to infer tissue micro-structure at coarse imaging resolutions. Methods: We introduce an unpaired histopathology knowledge-distillation strategy that trains a micro-US encoder to emulate the embedding distribution of a pretrained histopathology foundation model, conditioned on International Society of Urological Pathology (ISUP) grades. Training requires no patient-level pairing or image registration, and histopathology inputs are not used at inference. Results: Compared to the current state of the art, our approach increases sensitivity to clinically significant PCa (csPCa) at 60% specificity by 3.5% and improves overall sensitivity at 60% specificity by 1.2%. Conclusion: By enabling earlier and more dependable cancer risk stratification solely from imaging, our method advances clinical feasibility. Source code will be publicly released upon publication.
AINet: Anchor Instances Learning for Regional Heterogeneity in Whole Slide Image
Read abstract
Recent advances in multi-instance learning (MIL) have witnessed impressive performance in whole slide image (WSI) analysis. However, the inherent sparsity of tumors and their morphological diversity lead to obvious heterogeneity across regions, posing significant challenges in aggregating high-quality and discriminative representations. To address this, we introduce a novel concept of anchor instance (AI), a compact subset of instances that are representative within their regions (local) and discriminative at the bag (global) level. These AIs act as semantic references to guide interactions across regions, correcting non-discriminative patterns while preserving regional diversity. Specifically, we propose a dual-level anchor mining (DAM) module to \textbf{select} AIs from massive instances, where the most informative AI in each region is extracted by assessing its similarity to both local and global embeddings. Furthermore, to ensure completeness and diversity, we devise an anchor-guided region correction (ARC) module that explores the complementary information from all regions to \textbf{correct} each regional representation. Building upon DAM and ARC, we develop a concise yet effective framework, AINet, which employs a simple predictor and surpasses state-of-the-art methods with substantially fewer FLOPs and parameters. Moreover, both DAM and ARC are modular and can be seamlessly integrated into existing MIL frameworks, consistently improving their performance.
Initialization matters in few-shot adaptation of vision-language models for histopathological image classification
Read abstract
Vision language models (VLM) pre-trained on datasets of histopathological image-caption pairs enabled zero-shot slide-level classification. The ability of VLM image encoders to extract discriminative features also opens the door for supervised fine-tuning for whole-slide image (WSI) classification, ideally using few labeled samples. Slide-level prediction frameworks require the incorporation of multiple instance learning (MIL) due to the gigapixel size of the WSI. Following patch-level feature extraction and aggregation, MIL frameworks rely on linear classifiers trained on top of the slide-level aggregated features. Classifier weight initialization has a large influence on Linear Probing performance in efficient transfer learning (ETL) approaches based on few-shot learning. In this work, we propose Zero-Shot Multiple-Instance Learning (ZS-MIL) to address the limitations of random classifier initialization that underperform zero-shot prediction in MIL problems. ZS-MIL uses the class-level embeddings of the VLM text encoder as the classification layer's starting point to compute each sample's bag-level probabilities. Through multiple experiments, we demonstrate the robustness of ZS-MIL compared to well-known weight initialization techniques both in terms of performance and variability in an ETL few-shot scenario for subtyping prediction.
Benchmarking Computational Pathology Foundation Models For Semantic Segmentation
Read abstract
In recent years, foundation models such as CLIP, DINO,and CONCH have demonstrated remarkable domain generalization and unsupervised feature extraction capabilities across diverse imaging tasks. However, systematic and independent evaluations of these models for pixel-level semantic segmentation in histopathology remain scarce. In this study, we propose a robust benchmarking approach to asses 10 foundational models on four histopathological datasets covering both morphological tissue-region and cellular/nuclear segmentation tasks. Our method leverages attention maps of foundation models as pixel-wise features, which are then classified using a machine learning algorithm, XGBoost, enabling fast, interpretable, and model-agnostic evaluation without finetuning. We show that the vision language foundation model, CONCH performed the best across datasets when compared to vision-only foundation models, with PathDino as close second. Further analysis shows that models trained on distinct histopathology cohorts capture complementary morphological representations, and concatenating their features yields superior segmentation performance. Concatenating features from CONCH, PathDino and CellViT outperformed individual models across all the datasets by 7.95% (averaged across the datasets), suggesting that ensembles of foundation models can better generalize to diverse histopathological segmentation tasks.
RamanSeg: Interpretability-driven Deep Learning on Raman Spectra for Cancer Diagnosis
Read abstract
Histopathology, the current gold standard for cancer diagnosis, involves the manual examination of tissue samples after chemical staining, a time-consuming process requiring expert analysis. Raman spectroscopy is an alternative, stain-free method of extracting information from samples. Using nnU-Net, we trained a segmentation model on a novel dataset of spatial Raman spectra aligned with tumour annotations, achieving a mean foreground Dice score of 80.9%, surpassing previous work. Furthermore, we propose a novel, interpretable, prototype-based architecture called RamanSeg. RamanSeg classifies pixels based on discovered regions of the training set, generating a segmentation mask. Two variants of RamanSeg allow a trade-off between interpretability and performance: one with prototype projection and another projection-free version. The projection-free RamanSeg outperformed a U-Net baseline with a mean foreground Dice score of 67.3%, offering a meaningful improvement over a black-box training approach.
Promptable segmentation with region exploration enables minimal-effort expert-level prostate cancer delineation
Read abstract
Purpose: Accurate segmentation of prostate cancer on magnetic resonance (MR) images is crucial for planning image-guided interventions such as targeted biopsies, cryoablation, and radiotherapy. However, subtle and variable tumour appearances, differences in imaging protocols, and limited expert availability make consistent interpretation difficult. While automated methods aim to address this, they rely on large expertly-annotated datasets that are often inconsistent, whereas manual delineation remains labour-intensive. This work aims to bridge the gap between automated and manual segmentation through a framework driven by user-provided point prompts, enabling accurate segmentation with minimal annotation effort. Methods: The framework combines reinforcement learning (RL) with a region-growing segmentation process guided by user prompts. Starting from an initial point prompt, region-growing generates a preliminary segmentation, which is iteratively refined through RL. At each step, the RL agent observes the image and current segmentation to predict a new point, from which region growing updates the mask. A reward, balancing segmentation accuracy and voxel-wise uncertainty, encourages exploration of ambiguous regions, allowing the agent to escape local optima and perform sample-specific optimisation. Despite requiring fully supervised training, the framework bridges manual and fully automated segmentation at inference by substantially reducing user effort while outperforming current fully automated methods. Results: The framework was evaluated on two public prostate MR datasets (PROMIS and PICAI, with 566 and 1090 cases). It outperformed the previous best automated methods by 9.9% and 8.9%, respectively, with performance comparable to manual radiologist segmentation, reducing annotation time tenfold.