Research Papers

ARXIV Cancer: non-small cell lung cancer Method: XGBoost

An Interpretable Machine Learning Framework for Non-Small Cell Lung Cancer Drug Response Analysis

Ann Rachel, Pranav M Pawar, Mithun Mukharjee, Raja M, Tojo Mathew
Published 2026-03-17 10:01

This paper presents an interpretable machine learning framework aimed at analyzing drug responses in non-small cell lung cancer (NSCLC). It utilizes multi-omics data and machine learning techniques, specifically employing an XGBoost regressor to predict drug sensitivity based on molecular and cellular features. The model's performance is enhanced through hyperparameter tuning and is interpreted using SHAP values and DeepSeek for biological validation.

Read abstract

Lung cancer is a condition where there is abnormal growth of malignant cells that spread in an uncontrollable fashion in the lungs. Some common treatment strategies are surgery, chemotherapy, and radiation which aren't the best options due to the heterogeneous nature of cancer. In personalized medicine, treatments are tailored according to the individual's genetic information along with lifestyle aspects. In addition, AI-based deep learning methods can analyze large sets of data to find early signs of cancer, types of tumor, and prospects of treatment. The paper focuses on the development of personalized treatment plans using specific patient data focusing primarily on the genetic profile. Multi-Omics data from Genomics of Drug Sensitivity in Cancer have been used to build a predictive model along with machine learning techniques. The value of the target variable, LN-IC50, determines how sensitive or resistive a drug is. An XGBoost regressor is utilized to predict the drug response focusing on molecular and cellular features extracted from cancer datasets. Cross-validation and Randomized Search are performed for hyperparameter tuning to further optimize the model's predictive performance. For explanation purposes, SHAP (SHapley Additive exPlanations) was used. SHAP values measure each feature's impact on an individual prediction. Furthermore, interpreting feature relationships was performed using DeepSeek, a large language model trained to verify the biological validity of the features. Contextual explanations regarding the most important genes or pathways were provided by DeepSeek alongside the top SHAP value constituents, supporting the predictability of the model.

ARXIV Cancer: prostate cancer Method: deep learning

Efficient AI-Driven Multi-Section Whole Slide Image Analysis for Biochemical Recurrence Prediction in Prostate Cancer

Yesung Cho, Dongmyung Shin, Sujeong Hong, Jooyeon Lee, Seongmin Park, Geongyu Lee, Jongbae Park, Hong Koo Ha
Published 2026-03-17 07:58

This study presents a novel AI framework designed for the analysis of multi-section pathology slides to predict biochemical recurrence (BCR) in prostate cancer patients after radical prostatectomy. The model was developed using a large dataset of 23,451 slides from 789 patients and demonstrated superior predictive performance compared to traditional clinical benchmarks. The integration of patch and slide sub-sampling strategies was shown to reduce computational costs while maintaining accuracy, confirming the model's clinical feasibility and prognostic value.

Read abstract

Prostate cancer is one of the most frequently diagnosed malignancies in men worldwide. However, precise prediction of biochemical recurrence (BCR) after radical prostatectomy remains challenging due to the multifocality of tumors distributed throughout the prostate gland. In this paper, we propose a novel AI framework that simultaneously processes a series of multi-section pathology slides to capture the comprehensive tumor landscape across the entire prostate gland. To develop this predictive AI model, we curated a large-scale dataset of 23,451 slides from 789 patients. The proposed framework demonstrated strong predictive performance for 1- and 2-year BCR prediction, substantially outperforming established clinical benchmarks. The AI-derived risk score was validated as the most potent independent prognostic factor in a multivariable Cox proportional hazards analysis, surpassing conventional clinical markers such as pre-operative PSA and Gleason score. Furthermore, we demonstrated that integrating patch and slide sub-sampling strategies significantly reduces computational cost during both training and inference without compromising predictive performance, and generalizability of AI was confirmed through external validation. Collectively, these results highlight the clinical feasibility and prognostic value of the proposed AI-based multi-section slide analysis as a scalable tool for post-operative management in prostate cancer.

ARXIV Cancer: unknown Method: transfer learning

Sample-Efficient Adaptation of Drug-Response Models to Patient Tumors under Strong Biological Domain Shift

Camille Jimenez Cortes, Philippe Lalanda, German Vega
Published 2026-03-17 07:12

This study addresses the challenge of predicting drug response in patients by developing a staged transfer-learning framework that separates representation learning from task supervision. The method utilizes autoencoder-based representation learning on large collections of unlabeled pharmacogenomic data, followed by alignment with drug-response labels and adaptation to patient tumors using few-shot supervision. The results indicate that this approach can improve performance during patient-level adaptation while requiring less labeled data compared to traditional methods.

Read abstract

Predicting drug response in patients from preclinical data remains a major challenge in precision oncology due to the substantial biological gap between in vitro cell lines and patient tumors. Rather than aiming to improve absolute in vitro prediction accuracy, this work examines whether explicitly separating representation learning from task supervision enables more sample-efficient adaptation of drug-response models to patient data under strong biological domain shift. We propose a staged transfer-learning framework in which cellular and drug representations are first learned independently from large collections of unlabeled pharmacogenomic data using autoencoder-based representation learning. These representations are then aligned with drug-response labels on cell-line data and subsequently adapted to patient tumors using few-shot supervision. Through a systematic evaluation spanning in-domain, cross-dataset, and patient-level settings, we show that unsupervised pretraining provides limited benefit when source and target domains overlap substantially, but yields clear gains when adapting to patient tumors with very limited labeled data. In particular, the proposed framework achieves faster performance improvements during few-shot patient-level adaptation while maintaining comparable accuracy to single-phase baselines on standard cell-line benchmarks. Overall, these results demonstrate that learning structured and transferable representations from unlabeled molecular profiles can substantially reduce the amount of clinical supervision required for effective drug-response prediction, offering a practical pathway toward data-efficient preclinical-to-clinical translation.

ARXIV Cancer: general cancer Method: Topology-Guided Biomechanical Profiling

Topology-Guided Biomechanical Profiling: A White-Box Framework for Opportunistic Screening of Spinal Instability on Routine CT

Zanting Ye, Xuanbin Wu, Guoqing Zhong, Shengyuan Liu, Jiashuai Liu, Ge Song, Zhisong Wang, Jing Hao, Xiaolong Niu, Yefeng Zheng, Yu Zhang, Lijun Lu
Published 2026-03-17 06:34

The study introduces Topology-Guided Biomechanical Profiling (TGBP), a white-box framework designed to automate the assessment of spinal instability using the Spinal Instability Neoplastic Score (SINS). This method addresses challenges posed by metastatic osteolysis and topological ambiguity in routine oncologic CT scans. TGBP integrates geometric innovations and auxiliary modules, achieving a 90.2% accuracy in stability triage across a multi-cancer cohort, significantly outperforming medical oncologists in a blinded reader study.

Read abstract

Routine oncologic computed tomography (CT) presents an ideal opportunity for screening spinal instability, yet prophylactic stabilization windows are frequently missed due to the complex geometric reasoning required by the Spinal Instability Neoplastic Score (SINS). Automating SINS is fundamentally hindered by metastatic osteolysis, which induces topological ambiguity that confounds standard segmentation and black-box AI. We propose Topology-Guided Biomechanical Profiling (TGBP), an auditable white-box framework decoupling anatomical perception from structural reasoning. TGBP anchors SINS assessment on two deterministic geometric innovations: (i) canal-referenced partitioning to resolve posterolateral boundary ambiguity, and (ii) context-aware morphometric normalization via covariance-based oriented bounding boxes (OBB) to quantify vertebral collapse. Integrated with auxiliary radiomic and large language model (LLM) modules, TGBP provides an end-to-end, interpretable SINS evaluation. Validated on a multi-center, multi-cancer cohort ($N=482$), TGBP achieved 90.2\% accuracy in 3-tier stability triage. In a blinded reader study ($N=30$), TGBP significantly outperformed medical oncologists on complex structural features ($κ=0.857$ vs.\ $0.570$) and prevented compounding errors in Total Score estimation ($κ=0.625$ vs.\ $0.207$), democratizing expert-level opportunistic screening.

ARXIV Cancer: unknown Method: large language model

RadAnnotate: Large Language Models for Efficient and Reliable Radiology Report Annotation

Saisha Pradeep Shetty, Roger Eric Goldman, Vladimir Filkov
Published 2026-03-16 23:23

The paper presents RadAnnotate, a framework utilizing large language models (LLMs) for efficient radiology report annotation. It focuses on reducing expert effort through retrieval-augmented synthetic reports and confidence-based selective automation. The study demonstrates that synthetic models can achieve performance close to gold-standard models, particularly benefiting uncertain observations in low-resource settings. The framework can automatically annotate a significant percentage of reports while maintaining high entity match scores.

Read abstract

Radiology report annotation is essential for clinical NLP, yet manual labeling is slow and costly. We present RadAnnotate, an LLM-based framework that studies retrieval-augmented synthetic reports and confidence-based selective automation to reduce expert effort for labeling in RadGraph. We study RadGraph-style entity labeling (graph nodes) and leave relation extraction (edges) to future work. First, we train entity-specific classifiers on gold-standard reports and characterize their strengths and failure modes across anatomy and observation categories, with uncertain observations hardest to learn. Second, we generate RAG-guided synthetic reports and show that synthetic-only models remain within 1-2 F1 points of gold-trained models, and that synthetic augmentation is especially helpful for uncertain observations in a low-resource setting, improving F1 from 0.61 to 0.70. Finally, by learning entity-specific confidence thresholds, RadAnnotate can automatically annotate 55-90% of reports at 0.86-0.92 entity match score while routing low-confidence cases for expert review.

ARXIV Cancer: breast cancer Method: deep learning

Standardizing Medical Images at Scale for AI

Callen MacPhee, Yiming Zhou, Koichiro Kishima, Bahram Jalali
Published 2026-03-16 22:51

This paper presents a physics-based data preprocessing framework called PhyCV, designed to standardize medical images and enhance the performance of deep learning models in medical image analysis. The framework addresses the challenges posed by heterogeneity in clinical data by applying deterministic transformations derived from optical physics. When tested on histopathological images, PhyCV significantly improved breast cancer classification accuracy, demonstrating its effectiveness in harmonizing diverse datasets and enhancing model generalization.

Read abstract

Deep learning has achieved remarkable success in medical image analysis, yet its performance remains highly sensitive to the heterogeneity of clinical data. Differences in imaging hardware, staining protocols, and acquisition conditions produce substantial domain shifts that degrade model generalization across institutions. Here we present a physics-based data preprocessing framework based on the PhyCV (Physics-Inspired Computer Vision) family of algorithms, which standardizes medical images through deterministic transformations derived from optical physics. The framework models images as spatially varying optical fields that undergo a virtual diffractive propagation followed by coherent phase detection. This process suppresses non-semantic variability such as color and illumination differences while preserving diagnostically relevant texture and structural features. When applied to histopathological images from the Camelyon17-WILDS benchmark, PhyCV preprocessing improves out-of-distribution breast-cancer classification accuracy from 70.8% (Empirical Risk Minimization baseline) to 90.9%, matching or exceeding data-augmentation and domain-generalization approaches at negligible computational cost. Because the transform is physically interpretable, parameterizable, and differentiable, it can be deployed as a fixed preprocessing stage or integrated into end-to-end learning. These results establish PhyCV as a generalizable data refinery for medical imaging-one that harmonizes heterogeneous datasets through first-principles physics, improving robustness, interpretability, and reproducibility in clinical AI systems.

ARXIV Cancer: renal cell carcinoma Method: histopathology foundation models

A Comprehensive Benchmark of Histopathology Foundation Models for Kidney Digital Pathology Images

Harishwar Reddy Kasireddy, Patricio S. La Rosa, Akshita Gupta, Anindya S. Paul, Jamie L. Fermin, William L. Clapp, Meryl A. Waldman, Tarek M. El-Ashkar, Sanjay Jain, Luis Rodrigues, Kuang Yu Jen, Avi Z. Rosenberg, Michael T. Eadon, Jeffrey B. Hodgin, Pinaki Sarder
Published 2026-03-16 22:37

This study evaluates the performance of histopathology foundation models (HFMs) on kidney digital pathology images, focusing on their applicability to both cancerous and non-cancerous conditions. The authors systematically assess 11 HFMs across various kidney-specific tasks, revealing that while these models perform well on meso-scale morphological tasks, they struggle with fine-grained microstructural discrimination and prognostic inference. The findings underscore the necessity for improved kidney-specific models to enhance clinical decision-making in nephrology.

Read abstract

Histopathology foundation models (HFMs), pretrained on large-scale cancer datasets, have advanced computational pathology. However, their applicability to non-cancerous chronic kidney disease remains underexplored, despite coexistence of renal pathology with malignancies such as renal cell and urothelial carcinoma. We systematically evaluate 11 publicly available HFMs across 11 kidney-specific downstream tasks spanning multiple stains (PAS, H&E, PASM, and IHC), spatial scales (tile and slide-level), task types (classification, regression, and copy detection), and clinical objectives, including detection, diagnosis, and prognosis. Tile-level performance is assessed using repeated stratified group cross-validation, while slide-level tasks are evaluated using repeated nested stratified cross-validation. Statistical significance is examined using Friedman test followed by pairwise Wilcoxon signed-rank testing with Holm-Bonferroni correction and compact letter display visualization. To promote reproducibility, we release an open-source Python package, kidney-hfm-eval, available at https://pypi.org/project/kidney-hfm-eval/ , that reproduces the evaluation pipelines. Results show moderate to strong performance on tasks driven by coarse meso-scale renal morphology, including diagnostic classification and detection of prominent structural alterations. In contrast, performance consistently declines for tasks requiring fine-grained microstructural discrimination, complex biological phenotypes, or slide-level prognostic inference, largely independent of stain type. Overall, current HFMs appear to encode predominantly static meso-scale representations and may have limited capacity to capture subtle renal pathology or prognosis-related signals. Our results highlight the need for kidney-specific, multi-stain, and multimodal foundation models to support clinically reliable decision-making in nephrology.

ARXIV Cancer: lung cancer Method: multimodal diffusion

Nodule-Aligned Latent Space Learning with LLM-Driven Multimodal Diffusion for Lung Nodule Progression Prediction

James Song, Yifan Wang, Chuan Zhou, Liyue Shen
Published 2026-03-16 21:22

This study presents Nodule-Aligned Multimodal Diffusion (NAMD), a framework designed to predict lung nodule progression by generating follow-up computed tomography images based on baseline scans and patient data. The method utilizes a nodule-aligned latent space and an LLM-driven control mechanism to enhance prediction accuracy. Results indicate that NAMD significantly outperforms existing methods in predicting lung nodule malignancy, achieving an AUROC of 0.805.

Read abstract

Early diagnosis of lung cancer is challenging due to biological uncertainty and the limited understanding of the biological mechanisms driving nodule progression. To address this, we propose Nodule-Aligned Multimodal (Latent) Diffusion (NAMD), a novel framework that predicts lung nodule progression by generating 1-year follow-up nodule computed tomography images with baseline scans and the patient's and nodule's Electronic Health Record (EHR). NAMD introduces a nodule-aligned latent space, where distances between latents directly correspond to changes in nodule attributes, and utilizes an LLM-driven control mechanism to condition the diffusion backbone on patient data. On the National Lung Screening Trial (NLST) dataset, our method synthesizes follow-up nodule images that achieve an AUROC of 0.805 and an AUPRC of 0.346 for lung nodule malignancy prediction, significantly outperforming both baseline scans and state-of-the-art synthesis methods, while closely approaching the performance of real follow-up scans (AUROC: 0.819, AUPRC: 0.393). These results demonstrate that NAMD captures clinically relevant features of lung nodule progression, facilitating earlier and more accurate diagnosis.

ARXIV Cancer: unknown Method: adaptive augmentation framework

Beyond the Embedding Bottleneck: Adaptive Retrieval-Augmented 3D CT Report Generation

Renjie Liang, Yiling Ma, Yang Xing, Zhengkang Fan, Jinqian Pan, Chengkun Sun, Li Li, Kuang Gong, Jie Xu
Published 2026-03-16 18:56

This paper addresses the challenge of automated radiology report generation from 3D CT volumes, which is hindered by a representational bottleneck in contrastive 3D CT embeddings. The authors propose AdaRAG-CT, an adaptive augmentation framework that enhances visual representation by integrating supplementary textual information during report generation. The method demonstrates significant improvements in clinical efficacy, achieving a Clinical F1 score of 0.480 on the CT-RATE benchmark.

Read abstract

Automated radiology report generation from 3D CT volumes often suffers from incomplete pathology coverage. We provide empirical evidence that this limitation stems from a representational bottleneck: contrastive 3D CT embeddings encode discriminative pathology signals, yet exhibit severe dimensional concentration, with as few as 2 effective dimensions out of 512. Corroborating this, scaling the language model yields no measurable improvement, suggesting that the bottleneck lies in the visual representation rather than the generator. This bottleneck limits both generation and retrieval; naive static retrieval fails to improve clinical efficacy and can even degrade performance. We propose \textbf{AdaRAG-CT}, an adaptive augmentation framework that compensates for this visual bottleneck by introducing supplementary textual information through controlled retrieval and selectively integrating it during generation. On the CT-RATE benchmark, AdaRAG-CT achieves state-of-the-art clinical efficacy, improving Clinical F1 from 0.420 (CT-Agent) to 0.480 (+6 points); ablation studies confirm that both the retrieval and generation components contribute to the improvement. Code is available at https://github.com/renjie-liang/Adaptive-RAG-for-3DCT-Report-Generation.

ARXIV Cancer: general cancer Method: Multiple Instance Learning

Domain Adaptation Without the Compute Burden for Efficient Whole Slide Image Analysis

Umar Marikkar, Muhammad Awais, Sara Atito
Published 2026-03-16 18:05

This paper presents EfficientWSI (eWSI), a novel approach that integrates Parameter-Efficient-Fine-Tuning (PEFT) with Multiple Instance Learning (MIL) for the analysis of Whole Slide Images (WSIs). The method aims to reduce computational costs while enhancing task-specific learning in cancer diagnostics. Evaluations on multiple datasets demonstrate that eWSI achieves strong classification performance, often surpassing traditional methods that rely on in-domain feature extractors.

Read abstract

Computational methods on analyzing Whole Slide Images (WSIs) enable early diagnosis and treatments by supporting pathologists in detection and classification of tumors. However, the extremely high resolution of WSIs makes end-to-end training impractical compared to typical image analysis tasks. To address this, most approaches use pre-trained feature extractors to obtain fixed representations of whole slides, which are then combined with Multiple Instance Learning (MIL) for downstream tasks. These feature extractors are typically pre-trained on natural image datasets such as ImageNet, which fail to capture domain-specific characteristics. Although domain-specific pre-training on histopathology data yields more relevant feature representations, it remains computationally expensive and fail to capture task-specific characteristics within the domain. To address the computational cost and lack of task-specificity in domain-specific pre-training, we propose EfficientWSI (eWSI), a careful integration of Parameter-Efficient-Fine-Tuning (PEFT) and Multiple Instance Learning (MIL) that enables end-to-end training on WSI tasks. We evaluate eWSI on seven WSI-level tasks over Camelyon16, TCGA and BRACS datasets. Our results show that eWSI when applied with ImageNet feature extractors yields strong classification performance, matching or outperforming MILs with in-domain feature extractors, alleviating the need for extensive in-domain pre-training. Furthermore, when eWSI is applied with in-domain feature extractors, it further improves classification performance in most cases, demonstrating its ability to capture task-specific information where beneficial. Our findings suggest that eWSI provides a task-targeted, computationally efficient path for WSI tasks, offering a promising direction for task-specific learning in computational pathology.

Find the papers that actually matter