Research Papers

ARXIV Cancer: thyroid cancer Method: deep learning

Effectiveness of Automatically Curated Dataset in Thyroid Nodules Classification Algorithms Using Deep Learning

Jichen Yang, Jikai Zhang, Benjamin Wildman-Tobriner, Maciej A. Mazurowski
Published 2026-02-01 05:13

This study investigates the effectiveness of an automatically curated dataset for training deep learning algorithms in the classification of thyroid nodules. The research compares the performance of models trained on manually annotated datasets versus those trained on automatically curated datasets. Results indicate that the deep learning model trained on the automatically curated dataset achieved a higher AUC of 0.694, suggesting improved performance over traditional methods.

Read abstract

The diagnosis of thyroid nodule cancers commonly utilizes ultrasound images. Several studies showed that deep learning algorithms designed to classify benign and malignant thyroid nodules could match radiologists' performance. However, data availability for training deep learning models is often limited due to the significant effort required to curate such datasets. The previous study proposed a method to curate thyroid nodule datasets automatically. It was tested to have a 63% yield rate and 83% accuracy. However, the usefulness of the generated data for training deep learning models remains unknown. In this study, we conducted experiments to determine whether using a automatically-curated dataset improves deep learning algorithms' performance. We trained deep learning models on the manually annotated and automatically-curated datasets. We also trained with a smaller subset of the automatically-curated dataset that has higher accuracy to explore the optimum usage of such dataset. As a result, the deep learning model trained on the manually selected dataset has an AUC of 0.643 (95% confidence interval [CI]: 0.62, 0.66). It is significantly lower than the AUC of the 6automatically-curated dataset trained deep learning model, 0.694 (95% confidence interval [CI]: 0.67, 0.73, P < .001). The AUC of the accurate subset trained deep learning model is 0.689 (95% confidence interval [CI]: 0.66, 0.72, P > .43), which is insignificantly worse than the AUC of the full automatically-curated dataset. In conclusion, we showed that using a automatically-curated dataset can substantially increase the performance of deep learning algorithms, and it is suggested to use all the data rather than only using the accurate subset.

ARXIV Cancer: thyroid cancer Method: unknown

Diagnostic Impact of Cine Clips for Thyroid Nodule Assessment on Ultrasound

Jichen Yang, Brian C. Allen, Kirti Magudia, Lisa M. Ho, Chad M. Miller, Maciej A. Mazurowski, Benjamin Wildman-Tobriner
Published 2026-02-01 03:06

This study evaluated the impact of cine imaging on the accuracy and consistency of thyroid nodule assessment using the ACR TI-RADS. A reader study involving 4 radiologists assessed 50 benign and 50 malignant nodules across three rounds, comparing static images to those including cine clips. The results indicated that the inclusion of cine images did not significantly enhance diagnostic performance or management recommendations. Overall, current guidelines remain adequate for accurate diagnosis without the need for cine imaging.

Read abstract

Background: Thyroid ultrasound is commonly performed using a combination of static images and cine clips (video recordings). However, the exact utility and impact of cine images remains unknown. This study aimed to evaluate the impact of cine imaging on accuracy and consistency of thyroid nodule assessment, using the American College of Radiology Thyroid Reporting and Data System (ACR TI-RADS). Methods: 50 benign and 50 malignant thyroid nodules with cytopathology results were included. A reader study with 4 specialty-trained radiologists was then conducted over 3 rounds, assessing only static images in the first two rounds and both static and cine images in the third round. TI-RADS scores and the consequent management recommendations were then evaluated by comparing them to the malignancy status of the nodules. Results: Mean sensitivity for malignancy detection was 0.65 for static images and 0.67 with both static and cine images (p>0.5). Specificity was 0.20 for static images and 0.22 with both static and cine images (p>0.5). Management recommendations were similar with and without cine images. Intrareader agreement on feature assignments remained consistent across all rounds, though TI-RADS point totals were slightly higher with cine images. Conclusion: The inclusion of cine imaging for thyroid nodule assessment on ultrasound did not significantly change diagnostic performance. Current practice guidelines, which do not mandate cine imaging, are sufficient for accurate diagnosis.

ARXIV Cancer: general cancer Method: artificial neural networks

Data Augmentation for High-Fidelity Generation of CAR-T/NK Immunological Synapse Images

Xiang Zhang, Boxuan Zhang, Alireza Naghizadeh, Mohab Mohamed, Dongfang Liu, Ruixiang Tang, Dimitris Metaxas, Dongfang Liu
Published 2026-02-01 00:54

This study focuses on improving the detection and segmentation of CAR-T/NK cell immunological synapse (IS) structures using artificial neural networks (ANNs). To overcome the limitations of small annotated datasets, the authors propose two data-augmentation frameworks: Instance Aware Automatic Augmentation (IAAA) and Semantic-Aware AI Augmentation (SAAA). These methods generate synthetic images and segmentation masks that enhance the training data, leading to improved performance in IS quantification and potentially more reliable imaging-based biomarkers for predicting therapeutic efficacy.

Read abstract

Chimeric antigen receptor (CAR)-T and NK cell immunotherapies have transformed cancer treatment, and recent studies suggest that the quality of the CAR-T/NK cell immunological synapse (IS) may serve as a functional biomarker for predicting therapeutic efficacy. Accurate detection and segmentation of CAR-T/NK IS structures using artificial neural networks (ANNs) can greatly increase the speed and reliability of IS quantification. However, a persistent challenge is the limited size of annotated microscopy datasets, which restricts the ability of ANNs to generalize. To address this challenge, we integrate two complementary data-augmentation frameworks. First, we employ Instance Aware Automatic Augmentation (IAAA), an automated, instance-preserving augmentation method that generates synthetic CAR-T/NK IS images and corresponding segmentation masks by applying optimized augmentation policies to original IS data. IAAA supports multiple imaging modalities (e.g., fluorescence and brightfield) and can be applied directly to CAR-T/NK IS images derived from patient samples. In parallel, we introduce a Semantic-Aware AI Augmentation (SAAA) pipeline that combines a diffusion-based mask generator with a Pix2Pix conditional image synthesizer. This second method enables the creation of diverse, anatomically realistic segmentation masks and produces high-fidelity CAR-T/NK IS images aligned with those masks, further expanding the training corpus beyond what IAAA alone can provide. Together, these augmentation strategies generate synthetic images whose visual and structural properties closely match real IS data, significantly improving CAR-T/NK IS detection and segmentation performance. By enhancing the robustness and accuracy of IS quantification, this work supports the development of more reliable imaging-based biomarkers for predicting patient response to CAR-T/NK immunotherapy.

ARXIV Cancer: general cancer Method: graph neural network

RAG-GNN: Integrating Retrieved Knowledge with Graph Neural Networks for Precision Medicine

Hasi Hays, William J. Richardson
Published 2026-01-31 08:05

This study introduces a retrieval-augmented generation (RAG) embedding framework that combines graph neural network representations with knowledge retrieved from biomedical literature. The method demonstrates superior performance in functional clustering compared to traditional topology-focused methods. The application to cancer signaling networks identifies DDR1 as a potential therapeutic target associated with KRAS mutations, highlighting the complementary roles of structural and functional approaches in precision medicine.

Read abstract

Network topology excels at structural predictions but fails to capture functional semantics encoded in biomedical literature. We present a retrieval-augmented generation (RAG) embedding framework that integrates graph neural network representations with dynamically retrieved literature-derived knowledge through contrastive learning. Benchmarking against ten embedding methods reveals task-specific complementarity: topology-focused methods achieve near-perfect link prediction (GCN: 0.983 AUROC), while RAG-GNN is the only method achieving positive silhouette scores for functional clustering (0.001 vs. negative scores for all baselines). Information-theoretic decomposition shows network topology contributes 77.3% of predictive information, while retrieved documents provide 8.6% unique information. Applied to cancer signaling networks (379 proteins, 3,498 interactions), the framework identifies DDR1 as a therapeutic target based on retrieved evidence of synthetic lethality with KRAS mutations. These results establish that topology-only and retrieval-augmented approaches serve complementary purposes: structural prediction tasks are solved by network topology alone, while functional interpretation uniquely benefits from retrieved knowledge.

ARXIV Cancer: lung cancer Method: reinforcement learning

AdaFuse: Adaptive Multimodal Fusion for Lung Cancer Risk Prediction via Reinforcement Learning

Chongyu Qu, Zhengyi Lu, Yuxiang Lai, Thomas Z. Li, Junchao Zhu, Junlin Guo, Juming Xiong, Yanfan Zhu, Yuechen Yang, Allen J. Luna, Kim L. Sandler, Bennett A. Landman, Yuankai Huo
Published 2026-01-30 21:51

This paper presents AdaFuse, an adaptive multimodal fusion framework designed for lung cancer risk prediction using reinforcement learning. The framework addresses the challenge of modality selection by learning patient-specific strategies for incorporating various data sources. Experimental results indicate that AdaFuse outperforms existing methods in terms of accuracy and efficiency, suggesting its potential for personalized diagnostic approaches in medical imaging.

Read abstract

Multimodal fusion has emerged as a promising paradigm for disease diagnosis and prognosis, integrating complementary information from heterogeneous data sources such as medical images, clinical records, and radiology reports. However, existing fusion methods process all available modalities through the network, either treating them equally or learning to assign different contribution weights, leaving a fundamental question unaddressed: for a given patient, should certain modalities be used at all? We present AdaFuse, an adaptive multimodal fusion framework that leverages reinforcement learning (RL) to learn patient-specific modality selection and fusion strategies for lung cancer risk prediction. AdaFuse formulates multimodal fusion as a sequential decision process, where the policy network iteratively decides whether to incorporate an additional modality or proceed to prediction based on the information already acquired. This sequential formulation enables the model to condition each selection on previously observed modalities and terminate early when sufficient information is available, rather than committing to a fixed subset upfront. We evaluate AdaFuse on the National Lung Screening Trial (NLST) dataset. Experimental results demonstrate that AdaFuse achieves the highest AUC (0.762) compared to the best single-modality baseline (0.732), the best fixed fusion strategy (0.759), and adaptive baselines including DynMM (0.754) and MoE (0.742), while using fewer FLOPs than all triple-modality methods. Our work demonstrates the potential of reinforcement learning for personalized multimodal fusion in medical imaging, representing a shift from uniform fusion strategies toward adaptive diagnostic pipelines that learn when to consult additional modalities and when existing information suffices for accurate prediction.

ARXIV Cancer: general cancer Method: promptable segmentation

Opportunistic Promptable Segmentation: Leveraging Routine Radiological Annotations to Guide 3D CT Lesion Segmentation

Samuel Church, Joshua D. Warner, Danyal Maqbool, Xin Tie, Junjie Hu, Meghan G. Lubner, Tyler J. Bradshaw
Published 2026-01-30 20:59

This paper presents a novel approach called Opportunistic Promptable Segmentation, which utilizes routine radiological annotations to enhance 3D CT lesion segmentation. The proposed model, SAM2CT, converts limited annotations from radiologists into 3D segmentations, achieving significant performance improvements on public benchmarks. The results indicate that this method can effectively generate clinically acceptable segmentations from existing annotations in a scalable manner.

Read abstract

The development of machine learning models for CT imaging depends on the availability of large, high-quality, and diverse annotated datasets. Although large volumes of CT images and reports are readily available in clinical picture archiving and communication systems (PACS), 3D segmentations of critical findings are costly to obtain, typically requiring extensive manual annotation by radiologists. On the other hand, it is common for radiologists to provide limited annotations of findings during routine reads, such as line measurements and arrows, that are often stored in PACS as GSPS objects. We posit that these sparse annotations can be extracted along with CT volumes and converted into 3D segmentations using promptable segmentation models, a paradigm we term Opportunistic Promptable Segmentation. To enable this paradigm, we propose SAM2CT, the first promptable segmentation model designed to convert radiologist annotations into 3D segmentations in CT volumes. SAM2CT builds upon SAM2 by extending the prompt encoder to support arrow and line inputs and by introducing Memory-Conditioned Memories (MCM), a memory encoding strategy tailored to 3D medical volumes. On public lesion segmentation benchmarks, SAM2CT outperforms existing promptable segmentation models and similarly trained baselines, achieving Dice similarity coefficients of 0.649 for arrow prompts and 0.757 for line prompts. Applying the model to pre-existing GSPS annotations from a clinical PACS (N = 60), SAM2CT generates 3D segmentations that are clinically acceptable or require only minor adjustments in 87% of cases, as scored by radiologists. Additionally, SAM2CT demonstrates strong zero-shot performance on select Emergency Department findings. These results suggest that large-scale mining of historical GSPS annotations represents a promising and scalable approach for generating 3D CT segmentation datasets.

ARXIV Cancer: general cancer Method: reinforcement learning

Med-Scout: Curing MLLMs' Geometric Blindness in Medical Perception via Geometry-Aware RL Post-Training

Anglin Liu, Ruichao Chen, Yi Lu, Hongxia Xu, Jintai Chen
Published 2026-01-30 17:45

This paper presents Med-Scout, a framework designed to address the geometric blindness observed in Multimodal Large Language Models (MLLMs) used for medical diagnosis. By employing Reinforcement Learning (RL) and leveraging geometric logic from unlabeled medical images, Med-Scout introduces three proxy tasks to provide supervision without expert annotations. The results demonstrate that Med-Scout significantly improves geometric perception, outperforming existing MLLMs by over 40% on a newly introduced benchmark.

Read abstract

Despite recent Multimodal Large Language Models (MLLMs)' linguistic prowess in medical diagnosis, we find even state-of-the-art MLLMs suffer from a critical perceptual deficit: geometric blindness. This failure to ground outputs in objective geometric constraints leads to plausible yet factually incorrect hallucinations, rooted in training paradigms that prioritize linguistic fluency over geometric fidelity. This paper introduces Med-Scout, a novel framework that "cures" this blindness via Reinforcement Learning (RL) that leverages the intrinsic geometric logic latent within unlabeled medical images. Instead of relying on costly expert annotations, Med-Scout derives verifiable supervision signals through three strategic proxy tasks: Hierarchical Scale Localization, Topological Jigsaw Reconstruction, and Anomaly Consistency Detection. To rigorously quantify this deficit, we present Med-Scout-Bench, a new benchmark specifically designed to evaluate geometric perception. Extensive evaluations show that Med-Scout significantly mitigates geometric blindness, outperforming leading proprietary and open-source MLLMs by over 40% on our benchmark. Furthermore, this enhanced geometric perception generalizes to broader medical understanding, achieving superior results on radiological and comprehensive medical VQA tasks.

ARXIV Cancer: lung cancer Method: deep learning

Auditing Sybil: Explaining Deep Lung Cancer Risk Prediction Through Generative Interventional Attributions

Bartlomiej Sobieski, Jakub Grzywaczewski, Karol Dobiczek, Mateusz Wójcik, Tomasz Bartczak, Patryk Szatkowski, Przemysław Bombiński, Matthew Tivnan, Przemyslaw Biecek
Published 2026-01-30 15:21

This paper presents Sybil, a deep learning model designed for predicting lung cancer risk from computed tomography (CT) scans. The authors introduce S(H)NAP, a model-agnostic auditing framework that utilizes generative interventional attributions to assess the model's reasoning mechanisms. The study highlights the model's ability to differentiate between malignant and benign pulmonary nodules, while also identifying critical failure modes that could impact clinical decision-making.

Read abstract

Lung cancer remains the leading cause of cancer mortality, driving the development of automated screening tools to alleviate radiologist workload. Standing at the frontier of this effort is Sybil, a deep learning model capable of predicting future risk solely from computed tomography (CT) with high precision. However, despite extensive clinical validation, current assessments rely purely on observational metrics. This correlation-based approach overlooks the model's actual reasoning mechanism, necessitating a shift to causal verification to ensure robust decision-making before clinical deployment. We propose S(H)NAP, a model-agnostic auditing framework that constructs generative interventional attributions validated by expert radiologists. By leveraging realistic 3D diffusion bridge modeling to systematically modify anatomical features, our approach isolates object-specific causal contributions to the risk score. Providing the first interventional audit of Sybil, we demonstrate that while the model often exhibits behavior akin to an expert radiologist, differentiating malignant pulmonary nodules from benign ones, it suffers from critical failure modes, including dangerous sensitivity to clinically unjustified artifacts and a distinct radial bias.

ARXIV Cancer: prostate cancer Method: Riemannian deep learning

A Geometric Multimodal Foundation Model Integrating Bp-MRI and Clinical Reports in Prostate Cancer Classification

Juan A. Olmos, Antoine Manzanera, Fabio Martínez
Published 2026-01-30 15:21

This study presents a geometric multimodal Foundation Model (MFM-Geom) designed for the classification of prostate cancer by integrating bi-parametric MRI (bp-MRI) and clinical reports. The model addresses the limitations of existing methods that primarily focus on imaging, by incorporating clinical context to enhance representation learning. The results demonstrate that MFM-Geom outperforms traditional classification methods, achieving an AUC-PR of 90.67 with only 10% of the training data.

Read abstract

Prostate cancer (PCa) is one of the most common cancers in men worldwide. Bi-parametric MRI (bp-MRI) and clinical variables are crucial for PCa identification and improving treatment decisions. However, this process is subjective to expert interpretations. Furthermore, most existing computer-aided diagnosis methods focus on imaging-based models, overlooking the clinical context and suffering from data scarcity, limiting their ability to learn robust representations. We propose a geometric multimodal Foundation Model (FM), named MFM-Geom, that learns representations from bp-MRI and clinical reports, encoding visual findings and information from the context of clinical variables. In the representations classification head, the approach leverages symmetric positive definite (SPD) matrices and Riemannian deep learning to integrate imaging-text representations from a biomedical multimodal FM. Using 10% of the training data, MFM-Geom outperformed baseline class token embedding-based classification (+8.3%, AUC-PR of 90.67). Generalization on external dataset confirmed the robustness of fine-tuning biomedical FM, achieving an AUC-PR of 90.6.

ARXIV Cancer: general cancer Method: Phenotype-Aware Multiple Instance Learning

PA-MIL: Phenotype-Aware Multiple Instance Learning Guided by Language Prompting and Genotype-to-Phenotype Relationships

Zekang Yang, Hong Liu, Xiangdong Wang
Published 2026-01-30 15:05

This paper presents Phenotype-Aware Multiple Instance Learning (PA-MIL), an interpretable framework designed for identifying cancer-related phenotypes from whole-slide images (WSIs) and facilitating cancer subtyping. The method constructs a phenotype knowledge base and employs morphological descriptions as language prompts to enhance feature aggregation. Experimental results indicate that PA-MIL achieves competitive performance while providing improved interpretability compared to existing methods. The study also emphasizes the analysis of genotype-phenotype relationships to ensure reliability and accountability.

Read abstract

Deep learning has been extensively researched in the analysis of pathology whole-slide images (WSIs). However, most existing methods are limited to providing prediction interpretability by locating the model's salient areas in a post-hoc manner, failing to offer more reliable and accountable explanations. In this work, we propose Phenotype-Aware Multiple Instance Learning (PA-MIL), a novel ante-hoc interpretable framework that identifies cancer-related phenotypes from WSIs and utilizes them for cancer subtyping. To facilitate PA-MIL in learning phenotype-aware features, we 1) construct a phenotype knowledge base containing cancer-related phenotypes and their associated genotypes. 2) utilize the morphological descriptions of phenotypes as language prompting to aggregate phenotype-related features. 3) devise the Genotype-to-Phenotype Neural Network (GP-NN) grounded in genotype-to-phenotype relationships, which provides multi-level guidance for PA-MIL. Experimental results on multiple datasets demonstrate that PA-MIL achieves competitive performance compared to existing MIL methods while offering improved interpretability. PA-MIL leverages phenotype saliency as evidence and, using a linear classifier, achieves competitive results compared to state-of-the-art methods. Additionally, we thoroughly analyze the genotype-phenotype relationships, as well as cohort-level and case-level interpretability, demonstrating the reliability and accountability of PA-MIL.

Find the papers that actually matter