Research Papers

ARXIV Cancer: unknown Method: deep learning

CutisAI: Deep Learning Framework for Automated Dermatology and Cancer Screening

Rohit Kaushik, Eva Kaushik
Published 2026-01-05 21:29

This paper presents the Conformal Bayesian Dermatological Classifier (CBDC), a deep learning framework designed for automated dermatology and cancer screening. The framework integrates Statistical Learning Theory, Topological Data Analysis, and Bayesian Conformal Inference to improve uncertainty quantification in predictions. Experimental results demonstrate that CBDC achieves high classification accuracy while providing interpretable and calibrated predictions suitable for clinical use.

Read abstract

The rapid growth of dermatological imaging and mobile diagnostic tools calls for systems that not only demonstrate empirical performance but also provide strong theoretical guarantees. Deep learning models have shown high predictive accuracy; however, they are often criticized for lacking well, calibrated uncertainty estimates without which these models are hardly deployable in a clinical setting. To this end, we present the Conformal Bayesian Dermatological Classifier (CBDC), a well, founded framework that combines Statistical Learning Theory, Topological Data Analysis (TDA), and Bayesian Conformal Inference. CBDC offers distribution, dependent generalization bounds that reflect dermatological variability, proves a topological stability theorem that guarantees the invariance of convolutional neural network embeddings under photometric and morphological perturbations and provides finite conformal coverage guarantees for trustworthy uncertainty quantification. Through exhaustive experiments on the HAM10000, PH2, and ISIC 2020 datasets, we show that CBDC not only attains classification accuracy but also generates calibrated predictions that are interpretable from a clinical perspective. This research constitutes a theoretical and practical leap for deep dermatological diagnostics, thereby opening the machine learning theory clinical applicability interface.

ARXIV Cancer: thyroid cancer Method: prior-guided DETR

Prior-Guided DETR for Ultrasound Nodule Detection

Jingjing Wang, Zhuo Xiao, Xinning Yao, Bo Liu, Lijuan Niu, Xiangzhi Bai, Fugen Zhou
Published 2026-01-05 15:32

This paper presents a prior-guided DETR framework aimed at improving the detection of ultrasound nodules associated with thyroid and breast cancers. The method incorporates prior knowledge at multiple stages of the network to enhance feature extraction and detection accuracy, particularly for irregular and blurred nodules. Experimental results indicate that the proposed approach outperforms 18 existing detection methods, especially in challenging cases involving complex nodule morphology.

Read abstract

Accurate detection of ultrasound nodules is essential for the early diagnosis and treatment of thyroid and breast cancers. However, this task remains challenging due to irregular nodule shapes, indistinct boundaries, substantial scale variations, and the presence of speckle noise that degrades structural visibility. To address these challenges, we propose a prior-guided DETR framework specifically designed for ultrasound nodule detection. Instead of relying on purely data-driven feature learning, the proposed framework progressively incorporates different prior knowledge at multiple stages of the network. First, a Spatially-adaptive Deformable FFN with Prior Regularization (SDFPR) is embedded into the CNN backbone to inject geometric priors into deformable sampling, stabilizing feature extraction for irregular and blurred nodules. Second, a Multi-scale Spatial-Frequency Feature Mixer (MSFFM) is designed to extract multi-scale structural priors, where spatial-domain processing emphasizes contour continuity and boundary cues, while frequency-domain modeling captures global morphology and suppresses speckle noise. Furthermore, a Dense Feature Interaction (DFI) mechanism propagates and exploits these prior-modulated features across all encoder layers, enabling the decoder to enhance query refinement under consistent geometric and structural guidance. Experiments conducted on two clinically collected thyroid ultrasound datasets (Thyroid I and Thyroid II) and two public benchmarks (TN3K and BUSI) for thyroid and breast nodules demonstrate that the proposed method achieves superior accuracy compared with 18 detection methods, particularly in detecting morphologically complex nodules.The source code is publicly available at https://github.com/wjj1wjj/Ultrasound-DETR.

ARXIV Cancer: unknown Method: multi-source domain adaptation

Mind the Gap: Continuous Magnification Sampling for Pathology Foundation Models

Alexander Möllers, Julius Hense, Florian Schulz, Timo Milbich, Maximilian Alber, Lukas Ruff
Published 2026-01-05 15:19

This study investigates the impact of magnification sampling on the performance of pathology foundation models in histopathology. The authors propose a continuous magnification sampling method to address the limitations of traditional discrete sampling strategies. Their experiments demonstrate that continuous sampling significantly enhances classification accuracy, particularly at intermediate magnifications, and optimized distributions can further improve model performance. The findings highlight the importance of magnification in the evaluation of pathology models.

Read abstract

In histopathology, pathologists examine both tissue architecture at low magnification and fine-grained morphology at high magnification. Yet, the performance of pathology foundation models across magnifications and the effect of magnification sampling during training remain poorly understood. We model magnification sampling as a multi-source domain adaptation problem and develop a simple theoretical framework that reveals systematic trade-offs between sampling strategies. We show that the widely used discrete uniform sampling of magnifications (0.25, 0.5, 1.0, 2.0 mpp) leads to degradation at intermediate magnifications. We introduce continuous magnification sampling, which removes gaps in magnification coverage while preserving performance at standard scales. Further, we derive sampling distributions that optimize representation quality across magnification scales. To evaluate these strategies, we introduce two new benchmarks (TCGA-MS, BRACS-MS) with appropriate metrics. Our experiments show that continuous sampling substantially improves over discrete sampling at intermediate magnifications, with gains of up to 4 percentage points in balanced classification accuracy, and that optimized distributions can further improve performance. Finally, we evaluate current histopathology foundation models, finding that magnification is a primary driver of performance variation across models. Our work paves the way towards future pathology foundation models that perform reliably across magnifications.

ARXIV Cancer: thyroid cancer Method: detection transformer

Nodule-DETR: A Novel DETR Architecture with Frequency-Channel Attention for Ultrasound Thyroid Nodule Detection

Jingjing Wang, Qianglin Liu, Zhuo Xiao, Xinning Yao, Bo Liu, Lu Li, Lijuan Niu, Fugen Zhou
Published 2026-01-05 08:53

This study presents Nodule-DETR, a novel detection transformer architecture aimed at improving the detection of thyroid nodules in ultrasound images. The method incorporates innovative modules such as Multi-Spectral Frequency-domain Channel Attention and Hierarchical Feature Fusion to enhance the detection of low-contrast nodules. Experimental results indicate that Nodule-DETR significantly outperforms existing models, demonstrating its potential for clinical application in thyroid cancer diagnostics.

Read abstract

Thyroid cancer is the most common endocrine malignancy, and its incidence is rising globally. While ultrasound is the preferred imaging modality for detecting thyroid nodules, its diagnostic accuracy is often limited by challenges such as low image contrast and blurred nodule boundaries. To address these issues, we propose Nodule-DETR, a novel detection transformer (DETR) architecture designed for robust thyroid nodule detection in ultrasound images. Nodule-DETR introduces three key innovations: a Multi-Spectral Frequency-domain Channel Attention (MSFCA) module that leverages frequency analysis to enhance features of low-contrast nodules; a Hierarchical Feature Fusion (HFF) module for efficient multi-scale integration; and Multi-Scale Deformable Attention (MSDA) to flexibly capture small and irregularly shaped nodules. We conducted extensive experiments on a clinical dataset of real-world thyroid ultrasound images. The results demonstrate that Nodule-DETR achieves state-of-the-art performance, outperforming the baseline model by a significant margin of 0.149 in mAP@0.5:0.95. The superior accuracy of Nodule-DETR highlights its significant potential for clinical application as an effective tool in computer-aided thyroid diagnosis. The code of work is available at https://github.com/wjj1wjj/Nodule-DETR.

ARXIV Cancer: pancreatic ductal adenocarcinoma and breast cancer Method: Retrieval-Augmented Generation

Clinical Knowledge Graph Construction and Evaluation with Multi-LLMs via Retrieval-Augmented Generation

Udiptaman Das, Krishnasai B. Atmakuri, Duy Ho, Chi Lee, Yugyung Lee
Published 2026-01-05 07:16

This paper presents an end-to-end framework for constructing and evaluating clinical knowledge graphs (KGs) from unstructured clinical narratives using multi-agent prompting and a Retrieval-Augmented Generation (KG-RAG) strategy. The method integrates various components including entity extraction, uncertainty scoring, schema generation, and validation to enhance the accuracy and semantic consistency of the KGs. The framework was applied to two oncology cohorts, demonstrating improvements in precision and relevance compared to baseline methods.

Read abstract

Large language models (LLMs) offer new opportunities for constructing knowledge graphs (KGs) from unstructured clinical narratives. However, existing approaches often rely on structured inputs and lack robust validation of factual accuracy and semantic consistency, limitations that are especially problematic in oncology. We introduce an end-to-end framework for clinical KG construction and evaluation directly from free text using multi-agent prompting and a schema-constrained Retrieval-Augmented Generation (KG-RAG) strategy. Our pipeline integrates (1) prompt-driven entity, attribute, and relation extraction; (2) entropy-based uncertainty scoring; (3) ontology-aligned RDF/OWL schema generation; and (4) multi-LLM consensus validation for hallucination detection and semantic refinement. Beyond static graph construction, the framework supports continuous refinement and self-supervised evaluation, enabling iterative improvement of graph quality. Applied to two oncology cohorts (PDAC and BRCA), our method produces interpretable, SPARQL-compatible, and clinically grounded knowledge graphs without relying on gold-standard annotations. Experimental results demonstrate consistent gains in precision, relevance, and ontology compliance over baseline methods.

ARXIV Cancer: breast cancer Method: dual-stream architecture

CTIS-QA: Clinical Template-Informed Slide-level Question Answering for Pathology

Hao Lu, Ziniu Qian, Yifu Li, Yang Zhou, Bingzheng Wei, Yan Xu
Published 2026-01-05 03:54

This paper presents a clinical diagnosis template-based pipeline designed to extract and structure pathological information from reports. The authors developed a Clinical Pathology Report Template (CPRT) to ensure standardized extraction of diagnostic elements, validated on TCGA-BRCA. They introduced CTIS-QA, a Slide-level Question Answering model that utilizes a dual-stream architecture to enhance diagnostic accuracy. Experimental results demonstrate that CTIS-QA outperforms existing models across various metrics.

Read abstract

In this paper, we introduce a clinical diagnosis template-based pipeline to systematically collect and structure pathological information. In collaboration with pathologists and guided by the the College of American Pathologists (CAP) Cancer Protocols, we design a Clinical Pathology Report Template (CPRT) that ensures comprehensive and standardized extraction of diagnostic elements from pathology reports. We validate the effectiveness of our pipeline on TCGA-BRCA. First, we extract pathological features from reports using CPRT. These features are then used to build CTIS-Align, a dataset of 80k slide-description pairs from 804 WSIs for vision-language alignment training, and CTIS-Bench, a rigorously curated VQA benchmark comprising 977 WSIs and 14,879 question-answer pairs. CTIS-Bench emphasizes clinically grounded, closed-ended questions (e.g., tumor grade, receptor status) that reflect real diagnostic workflows, minimize non-visual reasoning, and require genuine slide understanding. We further propose CTIS-QA, a Slide-level Question Answering model, featuring a dual-stream architecture that mimics pathologists' diagnostic approach. One stream captures global slide-level context via clustering-based feature aggregation, while the other focuses on salient local regions through attention-guided patch perception module. Extensive experiments on WSI-VQA, CTIS-Bench, and slide-level diagnostic tasks show that CTIS-QA consistently outperforms existing state-of-the-art models across multiple metrics. Code and data are available at https://github.com/HLSvois/CTIS-QA.

ARXIV Cancer: unknown Method: convolutional neural network

CAP-IQA: Context-Aware Prompt-Guided CT Image Quality Assessment

Kazi Ramisa Rifa, Jie Zhang, Abdullah Imran
Published 2026-01-04 17:30

The paper presents the Context-Aware Prompt-guided Image Quality Assessment (CAP-IQA) framework, which aims to improve CT image quality assessment by integrating text-level priors with instance-level context prompts. This method employs a CNN-based visual encoder alongside a domain-specific text encoder to evaluate diagnostic visibility and anatomical clarity in abdominal CT images. The CAP-IQA framework demonstrates superior performance on the 2023 LDCTIQA challenge benchmark, achieving a correlation score that surpasses the leading team. Additionally, the model shows generalizability in assessing image quality across a large dataset of pediatric CT images.

Read abstract

Prompt-based methods, which encode medical priors through descriptive text, have been only minimally explored for CT Image Quality Assessment (IQA). While such prompts can embed prior knowledge about diagnostic quality, they often introduce bias by reflecting idealized definitions that may not hold under real-world degradations such as noise, motion artifacts, or scanner variability. To address this, we propose the Context-Aware Prompt-guided Image Quality Assessment (CAP-IQA) framework, which integrates text-level priors with instance-level context prompts and applies causal debiasing to separate idealized knowledge from factual, image-specific degradations. Our framework combines a CNN-based visual encoder with a domain-specific text encoder to assess diagnostic visibility, anatomical clarity, and noise perception in abdominal CT images. The model leverages radiology-style prompts and context-aware fusion to align semantic and perceptual representations. On the 2023 LDCTIQA challenge benchmark, CAP-IQA achieves an overall correlation score of 2.8590 (sum of PLCC, SROCC, and KROCC), surpassing the top-ranked leaderboard team (2.7427) by 4.24%. Moreover, our comprehensive ablation experiments confirm that prompt-guided fusion and the simplified encoder-only design jointly enhance feature alignment and interpretability. Furthermore, evaluation on an in-house dataset of 91,514 pediatric CT images demonstrates the true generalizability of CAP-IQA in assessing perceptual fidelity in a different patient population.

ARXIV Cancer: general cancer Method: hypergraph diffusion

HyperNetWalk: A Unified Framework for Personalized and Population-Level Cancer Driver Gene Identification via Multi-Network Hypergraph Diffusion

Xueqing Xu, Yonghang Gao, Duanchen Sun, Ling-Yun Wu
Published 2026-01-04 02:49

The paper presents HyperNetWalk, a novel computational framework designed to identify cancer driver genes by integrating multiple biological networks and hypergraph diffusion. This method captures both personalized and cohort-level information through random walks on patient-specific subnetworks and refines predictions using hypergraph-based approaches. Evaluation across 12 TCGA cancer types shows that HyperNetWalk outperforms existing methods in identifying known driver genes and reveals cancer type-specific drivers, contributing to precision oncology.

Read abstract

Identifying cancer driver genes is crucial for understanding tumor biology and developing precision therapies. However, existing computational methods often rely on single biological networks or population-level mutation patterns, limiting their ability to identify patient-specific drivers and leverage the complementary information from multiple network types. Here, we present HyperNetWalk, a novel computational framework that integrates multiple biological networks and hypergraph diffusion to identify driver genes at both personalized and cohort levels. In the first stage, HyperNetWalk integrates protein-protein interaction networks, gene regulatory networks, and dynamic co-expression networks through sample-independent random walks on patient-specific subnetworks to capture topological importance and expression perturbation effects. In the second stage, it refines predictions through hypergraph-based random walks that leverage cross-sample information while preserving individual mutational contexts. Comprehensive evaluation on 12 TCGA cancer types demonstrates that HyperNetWalk achieves superior or competitive performance compared to state-of-the-art methods in both personalized and cohort-level predictions. Notably, HyperNetWalk successfully identifies known driver genes with high precision while revealing cancer type-specific drivers that reflect distinct biological mechanisms. Our framework provides a unified solution for personalized and population-based driver gene identification, offering valuable insights for precision oncology and therapeutic target discovery.

ARXIV Cancer: brain tumor Method: spectral-selective token mixer

S2M-Net: Spectral-Spatial Mixing for Medical Image Segmentation with Morphology-Aware Adaptive Loss

Md. Sanaullah Chowdhury Lameya Sabrin
Published 2026-01-03 21:03

This paper presents S2M-Net, a novel architecture for medical image segmentation that addresses the challenges of local precision, global context, and computational efficiency. The method incorporates a Spectral-Selective Token Mixer and a Morphology-Aware Adaptive Segmentation Loss to enhance performance while reducing the number of parameters. Evaluation across 16 medical imaging datasets shows S2M-Net achieving state-of-the-art results in polyp segmentation, surgical instrument detection, and brain tumor segmentation.

Read abstract

Medical image segmentation requires balancing local precision for boundary-critical clinical applications, global context for anatomical coherence, and computational efficiency for deployment on limited data and hardware a trilemma that existing architectures fail to resolve. Although convolutional networks provide local precision at $\mathcal{O}(n)$ cost but limited receptive fields, vision transformers achieve global context through $\mathcal{O}(n^2)$ self-attention at prohibitive computational expense, causing overfitting on small clinical datasets. We propose S2M-Net, a 4.7M-parameter architecture that achieves $\mathcal{O}(HW \log HW)$ global context through two synergistic innovations: (i) Spectral-Selective Token Mixer (SSTM), which exploits the spectral concentration of medical images via truncated 2D FFT with learnable frequency filtering and content-gated spatial projection, avoiding quadratic attention cost while maintaining global receptive fields; and (ii) Morphology-Aware Adaptive Segmentation Loss (MASL), which automatically analyzes structure characteristics (compactness, tubularity, irregularity, scale) to modulate five complementary loss components through constrained learnable weights, eliminating manual per-dataset tuning. Comprehensive evaluation in 16 medical imaging datasets that span 8 modalities demonstrates state-of-the-art performance: 96.12\% Dice on polyp segmentation, 83.77\% on surgical instruments (+17.85\% over the prior art) and 80.90\% on brain tumors, with consistent 3-18\% improvements over specialized baselines while using 3.5--6$\times$ fewer parameters than transformer-based methods.

ARXIV Cancer: unknown Method: deep learning

Enhancing Histopathological Image Classification via Integrated HOG and Deep Features with Robust Noise Performance

Ifeanyi Ezuma, Ugochukwu Ugwu
Published 2026-01-03 03:33

This study investigates the classification performance of machine learning and deep learning models on the LC25000 dataset, which consists of histopathological images. The fine-tuned InceptionResNet-v2 network was utilized for both classification and feature extraction, achieving a classification accuracy of 96.01% and an average AUC of 96.8%. The results indicate that models leveraging deep features significantly outperformed those using only the pre-trained network, with the Neural Network model achieving an AUC of 99.99%. Additionally, the study assessed model robustness under varying signal-to-noise ratio conditions.

Read abstract

The era of digital pathology has advanced histopathological examinations, making automated image analysis essential in clinical practice. This study evaluates the classification performance of machine learning and deep learning models on the LC25000 dataset, which includes five classes of histopathological images. We used the fine-tuned InceptionResNet-v2 network both as a classifier and for feature extraction. Our results show that the fine-tuned InceptionResNet-v2 achieved a classification accuracy of 96.01\% and an average AUC of 96.8\%. Models trained on deep features from InceptionResNet-v2 outperformed those using only the pre-trained network, with the Neural Network model achieving an AUC of 99.99\% and accuracy of 99.84\%. Evaluating model robustness under varying SNR conditions revealed that models using deep features exhibited greater resilience, particularly GBM and KNN. The combination of HOG and deep features showed enhanced performance, however, less so in noisy environments.

Find the papers that actually matter