Research Papers

ARXIV Cancer: acute lymphoblastic leukemia Method: attention-based convolutional neural network

Enhanced Leukemic Cell Classification Using Attention-Based CNN and Data Augmentation

Douglas Costa Braga, Daniel Oliveira Dantas
Published 2026-01-03 01:24

This study presents a deep learning pipeline for the classification of leukemic cells, specifically targeting acute lymphoblastic leukemia (ALL). The method utilizes an attention-based convolutional neural network, integrating EfficientNetV2-B3 with Squeeze-and-Excitation mechanisms, and employs data augmentation and focal loss to enhance performance. The system achieved a 97.89% F1-score and accuracy on the test set, demonstrating significant improvements over existing methods while reducing the number of parameters used.

Read abstract

We present a reproducible deep learning pipeline for leukemic cell classification, focusing on system architecture, experimental robustness, and software design choices for medical image analysis. Acute lymphoblastic leukemia (ALL) is the most common childhood cancer, requiring expert microscopic diagnosis that suffers from inter-observer variability and time constraints. The proposed system integrates an attention-based convolutional neural network combining EfficientNetV2-B3 with Squeeze-and-Excitation mechanisms for automated ALL cell classification. Our approach employs comprehensive data augmentation, focal loss for class imbalance, and patient-wise data splitting to ensure robust and reproducible evaluation. On the C-NMC 2019 dataset (12,528 original images from 62 patients), the system achieves a 97.89% F1-score and 97.89% accuracy on the test set, with statistical validation through 100-iteration Monte Carlo experiments confirming significant improvements (p < 0.001) over baseline methods. The proposed pipeline outperforms existing approaches by up to 4.67% while using 89% fewer parameters than VGG16 (15.2M vs. 138M). The attention mechanism provides interpretable visualizations of diagnostically relevant cellular features, demonstrating that modern attention-based architectures can improve leukemic cell classification while maintaining computational efficiency suitable for clinical deployment.

ARXIV Cancer: unknown Method: agentic AI framework

An Explainable Agentic AI Framework for Uncertainty-Aware and Abstention-Enabled Acute Ischemic Stroke Imaging Decisions

Md Rashadul Islam
Published 2026-01-03 00:10

This paper presents an explainable agentic AI framework designed for uncertainty-aware and abstention-enabled decision support in acute ischemic stroke imaging. The framework includes a modular pipeline with agents for image analysis, uncertainty estimation, and decision-making, prioritizing clinical safety and transparency. Through qualitative and case-based analyses, the framework demonstrates effective handling of diagnostically ambiguous situations, integrating visual explanations to enhance trust in AI outputs.

Read abstract

Artificial intelligence models have shown strong potential in acute ischemic stroke imaging, particularly for lesion detection and segmentation using computed tomography and magnetic resonance imaging. However, most existing approaches operate as black box predictors, producing deterministic outputs without explicit uncertainty awareness or structured mechanisms to abstain under ambiguous conditions. This limitation raises serious safety and trust concerns in high risk emergency radiology settings. In this paper, we propose an explainable agentic AI framework for uncertainty aware and abstention enabled decision support in acute ischemic stroke imaging. The framework follows a modular agentic pipeline in which a perception agent performs lesion aware image analysis, an uncertainty estimation agent computes slice level predictive reliability, and a decision agent determines whether to issue a prediction or abstain based on predefined uncertainty thresholds. Unlike prior stroke imaging systems that primarily focus on improving segmentation or classification accuracy, the proposed framework explicitly prioritizes clinical safety, transparency, and clinician aligned decision behavior. Qualitative and case based analyses across representative stroke imaging scenarios demonstrate that uncertainty driven abstention naturally emerges in diagnostically ambiguous regions and low information slices. The framework further integrates visual explanation mechanisms to support both predictive and abstention decisions, addressing a key limitation of existing uncertainty aware medical imaging systems. Rather than introducing a new performance benchmark, this work presents agentic control, uncertainty awareness, and selective abstention as essential design principles for developing safe and trustworthy medical imaging AI systems.

ARXIV Cancer: pancreatic cancer Method: scale-aware adaptive supervised network

Scale-aware Adaptive Supervised Network with Limited Medical Annotations

Zihan Li, Dandan Shan, Yunxiang Li, Paul E. Kinahan, Qingqi Hong
Published 2026-01-02 23:55

This paper presents SASNet, a Scale-aware Adaptive Supervised Network designed to improve medical image segmentation in semi-supervised learning scenarios with limited annotations. The proposed dual-branch architecture integrates low-level and high-level feature representations through innovative mechanisms such as dynamic pixel-wise prediction weighting and 3D Fourier domain transformations. Evaluation on multiple datasets shows that SASNet outperforms existing semi-supervised methods and approaches the performance of fully supervised models.

Read abstract

Medical image segmentation faces critical challenges in semi-supervised learning scenarios due to severe annotation scarcity requiring expert radiological knowledge, significant inter-annotator variability across different viewpoints and expertise levels, and inadequate multi-scale feature integration for precise boundary delineation in complex anatomical structures. Existing semi-supervised methods demonstrate substantial performance degradation compared to fully supervised approaches, particularly in small target segmentation and boundary refinement tasks. To address these fundamental challenges, we propose SASNet (Scale-aware Adaptive Supervised Network), a dual-branch architecture that leverages both low-level and high-level feature representations through novel scale-aware adaptive reweight mechanisms. Our approach introduces three key methodological innovations, including the Scale-aware Adaptive Reweight strategy that dynamically weights pixel-wise predictions using temporal confidence accumulation, the View Variance Enhancement mechanism employing 3D Fourier domain transformations to simulate annotation variability, and segmentation-regression consistency learning through signed distance map algorithms for enhanced boundary precision. These innovations collectively address the core limitations of existing semi-supervised approaches by integrating spatial, temporal, and geometric consistency principles within a unified optimization framework. Comprehensive evaluation across LA, Pancreas-CT, and BraTS datasets demonstrates that SASNet achieves superior performance with limited labeled data, surpassing state-of-the-art semi-supervised methods while approaching fully supervised performance levels. The source code for SASNet is available at https://github.com/HUANGLIZI/SASNet.

ARXIV Cancer: skin cancer Method: deep learning

A Deep Learning Approach for Automated Skin Lesion Diagnosis with Explainable AI

Md. Maksudul Haque, Rahnuma Akter, A S M Ahsanul Sarkar Akib, Abdul Hasib
Published 2026-01-02 19:21

This paper presents a deep learning architecture for the automated classification of skin lesions using the HAM10000 dataset. The proposed system integrates advanced data balancing, augmentation techniques, and a hybrid EfficientNetV2-L framework with channel attention. The model achieves a total accuracy of 91.15% and demonstrates high performance across various lesion classes, particularly in identifying melanoma and melanocytic nevi. Additionally, the use of explainable AI techniques enhances the interpretability of the model's predictions.

Read abstract

Skin cancer is also one of the most common and dangerous types of cancer in the world that requires timely and precise diagnosis. In this paper, a deep-learning architecture of the multi-class skin lesion classification on the HAM10000 dataset will be described. The system suggested combines high-quality data balancing methods, large-scale data augmentation, hybridized EfficientNetV2-L framework with channel attention, and a three-stage progressive learning approach. Moreover, we also use explainable AI (XAI) techniques such as Grad-CAM and saliency maps to come up with intelligible visual representations of model predictions. Our strategy is with a total accuracy of 91.15 per cent, macro F1 of 85.45\% and micro-average AUC of 99.33\%. The model has shown high performance in all the seven lesion classes with specific high performance of melanoma and melanocytic nevi. In addition to enhancing diagnostic transparency, XAI also helps to find out the visual characteristics that cause the classifications, which enhances clinical trustworthiness.

ARXIV Cancer: general cancer Method: vision-language model

Detecting Performance Degradation under Data Shift in Pathology Vision-Language Model

Hao Guan, Li Zhou
Published 2026-01-02 15:12

This study investigates the detection of performance degradation in Vision-Language Models (VLMs) used for pathology under data shift conditions. It introduces DomainSAT, a toolbox for analyzing input data shifts and proposes a label-free, confidence-based indicator for monitoring output performance. The findings indicate that combining input shift detection with output confidence indicators enhances the reliability of VLMs in tumor classification tasks.

Read abstract

Vision-Language Models have demonstrated strong potential in medical image analysis and disease diagnosis. However, after deployment, their performance may deteriorate when the input data distribution shifts from that observed during development. Detecting such performance degradation is essential for clinical reliability, yet remains challenging for large pre-trained VLMs operating without labeled data. In this study, we investigate performance degradation detection under data shift in a state-of-the-art pathology VLM. We examine both input-level data shift and output-level prediction behavior to understand their respective roles in monitoring model reliability. To facilitate systematic analysis of input data shift, we develop DomainSAT, a lightweight toolbox with a graphical interface that integrates representative shift detection algorithms and enables intuitive exploration of data shift. Our analysis shows that while input data shift detection is effective at identifying distributional changes and providing early diagnostic signals, it does not always correspond to actual performance degradation. Motivated by this observation, we further study output-based monitoring and introduce a label-free, confidence-based degradation indicator that directly captures changes in model prediction confidence. We find that this indicator exhibits a close relationship with performance degradation and serves as an effective complement to input shift detection. Experiments on a large-scale pathology dataset for tumor classification demonstrate that combining input data shift detection and output confidence-based indicators enables more reliable detection and interpretation of performance degradation in VLMs under data shift. These findings provide a practical and complementary framework for monitoring the reliability of foundation models in digital pathology.

ARXIV Cancer: melanoma Method: convolutional neural network

The Impact of Lesion Focus on the Performance of AI-Based Melanoma Classification

Tanay Donde
Published 2026-01-01 14:17

This study investigates the impact of lesion focus on the performance of AI-based melanoma classification. It employs convolutional neural networks and explores the relationship between lesion attention and diagnostic accuracy using various analytical methods. The results indicate that models with better alignment to lesion areas yield improved diagnostic performance, highlighting the importance of interpretable AI in medical diagnostics.

Read abstract

Melanoma is the most lethal subtype of skin cancer, and early and accurate detection of this disease can greatly improve patients' outcomes. Although machine learning models, especially convolutional neural networks (CNNs), have shown great potential in automating melanoma classification, their diagnostic reliability still suffers due to inconsistent focus on lesion areas. In this study, we analyze the relationship between lesion attention and diagnostic performance, involving masked images, bounding box detection, and transfer learning. We used multiple explainability and sensitivity analysis approaches to investigate how well models aligned their attention with lesion areas and how this alignment correlated with precision, recall, and F1-score. Results showed that models with a higher focus on lesion areas achieved better diagnostic performance, suggesting the potential of interpretable AI in medical diagnostics. This study provides a foundation for developing more accurate and trustworthy melanoma classification models in the future.

ARXIV Cancer: general cancer Method: self-supervised learning

TotalFM: An Organ-Separated Framework for 3D-CT Vision Foundation Models

Kohei Yamamoto, Tomohiro Kikuchi
Published 2026-01-01 08:27

This study introduces TotalFM, a radiological foundation model designed to efficiently learn the relationship between 3D-CT images and linguistic expressions through organ separation. Utilizing a large-scale dataset and advanced techniques such as segmentation and Large Language Model processing, the model balances computational efficiency with representation capability. The results indicate that TotalFM outperforms existing models in zero-shot lesion classification tasks, demonstrating its potential for practical applications in radiology.

Read abstract

While foundation models in radiology are expected to be applied to various clinical tasks, computational cost constraints remain a major challenge when training on 3D-CT volumetric data. In this study, we propose TotalFM, a radiological foundation model that efficiently learns the correspondence between 3D-CT images and linguistic expressions based on the concept of organ separation, utilizing a large-scale dataset of 140,000 series. By automating the creation of organ volume and finding-sentence pairs through segmentation techniques and Large Language Model (LLM)-based radiology report processing, and by combining self-supervised pre-training via VideoMAE with contrastive learning using volume-text pairs, we aimed to balance computational efficiency and representation capability. In zero-shot organ-wise lesion classification tasks, the proposed model achieved higher F1 scores in 83% (5/6) of organs compared to CT-CLIP and 64% (9/14) of organs compared to Merlin. These results suggest that the proposed model exhibits high generalization performance in a clinical evaluation setting using actual radiology report sentences. Furthermore, in zero-shot finding-wise lesion classification tasks, our model achieved a higher AUROC in 83% (25/30) of finding categories compared to Merlin. We also confirmed performance comparable to existing Vision-Language Models (VLMs) in radiology report generation tasks. Our results demonstrate that the organ-separated learning framework can serve as a realistic and effective design guideline for the practical implementation of 3D-CT foundation models.

PUBMED Cancer: laryngeal cancer Method: unknown

Functional Voice Restoration After Laryngeal Transplantation: A Multidisciplinary Protocol and Longitudinal Outcomes.

Bin Zeng, Hailing Gu, Zheng Jiang, Mailudan Ainiwaer, Yitao Zheng, Jimin Yang, Jia Ren, Fei Chen
Published 2026-01-01 00:00

This study presents a protocolized framework for voice rehabilitation following laryngeal transplantation, addressing the lack of standardized approaches in this area. It details the experiences of four male patients, three of whom had laryngeal cancer, undergoing structured assessments and personalized rehabilitation. The results indicate significant improvements in vocal function over time, particularly with the implementation of neuromuscular reinnervation strategies. The findings aim to guide evidence-based rehabilitation practices for laryngeal transplantation.

Read abstract

Laryngeal transplantation offers the potential for patients to regain vocal function, yet standardised voice rehabilitation protocols are lacking. We share the experience of our team in the regular follow-up of voice function evaluation and address this gap by establishing a multidisciplinary pathway for functional recovery. Four male transplant recipients (3 laryngeal cancers, 1 hypopharyngeal cancer) underwent protocolized assessments at 1/3/6/8 months post-op: subjective assessment (GRBAS scale) and objective evaluation (multiparametric acoustic analysis and electronic laryngoscopy). Personalized rehabilitation was delivered weekly by a licensed speech therapist. Protocol evolution occurred: Patients 1-2 received conventional training; Patients 3-4 received intensive neuromuscular reinnervation strategies. The voice of the four patients showed a gradual decrease in the degree of hoarseness, a gradual alleviation of breathiness, and a gradual decrease in asthenia score, with the overall condition improving. The MPT was about 1.8 s at 1 month after surgery which kept increasing in all patients. The 3rd patient, who performed the best among the 4 patients, had an MPT of more than 10 s at 8 months after surgery. Laryngeal mucosa sensory function was gradually established in patients starting 3 months after operation, and compensatory vibration of ventricular band appeared at 8 months after operation with the assistance of voice training. This study anchored to neuromuscular reinnervation milestones demonstrates that standardised evaluations coupled with individualized training progressively restore vocal function. Our protocolized framework guides evidence-based rehabilitation for institutions pursuing laryngeal transplantation WHAT THIS PAPER ADDS: What is already known on this subject Laryngeal transplantation surgically restores laryngeal anatomy but faces functional recovery challenges due to delayed neuromuscular reinnervation. Existing literature focuses predominantly on immunosuppression and graft viability, with sparse evidence guiding postoperative voice rehabilitation. Standardised protocols for phonatory recovery-routine in other neurogenic voice disorders (e.g., vocal fold paralysis)-are absent. Fewer than 20 human cases have been reported globally, and only two publications detail voice outcomes. Consequently, rehabilitation strategies remain ad hoc, lacking consensus on intervention timing, exercise biomarkers, or psychological support frameworks. What this study adds to existing knowledge This study establishes the first protocolized voice rehabilitation framework for laryngeal transplantation, anchored to neuromuscular milestones: Pharyngeal reflex recovery (3 months) signalling sensory reinnervation; Ventricular band compensation (8 months) indicating motor adaptation. We demonstrate that early, structured rehabilitation (initiated at 1 month) enables significant voice restoration (MPT: 1.8 s → >10 s). Critically, we identify modular design principles accommodating clinical interruptions (e.g., ICU admissions) without compromising core outcomes. We anticipate these findings will guide evidence-based rehabilitation for institutions pursuing laryngeal transplantation and inform standardised pathways for complex laryngologic rehabilitation. What are the potential or actual clinical implications of this work? Rehabilitation Standardization: Provides evidence-based timelines (1/3/6/8-month assessments) and neuromuscular biomarkers to guide intervention intensity. Broad Applicability: The protocol shows cross-utility for bilateral vocal fold paralysis and post-traumatic neurogenic dysphonia, leveraging shared reinnervation mechanisms. Contingency Management: Modular training design maintains efficacy despite clinical interruptions (e.g., 40% cohort ICU/oncology transfers). Technology Integration: Validates objective metrics (MPT, mucosal wave symmetry) as targets for future AI-assisted biofeedback tools. Clinicians should prioritise early sensorimotor retraining (<3 months) while monitoring compensatory strategies (ventricular vibration) as functional proxies.

ARXIV Cancer: head and neck cancer Method: rank-based method

friends.test: rank-based method for feature selection in interaction matrices

Alexandra Suvorikova, Alexey Kroshnin, Dmirijs Lvovs, Vera Mukhina, Andrey Mironov, Elana J. Fertig, Ludmila Danilova, Alexander Favorov
Published 2025-12-31 13:03

This paper presents friends.test, a rank-based method designed to enhance feature selection in interaction matrices, particularly in the context of identifying specific interactions amidst background noise. The method utilizes model fitting to detect structural breaks in entity interactions, allowing for the integration of heterogeneous data sources. The effectiveness of friends.test is demonstrated using transnational data from head and neck cancer.

Read abstract

The analysis of the interaction matrix between two distinct sets is essential across diverse fields, from pharmacovigilance to transcriptomics. Not all interactions are equally informative: a marker gene associated with a few specific biological processes is more informative than a highly expressed non-specific gene associated with most observed processes. Identifying these interactions is challenging due to background connections. Furthermore, data heterogeneity across sources precludes universal identification criteria. To address this challenge, we introduce \textsf{friends.test}, a method for identifying specificity by detecting structural breaks in entity interactions. Rank-based representation of the interaction matrix ensures invariance to heterogeneous data and allows for integrating data from diverse sources. To automatically locate the boundary between specific interactions and background activity, we employ model fitting. We demonstrate the applicability of \textsf{friends.test} on the GSE112026 -- transnational data from head and neck cancer. A computationally efficient \textsf{R} implementation is available at https://github.com/favorov/friends.test.

ARXIV Cancer: unknown Method: vision transformer

VL-OrdinalFormer: Vision Language Guided Ordinal Transformers for Interpretable Knee Osteoarthritis Grading

Zahid Ullah, Jihie Kim
Published 2025-12-31 03:01

This study introduces VLOrdinalFormer, a vision language guided ordinal learning framework designed for the automated grading of knee osteoarthritis (KOA) using knee radiographs. The method integrates a ViT L16 backbone with CORAL based ordinal regression and a CLIP driven semantic alignment module, enhancing the model's ability to interpret subtle radiographic distinctions. Experimental results on the OAI kneeKL224 dataset demonstrate that VLOrdinalFormer outperforms existing CNN and ViT baselines, particularly in accurately classifying early disease stages.

Read abstract

Knee osteoarthritis (KOA) is a leading cause of disability worldwide, and accurate severity assessment using the Kellgren Lawrence (KL) grading system is critical for clinical decision making. However, radiographic distinctions between early disease stages, particularly KL1 and KL2, are subtle and frequently lead to inter-observer variability among radiologists. To address these challenges, we propose VLOrdinalFormer, a vision language guided ordinal learning framework for fully automated KOA grading from knee radiographs. The proposed method combines a ViT L16 backbone with CORAL based ordinal regression and a Contrastive Language Image Pretraining (CLIP) driven semantic alignment module, allowing the model to incorporate clinically meaningful textual concepts related to joint space narrowing, osteophyte formation, and subchondral sclerosis. To improve robustness and mitigate overfitting, we employ stratified five fold cross validation, class aware re weighting to emphasize challenging intermediate grades, and test time augmentation with global threshold optimization. Experiments conducted on the publicly available OAI kneeKL224 dataset demonstrate that VLOrdinalFormer achieves state of the art performance, outperforming CNN and ViT baselines in terms of macro F1 score and overall accuracy. Notably, the proposed framework yields substantial performance gains for KL1 and KL2 without compromising classification accuracy for mild or severe cases. In addition, interpretability analyses using Grad CAM and CLIP similarity maps confirm that the model consistently attends to clinically relevant anatomical regions. These results highlight the potential of vision language aligned ordinal transformers as reliable and interpretable tools for KOA grading and disease progression assessment in routine radiological practice.

Find the papers that actually matter