Research Papers

ARXIV Cancer: unknown Method: convolutional neural network

Deep Learning Approach for the Diagnosis of Pediatric Pneumonia Using Chest X-ray Imaging

Fatemeh Hosseinabadi, Mohammad Mojtaba Rohani
Published 2025-12-31 00:07

This study explores the use of convolutional neural networks (CNNs) for the automated classification of pediatric chest X-ray images to diagnose pneumonia. Three CNN architectures—ResNetRS, RegNet, and EfficientNetV2—were evaluated using transfer learning on a curated dataset of 1,000 images. RegNet demonstrated the highest classification performance with an accuracy of 92.4%.

Read abstract

Pediatric pneumonia remains a leading cause of morbidity and mortality in children worldwide. Timely and accurate diagnosis is critical but often challenged by limited radiological expertise and the physiological and procedural complexity of pediatric imaging. This study investigates the performance of state-of-the-art convolutional neural network (CNN) architectures ResNetRS, RegNet, and EfficientNetV2 using transfer learning for the automated classification of pediatric chest Xray images as either pneumonia or normal.A curated subset of 1,000 chest X-ray images was extracted from a publicly available dataset originally comprising 5,856 pediatric images. All images were preprocessed and labeled for binary classification. Each model was fine-tuned using pretrained ImageNet weights and evaluated based on accuracy and sensitivity. RegNet achieved the highest classification performance with an accuracy of 92.4 and a sensitivity of 90.1, followed by ResNetRS (accuracy: 91.9, sensitivity: 89.3) and EfficientNetV2 (accuracy: 88.5, sensitivity: 88.1).

ARXIV Cancer: unknown Method: YOLOv5 and YOLOv8

Using Large Language Models To Translate Machine Results To Human Results

Trishna Niraula, Jonathan Stubblefield
Published 2025-12-30 23:32

This study presents a novel pipeline that combines YOLOv5 and YOLOv8 for anomaly detection in chest X-ray images with a large language model (LLM) to generate natural-language radiology reports. The integration aims to improve the translation of structured AI predictions into comprehensive diagnostic narratives. Results indicate that the AI-generated reports exhibit strong semantic similarity to human-authored reports, although there are stylistic differences.

Read abstract

Artificial intelligence (AI) has transformed medical imaging, with computer vision (CV) systems achieving state-of-the-art performance in classification and detection tasks. However, these systems typically output structured predictions, leaving radiologists responsible for translating results into full narrative reports. Recent advances in large language models (LLMs), such as GPT-4, offer new opportunities to bridge this gap by generating diagnostic narratives from structured findings. This study introduces a pipeline that integrates YOLOv5 and YOLOv8 for anomaly detection in chest X-ray images with a large language model (LLM) to generate natural-language radiology reports. The YOLO models produce bounding-box predictions and class labels, which are then passed to the LLM to generate descriptive findings and clinical summaries. YOLOv5 and YOLOv8 are compared in terms of detection accuracy, inference latency, and the quality of generated text, as measured by cosine similarity to ground-truth reports. Results show strong semantic similarity between AI and human reports, while human evaluation reveals GPT-4 excels in clarity (4.88/5) but exhibits lower scores for natural writing flow (2.81/5), indicating that current systems achieve clinical accuracy but remain stylistically distinguishable from radiologist-authored text.

ARXIV Cancer: lung cancer Method: deep learning

Virtual-Eyes: Quantitative Validation of a Lung CT Quality-Control Pipeline for Foundation-Model Cancer Risk Prediction

Md. Enamul Hoq, Linda Larson-Prior, Fred Prior
Published 2025-12-30 15:34

This study presents Virtual-Eyes, a quality-control pipeline designed for low-dose CT lung cancer screening. The pipeline enhances the performance of generalist foundation models in cancer risk prediction by enforcing strict imaging standards and preprocessing techniques. Results indicate that Virtual-Eyes significantly improves the predictive accuracy of the RAD-DINO model while negatively impacting specialist models, highlighting the importance of tailored preprocessing in AI workflows.

Read abstract

Robust preprocessing is rarely quantified in deep-learning pipelines for low-dose CT (LDCT) lung cancer screening. We develop and validate Virtual-Eyes, a clinically motivated 16-bit CT quality-control pipeline, and measure its differential impact on generalist foundation models versus specialist models. Virtual-Eyes enforces strict 512x512 in-plane resolution, rejects short or non-diagnostic series, and extracts a contiguous lung block using Hounsfield-unit filtering and bilateral lung-coverage scoring while preserving the native 16-bit grid. Using 765 NLST patients (182 cancer, 583 non-cancer), we compute slice-level embeddings from RAD-DINO and Merlin with frozen encoders and train leakage-free patient-level MLP heads; we also evaluate Sybil and a 2D ResNet-18 baseline under Raw versus Virtual-Eyes inputs without backbone retraining. Virtual-Eyes improves RAD-DINO slice-level AUC from 0.576 to 0.610 and patient-level AUC from 0.646 to 0.683 (mean pooling) and from 0.619 to 0.735 (max pooling), with improved calibration (Brier score 0.188 to 0.112). In contrast, Sybil and ResNet-18 degrade under Virtual-Eyes (Sybil AUC 0.886 to 0.837; ResNet-18 AUC 0.571 to 0.596) with evidence of context dependence and shortcut learning, and Merlin shows limited transferability (AUC approximately 0.507 to 0.567) regardless of preprocessing. These results demonstrate that anatomically targeted QC can stabilize and improve generalist foundation-model workflows but may disrupt specialist models adapted to raw clinical context.

ARXIV Cancer: general cancer Method: generative framework

One-shot synthesis of rare gastrointestinal lesions improves diagnostic accuracy and clinical training

Jia Yu, Yan Zhu, Peiyao Fu, Tianyi Chen, Zhihua Wang, Fei Wu, Quanlin Li, Pinghong Zhou, Shuo Wang, Xian Yang
Published 2025-12-30 15:07

The study introduces EndoRare, a generative framework designed to synthesize high-fidelity exemplars of rare gastrointestinal lesions from a single reference image. By employing language-guided concept disentanglement, the method enhances the training of AI classifiers and improves diagnostic accuracy for novice clinicians. Validation across four rare pathologies showed significant improvements in recall and precision when using synthetic images for data augmentation.

Read abstract

Rare gastrointestinal lesions are infrequently encountered in routine endoscopy, restricting the data available for developing reliable artificial intelligence (AI) models and training novice clinicians. Here we present EndoRare, a one-shot, retraining-free generative framework that synthesizes diverse, high-fidelity lesion exemplars from a single reference image. By leveraging language-guided concept disentanglement, EndoRare separates pathognomonic lesion features from non-diagnostic attributes, encoding the former into a learnable prototype embedding while varying the latter to ensure diversity. We validated the framework across four rare pathologies (calcifying fibrous tumor, juvenile polyposis syndrome, familial adenomatous polyposis, and Peutz-Jeghers syndrome). Synthetic images were judged clinically plausible by experts and, when used for data augmentation, significantly enhanced downstream AI classifiers, improving the true positive rate at low false-positive rates. Crucially, a blinded reader study demonstrated that novice endoscopists exposed to EndoRare-generated cases achieved a 0.400 increase in recall and a 0.267 increase in precision. These results establish a practical, data-efficient pathway to bridge the rare-disease gap in both computer-aided diagnostics and clinical education.

ARXIV Cancer: brain tumor Method: meta-guided multi-modal learning

MGML: A Plug-and-Play Meta-Guided Multi-Modal Learning Framework for Incomplete Multimodal Brain Tumor Segmentation

Yulong Zou, Bo Liu, Cun-Jing Zheng, Yuan-ming Geng, Siyue Li, Qiankun Zuo, Shuihua Wang, Yudong Zhang, Jin Hong
Published 2025-12-30 01:37

This paper presents a novel meta-guided multi-modal learning (MGML) framework aimed at improving brain tumor segmentation using incomplete multimodal MRI data. The framework includes a meta-parameterized adaptive modality fusion component and a consistency regularization module to enhance segmentation performance. Experimental results on the BraTS2020 and BraTS2023 datasets demonstrate that the proposed method outperforms several state-of-the-art techniques, achieving high Dice scores for various tumor types.

Read abstract

Leveraging multimodal information from Magnetic Resonance Imaging (MRI) plays a vital role in lesion segmentation, especially for brain tumors. However, in clinical practice, multimodal MRI data are often incomplete, making it challenging to fully utilize the available information. Therefore, maximizing the utilization of this incomplete multimodal information presents a crucial research challenge. We present a novel meta-guided multi-modal learning (MGML) framework that comprises two components: meta-parameterized adaptive modality fusion and consistency regularization module. The meta-parameterized adaptive modality fusion (Meta-AMF) enables the model to effectively integrate information from multiple modalities under varying input conditions. By generating adaptive soft-label supervision signals based on the available modalities, Meta-AMF explicitly promotes more coherent multimodal fusion. In addition, the consistency regularization module enhances segmentation performance and implicitly reinforces the robustness and generalization of the overall framework. Notably, our approach does not alter the original model architecture and can be conveniently integrated into the training pipeline for end-to-end model optimization. We conducted extensive experiments on the public BraTS2020 and BraTS2023 datasets. Compared to multiple state-of-the-art methods from previous years, our method achieved superior performance. On BraTS2020, for the average Dice scores across fifteen missing modality combinations, building upon the baseline, our method obtained scores of 87.55, 79.36, and 62.67 for the whole tumor (WT), the tumor core (TC), and the enhancing tumor (ET), respectively. We have made our source code publicly available at https://github.com/worldlikerr/MGML.

ARXIV Cancer: pancreatic neoplasm Method: Vision Transformer

Scalable Residual Feature Aggregation Framework with Hybrid Metaheuristic Optimization for Robust Early Pancreatic Neoplasm Detection in Multimodal CT Imaging

Janani Annur Thiruvengadam, Kiran Mayee Nabigaru, Anusha Kovi
Published 2025-12-29 16:51

This study presents a Scalable Residual Feature Aggregation (SRFA) framework aimed at improving the early detection of pancreatic neoplasms using multimodal CT imaging. The framework employs a combination of preprocessing, segmentation with MAGRes-UNet, and feature extraction using DenseNet-121, enhanced by a hybrid metaheuristic optimization strategy. Experimental results demonstrate significant performance improvements, achieving 96.23% accuracy and outperforming traditional CNNs and contemporary transformer-based models.

Read abstract

The early detection of pancreatic neoplasm is a major clinical dilemma, and it is predominantly so because tumors are likely to occur with minimal contrast margins and a large spread anatomy-wide variation amongst patients on a CT scan. These complexities require to be addressed with an effective and scalable system that can assist in enhancing the salience of the subtle visual cues and provide a high level of the generalization on the multimodal imaging data. A Scalable Residual Feature Aggregation (SRFA) framework is proposed to be used to meet these conditions in this study. The framework integrates a pipeline of preprocessing followed by the segmentation using the MAGRes-UNet that is effective in making the pancreatic structures and isolating regions of interest more visible. DenseNet-121 performed with residual feature storage is used to extract features to allow deep hierarchical features to be aggregated without properties loss. To go further, hybrid HHO-BA metaheuristic feature selection strategy is used, which guarantees the best feature subset refinement. To be classified, the system is trained based on a new hybrid model that integrates the ability to pay attention on the world, which is the Vision Transformer (ViT) with the high representational efficiency of EfficientNet-B3. A dual optimization mechanism incorporating SSA and GWO is used to fine-tune hyperparameters to enhance greater robustness and less overfitting. Experimental results support the significant improvement in performance, with the suggested model reaching 96.23% accuracy, 95.58% F1-score and 94.83% specificity, the model is significantly better than the traditional CNNs and contemporary transformer-based models. Such results highlight the possibility of the SRFA framework as a useful instrument in the early detection of pancreatic tumors.

ARXIV Cancer: general cancer Method: multimodal learning

MedGemma vs GPT-4: Open-Source and Proprietary Zero-shot Medical Disease Classification from Images

Md. Sazzadul Islam Prottasha, Nabil Walid Rafi
Published 2025-12-29 08:48

This study compares two AI architectures, MedGemma and GPT-4, for medical disease classification from images, focusing on their diagnostic capabilities. The MedGemma model, fine-tuned with Low-Rank Adaptation, achieved a mean test accuracy of 80.37%, outperforming the untuned GPT-4. The findings highlight the importance of domain-specific fine-tuning in enhancing diagnostic sensitivity, particularly in high-stakes clinical tasks such as cancer detection.

Read abstract

Multimodal Large Language Models (LLMs) introduce an emerging paradigm for medical imaging by interpreting scans through the lens of extensive clinical knowledge, offering a transformative approach to disease classification. This study presents a critical comparison between two fundamentally different AI architectures: the specialized open-source agent MedGemma and the proprietary large multimodal model GPT-4 for diagnosing six different diseases. The MedGemma-4b-it model, fine-tuned using Low-Rank Adaptation (LoRA), demonstrated superior diagnostic capability by achieving a mean test accuracy of 80.37% compared to 69.58% for the untuned GPT-4. Furthermore, MedGemma exhibited notably higher sensitivity in high-stakes clinical tasks, such as cancer and pneumonia detection. Quantitative analysis via confusion matrices and classification reports provides comprehensive insights into model performance across all categories. These results emphasize that domain-specific fine-tuning is essential for minimizing hallucinations in clinical implementation, positioning MedGemma as a sophisticated tool for complex, evidence-based medical reasoning.

ARXIV Cancer: unknown Method: Deviation-Space Diffusion Model

PathoSyn: Imaging-Pathology MRI Synthesis via Disentangled Deviation Diffusion

Jian Wang, Sixing Rong, Jiarui Xing, Yuling Xu, Weide Liu
Published 2025-12-29 01:13

PathoSyn is a generative framework for MRI image synthesis that reformulates imaging-pathology as a disentangled additive deviation on a stable anatomical manifold. It addresses limitations of current generative models by decomposing the synthesis task into anatomical reconstruction and deviation modeling, utilizing a Deviation-Space Diffusion Model. The framework aims to generate high-fidelity patient-specific synthetic datasets, enhancing the development of diagnostic algorithms and supporting precision intervention planning. Evaluations indicate that PathoSyn outperforms existing methods in perceptual realism and anatomical fidelity.

Read abstract

We present PathoSyn, a unified generative framework for Magnetic Resonance Imaging (MRI) image synthesis that reformulates imaging-pathology as a disentangled additive deviation on a stable anatomical manifold. Current generative models typically operate in the global pixel domain or rely on binary masks, these paradigms often suffer from feature entanglement, leading to corrupted anatomical substrates or structural discontinuities. PathoSyn addresses these limitations by decomposing the synthesis task into deterministic anatomical reconstruction and stochastic deviation modeling. Central to our framework is a Deviation-Space Diffusion Model designed to learn the conditional distribution of pathological residuals, thereby capturing localized intensity variations while preserving global structural integrity by construction. To ensure spatial coherence, the diffusion process is coupled with a seam-aware fusion strategy and an inference-time stabilization module, which collectively suppress boundary artifacts and produce high-fidelity internal lesion heterogeneity. PathoSyn provides a mathematically principled pipeline for generating high-fidelity patient-specific synthetic datasets, facilitating the development of robust diagnostic algorithms in low-data regimes. By allowing interpretable counterfactual disease progression modeling, the framework supports precision intervention planning and provides a controlled environment for benchmarking clinical decision-support systems. Quantitative and qualitative evaluations on tumor imaging benchmarks demonstrate that PathoSyn significantly outperforms holistic diffusion and mask-conditioned baselines in both perceptual realism and anatomical fidelity. The source code of this work will be made publicly available.

ARXIV Cancer: melanoma Method: deep learning

INTERACT-CMIL: Multi-Task Shared Learning and Inter-Task Consistency for Conjunctival Melanocytic Intraepithelial Lesion Grading

Mert Ikinci, Luna Toma, Karin U. Loeffler, Leticia Ussem, Daniela Süsskind, Julia M. Weller, Yousef Yeganeh, Martina C. Herwig-Carl, Shadi Albarqouni
Published 2025-12-27 17:37

The paper presents INTERACT-CMIL, a multi-head deep learning framework designed for the grading of Conjunctival Melanocytic Intraepithelial Lesions (CMIL). This framework predicts five histopathological axes using Shared Feature Learning and an Inter-Dependence Loss to ensure consistency across tasks. Evaluated on a dataset of 486 expert-annotated conjunctival biopsy patches, INTERACT-CMIL demonstrates significant improvements over existing CNN and foundation-model baselines, achieving notable macro F1 score gains. The results indicate its potential as a reproducible benchmark for CMIL diagnosis.

Read abstract

Accurate grading of Conjunctival Melanocytic Intraepithelial Lesions (CMIL) is essential for treatment and melanoma prediction but remains difficult due to subtle morphological cues and interrelated diagnostic criteria. We introduce INTERACT-CMIL, a multi-head deep learning framework that jointly predicts five histopathological axes; WHO4, WHO5, horizontal spread, vertical spread, and cytologic atypia, through Shared Feature Learning with Combinatorial Partial Supervision and an Inter-Dependence Loss enforcing cross-task consistency. Trained and evaluated on a newly curated, multi-center dataset of 486 expert-annotated conjunctival biopsy patches from three university hospitals, INTERACT-CMIL achieves consistent improvements over CNN and foundation-model (FM) baselines, with relative macro F1 gains up to 55.1% (WHO4) and 25.0% (vertical spread). The framework provides coherent, interpretable multi-criteria predictions aligned with expert grading, offering a reproducible computational benchmark for CMIL diagnosis and a step toward standardized digital ocular pathology.

ARXIV Cancer: lung cancer Method: deep learning

Leveraging Machine Learning for Early Detection of Lung Diseases

Bahareh Rahmani, Harsha Reddy Bindela, Rama Kanth Reddy Gosula, Krishna Yedubati, Mohammad Amir Salari, Leslie Hinyard, Payam Norouzzadeh, Eli Snir, Martin Schoen
Published 2025-12-27 16:50

This study explores the application of deep learning methods for the early detection of lung diseases, including lung cancer, using chest x-rays. By integrating traditional image processing with advanced neural networks, the research aims to provide rapid and accurate diagnostic solutions. The models were trained and validated, demonstrating high performance metrics such as accuracy, precision, recall, and F1 scores, indicating their reliability for real-world applications.

Read abstract

A combination of traditional image processing methods with advanced neural networks concretes a predictive and preventive healthcare paradigm. This study offers rapid, accurate, and non-invasive diagnostic solutions that can significantly impact patient outcomes, particularly in areas with limited access to radiologists and healthcare resources. In this project, deep learning methods apply in enhancing the diagnosis of respiratory diseases such as COVID-19, lung cancer, and pneumonia from chest x-rays. We trained and validated various neural network models, including CNNs, VGG16, InceptionV3, and EfficientNetB0, with high accuracy, precision, recall, and F1 scores to highlight the models' reliability and potential in real-world diagnostic applications.

Find the papers that actually matter