Research Papers

ARXIV Cancer: breast cancer Method: multiple instance learning

WSD-MIL: Window Scale Decay Multiple Instance Learning for Whole Slide Image Classification

Le Feng, Li Xiao
Published 2025-12-23 02:10

This paper presents WSD-MIL, a novel approach that enhances multiple instance learning (MIL) for whole slide image classification in computational pathology. The method addresses the limitations of existing Transformer-based MIL approaches by introducing a window scale decay attention module and a squeeze-and-excitation based region gate module. Experimental results indicate that WSD-MIL achieves state-of-the-art performance on the CAMELYON16 and TCGA-BRCA datasets while significantly reducing computational memory requirements.

Read abstract

In recent years, the integration of pre-trained foundational models with multiple instance learning (MIL) has improved diagnostic accuracy in computational pathology. However, existing MIL methods focus on optimizing feature extractors and aggregation strategies while overlooking the complex semantic relationships among instances within whole slide image (WSI). Although Transformer-based MIL approaches aiming to model instance dependencies, the quadratic computational complexity limits their scalability to large-scale WSIs. Moreover, due to the pronounced variations in tumor region scales across different WSIs, existing Transformer-based methods employing fixed-scale attention mechanisms face significant challenges in precisely capturing local instance correlations and fail to account for the distance-based decay effect of patch relevance. To address these challenges, we propose window scale decay MIL (WSD-MIL), designed to enhance the capacity to model tumor regions of varying scales while improving computational efficiency. WSD-MIL comprises: 1) a window scale decay based attention module, which employs a cluster-based sampling strategy to reduce computational costs while progressively decaying attention window-scale to capture local instance relationships at varying scales; and 2) a squeeze-and-excitation based region gate module, which dynamically adjusts window weights to enhance global information modeling. Experimental results demonstrate that WSD-MIL achieves state-of-the-art performance on the CAMELYON16 and TCGA-BRCA datasets while reducing 62% of the computational memory. The code will be publicly available.

ARXIV Cancer: general cancer Method: large language models

HARMON-E: Hierarchical Agentic Reasoning for Multimodal Oncology Notes to Extract Structured Data

Shashi Kant Gupta, Arijeet Pramanik, Jerrin John Thomas, Regina Schwind, Lauren Wiener, Avi Raju, Jeremy Kornbluth, Yanshan Wang, Zhaohui Su, Hrituraj Singh
Published 2025-12-22 20:38

This study presents HARMON-E, an agentic framework designed to extract structured oncology data from unstructured electronic health record notes. By utilizing large language models as reasoning agents, the method addresses the challenges of variability and inconsistency in clinical documentation. The framework was evaluated on a dataset of over 400,000 clinical notes, achieving a high average F1-score of 0.93 for extracting oncology-specific clinical variables. The integration of this system into data curation workflows significantly reduced annotation costs.

Read abstract

Unstructured notes within the electronic health record (EHR) contain rich clinical information vital for cancer treatment decision making and research, yet reliably extracting structured oncology data remains challenging due to extensive variability, specialized terminology, and inconsistent document formats. Manual abstraction, although accurate, is prohibitively costly and unscalable. Existing automated approaches typically address narrow scenarios - either using synthetic datasets, restricting focus to document-level extraction, or isolating specific clinical variables (e.g., staging, biomarkers, histology) - and do not adequately handle patient-level synthesis across the large number of clinical documents containing contradictory information. In this study, we propose an agentic framework that systematically decomposes complex oncology data extraction into modular, adaptive tasks. Specifically, we use large language models (LLMs) as reasoning agents, equipped with context-sensitive retrieval and iterative synthesis capabilities, to exhaustively and comprehensively extract structured clinical variables from real-world oncology notes. Evaluated on a large-scale dataset of over 400,000 unstructured clinical notes and scanned PDF reports spanning 2,250 cancer patients, our method achieves an average F1-score of 0.93, with 100 out of 103 oncology-specific clinical variables exceeding 0.85, and critical variables (e.g., biomarkers and medications) surpassing 0.95. Moreover, integration of the agentic system into a data curation workflow resulted in 0.94 direct manual approval rate, significantly reducing annotation costs. To our knowledge, this constitutes the first exhaustive, end-to-end application of LLM-based agents for structured oncology data extraction at scale

ARXIV Cancer: breast cancer Method: nnU-Net

Selective Phase-Aware Training of nnU-Net for Robust Breast Cancer Segmentation in Multi-Center DCE-MRI

Beyza Zayim, Aissiou Ikram, Boukhiar Naima
Published 2025-12-22 10:05

This study addresses the challenges of breast cancer segmentation in dynamic contrast-enhanced MRI (DCE-MRI) by proposing a selective, phase-aware training framework for the nnU-Net architecture. The research emphasizes the importance of quality-focused data selection to enhance model robustness and generalization. Experiments demonstrated that training on high-quality images significantly improved segmentation performance compared to those with motion artifacts.

Read abstract

Breast cancer remains the most common cancer among women and is a leading cause of female mortality. Dynamic contrast-enhanced MRI (DCE-MRI) is a powerful imaging tool for evaluating breast tumors, yet the field lacks a standardized benchmark for analyzing treatment responses and guiding personalized care. We participated in the MAMA-MIA Challenge's Primary Tumor Segmentation task and this work presents a proposed selective, phase-aware training framework for the nnU-Net architecture, emphasizing quality-focused data selection to strengthen model robustness and generalization. We employed the No New Net (nnU-Net) framework with a selective training strategy that systematically analyzed the impact of image quality and center-specific variability on segmentation performance. Controlled experiments on the DUKE, NACT, ISPY1, and ISPY2 datasets revealed that including ISPY scans with motion artifacts and reduced contrast impaired segmentation performance, even with advanced preprocessing, such as contrast-limited adaptive histogram equalization (CLAHE). In contrast, training on DUKE and NACT data, which exhibited clearer contrast and fewer motion artifacts despite varying resolutions, with early phase images (0000-0002) provided more stable training conditions. Our results demonstrate the importance of phase-sensitive and quality-aware training strategies in achieving reliable segmentation performance in heterogeneous clinical datasets, highlighting the limitations of the expansion of naive datasets and motivating the need for future automation of quality-based data selection strategies.

ARXIV Cancer: breast cancer Method: cross attention mid fusion

Practical Quantum-Classical Feature Fusion for complex data Classification

Azadeh Alavi, Fatemeh Kouchmeshki, Abdolrahman Alavi
Published 2025-12-22 09:16

This paper presents a multimodal hybrid learning approach that integrates quantum feature maps with classical neural networks to enhance performance on complex datasets. The proposed cross attention mid fusion architecture allows classical representations to query quantum-derived feature tokens, addressing limitations of previous methods that treated quantum circuits as isolated feature extractors. The evaluation on various datasets, including breast cancer, demonstrates that the mid fusion model outperforms pure quantum and standard hybrid designs, highlighting the benefits of principled multimodal fusion.

Read abstract

Hybrid quantum and classical learning aims to couple quantum feature maps with the robustness of classical neural networks, yet most architectures treat the quantum circuit as an isolated feature extractor and merge its measurements with classical representations by direct concatenation. This neglects that the quantum and classical branches constitute distinct computational modalities and limits reliable performance on complex, high dimensional tabular and semi structured data, including remote sensing, environmental monitoring, and medical diagnostics. We present a multimodal formulation of hybrid learning and propose a cross attention mid fusion architecture in which a classical representation queries quantum derived feature tokens through an attention block with residual connectivity. The quantum branch is kept within practical NISQ budgets and uses up to nine qubits. We evaluate on Wine, Breast Cancer, Forest CoverType, FashionMNIST, and SteelPlatesFaults, comparing a quantum only model, a classical baseline, residual hybrid models, and the proposed mid fusion model under a consistent protocol. Pure quantum and standard hybrid designs underperform due to measurement induced information loss, while cross attention mid fusion is consistently competitive and improves performance on the more complex datasets in most cases. These findings suggest that quantum derived information becomes most valuable when integrated through principled multimodal fusion rather than used in isolation or loosely appended to classical features.

ARXIV Cancer: breast cancer Method: Multiple Instance Learning

Breast Cancer Recurrence Risk Prediction Based on Multiple Instance Learning

Jinqiu Chen, Huyan Xu
Published 2025-12-21 13:46

This study aims to predict breast cancer recurrence risk using deep learning techniques applied to routine Hematoxylin and Eosin stained whole-slide images. Three Multiple Instance Learning frameworks were developed and compared on a dataset of 210 patient cases. The modified CLAM-SB model achieved the best performance with a mean Area Under the Curve of 0.836, indicating its potential for automated risk stratification in clinical settings.

Read abstract

Predicting breast cancer recurrence risk is a critical clinical challenge. This study investigates the potential of computational pathology to stratify patients using deep learning on routine Hematoxylin and Eosin (H&E) stained whole-slide images (WSIs). We developed and compared three Multiple Instance Learning (MIL) frameworks -- CLAM-SB, ABMIL, and ConvNeXt-MIL-XGBoost -- on an in-house dataset of 210 patient cases. The models were trained to predict 5-year recurrence risk, categorized into three tiers (low, medium, high), with ground truth labels established by the 21-gene Recurrence Score. Features were extracted using the UNI and CONCH pre-trained models. In a 5-fold cross-validation, the modified CLAM-SB model demonstrated the strongest performance, achieving a mean Area Under the Curve (AUC) of 0.836 and a classification accuracy of 76.2%. Our findings demonstrate the feasibility of using deep learning on standard histology slides for automated, genomics-correlated risk stratification, highlighting a promising pathway toward rapid and cost-effective clinical decision support.

ARXIV Cancer: unknown Method: multi-view representation learning

brat: Aligned Multi-View Embeddings for Brain MRI Analysis

Maxime Kayser, Maksim Gridnev, Wanting Wang, Max Bain, Aneesh Rangnekar, Avijit Chatterjee, Aleksandr Petrov, Harini Veeraraghavan, Nathaniel C. Swinburne
Published 2025-12-21 10:37

The paper introduces brat, a multi-view representation learning framework designed for brain MRI analysis, which is trained on a newly created dataset of approximately 80,000 3D scans paired with clinical reports. The framework addresses challenges in brain MRI interpretation by employing a multi-view pre-training approach and an implicit query-feature matching mechanism. Evaluation results indicate significant performance improvements across various vision-language and vision tasks.

Read abstract

We present brat (brain report alignment transformer), a multi-view representation learning framework for brain magnetic resonance imaging (MRI) trained on MRIs paired with clinical reports. Brain MRIs present unique challenges due to the presence of numerous, highly varied, and often subtle abnormalities that are localized to a few slices within a 3D volume. To address these challenges, we introduce a brain MRI dataset $10\times$ larger than existing ones, containing approximately 80,000 3D scans with corresponding radiology reports, and propose a multi-view pre-training approach inspired by advances in document retrieval. We develop an implicit query-feature matching mechanism and adopt concepts from quality-diversity to obtain multi-view embeddings of MRIs that are aligned with the clinical features given by report sentences. We evaluate our approach across multiple vision-language and vision tasks, demonstrating substantial performance improvements. The brat foundation models are publicly released.

ARXIV Cancer: fungating malignant tumors Method: ensemble model

WoundNet-Ensemble: A Novel IoMT System Integrating Self-Supervised Deep Learning and Multi-Model Fusion for Automated, High-Accuracy Wound Classification and Healing Progression Monitoring

Moses Kiprono
Published 2025-12-20 22:49

The study introduces WoundNet-Ensemble, an Internet of Medical Things system designed for the automated classification of various wound types using a combination of deep learning architectures. The system integrates ResNet-50, the self-supervised Vision Transformer DINOv2, and Swin Transformer, achieving an ensemble accuracy of 99.90% on a dataset of 5,175 wound images. This approach not only enhances classification accuracy but also includes a longitudinal tracker for monitoring wound healing progress.

Read abstract

Chronic wounds, including diabetic foot ulcers which affect up to one-third of people with diabetes, impose a substantial clinical and economic burden, with U.S. healthcare costs exceeding 25 billion dollars annually. Current wound assessment remains predominantly subjective, leading to inconsistent classification and delayed interventions. We present WoundNet-Ensemble, an Internet of Medical Things system leveraging a novel ensemble of three complementary deep learning architectures: ResNet-50, the self-supervised Vision Transformer DINOv2, and Swin Transformer, for automated classification of six clinically distinct wound types. Our system achieves 99.90 percent ensemble accuracy on a comprehensive dataset of 5,175 wound images spanning diabetic foot ulcers, pressure ulcers, venous ulcers, thermal burns, pilonidal sinus wounds, and fungating malignant tumors. The weighted fusion strategy demonstrates a 3.7 percent improvement over previous state-of-the-art methods. Furthermore, we implement a longitudinal wound healing tracker that computes healing rates, severity scores, and generates clinical alerts. This work demonstrates a robust, accurate, and clinically deployable tool for modernizing wound care through artificial intelligence, addressing critical needs in telemedicine and remote patient monitoring. The implementation and trained models will be made publicly available to support reproducibility.

ARXIV Cancer: breast cancer Method: agent-based framework

Agent-Based Output Drift Detection for Breast Cancer Response Prediction in a Multisite Clinical Decision Support System

Xavier Rafael-Palou, Jose Munuera, Ana Jimenez-Pastor, Richard Osuala, Karim Lekadir, Oliver Diaz
Published 2025-12-20 17:49

This paper presents an agent-based framework for detecting output drift in clinical decision support systems used for breast cancer response prediction. The method allows for continuous monitoring of predictive model outputs across multiple medical imaging sites, addressing the challenges posed by variations in patient populations and imaging protocols. The results indicate that the proposed multi-center monitoring schemes significantly outperform traditional centralized monitoring, with notable improvements in drift detection accuracy.

Read abstract

Modern clinical decision support systems can concurrently serve multiple, independent medical imaging institutions, but their predictive performance may degrade across sites due to variations in patient populations, imaging hardware, and acquisition protocols. Continuous surveillance of predictive model outputs offers a safe and reliable approach for identifying such distributional shifts without ground truth labels. However, most existing methods rely on centralized monitoring of aggregated predictions, overlooking site-specific drift dynamics. We propose an agent-based framework for detecting drift and assessing its severity in multisite clinical AI systems. To evaluate its effectiveness, we simulate a multi-center environment for output-based drift detection, assigning each site a drift monitoring agent that performs batch-wise comparisons of model outputs against a reference distribution. We analyse several multi-center monitoring schemes, that differ in how the reference is obtained (site-specific, global, production-only and adaptive), alongside a centralized baseline. Results on real-world breast cancer imaging data using a pathological complete response prediction model shows that all multi-center schemes outperform centralized monitoring, with F1-score improvements up to 10.3% in drift detection. In the absence of site-specific references, the adaptive scheme performs best, with F1-scores of 74.3% for drift detection and 83.7% for drift severity classification. These findings suggest that adaptive, site-aware agent-based drift monitoring can enhance reliability of multisite clinical decision support systems.

ARXIV Cancer: unknown Method: two-stream deep learning architecture

A two-stream network with global-local feature fusion for bone age assessment

Qiong Lou, Han Yang, Fang Lu
Published 2025-12-20 11:56

This study presents a two-stream deep learning architecture, named BoNet+, for automated bone age assessment (BAA). The model integrates global and local feature extraction channels, utilizing a Transformer module for global features and a RFAConv module for local features. The proposed method demonstrates improved accuracy in BAA, achieving mean absolute errors of 3.81 and 5.65 months on two test datasets, thus reducing clinical workload and enhancing objectivity in assessments.

Read abstract

Bone Age Assessment (BAA) is a widely used clinical technique that can accurately reflect an individual's growth and development level, as well as maturity. In recent years, although deep learning has advanced the field of bone age assessment, existing methods face challenges in efficiently balancing global features and local skeletal details. This study aims to develop an automated bone age assessment system based on a two-stream deep learning architecture to achieve higher accuracy in bone age assessment. We propose the BoNet+ model incorporating global and local feature extraction channels. A Transformer module is introduced into the global feature extraction channel to enhance the ability in extracting global features through multi-head self-attention mechanism. A RFAConv module is incorporated into the local feature extraction channel to generate adaptive attention maps within multiscale receptive fields, enhancing local feature extraction capabilities. Global and local features are concatenated along the channel dimension and optimized by an Inception-V3 network. The proposed method has been validated on the Radiological Society of North America (RSNA) and Radiological Hand Pose Estimation (RHPE) test datasets, achieving mean absolute errors (MAEs) of 3.81 and 5.65 months, respectively. These results are comparable to the state-of-the-art. The BoNet+ model reduces the clinical workload and achieves automatic, high-precision, and more objective bone age assessment.

Find the papers that actually matter