Research Papers

ARXIV Cancer: gastric cancer Method: large language model

Eligibility-Aware Evidence Synthesis: An Agentic Framework for Clinical Trial Meta-Analysis

Yao Zhao, Zhiyue Zhang, Yanxun Xu
Published 2026-04-03 03:18

The paper presents EligMeta, an agentic framework designed for clinical trial meta-analysis that integrates automated trial discovery with eligibility-aware meta-analysis. It utilizes a hybrid architecture that combines LLM-based reasoning with deterministic execution to ensure reproducibility. The framework was applied in a gastric cancer landscape analysis, successfully narrowing down candidate trials while demonstrating the impact of eligibility-aware weighting on pooled risk ratios in an adverse events meta-analysis.

Read abstract

Clinical evidence synthesis requires identifying relevant trials from large registries and aggregating results that account for population differences. While recent LLM-based approaches have automated components of systematic review, they do not support end-to-end evidence synthesis. Moreover, conventional meta-analysis weights studies by statistical precision without considering clinical compatibility reflected in eligibility criteria. We propose EligMeta, an agentic framework that integrates automated trial discovery with eligibility-aware meta-analysis, translating natural-language queries into reproducible trial selection and incorporating eligibility alignment into study weighting to produce cohort-specific pooled estimates. EligMeta employs a hybrid architecture separating LLM-based reasoning from deterministic execution: LLMs generate interpretable rules from natural-language queries and perform schema-constrained parsing of trial metadata, while all logical operations, weight computations, and statistical pooling are executed deterministically to ensure reproducibility. The framework structures eligibility criteria and computes similarity-based study weights reflecting population alignment between target and comparator trials. In a gastric cancer landscape analysis, EligMeta reduced 4,044 candidate trials to 39 clinically relevant studies through rule-based filtering, recovering all 13 guideline-cited trials. In an olaparib adverse events meta-analysis across four trials, eligibility-aware weighting shifted the pooled risk ratio from 2.18 (95% CI: 1.71-2.79) under conventional Mantel-Haenszel estimation to 1.97 (95% CI: 1.76-2.20), demonstrating quantifiable impact of incorporating eligibility alignment. EligMeta bridges automated trial discovery with eligibility-aware meta-analysis, providing a scalable and reproducible framework for evidence synthesis in precision medicine.

ARXIV Cancer: unknown Method: Explainable Vision-Language Model

An Explainable Vision-Language Model Framework with Adaptive PID-Tversky Loss for Lumbar Spinal Stenosis Diagnosis

Md. Sajeebul Islam Sk., Md. Mehedi Hasan Shawon, Md. Golam Rabiul Alam
Published 2026-04-02 20:18

This paper presents an Explainable Vision-Language Model framework aimed at improving the diagnosis of Lumbar Spinal Stenosis (LSS) by addressing challenges in manual interpretation of MRI. The framework incorporates a Spatial Patch Cross-Attention module and an Adaptive PID-Tversky Loss function to enhance segmentation accuracy and manage class imbalance. The proposed model achieves a diagnostic classification accuracy of 90.69% and demonstrates explainability through automated radiology report generation.

Read abstract

Lumbar Spinal Stenosis (LSS) diagnosis remains a critical clinical challenge, with diagnosis heavily dependent on labor-intensive manual interpretation of multi-view Magnetic Resonance Imaging (MRI), leading to substantial inter-observer variability and diagnostic delays. Existing vision-language models simultaneously fail to address the extreme class imbalance prevalent in clinical segmentation datasets while preserving spatial accuracy, primarily due to global pooling mechanisms that discard crucial anatomical hierarchies. We present an end-to-end Explainable Vision-Language Model framework designed to overcome these limitations, achieved through two principal objectives. We propose a Spatial Patch Cross-Attention module that enables precise, text-directed localization of spinal anomalies with spatial precision. A novel Adaptive PID-Tversky Loss function by integrating control theory principles dynamically further modifies training penalties to specifically address difficult, under-segmented minority instances. By incorporating foundational VLMs alongside an Automated Radiology Report Generation module, our framework demonstrates considerable performance: a diagnostic classification accuracy of 90.69%, a macro-averaged Dice score of 0.9512 for segmentation, and a CIDEr score of 92.80%. Furthermore, the framework shows explainability by converting complex segmentation predictions into radiologist-style clinical reports, thereby establishing a new benchmark for transparent, interpretable AI in clinical medical imaging that keeps essential human supervision while enhancing diagnostic capabilities.

ARXIV Cancer: unknown Method: large language model

Blinded Radiologist and LLM-Based Evaluation of LLM-Generated Japanese Translations of Chest CT Reports: Comparative Study

Yosuke Yamagishi, Atsushi Takamatsu, Yasunori Hamaguchi, Tomohiro Kikuchi, Shouhei Hanaoka, Takeharu Yoshikawa, Osamu Abe
Published 2026-04-02 15:59

This study evaluates the educational suitability of LLM-generated Japanese translations of chest CT reports by comparing them with human-edited translations. A total of 150 reports were analyzed, with assessments conducted by both radiologists and LLM judges. The results indicated a significant lack of agreement between the evaluations of radiologists and LLM judges, highlighting the limitations of LLM-based evaluations in clinical contexts.

Read abstract

Background: Accurate translation of radiology reports is important for multilingual research, clinical communication, and radiology education, but the validity of LLM-based evaluation remains unclear. Objective: To evaluate the educational suitability of LLM-generated Japanese translations of chest CT reports and compare radiologist assessments with LLM-as-a-judge evaluations. Methods: We analyzed 150 chest CT reports from the CT-RATE-JPN validation set. For each English report, a human-edited Japanese translation was compared with an LLM-generated translation by DeepSeek-V3.2. A board-certified radiologist and a radiology resident independently performed blinded pairwise evaluations across 4 criteria: terminology accuracy, readability, overall quality, and radiologist-style authenticity. In parallel, 3 LLM judges (DeepSeek-V3.2, Mistral Large 3, and GPT-5) evaluated the same pairs. Agreement was assessed using QWK and percentage agreement. Results: Agreement between radiologists and LLM judges was near zero (QWK=-0.04 to 0.15). Agreement between the 2 radiologists was also poor (QWK=0.01 to 0.06). Radiologist 1 rated terminology as equivalent in 59% of cases and favored the LLM translation for readability (51%) and overall quality (51%). Radiologist 2 rated readability as equivalent in 75% of cases and favored the human-edited translation for overall quality (40% vs 21%). All 3 LLM judges strongly favored the LLM translation across all criteria (70%-99%) and rated it as more radiologist-like in >93% of cases. Conclusions: LLM-generated translations were often judged natural and fluent, but the 2 radiologists differed substantially. LLM-as-a-judge showed strong preference for LLM output and negligible agreement with radiologists. For educational use of translated radiology reports, automated LLM-based evaluation alone is insufficient; expert radiologist review remains important.

ARXIV Cancer: cervical cancer Method: Swin-based Co-DETR

Center-Aware Detection with Swin-based Co-DETR Framework for Cervical Cytology

Yan Kong, Yuan Yin, Hongan Chen, Yuqi Fang, Caifeng Shan
Published 2026-04-02 14:18

This paper presents a novel approach for the automated analysis of Pap smear images aimed at improving cervical cancer screening. The authors achieved significant results in the RIVA Cervical Cytology Challenge by integrating the Co-DINO framework with a Swin-Large backbone for enhanced feature extraction. Their method includes a center-point prediction formulation and tailored data augmentation strategies, leading to improved detection performance in cytology image analysis.

Read abstract

Automated analysis of Pap smear images is critical for cervical cancer screening but remains challenging due to dense cell distribution and complex morphology. In this paper, we present our winning solution for the RIVA Cervical Cytology Challenge, achieving 1st place in Track B and 2nd place in Track A. Our approach leverages a powerful baseline, integrating the Co-DINO framework with a Swin-Large backbone for robust multi-scale feature extraction. To address the dataset's unique fixed-size bounding box annotations, we formulate the detection task as a center-point prediction problem. Tailoring our approach to this formulation, we introduce a center-preserving data augmentation strategy and an analytical geometric box optimization to effectively absorb localization jitter. Finally, we apply track-specific loss tuning to adapt the loss weights for each task. Experiments demonstrate that our targeted optimizations improve detection performance, providing an effective pipeline for cytology image analysis. Our code is available at https://github.com/YanKong0408/Center-DETR.

ARXIV Cancer: general cancer Method: vision transformer

Curia-2: Scaling Self-Supervised Learning for Radiology Foundation Models

Antoine Saporta, Baptiste Callard, Corentin Dancette, Julien Khlaut, Charles Corbière, Leo Butsanets, Amaury Prat, Pierre Manceron
Published 2026-04-02 12:49

This paper presents Curia-2, an advanced framework for self-supervised learning aimed at enhancing the performance of Foundation Models in radiology. The methodology optimizes pre-training strategies and representation quality for analyzing complex radiological data, specifically targeting CT and MRI images. Results indicate that Curia-2 surpasses existing Foundation Models on vision tasks and competes well with vision-language models on clinically relevant detection tasks.

Read abstract

The rapid growth of medical imaging has fueled the development of Foundation Models (FMs) to reduce the growing, unsustainable workload on radiologists. While recent FMs have shown the power of large-scale pre-training to CT and MRI analysis, there remains significant room to optimize how these models learn from complex radiological volumes. Building upon the Curia framework, this work introduces Curia-2, which significantly improves the original pre-training strategy and representation quality to better capture the specificities of radiological data. The proposed methodology enables scaling the architecture up to billion-parameter Vision Transformers, marking a first for multi-modal CT and MRI FMs. Furthermore, we formalize the evaluation of these models by extending and restructuring CuriaBench into two distinct tracks: a 2D track tailored for slice-based vision models and a 3D track for volumetric benchmarking. Our results demonstrate that Curia-2 outperforms all FMs on vision-focused tasks and fairs competitively to vision-language models on clinically complex tasks such as finding detection. Weights will be made publicly available to foster further research.

ARXIV Cancer: breast cancer Method: variational quantum classifier

Quantum-Inspired Geometric Classification with Correlation Group Structures and VQC Decision Modeling

Nishikanta Mohanty, Arya Ansuman Priyadarshi, Bikash K. Behera, Badshah Mukherjee
Published 2026-04-02 11:50

This paper presents a quantum-inspired classification framework that utilizes Correlation Group Structures and variational quantum decision modeling. The method emphasizes a geometry-first approach, evaluating samples relative to class medoids and enhancing robustness in heterogeneous datasets. The proposed classifier demonstrates competitive performance on various datasets, including breast cancer, achieving high accuracy and macro-F1 scores. Additionally, the framework is adaptable for large-scale and imbalanced data scenarios.

Read abstract

We propose a geometry-driven quantum-inspired classification framework that integrates Correlation Group Structures (CGR), compact SWAP-test-based overlap estimation, and selective variational quantum decision modelling. Rather than directly approximating class posteriors, the method adopts a geometry-first paradigm in which samples are evaluated relative to class medoids using overlap-derived Euclidean-like and angular similarity channels. CGR organizes features into anchor-centered correlation neighbourhoods, generating nonlinear, correlation-weighted representations that enhance robustness in heterogeneous tabular spaces. These geometric signals are fused through a non-probabilistic margin-based fusion score, serving as a lightweight and data-efficient primary classifier for small-to-moderate datasets. On Heart Disease, Breast Cancer, and Wine Quality datasets, the fusion-score classifier achieves 0.8478, 0.8881, and 0.9556 test accuracy respectively, with macro-F1 scores of 0.8463, 0.8703, and 0.9522, demonstrating competitive and stable performance relative to classical baselines. For large-scale and highly imbalanced regimes, we construct compact Delta-distance contrastive features and train a variational quantum classifier (VQC) as a nonlinear refinement layer. On the Credit Card Fraud dataset (0.17% prevalence), the Delta + VQC pipeline achieves approximately 0.85 minority recall at an alert rate of approximately 1.31%, with ROC-AUC 0.9249 and PR-AUC 0.3251 under full-dataset evaluation. These results highlight the importance of operating-point-aware assessment in rare-event detection and demonstrate that the proposed hybrid geometric-variational framework provides interpretable, scalable, and regime-adaptive classification across heterogeneous data settings.

ARXIV Cancer: general cancer Method: knowledge-guided spatial prompts

Enhancing Medical Visual Grounding via Knowledge-guided Spatial Prompts

Yifan Gao, Tao Zhou, Yi Zhou, Ke Zou, Yizhe Zhang, Huazhu Fu
Published 2026-04-02 11:31

This study presents KnowMVG, a framework designed to enhance Medical Visual Grounding (MVG) by improving spatial awareness in Vision-Language Models (VLMs). The proposed method incorporates a knowledge-enhanced prompting strategy and a global-local attention mechanism to achieve precise localization of relevant medical phrases in images. Experimental results indicate that KnowMVG outperforms existing methods, demonstrating significant improvements in accuracy metrics.

Read abstract

Medical Visual Grounding (MVG) aims to identify diagnostically relevant phrases from free-text radiology reports and localize their corresponding regions in medical images, providing interpretable visual evidence to support clinical decision-making. Although recent Vision-Language Models (VLMs) exhibit promising multimodal reasoning ability, their grounding remains insufficient spatial precision, largely due to a lack of explicit localization priors when relying solely on latent embeddings. In this work, we analyze this limitation from an attention perspective and propose KnowMVG, a Knowledge-prior and global-local attention enhancement framework for MVG in VLMs that explicitly strengthens spatial awareness during decoding. Specifically, we present a knowledge-enhanced prompting strategy that encodes phrase related medical knowledge into compact embeddings, together with a global-local attention that jointly leverages coarse global information and refined local cues to guide precise region localization. localization. This design bridges high-level semantic understanding and fine-grained visual perception without introducing extra textual reasoning overhead. Extensive experiments on four MVG benchmarks demonstrate that our KnowMVG consistently outperforms existing approaches, achieving gains of 3.0% in AP50 and 2.6% in mIoU over prior state-of-the-art methods. Qualitative and ablation studies further validate the effectiveness of each component.

ARXIV Cancer: breast cancer Method: deep learning

A deep learning pipeline for PAM50 subtype classification using histopathology images and multi-objective patch selection

Arezoo Borji, Gernot Kronreif, Bernhard Angermayr, Francisco Mario Calisto, Wolfgang Birkfellner, Inna Servetnyk, Yinyin Yuan, Sepideh Hatamikia
Published 2026-04-02 09:13

This study presents a novel deep learning framework for classifying breast cancer into PAM50 subtypes using histopathology images. The method optimizes patch selection based on informativeness and uncertainty, reducing the need for expensive molecular assays. Evaluation on both internal and external datasets demonstrated high performance, indicating the potential for improved clinical decision-making.

Read abstract

Breast cancer is a highly heterogeneous disease with diverse molecular profiles. The PAM50 gene signature is widely recognized as a standard for classifying breast cancer into intrinsic subtypes, enabling more personalized treatment strategies. In this study, we introduce a novel optimization-driven deep learning framework that aims to reduce reliance on costly molecular assays by directly predicting PAM50 subtypes from H&E-stained whole-slide images (WSIs). Our method jointly optimizes patch informativeness, spatial diversity, uncertainty, and patch count by combining the non-dominated sorting genetic algorithm II (NSGA-II) with Monte Carlo dropout-based uncertainty estimation. The proposed method can identify a small but highly informative patch subset for classification. We used a ResNet18 backbone for feature extraction and a custom CNN head for classification. For evaluation, we used the internal TCGA-BRCA dataset as the training cohort and the external CPTAC-BRCA dataset as the test cohort. On the internal dataset, an F1-score of 0.8812 and an AUC of 0.9841 using 627 WSIs from the TCGA-BRCA cohort were achieved. The performance of the proposed approach on the external validation dataset showed an F1-score of 0.7952 and an AUC of 0.9512. These findings indicate that the proposed optimization-guided, uncertainty-aware patch selection can achieve high performance and improve the computational efficiency of histopathology-based PAM50 classification compared to existing methods, suggesting a scalable imaging-based replacement that has the potential to support clinical decision-making.

ARXIV Cancer: general cancer Method: model merging

Countering Catastrophic Forgetting of Large Language Models for Better Instruction Following via Weight-Space Model Merging

Mengxian Lyu, Cheng Peng, Ziyi Chen, Mengyuan Zhang, Jieting Li Lu, Yonghui Wu
Published 2026-04-02 02:18

This study addresses the challenge of catastrophic forgetting in large language models (LLMs) when fine-tuned on medical datasets. It introduces a model merging framework that combines a clinical foundation model with a general instruct model to enhance instruction-following capabilities while adapting to the medical domain. The evaluation demonstrates that the merged models maintain clinical expertise and perform well on various clinical tasks, offering a scalable solution for deploying LLMs in healthcare.

Read abstract

Large language models have been adopted in the medical domain for clinical documentation to reduce clinician burden. However, studies have reported that LLMs often "forget" a significant amount of instruction-following ability when fine-tuned using a task-specific medical dataset, a critical challenge in adopting general-purpose LLMs for clinical applications. This study presents a model merging framework to efficiently adapt general-purpose LLMs to the medical domain by countering this forgetting issue. By merging a clinical foundation model (GatorTronLlama) with a general instruct model (Llama-3.1-8B-Instruct) via interpolation-based merge methods, we seek to derive a domain-adapted model with strong performance on clinical tasks while retaining instruction-following ability. Comprehensive evaluation across medical benchmarks and five clinical generation tasks (e.g., radiology and discharge summarization) shows that merged models can effectively mitigate catastrophic forgetting, preserve clinical domain expertise, and retain instruction-following ability. In addition, our model merging strategies demonstrate training efficiency, achieving performance on par with fully fine-tuned baselines under severely constrained supervision (e.g., 64-shot vs. 256-shot). Consequently, weight-space merging constitutes a highly scalable solution for adapting open-source LLMs to clinical applications, facilitating broader deployment in resource-constrained healthcare environments.

ARXIV Cancer: general cancer Method: unknown

Strategies for tumor elimination and control under immune evasion and chemotherapy resistance

Nazanin Mokari, Bryce Morsky
Published 2026-04-01 20:46

This paper develops and analyzes mathematical models to understand the dynamics of tumors in response to immune responses and chemotherapy. The models focus on the interactions between effector cells and both chemo-resistant and immuno-resistant tumor cells. The study identifies key conditions that influence tumor persistence and elimination, providing a theoretical framework for improving targeted and combination therapies.

Read abstract

The evolutionary and ecological dynamics of tumors under immune responses and therapeutic interventions pose major challenges to long-term treatment success. Although treatment may initially achieve short-term disease control, resistant cancer cell subpopulations often arise, leading to relapse with more aggressive and treatment-resistant forms of the disease. Here, we develop and analyze mathematical models describing the interactions among effector cells, chemo-resistant tumor cells, and immuno-resistant tumor cells under distinct immune-evasion strategies. The models incorporate competition and cooperation between resistant and sensitive tumor subpopulations. We identify threshold conditions governing tumor persistence, elimination, and phenotype dominance under varying therapeutic intensities. These findings provide a theoretical framework for designing targeted and combination therapies and offer insights into strategies for mitigating the treatment resistance.

Find the papers that actually matter