Research Papers

ARXIV Cancer: non-small cell lung cancer Method: mechanistic learning

Mechanistic Learning for Survival Prediction in NSCLC Using Routine Blood Biomarkers and Tumor Kinetics

Ruben Taieb, René Bruno, Pascal Chanu, Jin Yan Jin, Sébastien Benzekry
Published 2026-01-16 10:07

This study aims to predict overall survival in non-small cell lung cancer (NSCLC) by developing a mechanistic model that integrates tumor burden and blood marker kinetics. The model, termed TALN-k, utilizes coupled differential equations and is enhanced with a machine learning framework for improved survival predictions. Results indicate that TALN-k outperforms traditional models, demonstrating better predictive accuracy and interpretability in clinical settings.

Read abstract

Background Predicting overall survival (OS) in non-small cell lung cancer (NSCLC) is essential for clinical decision-making and drug development. While tumor and blood test markers kinetics are intrinsically linked, their joint dynamics and relationship to OS remain unknown. Methods We developed a mechanistic model capturing the interplay between tumor (T) burden and three key blood markers kinetics: albumin (A), lactate dehydrogenase (L), and neutrophils (N), through coupled differential equations (termed TALN-k). This model was enhanced with a machine learning framework (TALN-kML) for OS prediction. The model was trained and validated on clinical trial data from NSCLC patients treated with atezolizumab in monotherapy (N = 862 patients) or combination therapy (N = 1,115). Model parameters were estimated using nonlinear mixed-effects modelling, and survival predictions were assessed using individual and trial level metrics. Results TALN-k successfully described individual and population-level marker kinetics, revealing complex interactions between tumor and blood markers, and improving corrected BIC and log-likelihood metrics by a significant margin of previous empirical state-of-the-art models. Feature selection methods also highlighted valuable predictive parameters, indicatives of good or poor prognosis. The TALN-kML model outperformed empirical, uncoupled models, achieving improved C-index (0.74 $\pm$ 0.02 vs 0.72 $\pm$ 0.03), 12-months AUC (0.83 $\pm$ 0.004 vs 0.79 $\pm$ 0.05), and accuracy (0.77 $\pm$ 0.03 vs 0.76 $\pm$ 0.05) in OS prediction. Conclusion Our mechanistic learning approach allows for an interpretable model, which improves on longitudinal data description and on survival prediction in NSCLC by jointly integrating tumor and blood markers kinetics. This methodology offers a promising avenue for both personalized treatment strategies and drug development optimization.

ARXIV Cancer: small cell carcinoma Method: latent diffusion models

Generation of Chest CT pulmonary Nodule Images by Latent Diffusion Models using the LIDC-IDRI Dataset

Kaito Urata, Maiko Nagao, Atsushi Teramoto, Kazuyoshi Imaizumi, Masashi Kondo, Hiroshi Fujita
Published 2026-01-16 08:36

This study addresses the challenge of data imbalance in computer-aided diagnosis systems for chest CT images, particularly for rare cases like small cell carcinoma. The authors propose a method using latent diffusion models (LDM) to automatically generate chest CT nodule images that reflect target features. The effectiveness of the method was verified using the LIDC-IDRI dataset, with results indicating that the generated images achieved quality comparable to real clinical images.

Read abstract

Recently, computer-aided diagnosis systems have been developed to support diagnosis, but their performance depends heavily on the quality and quantity of training data. However, in clinical practice, it is difficult to collect the large amount of CT images for specific cases, such as small cell carcinoma with low epidemiological incidence or benign tumors that are difficult to distinguish from malignant ones. This leads to the challenge of data imbalance. In this study, to address this issue, we proposed a method to automatically generate chest CT nodule images that capture target features using latent diffusion models (LDM) and verified its effectiveness. Using the LIDC-IDRI dataset, we created pairs of nodule images and finding-based text prompts based on physician evaluations. For the image generation models, we used Stable Diffusion version 1.5 (SDv1) and 2.0 (SDv2), which are types of LDM. Each model was fine-tuned using the created dataset. During the generation process, we adjusted the guidance scale (GS), which indicates the fidelity to the input text. Both quantitative and subjective evaluations showed that SDv2 (GS = 5) achieved the best performance in terms of image quality, diversity, and text consistency. In the subjective evaluation, no statistically significant differences were observed between the generated images and real images, confirming that the quality was equivalent to real clinical images. We proposed a method for generating chest CT nodule images based on input text using LDM. Evaluation results demonstrated that the proposed method could generate high-quality images that successfully capture specific medical features.

ARXIV Cancer: lung cancer Method: visual question answering

Visual question answering-based image-finding generation for pulmonary nodules on chest CT from structured annotations

Maiko Nagao, Kaito Urata, Atsushi Teramoto, Kazuyoshi Imaizumi, Masashi Kondo, Hiroshi Fujita
Published 2026-01-16 08:21

This study focuses on the interpretation of pulmonary nodules in chest CT images through a visual question answering (VQA) approach. A dataset was constructed from structured annotations to enable interactive diagnostic support, allowing findings to be generated based on specific physician inquiries. The method demonstrated effectiveness, achieving high evaluation scores in generating relevant image findings.

Read abstract

Interpretation of imaging findings based on morphological characteristics is important for diagnosing pulmonary nodules on chest computed tomography (CT) images. In this study, we constructed a visual question answering (VQA) dataset from structured data in an open dataset and investigated an image-finding generation method for chest CT images, with the aim of enabling interactive diagnostic support that presents findings based on questions that reflect physicians' interests rather than fixed descriptions. In this study, chest CT images included in the Lung Image Database Consortium and Image Database Resource Initiative (LIDC-IDRI) datasets were used. Regions of interest surrounding the pulmonary nodules were extracted from these images, and image findings and questions were defined based on morphological characteristics recorded in the database. A dataset comprising pairs of cropped images, corresponding questions, and image findings was constructed, and the VQA model was fine-tuned on it. Language evaluation metrics such as BLEU were used to evaluate the generated image findings. The VQA dataset constructed using the proposed method contained image findings with natural expressions as radiological descriptions. In addition, the generated image findings showed a high CIDEr score of 3.896, and a high agreement with the reference findings was obtained through evaluation based on morphological characteristics. We constructed a VQA dataset for chest CT images using structured information on the morphological characteristics from the LIDC-IDRI dataset. Methods for generating image findings in response to these questions have also been investigated. Based on the generated results and evaluation metric scores, the proposed method was effective as an interactive diagnostic support system that can present image findings according to physicians' interests.

ARXIV Cancer: general cancer Method: multi-scale attention

MATEX: Multi-scale Attention and Text-guided Explainability of Medical Vision-Language Models

Muhammad Imran, Chi Lee, Yugyung Lee
Published 2026-01-16 01:18

The paper presents MATEX, a framework designed to improve the interpretability of medical vision-language models by integrating anatomically informed spatial reasoning. It combines multi-layer attention rollout, text-guided spatial priors, and layer consistency analysis to generate accurate and clinically relevant gradient attribution maps. Evaluations on the MS-CXR dataset demonstrate that MATEX surpasses the existing M2IB method in spatial precision and alignment with expert annotations, indicating its potential to enhance trust in radiological AI applications.

Read abstract

We introduce MATEX (Multi-scale Attention and Text-guided Explainability), a novel framework that advances interpretability in medical vision-language models by incorporating anatomically informed spatial reasoning. MATEX synergistically combines multi-layer attention rollout, text-guided spatial priors, and layer consistency analysis to produce precise, stable, and clinically meaningful gradient attribution maps. By addressing key limitations of prior methods, such as spatial imprecision, lack of anatomical grounding, and limited attention granularity, MATEX enables more faithful and interpretable model explanations. Evaluated on the MS-CXR dataset, MATEX outperforms the state-of-the-art M2IB approach in both spatial precision and alignment with expert-annotated findings. These results highlight MATEX's potential to enhance trust and transparency in radiological AI applications.

ARXIV Cancer: breast cancer Method: latent diffusion model

Self-learned representation-guided latent diffusion model for breast cancer classification in deep ultraviolet whole surface images

Pouya Afshin, David Helminiak, Tianling Niu, Julie M. Jorns, Tina Yen, Bing Yu, Dong Hye Ye
Published 2026-01-16 00:22

This study presents a Self-Supervised Learning (SSL)-guided Latent Diffusion Model (LDM) aimed at improving breast cancer classification using Deep Ultraviolet Fluorescence Scanning Microscopy (DUV-FSM) images. The method generates high-quality synthetic training patches by incorporating semantic details from a fine-tuned DINO teacher. The approach combines real and synthetic data to fine-tune a Vision Transformer (ViT), achieving an accuracy of 96.47% in classification tasks.

Read abstract

Breast-Conserving Surgery (BCS) requires precise intraoperative margin assessment to preserve healthy tissue. Deep Ultraviolet Fluorescence Scanning Microscopy (DUV-FSM) offers rapid, high-resolution surface imaging for this purpose; however, the scarcity of annotated DUV data hinders the training of robust deep learning models. To address this, we propose an Self-Supervised Learning (SSL)-guided Latent Diffusion Model (LDM) to generate high-quality synthetic training patches. By guiding the LDM with embeddings from a fine-tuned DINO teacher, we inject rich semantic details of cellular structures into the synthetic data. We combine real and synthetic patches to fine-tune a Vision Transformer (ViT), utilizing patch prediction aggregation for WSI-level classification. Experiments using 5-fold cross-validation demonstrate that our method achieves 96.47 % accuracy and reduces the FID score to 45.72, significantly outperforming class-conditioned baselines.

ARXIV Cancer: unknown Method: DenseNet121

Classification of Chest XRay Diseases through image processing and analysis techniques

Santiago Martínez Novoa, María Catalina Ibáñez, Lina Gómez Mesa, Jeremias Kramer
Published 2026-01-16 00:06

This study focuses on the classification of chest X-ray images to diagnose thoracic diseases using various image processing and analysis techniques. The authors specifically highlight the use of DenseNet121 as a method for this classification task. They also evaluate the performance of different methods and discuss their limitations, proposing future improvements.

Read abstract

Multi-Classification Chest X-Ray Images are one of the most prevalent forms of radiological examination used for diagnosing thoracic diseases. In this study, we offer a concise overview of several methods employed for tackling this task, including DenseNet121. In addition, we deploy an open-source web-based application. In our study, we conduct tests to compare different methods and see how well they work. We also look closely at the weaknesses of the methods we propose and suggest ideas for making them better in the future. Our code is available at: https://github.com/AML4206-MINE20242/Proyecto_AML

ARXIV Cancer: unknown Method: mathematical modeling

A Predictive Model for Synergistic Oncolytic Virotherapy: Unveiling the Ping-Pong Mechanism and Optimal Timing of Combined Vesicular Stomatitis and Vaccinia Viruses

Joseph Malinzi, Amina Eladdadi, Rachid Ouifki, Raluca Eftimie, Anotida Madzvamuse, Helen M. Byrne
Published 2026-01-15 13:56

This study introduces a mathematical model to explore the synergistic effects of combining Vesicular Stomatitis Virus (VSV) and Vaccinia Virus (VV) in oncolytic virotherapy. The model elucidates a 'ping-pong' mechanism where VV enhances VSV replication by neutralizing interferon-$α$. Numerical simulations indicate that this combination can achieve complete tumor clearance in about 50 days, outperforming VV monotherapy. The research also identifies critical parameters for treatment efficacy and suggests an optimal timing strategy for administration.

Read abstract

We present a mathematical model that describes the synergistic mechanism of combined Vesicular Stomatitis Virus (VSV) and Vaccinia Virus (VV). The model captures the dynamic interplay between tumor cells, viral replication, and the interferon-mediated immune response, revealing a `ping-pong' synergy where VV-infected cells produce B18R protein that neutralizes interferon-$α$, thereby enhancing VSV replication within the tumor. Numerical simulations demonstrate that this combination achieves complete tumor clearance in approximately 50 days, representing an 11\% acceleration compared to VV monotherapy (56 days), while VSV alone fails to eradicate tumors. Through bifurcation analysis, we identify critical thresholds for viral burst size and B18R inhibition, while sensitivity analysis highlights infection rates and burst sizes as the most influential parameters for treatment efficacy. Temporal optimization reveals that therapeutic outcomes are maximized through immediate VSV administration followed by delayed VV injection within a 1-19 day window, offering a strategic approach to overcome the timing and dosing challenges inherent in OVT.

ARXIV Cancer: non-small cell lung cancer Method: multimodal deep learning

Handling Missing Modalities in Multimodal Survival Prediction for Non-Small Cell Lung Cancer

Filippo Ruffini, Camillo Maria Caruso, Claudia Tacconi, Lorenzo Nibid, Francesca Miccolis, Marta Lovino, Carlo Greco, Edy Ippolito, Michele Fiore, Alessio Cortellini, Bruno Beomonte Zobel, Giuseppe Perrone, Bruno Vincenzi, Claudio Marrocco, Alessandro Bria, Elisa Ficarra, Sara Ramella, Valerio Guarrasi, Paolo Soda
Published 2026-01-15 13:38

This study addresses the challenge of accurate survival prediction in Non-Small Cell Lung Cancer (NSCLC) by developing a missing-aware multimodal survival framework. The framework integrates various data types, including CT images, Whole-Slide Histopathology images, and clinical variables, to enhance overall survival modeling. The proposed method demonstrates resilience to missing data and outperforms traditional unimodal and fusion strategies, achieving a C-index of 73.30 with the optimal combination of modalities.

Read abstract

Accurate survival prediction in Non-Small Cell Lung Cancer (NSCLC) requires the integration of heterogeneous clinical, radiological, and histopathological information. While Multimodal Deep Learning (MDL) offers a promises for precision prognosis and survival prediction, its clinical applicability is severely limited by small cohort sizes and the presence of missing modalities, often forcing complete-case filtering or aggressive imputation. In this work, we present a missing-aware multimodal survival framework that integrates Computed Tomography (CT), Whole-Slide Histopathology (WSI) Images, and structured clinical variables for overall survival modeling in unresectable stage II-III NSCLC. By leveraging Foundation Models (FM) for modality-specific feature extraction and a missing-aware encoding strategy, the proposed approach enables intermediate multimodal fusion under naturally incomplete modality profiles. The proposed architecture is resilient to missing modalities by design, allowing the model to utilize all available data without being forced to drop patients during training or inference. Experimental results demonstrate that intermediate fusion consistently outperforms unimodal baselines as well as early and late fusion strategies, with the strongest performance achieved by the fusion of WSI and clinical modalities (73.30 C-index). Further analyses of modality importance reveal an adaptive behavior in which less informative modalities, i.e., CT modality, are automatically down-weighted and contribute less to the final survival prediction.

ARXIV Cancer: lung cancer Method: vector quantization

VQ-Seg: Vector-Quantized Token Perturbation for Semi-Supervised Medical Image Segmentation

Sicheng Yang, Zhaohu Xing, Lei Zhu
Published 2026-01-15 07:09

This paper presents VQ-Seg, a novel approach for semi-supervised medical image segmentation that utilizes vector quantization to improve feature perturbation methods. The proposed Quantized Perturbation Module (QPM) replaces traditional dropout techniques, allowing for effective regularization without the need for manual tuning of hyperparameters. The method is evaluated on a large-scale lung cancer dataset, demonstrating superior performance compared to existing state-of-the-art techniques.

Read abstract

Consistency learning with feature perturbation is a widely used strategy in semi-supervised medical image segmentation. However, many existing perturbation methods rely on dropout, and thus require a careful manual tuning of the dropout rate, which is a sensitive hyperparameter and often difficult to optimize and may lead to suboptimal regularization. To overcome this limitation, we propose VQ-Seg, the first approach to employ vector quantization (VQ) to discretize the feature space and introduce a novel and controllable Quantized Perturbation Module (QPM) that replaces dropout. Our QPM perturbs discrete representations by shuffling the spatial locations of codebook indices, enabling effective and controllable regularization. To mitigate potential information loss caused by quantization, we design a dual-branch architecture where the post-quantization feature space is shared by both image reconstruction and segmentation tasks. Moreover, we introduce a Post-VQ Feature Adapter (PFA) to incorporate guidance from a foundation model (FM), supplementing the high-level semantic information lost during quantization. Furthermore, we collect a large-scale Lung Cancer (LC) dataset comprising 828 CT scans annotated for central-type lung carcinoma. Extensive experiments on the LC dataset and other public benchmarks demonstrate the effectiveness of our method, which outperforms state-of-the-art approaches. Code available at: https://github.com/script-Yang/VQ-Seg.

ARXIV Cancer: non-small cell lung cancer Method: multiple instance learning

ReaMIL: Reasoning- and Evidence-Aware Multiple Instance Learning for Whole-Slide Histopathology

Hyun Do Jung, Jungwon Choi, Hwiyoung Kim
Published 2026-01-15 04:55

This paper presents ReaMIL, a multiple instance learning approach designed for whole-slide histopathology. The method incorporates a selection head that produces soft per-tile gates and is trained with a budgeted-sufficiency objective to enhance evidence efficiency without compromising performance. The results demonstrate that ReaMIL achieves high AUC scores on various cancer datasets, indicating improved class confidence with a minimal number of selected tiles.

Read abstract

We introduce ReaMIL (Reasoning- and Evidence-Aware MIL), a multiple instance learning approach for whole-slide histopathology that adds a light selection head to a strong MIL backbone. The head produces soft per-tile gates and is trained with a budgeted-sufficiency objective: a hinge loss that enforces the true-class probability to be $\geq τ$ using only the kept evidence, under a sparsity budget on the number of selected tiles. The budgeted-sufficiency objective yields small, spatially compact evidence sets without sacrificing baseline performance. Across TCGA-NSCLC (LUAD vs. LUSC), TCGA-BRCA (IDC vs. Others), and PANDA, ReaMIL matches or slightly improves baseline AUC and provides quantitative evidence-efficiency diagnostics. On NSCLC, it attains AUC 0.983 with a mean minimal sufficient K (MSK) $\approx 8.2$ tiles at $τ= 0.90$ and AUKC $\approx 0.864$, showing that class confidence rises sharply and stabilizes once a small set of tiles is kept. The method requires no extra supervision, integrates seamlessly with standard MIL training, and naturally yields slide-level overlays. We report accuracy alongside MSK, AUKC, and contiguity for rigorous evaluation of model behavior on WSIs.

Find the papers that actually matter