Research Papers

ARXIV Cancer: non-small cell lung cancer Method: multimodal deep learning

Learning from Limited and Incomplete Data: A Multimodal Framework for Predicting Pathological Response in NSCLC

Alice Natalina Caragliano, Giulia Farina, Fatih Aksu, Camillo Maria Caruso, Claudia Tacconi, Carlo Greco, Lorenzo Nibid, Edy Ippolito, Michele Fiore, Giuseppe Perrone, Sara Ramella, Paolo Soda, Valerio Guarrasi
Published 2026-03-16 10:51

This study presents a multimodal deep learning framework aimed at predicting major pathological response (pR) in non-small cell lung cancer (NSCLC) following neoadjuvant therapy. The framework integrates CT feature extraction with a missing-aware architecture to effectively handle limited data and incomplete clinical profiles. Results indicate that this approach outperforms traditional unimodal methods, demonstrating the benefits of combining diverse data sources for improved predictive accuracy.

Read abstract

Major pathological response (pR) following neoadjuvant therapy is a clinically meaningful endpoint in non-small cell lung cancer, strongly associated with improved survival. However, accurate preoperative prediction of pR remains challenging, particularly in real-world clinical settings characterized by limited data availability and incomplete clinical profiles. In this study, we propose a multimodal deep learning framework designed to address these constraints by integrating foundation model-based CT feature extraction with a missing-aware architecture for clinical variables. This approach enables robust learning from small cohorts while explicitly modeling missing clinical information, without relying on conventional imputation strategies. A weighted fusion mechanism is employed to leverage the complementary contributions of imaging and clinical modalities, yielding a multimodal model that consistently outperforms both unimodal imaging and clinical baselines. These findings underscore the added value of integrating heterogeneous data sources and highlight the potential of multimodal, missing-aware systems to support pR prediction under realistic clinical conditions.

ARXIV Cancer: general cancer Method: deep learning

Empowering Chemical Structures with Biological Insights for Scalable Phenotypic Virtual Screening

Xiaoqing Lian, Pengsen Ma, Tengfeng Ma, Zhonghao Ren, Xibao Cai, Zhixiang Cheng, Bosheng Song, He Wang, Xiang Pan, Yangyang Chen, Sisi Yuan, Chen Lin
Published 2026-03-16 09:13

This study introduces DECODE, a framework designed to enhance chemical representations with biological insights for scalable phenotypic virtual screening in drug discovery. By utilizing limited paired transcriptomic and morphological data, DECODE extracts a measurement-invariant biological fingerprint from chemical structures, significantly improving mechanism-of-action prediction and increasing hit rates for novel anti-cancer agents. The results indicate a relative improvement of over 20% compared to traditional chemical baselines.

Read abstract

Motivation: The scalable identification of bioactive compounds is essential for contemporary drug discovery. This process faces a key trade-off: structural screening offers scalability but lacks biological context, whereas high-content phenotypic profiling provides deep biological insights but is resource-intensive. The primary challenge is to extract robust biological signals from noisy data and encode them into representations that do not require biological data at inference. Results: This study presents DECODE (DEcomposing Cellular Observations of Drug Effects), a framework that bridges this gap by empowering chemical representations with intrinsic biological semantics to enable structure-based in silico biological profiling. DECODE leverages limited paired transcriptomic and morphological data as supervisory signals during training, enabling the extraction of a measurement-invariant biological fingerprint from chemical structures and explicit filtering of experimental noise. Our evaluations demonstrate that DECODE retrieves functionally similar drugs in zero-shot settings with over 20% relative improvement over chemical baselines in mechanism-of-action (MOA) prediction. Furthermore, the framework achieves a 6-fold increase in hit rates for novel anti-cancer agents during external validation. Availability and implementation: The codes and datasets of DECODE are available at https://github.com/lian-xiao/DECODE.

ARXIV Cancer: breast cancer Method: large language model

Extracting Breast Cancer Phenotypes from Clinical Notes: Comparing LLMs with Classical Ontology Methods

Abdullah Bin Faiz, Arbaz Khan Shehzad, Asad Afzal, Momin Tariq, Muhammad Siddiqi, Muhammad Usamah Shahid, Maryam Noor Awan, Muddassar Farooq
Published 2026-03-16 05:09

This research presents a framework utilizing large language models (LLMs) to extract phenotypes from unstructured clinical notes in oncology. The study specifically focuses on breast cancer and compares the performance of the LLM-based approach with traditional ontology-driven methods. Results indicate that the LLM framework achieves comparable accuracy to classical methods, demonstrating its adaptability for various cancer types.

Read abstract

A significant amount of data held in Oncology Electronic Medical Records (EMRs) is contained in unstructured provider notes -- including but not limited to the chemotherapy (or cancer treatment) outcome, different biomarkers, the tumor's location, sizes, and growth patterns of a patient. The clinical studies show that the majority of oncologists are comfortable providing these valuable insights in their notes in a natural language rather than the relevant structured fields of an EMR. The major contribution of this research is to report an LLM-based framework to process provider notes and extract valuable medical knowledge and phenotype mentioned above, with a focus on the domain of oncology. In this paper, we focus on extracting phenotypes related to breast cancer using our LLM framework, and then compare its performance with earlier works that used knowledge-driven annotation system, paired with the NCIt Ontology Annotator. The results of the study show that an LLM-based information extraction framework can be easily adapted to extract phenotypes with an accuracy that is comparable to the classical ontology-based methods. However, once trained, they could be easily fine-tuned to cater for other cancer types and diseases.

ARXIV Cancer: unknown Method: heterogeneous ensemble

A Heterogeneous Ensemble for Multi-Center COVID-19 Classification from Chest CT Scans

Aadit Nilay, Bhavesh Thapar, Anant Agrawal, Mohammad Nayeem Teli
Published 2026-03-15 21:34

This study addresses the limitations of RT-PCR tests and the challenges of CT-based screening for COVID-19 diagnosis across multiple hospital centers. A heterogeneous ensemble of nine models is proposed, utilizing various architectures and techniques to improve diagnostic accuracy. The ensemble achieved an average macro F1 score of 0.9280, outperforming the best single model, highlighting the importance of diverse models and calibration in medical image classification.

Read abstract

The COVID-19 pandemic exposed critical limitations in diagnostic workflows: RT-PCR tests suffer from slow turnaround times and high false-negative rates, while CT-based screening offers faster complementary diagnosis but requires expert radiological interpretation. Deploying automated CT analysis across multiple hospital centres introduces further challenges, as differences in scanner hardware, acquisition protocols, and patient populations cause substantial domain shift that degrades single-model performance. To address these challenges, we present a heterogeneous ensemble of nine models spanning three inference paradigms: (1) a self-supervised DINOv2 Vision Transformer with slice-level sigmoid aggregation, (2) a RadImageNet-pretrained DenseNet-121 with slice-level sigmoid averaging, and (3) seven Gated Attention Multiple Instance Learning models using EfficientNet-B3, ConvNeXt-Tiny, and EfficientNetV2-S backbones with scan-level softmax classification. Ensemble diversity is further enhanced through random-seed variation and Stochastic Weight Averaging. We address severe overfitting, reducing the validation-to-training loss ratio from 35x to less than 3x, through a combination of Focal Loss, embedding-level Mixup, and domain-aware augmentation. Model outputs are fused via score-weighted probability averaging and calibrated with per-source threshold optimization. The final ensemble achieves an average macro F1 of 0.9280 across four hospital centres, outperforming the best single model (F1=0.8969) by +0.031, demonstrating that heterogeneous architectures combined with source-aware calibration are essential for robust multi-site medical image classification.

ARXIV Cancer: colorectal cancer Method: convolutional neural network

A comprehensive multimodal dataset and benchmark for ulcerative colitis scoring in endoscopy

Noha Ghatwary, Jiangbei Yue, Ahmed Elgendy, Hanna Nagdy, Ahmed Galal, Hayam Fathy, Hussein El-Amin, Venkataraman Subramanian, Noor Mohammed, Gilberto Ochoa-Ruiz, Sharib Ali
Published 2026-03-15 19:15

This paper presents a comprehensive multimodal dataset aimed at improving the scoring of ulcerative colitis during endoscopy. The dataset includes expert-validated labels for the Mayo Endoscopic Score and the Ulcerative Colitis Endoscopic Index of Severity, along with clinical descriptions. The authors highlight the need for robust computational methods to predict these scores and benchmark various AI techniques, including convolutional neural networks and vision transformers.

Read abstract

Ulcerative colitis (UC) is a chronic mucosal inflammatory condition that places patients at increased risk of colorectal cancer. Colonoscopic surveillance remains the gold standard for assessing disease activity, and reporting typically relies on standardised endoscopic scoring metrics. The most widely used is the Mayo Endoscopic Score (MES), with some centres also adopting the Ulcerative Colitis Endoscopic Index of Severity (UCEIS). Both are descriptive assessments of mucosal inflammation (MES: 0 to 3; UCEIS: 0 to 8), where higher values indicate more severe disease. However, computational methods for automatically predicting these scores remain limited, largely due to the lack of publicly available expert-annotated datasets and the absence of robust benchmarking. There is also a significant research gap in generating clinically meaningful descriptions of UC images, despite image captioning being a well-established computer vision task. Variability in endoscopic systems and procedural workflows across centres further highlights the need for multi-centre datasets to ensure algorithmic robustness and generalisability. In this work, we introduce a curated multi-centre, multi-resolution dataset that includes expert-validated MES and UCEIS labels, alongside detailed clinical descriptions. To our knowledge, this is the first comprehensive dataset that combines dual scoring metrics for classification tasks with expert-generated captions describing mucosal appearance and clinically accepted reasoning for image captioning. This resource opens new opportunities for developing clinically meaningful multimodal algorithms. In addition to the dataset, we also provide benchmarking using convolutional neural networks, vision transformers, hybrid models, and widely used multimodal vision-language captioning algorithms.

ARXIV Cancer: breast cancer Method: deep learning

Histo-MExNet: A Unified Framework for Real-World, Cross-Magnification, and Trustworthy Breast Cancer Histopathology

Enam Ahmed Taufika, Md Ahasanul Arafatha, Abhijit Kumar Ghoshb, Md. Tanzim Rezab, Md Ashad Alamc
Published 2026-03-15 15:01

The paper presents Histo-MExNet, a unified framework aimed at improving histopathological image classification for breast cancer diagnosis. This model addresses challenges related to magnification variability and interpretability by integrating multiple deep learning backbones and a prototype learning module. It achieves a high accuracy of 96.97% on the BreaKHis dataset and enhances generalization to unseen magnification levels while providing uncertainty estimation to support clinical decision-making.

Read abstract

Accurate and reliable histopathological image classification is essential for breast cancer diagnosis. However, many deep learning models remain sensitive to magnification variability and lack interpretability. To address these challenges, we propose Histo-MExNet, a unified framework designed for scaleinvariant and uncertainty-aware classification. The model integrates DenseNet, ConvNeXt, and EfficientNet backbones within a gated multi-expert architecture, incorporates a prototype learning module for example-driven interpretability, and applies physics-informed regularization to enforce morphology preservation and spatial coherence during feature learning. Monte Carlo Dropout is used to quantify predictive uncertainty. On the BreaKHis dataset, Histo-MExNet achieves 96.97% accuracy under multi-magnification training and demonstrates improved generalization to unseen magnification levels compared to single-expert models, while uncertainty estimation helps identify out-of-distribution samples and reduce overconfident errors, supporting a balanced combination of accuracy, robustness, and interpretability for clinical decision support.

ARXIV Cancer: prostate cancer Method: deep learning

Deep Learning From Routine Histology Improves Risk Stratification for Biochemical Recurrence in Prostate Cancer

Clément Grisi, Khrystyna Faryna, Nefise Uysal, Vittorio Agosti, Enrico Munari, Solène-Florence Kammerer-Jacquet, Paulo Guilherme de Oliveira Salles, Yuri Tolkach, Reinhard Büttner, Sofiya Semko, Maksym Pikul, Axel Heidenreich, Jeroen van der Laak, Geert Litjens
Published 2026-03-15 02:22

This study focuses on improving the prediction of biochemical recurrence (BCR) in prostate cancer following radical prostatectomy. A deep learning-based biomarker was developed to analyze H&E-stained whole-slide specimens, demonstrating robust generalization across multiple cohorts. The model, when combined with existing clinical risk scores, significantly enhanced the discrimination of BCR risk, indicating its potential for personalized management in clinical settings.

Read abstract

Accurate prediction of biochemical recurrence (BCR) after radical prostatectomy is critical for guiding adjuvant treatment and surveillance decisions in prostate cancer. However, existing clinicopathological risk models reduce complex morphology to relatively coarse descriptors, leaving substantial prognostic information embedded in routine histopathology underexplored. We present a deep learning-based biomarker that predicts continuous, patient-specific risk of BCR directly from H&E-stained whole-slide prostatectomy specimens. Trained end-to-end on time-to-event outcomes and evaluated across four independent international cohorts, our model demonstrates robust generalization across institutions and patient populations. When integrated with the CAPRA-S clinical risk score, the deep learning risk score consistently improved discrimination for BCR, increasing concordance indices from 0.725-0.772 to 0.749-0.788 across cohorts. To support clinical interpretability, outcome-grounded analyses revealed subtle histomorphological patterns associated with recurrence risk that are not captured by conventional clinicopathological risk scores. This multicohort study demonstrates that deep learning applied to routine prostate histopathology can deliver reproducible and clinically generalizable biomarkers that augment postoperative risk stratification, with potential to support personalized management of prostate cancer in real-world clinical settings.

ARXIV Cancer: general cancer Method: multimodal learning

EviAgent: Evidence-Driven Agent for Radiology Report Generation

Tuoshi Qi, Shenshen Bu, Yingfei Xiang, Zhiming Dai
Published 2026-03-14 14:07

The paper presents EviAgent, a novel approach for automated radiology report generation that addresses the limitations of existing Multimodal Large Language Models (MLLMs). By breaking down the report generation process into granular operational units and integrating multi-dimensional visual experts, EviAgent enhances transparency and provides explicit visual evidence to support diagnoses. Experimental results on various medical imaging datasets indicate that EviAgent outperforms both generalist and specialized models, offering a reliable solution for radiologists.

Read abstract

Automated radiology report generation holds immense potential to alleviate the heavy workload of radiologists. Despite the formidable vision-language capabilities of recent Multimodal Large Language Models (MLLMs), their clinical deployment is severely constrained by inherent limitations: their "black-box" decision-making renders the generated reports untraceable due to the lack of explicit visual evidence to support the diagnosis, and they struggle to access external domain knowledge. To address these challenges, we propose the Evidence-driven Radiology Report Generation Agent (EviAgent). Unlike opaque end-to-end paradigms, EviAgent coordinates a transparent reasoning trajectory by breaking down the complex generation process into granular operational units. We integrate multi-dimensional visual experts and retrieval mechanisms as external support modules, endowing the system with explicit visual evidence and high-quality clinical priors. Extensive experiments on MIMIC-CXR, CheXpert Plus, and IU-Xray datasets demonstrate that EviAgent outperforms both large-scale generalist models and specialized medical models, providing a robust and trustworthy solution for automated radiology report generation.

ARXIV Cancer: breast cancer Method: unknown

ArrayTac: A Closed-loop Piezoelectric Tactile Platform for Continuously Tunable Rendering of Shape, Stiffness, and Friction

Tianhai Liang, Shiyi Guo, Baiye Cheng, Zhengrong Xue, Han Zhang, Huazhe Xu
Published 2026-03-14 08:23

The paper presents ArrayTac, a closed-loop piezoelectric tactile display designed to render shape, stiffness, and friction as continuously tunable signals for enhanced tactile perception. The system utilizes a 4 by 4 actuator array with advanced feedback mechanisms, allowing for high fidelity in tactile rendering. Psychophysical experiments demonstrated that participants could accurately identify three-dimensional shapes and distinguish various stiffness and friction levels through touch. Additionally, the platform was tested for remote palpation of a breast tumor phantom, achieving accurate identification of tumor characteristics.

Read abstract

Human touch depends on the integration of shape, stiffness, and friction, yet existing tactile displays cannot render these cues together as continuously tunable, high-fidelity signals for intuitive perception. We present ArrayTac, a closed-loop piezoelectric tactile display that simultaneously renders these three dimensions with continuous tunability on a 4 by 4 actuator array. Each unit integrates a three-stage micro-lever amplifier with end-effector Hall-effect feedback, enabling up to 5 mm displacement, greater than 500 Hz array refresh, and 123 Hz closed-loop bandwidth. In psychophysical experiments, naive participants identified three-dimensional shapes and distinguished multiple stiffness and friction levels through touch alone without training. We further demonstrate image-to-touch rendering from an RGB image and remote palpation of a medical-grade breast tumor phantom over 1,000 km, in which all 11 naive participants correctly identified tumor number and type with sub-centimeter localization error. These results establish ArrayTac as a platform for multidimensional haptic rendering and interaction.

ARXIV Cancer: breast cancer Method: multimodal foundation models

ArrayTac: A tactile display for simultaneous rendering of shape, stiffness and friction

Tianhai Liang, Shiyi Guo, Baiye Cheng, Zhengrong Xue, Han Zhang, Huazhe Xu
Published 2026-03-14 08:23

The paper presents ArrayTac, a novel tactile display designed to render shape, stiffness, and friction simultaneously, enhancing haptic feedback in human-computer interaction. The system utilizes a 4x4 array of actuator units with a micro-lever mechanism and closed-loop control for improved precision. User studies demonstrate its effectiveness, with participants accurately identifying object shapes and properties, including tumor detection in a breast phantom with 100% accuracy. This work represents a significant advancement in realistic haptic simulation.

Read abstract

Human-computer interaction in the visual and auditory domains has achieved considerable maturity, yet machine-to-human tactile feedback remains underdeveloped. Existing tactile displays struggle to simultaneously render multiple tactile dimensions, such as shape, stiffness, and friction, which limits the realism of haptic simulation. Here, we present ArrayTac, a piezoelectric-driven tactile display capable of simultaneously rendering shape, stiffness, and friction to reproduce realistic haptic signals. The system comprises a 4x4 array of 16 actuator units, each employing a three-stage micro-lever mechanism to amplify the micrometer-scale displacement of the piezoelectric element, with Hall sensor-based closed-loop control at the end effector to enhance response speed and precision. We further implement two end-to-end pipelines: 1) a vision-to-touch framework that converts visual inputs into tactile signals using multimodal foundation models, and 2) a real-time tele-palpation system operating over distances of several thousand kilometers. In user studies, first-time participants accurately identify object shapes and physical properties with high success rates. In a tele-palpation experiment over 1,000km, untrained volunteers correctly identified both the number and type of tumors in a breast phantom with 100% accuracy and precisely localized their positions. The system pioneers a new pathway for high-fidelity haptic feedback by introducing the unprecedented capability to simultaneously render an object's shape, stiffness, and friction, delivering a holistic tactile experience that was previously unattainable.

Find the papers that actually matter