Research Papers

ARXIV Cancer: unknown Method: self-supervised learning

Addressing Data Scarcity in 3D Trauma Detection through Self-Supervised and Semi-Supervised Learning with Vertex Relative Position Encoding

Shivam Chaudhary, Sheethal Bhat, Andreas Maier
Published 2026-03-12 23:17

This paper addresses the challenge of detecting traumatic injuries in abdominal CT scans, which is hindered by a lack of annotated data. It proposes a label-efficient method that combines self-supervised pre-training with semi-supervised learning for 3D medical image analysis. The approach utilizes a 3D U-Net encoder pre-trained on unannotated CT volumes, achieving significant improvements in injury detection and classification tasks with minimal labeled data.

Read abstract

Accurate detection and localization of traumatic injuries in abdominal CT scans remains a critical challenge in emergency radiology, primarily due to severe scarcity of annotated medical data. This paper presents a label-efficient approach combining self-supervised pre-training with semi-supervised detection for 3D medical image analysis. We employ patch-based Masked Image Modeling (MIM) to pre-train a 3D U-Net encoder on 1,206 CT volumes without annotations, learning robust anatomical representations. The pretrained encoder enables two downstream clinical tasks: 3D injury detection using VDETR with Vertex Relative Position Encoding, and multi-label injury classification. For detection, semi-supervised learning with 2,000 unlabeled volumes and consistency regularization achieves 56.57% validation mAP@0.50 and 45.30% test mAP@0.50 with only 144 labeled training samples, representing a 115% improvement over supervised-only training. For classification, expanding to 2,244 labeled samples yields 94.07% test accuracy across seven injury categories using only a frozen encoder, demonstrating immediately transferable self-supervised features. Our results validate that self-supervised pre-training combined with semi-supervised learning effectively addresses label scarcity in medical imaging, enabling robust 3D object detection with limited annotations.

ARXIV Cancer: general cancer Method: machine unlearning

Adaptation of Weakly Supervised Localization in Histopathology by Debiasing Predictions

Alexis Guichemerre, Banafsheh Karimian, Soufiane Belharbi, Natacha Gillet, Nicolas Thome, Pourya Shamsolmoali, Mohammadhadi Shateri, Luke McCaffrey, Eric Granger
Published 2026-03-12 21:34

This paper addresses the challenges of Weakly Supervised Object Localization (WSOL) in histopathology images, particularly under domain shifts that lead to biased predictions. The authors introduce a novel method, SFDA-DeP, which utilizes machine unlearning principles to iteratively correct prediction bias and improve both classification and localization tasks. Extensive experiments demonstrate that this approach outperforms existing state-of-the-art methods in cross-organ and -center histopathology benchmarks.

Read abstract

Weakly Supervised Object Localization (WSOL) models enable joint classification and region-of-interest localization in histology images using only image-class supervision. When deployed in a target domain, distributions shift remains a major cause of performance degradation, especially when applied on new organs or institutions with different staining protocols and scanner characteristics. Under stronger cross-domain shifts, WSOL predictions can become biased toward dominant classes, producing highly skewed pseudo-label distributions in the target domain. Source-Free (Unsupervised) Domain Adaptation (SFDA) methods are commonly employed to address domain shift. However, because they rely on self-training, the initial bias is reinforced over training iterations, degrading both classification and localization tasks. We identify this amplification of prediction bias as a primary obstacle to the SFDA of WSOL models in histopathology. This paper introduces \sfdadep, a method inspired by machine unlearning that formulates SFDA as an iterative process of identifying and correcting prediction bias. It periodically identifies target images from over-predicted classes and selectively reduces the predictive confidence for uncertain (high entropy) images, while preserving confident predictions. This process reduces the drift of decision boundaries and bias toward dominant classes. A jointly optimized pixel-level classifier further restores discriminative localization features under distribution shift. Extensive experiments on cross-organ and -center histopathology benchmarks (glas, CAMELYON-16, CAMELYON-17) with several WSOL models show that SFDA-DeP consistently improves classification and localization over state-of-the-art SFDA baselines. {\small Code: \href{https://anonymous.4open.science/r/SFDA-DeP-1797/}{anonymous.4open.science/r/SFDA-DeP-1797/}}

ARXIV Cancer: melanoma Method: convolutional neural network

Unmasking Biases and Reliability Concerns in Convolutional Neural Networks Analysis of Cancer Pathology Images

Michael Okonoda, Eder Martinez, Abhilekha Dalal, Lior Shamir
Published 2026-03-12 20:42

This study investigates the biases and reliability of Convolutional Neural Networks (CNNs) in analyzing cancer pathology images. By evaluating thirteen benchmark datasets across various cancer types, the research highlights that CNNs can achieve high accuracy even when tested on irrelevant data segments. The findings suggest that current evaluation practices may lead to misleading results due to inherent biases in certain CNN architectures.

Read abstract

Convolutional Neural Networks have shown promising effectiveness in identifying different types of cancer from radiographs. However, the opaque nature of CNNs makes it difficult to fully understand the way they operate, limiting their assessment to empirical evaluation. Here we study the soundness of the standard practices by which CNNs are evaluated for the purpose of cancer pathology. Thirteen highly used cancer benchmark datasets were analyzed, using four common CNN architectures and different types of cancer, such as melanoma, carcinoma, colorectal cancer, and lung cancer. We compared the accuracy of each model with that of datasets made of cropped segments from the background of the original images that do not contain clinically relevant content. Because the rendered datasets contain no clinical information, the null hypothesis is that the CNNs should provide mere chance-based accuracy when classifying these datasets. The results show that the CNN models provided high accuracy when using the cropped segments, sometimes as high as 93\%, even though they lacked biomedical information. These results show that some CNN architectures are more sensitive to bias than others. The analysis shows that the common practices of machine learning evaluation might lead to unreliable results when applied to cancer pathology. These biases are very difficult to identify, and might mislead researchers as they use available benchmark datasets to test the efficacy of CNN methods.

ARXIV Cancer: unknown Method: instruction-tuned LLM

Prototype-Based Knowledge Guidance for Fine-Grained Structured Radiology Reporting

Chantal Pellegrini, Adrian Delchev, Ege Özsoy, Nassir Navab, Matthias Keicher
Published 2026-03-12 13:51

This paper presents ProtoSR, a novel approach that integrates free-text information into structured radiology reporting to enhance automation in fine-grained decision-making. By utilizing an instruction-tuned LLM to extract knowledge from a large dataset of MIMIC-CXR studies, ProtoSR builds a multimodal knowledge base that aligns with structured reporting templates. The method demonstrates significant improvements in performance on the Rad-ReStruct benchmark, particularly for detailed attribute questions, highlighting the benefits of combining unstructured and structured data.

Read abstract

Structured radiology reporting promises faster, more consistent communication than free text, but automation remains difficult as models must make many fine-grained, discrete decisions about rare findings and attributes from limited structured supervision. In contrast, free-text reports are produced at scale in routine care and implicitly encode fine-grained, image-linked information through detailed descriptions. To leverage this unstructured knowledge, we propose ProtoSR, an approach for injecting free-text information into structured report population. First, we introduce an automatic extraction pipeline that uses an instruction-tuned LLM to mine 80k+ MIMIC-CXR studies and build a multimodal knowledge base aligned with a structured reporting template, representing each answer option with a visual prototype. Using this knowledge base, ProtoSR is trained to retrieve prototypes relevant for the current image-question pair and augment the model predictions through a prototype-conditioned residual, providing a data-driven second opinion that selectively corrects predictions. On the Rad-ReStruct benchmark, ProtoSR achieves state-of-the-art results, with the largest improvements on detailed attribute questions, demonstrating the value of integrating free-text derived signal for fine-grained image understanding.

ARXIV Cancer: general cancer Method: hybrid generative AI

ELISA: An Interpretable Hybrid Generative AI Agent for Expression-Grounded Discovery in Single-Cell Genomics

Omar Coser
Published 2026-03-12 12:46

The paper presents ELISA, an interpretable hybrid generative AI agent designed for the analysis of single-cell RNA sequencing (scRNA-seq) data. It integrates expression embeddings with semantic retrieval and large language model (LLM) interpretation to facilitate interactive discovery in single-cell genomics. ELISA demonstrates significant improvements in cell type retrieval and replicates biological findings across various datasets, including those related to pediatric and adult cancers.

Read abstract

Translating single-cell RNA sequencing (scRNA-seq) data into mechanistic biological hypotheses remains a critical bottleneck, as agentic AI systems lack direct access to transcriptomic representations while expression foundation models remain opaque to natural language. Here we introduce ELISA (Embedding-Linked Interactive Single-cell Agent), an interpretable framework that unifies scGPT expression embeddings with BioBERT-based semantic retrieval and LLM-mediated interpretation for interactive single-cell discovery. An automatic query classifier routes inputs to gene marker scoring, semantic matching, or reciprocal rank fusion pipelines depending on whether the query is a gene signature, natural language concept, or mixture of both. Integrated analytical modules perform pathway activity scoringacross 60+ gene sets, ligand--receptor interaction prediction using 280+ curated pairs, condition-aware comparative analysis, and cell-type proportion estimation all operating directly on embedded data without access to the original count matrix. Benchmarked across six diverse scRNA-seq datasets spanning inflammatory lung disease, pediatric and adult cancers, organoid models, healthy tissue, and neurodevelopment, ELISA significantly outperforms CellWhisperer in cell type retrieval (combined permutation test, $p < 0.001$), with particularly large gains on gene-signature queries (Cohen's $d = 5.98$ for MRR). ELISA replicates published biological findings (mean composite score 0.90) with near-perfect pathway alignment and theme coverage (0.98 each), and generates candidate hypotheses through grounded LLM reasoning, bridging the gap between transcriptomic data exploration and biological discovery. Code available at: https://github.com/omaruno/ELISA-An-AI-Agent-for-Expression-Grounded-Discovery-in-Single-Cell-Genomics.git (If you use ELISA in your research, please cite this work).

ARXIV Cancer: glioblastoma Method: deep learning

Multimodal classification of Radiation-Induced Contrast Enhancements and tumor recurrence using deep learning

Robin Peretzke, Marlin Hanstein, Maximilian Fischer, Lars Badhi Wessel, Obada Alhalabi, Sebastian Regnery, Andreas Kudak, Maximilian Deng, Tanja Eichkorn, Philipp Hoegen Saßmannshausen, Fabian Allmendinger, Jan-Hendrik Bolten, Philipp Schröter, Christine Jungk, Jürgen Peter Debus, Peter Neher, Laila König, Klaus Maier-Hein
Published 2026-03-12 11:48

This study addresses the challenge of differentiating between tumor recurrence and radiation-induced contrast enhancements in glioblastoma patients. The authors present RICE-NET, a multimodal 3D deep learning model that combines longitudinal MRI data with radiotherapy dose distributions for automated lesion classification. The model demonstrated a high F1 score of 0.92 on an independent test set, indicating its potential to improve diagnostic accuracy in neuro-oncology.

Read abstract

The differentiation between tumor recurrence and radiation-induced contrast enhancements in post-treatment glioblastoma patients remains a major clinical challenge. Existing approaches rely on clinically sparsely available diffusion MRI or do not consider radiation maps, which are gaining increasing interest in the tumor board for this differentiation. We introduce RICE-NET, a multimodal 3D deep learning model that integrates longitudinal MRI data with radiotherapy dose distributions for automated lesion classification using conventional T1-weighted MRI data. Using a cohort of 92 patients, the model achieved an F1 score of 0.92 on an independent test set. During extensive ablation experiments, we quantified the contribution of each timepoint and modality and showed that reliable classification largely depends on the radiation map. Occlusion-based interpretability analyses further confirmed the model's focus on clinically relevant regions. These findings highlight the potential of multimodal deep learning to enhance diagnostic accuracy and support clinical decision-making in neuro-oncology.

ARXIV Cancer: ovarian cancer Method: convolutional neural network

Automated Detection of Malignant Lesions in the Ovary Using Deep Learning Models and XAI

Md. Hasin Sarwar Ifty, Nisharga Nirjan, Labib Islam, M. A. Diganta, Reeyad Ahmed Ornate, Anika Tasnim, Md. Saiful Islam
Published 2026-03-12 11:26

This research focuses on improving the detection of malignant ovarian lesions using deep learning models. Several Convolutional Neural Networks, including LeNet-5, ResNet, VGGNet, and GoogLeNet/Inception, were developed and evaluated to identify the most effective model for ovarian cancer detection. The InceptionV3 model achieved the highest performance metrics, with an average score of 94%. Additionally, Explainable Artificial Intelligence techniques were employed to interpret the model's outcomes.

Read abstract

The unrestrained proliferation of cells that are malignant in nature is cancer. In recent times, medical professionals are constantly acquiring enhanced diagnostic and treatment abilities by implementing deep learning models to analyze medical data for better clinical decision, disease diagnosis and drug discovery. A majority of cancers are studied and treated by incorporating these technologies. However, ovarian cancer remains a dilemma as it has inaccurate non-invasive detection procedures and a time consuming, invasive procedure for accurate detection. Thus, in this research, several Convolutional Neural Networks such as LeNet-5, ResNet, VGGNet and GoogLeNet/Inception have been utilized to develop 15 variants and choose a model that accurately detects and identifies ovarian cancer. For effective model training, the dataset OvarianCancer&SubtypesDatasetHistopathology from Mendeley has been used. After constructing a model, we utilized Explainable Artificial Intelligence (XAI) models such as LIME, Integrated Gradients and SHAP to explain the black box outcome of the selected model. For evaluating the performance of the model, Accuracy, Precision, Recall, F1-Score, ROC Curve and AUC have been used. From the evaluation, it was seen that the slightly compact InceptionV3 model with ReLu had the overall best result achieving an average score of 94% across all the performance metrics in the augmented dataset. Lastly for XAI, the three aforementioned XAI have been used for an overall comparative analysis. It is the aim of this research that the contributions of the study will help in achieving a better detection method for ovarian cancer.

ARXIV Cancer: breast cancer Method: Random Survival Forest

Leveraging Large Language Models and Survival Analysis for Early Prediction of Chemotherapy Outcomes

Muhammad Faisal Shahid, Asad Afzal, Abdullah Faiz, Muhammad Siddiqui, Arbaz Khan Shehzad, Fatima Aftab, Muhammad Usamah Shahid, Muddassar Farooq
Published 2026-03-12 06:25

This study aims to improve early prediction of chemotherapy outcomes by utilizing Large Language Models (LLMs) and ontology-based techniques for extracting phenotypes and outcome labels from patient notes. The research focuses on breast cancer, employing a dataset that includes various patient features and chemotherapy regimens. The proposed method, which incorporates Random Survival Forest, achieved a C-index of 73% for predicting time-to-failure and demonstrated accuracy and F1 scores above 70%. The findings suggest that LLM-based clinical data extraction can enhance personalized treatment plans and patient outcomes.

Read abstract

Chemotherapy for cancer treatment is costly and accompanied by severe side effects, highlighting the critical need for early prediction of treatment outcomes to improve patient management and informed decision-making. Predictive models for chemotherapy outcomes using real-world data face challenges, including the absence of explicit phenotypes and treatment outcome labels such as cancer progression and toxicity. This study addresses these challenges by employing Large Language Models (LLMs) and ontology-based techniques for phenotypes and outcome label extraction from patient notes. We focused on one of the most frequently occurring cancers, breast cancer, due to its high prevalence and significant variability in patient response to treatment, making it a critical area for improving predictive modeling. The dataset included features such as vitals, demographics, staging, biomarkers, and performance scales. Drug regimens and their combinations were extracted from the chemotherapy plans in the EMR data and shortlisted based on NCCN guidelines, verified with NIH standards, and analyzed through survival modeling. The proposed approach significantly reduced phenotypes sparsity and improved predictive accuracy. Random Survival Forest was used to predict time-to-failure, achieving a C-index of 73%, and utilized as a classifier at a specific time point to predict treatment outcomes, with accuracy and F1 scores above 70%. The outcome probabilities were validated for reliability by calibration curves. We extended our approach to four other cancer types. This research highlights the potential of early prediction of treatment outcomes using LLM-based clinical data extraction enabling personalized treatment plans with better patient outcomes.

ARXIV Cancer: lung cancer Method: vision transformer

DeepHistoViT: An Interpretable Vision Transformer Framework for Histopathological Cancer Classification

Ravi Mosalpuri, Mohammed Abdelsamea, Ahmed Karam Eldaly
Published 2026-03-12 00:29

This study introduces DeepHistoViT, a transformer-based framework designed for the automated classification of histopathological images. The model incorporates a customized Vision Transformer architecture with an attention mechanism to enhance interpretability and capture fine-grained cellular structures. Evaluated on datasets for lung cancer, colon cancer, and acute lymphoblastic leukaemia, the framework achieved state-of-the-art performance, demonstrating its potential as a reliable tool for supporting pathologists in clinical decision-making.

Read abstract

Histopathology remains the gold standard for cancer diagnosis because it provides detailed cellular-level assessment of tissue morphology. However, manual histopathological examination is time-consuming, labour-intensive, and subject to inter-observer variability, creating a demand for reliable computer-assisted diagnostic tools. Recent advances in deep learning, particularly transformer-based architectures, have shown strong potential for modelling complex spatial dependencies in medical images. In this work, we propose DeepHistoViT, a transformer-based framework for automated classification of histopathological images. The model employs a customized Vision Transformer architecture with an integrated attention mechanism designed to capture fine-grained cellular structures while improving interpretability through attention-based localization of diagnostically relevant regions. The framework is evaluated on three publicly available histopathology datasets covering lung cancer, colon cancer, and acute lymphoblastic leukaemia. Experimental results demonstrate state-of-the-art performance across all datasets, with classification accuracy, precision, recall, F1-score, and ROC-AUC reaching 100 percent on the lung and colon cancer datasets, and 99.85 percent, 99.84 percent, 99.86 percent, 99.85 percent, and 99.99 percent respectively on the acute lymphoblastic leukaemia dataset. All performance metrics are reported with 95 percent confidence intervals. These results highlight the effectiveness of transformer-based architectures for histopathological image analysis and demonstrate the potential of DeepHistoViT as an interpretable computer-assisted diagnostic tool to support pathologists in clinical decision-making.

ARXIV Cancer: breast cancer Method: evidential learning

Evidential learning driven Breast Tumor Segmentation with Stage-divided Vision-Language Interaction

Jingxing Zhong, Qingtao Pan, Xuchang Zhou, Jiazhen Lin, Xinguo Zhuang
Published 2026-03-11 18:24

This study presents a novel text-guided Breast Tumor Segmentation model (TextBCS) that integrates stage-divided vision-language interaction and evidential learning to improve the accuracy of tumor segmentation in MRI images. The model addresses challenges such as low contrast between cancerous and normal tissues and blurred tumor boundaries by leveraging text prompts to enhance segmentation performance. Experimental results demonstrate that TextBCS outperforms existing segmentation networks, achieving superior results on publicly available datasets.

Read abstract

Breast cancer is one of the most common causes of death among women worldwide, with millions of fatalities annually. Magnetic Resonance Imaging (MRI) can provide various sequences for characterizing tumor morphology and internal patterns, and becomes an effective tool for detection and diagnosis of breast tumors. However, previous deep-learning based tumor segmentation methods have limitations in accurately locating tumor contours due to the challenge of low contrast between cancer and normal areas and blurred boundaries. Leveraging text prompt information holds promise in ameliorating tumor segmentation effect by delineating segmentation regions. Inspired by this, we propose text-guided Breast Tumor Segmentation model (TextBCS) with stage-divided vision-language interaction and evidential learning. Specifically, the proposed stage-divided vision-language interaction facilitates information mutual between visual and text features at each stage of down-sampling, further exerting the advantages of text prompts to assist in locating lesion areas in low contrast scenarios. Moreover, the evidential learning is adopted to quantify the segmentation uncertainty of the model for blurred boundary. It utilizes the variational Dirichlet to characterize the distribution of the segmentation probabilities, addressing the segmentation uncertainties of the boundaries. Extensive experiments validate the superiority of our TextBCS over other segmentation networks, showcasing the best breast tumor segmentation performance on publicly available datasets.

Find the papers that actually matter