Research Papers

ARXIV Cancer: general cancer Method: multi-task learning

iSight: Towards expert-AI co-assessment for improved immunohistochemistry staining interpretation

Jacob S. Leiby, Jialu Yao, Pan Lu, George Hu, Anna Davidian, Shunsuke Koga, Olivia Leung, Pravin Patel, Isabella Tondi Resta, Rebecca Rojansky, Derek Sung, Eric Yang, Paul J. Zhang, Emma Lundberg, Dokyoon Kim, Serena Yeung-Levy, James Zou, Thomas Montine, Jeffrey Nirschl, Zhi Huang
Published 2026-02-03 22:49

This study introduces iSight, a multi-task learning framework designed for automated immunohistochemistry (IHC) staining assessment using a large dataset of IHC images. The framework combines visual features from whole-slide images with tissue metadata to predict various aspects of staining. iSight demonstrated superior performance compared to traditional pathologist assessments and fine-tuned models, achieving high accuracy in predicting staining characteristics. The findings suggest that expert-AI co-assessment can enhance the reliability of IHC interpretation in clinical settings.

Read abstract

Immunohistochemistry (IHC) provides information on protein expression in tissue sections and is commonly used to support pathology diagnosis and disease triage. While AI models for H\&E-stained slides show promise, their applicability to IHC is limited due to domain-specific variations. Here we introduce HPA10M, a dataset that contains 10,495,672 IHC images from the Human Protein Atlas with comprehensive metadata included, and encompasses 45 normal tissue types and 20 major cancer types. Based on HPA10M, we trained iSight, a multi-task learning framework for automated IHC staining assessment. iSight combines visual features from whole-slide images with tissue metadata through a token-level attention mechanism, simultaneously predicting staining intensity, location, quantity, tissue type, and malignancy status. On held-out data, iSight achieved 85.5\% accuracy for location, 76.6\% for intensity, and 75.7\% for quantity, outperforming fine-tuned foundation models (PLIP, CONCH) by 2.5--10.2\%. In addition, iSight demonstrates well-calibrated predictions with expected calibration errors of 0.0150-0.0408. Furthermore, in a user study with eight pathologists evaluating 200 images from two datasets, iSight outperformed initial pathologist assessments on the held-out HPA dataset (79\% vs 68\% for location, 70\% vs 57\% for intensity, 68\% vs 52\% for quantity). Inter-pathologist agreement also improved after AI assistance in both held-out HPA (Cohen's $κ$ increased from 0.63 to 0.70) and Stanford TMAD datasets (from 0.74 to 0.76), suggesting expert--AI co-assessment can improve IHC interpretation. This work establishes a foundation for AI systems that can improve IHC diagnostic accuracy and highlights the potential for integrating iSight into clinical workflows to enhance the consistency and reliability of IHC assessment.

ARXIV Cancer: unknown Method: unknown

Fast, Unsupervised Framework for Registration Quality Assessment of Multi-stain Histological Whole Slide Pairs

Shikha Dubey, Patricia Raciti, Kristopher Standish, Albert Juan Ramon, Erik Ames Burlingame
Published 2026-02-03 22:30

This study presents a fast, unsupervised framework for assessing the quality of registration of histopathological whole slide images (WSIs), specifically focusing on H&E and IHC pairs. The proposed method utilizes down-sampled tissue masks and deformation-based metrics to evaluate registration quality without the need for ground-truth annotations. Validation results indicate a strong correlation between the automated metrics and human evaluations, highlighting the framework's potential for real-time quality control in digital pathology.

Read abstract

High-fidelity registration of histopathological whole slide images (WSIs), such as hematoxylin & eosin (H&E) and immunohistochemistry (IHC), is vital for integrated molecular analysis but challenging to evaluate without ground-truth (GT) annotations. Existing WSI-level assessments -- using annotated landmarks or intensity-based similarity metrics -- are often time-consuming, unreliable, and computationally intensive, limiting large-scale applicability. This study proposes a fast, unsupervised framework that jointly employs down-sampled tissue masks- and deformations-based metrics for registration quality assessment (RQA) of registered H&E and IHC WSI pairs. The masks-based metrics measure global structural correspondence, while the deformations-based metrics evaluate local smoothness, continuity, and transformation realism. Validation across multiple IHC markers and multi-expert assessments demonstrate a strong correlation between automated metrics and human evaluations. In the absence of GT, this framework offers reliable, real-time RQA with high fidelity and minimal computational resources, making it suitable for large-scale quality control in digital pathology.

ARXIV Cancer: unknown Method: conformal prediction

CONRep: Uncertainty-Aware Vision-Language Report Drafting Using Conformal Prediction

Danial Elyassirad, Benyamin Gheiji, Mahsa Vatanparast, Amir Mahmoud Ahmadzadeh, Seyed Amir Asef Agah, Mana Moassefi, Meysam Tavakoli, Shahriar Faghani
Published 2026-02-03 15:02

The paper presents CONRep, a model-agnostic framework that enhances automated radiology report drafting by integrating conformal prediction for uncertainty quantification. This approach allows for both label-level calibration of binary predictions and sentence-level assessment of uncertainty in free-text impressions. The evaluation on chest X-ray datasets demonstrates that high-confidence outputs align more closely with radiologist annotations compared to low-confidence outputs, thereby improving the reliability of automated reporting systems.

Read abstract

Automated radiology report drafting (ARRD) using vision-language models (VLMs) has advanced rapidly, yet most systems lack explicit uncertainty estimates, limiting trust and safe clinical deployment. We propose CONRep, a model-agnostic framework that integrates conformal prediction (CP) to provide statistically grounded uncertainty quantification for VLM-generated radiology reports. CONRep operates at both the label level, by calibrating binary predictions for predefined findings, and the sentence level, by assessing uncertainty in free-text impressions via image-text semantic alignment. We evaluate CONRep using both generative and contrastive VLMs on public chest X-ray datasets. Across both settings, outputs classified as high confidence consistently show significantly higher agreement with radiologist annotations and ground-truth impressions than low-confidence outputs. By enabling calibrated confidence stratification without modifying underlying models, CONRep improves the transparency, reliability, and clinical usability of automated radiology reporting systems.

ARXIV Cancer: glioblastoma Method: variational autoencoder

Multiparameter Uncertainty Mapping in Quantitative Molecular MRI using a Physics-Structured Variational Autoencoder (PS-VAE)

Alex Finkelstein, Ron Moneta, Or Zohar, Michal Rivlin, Moritz Zaiss, Dinora Friedmann Morvinski, Or Perlman
Published 2026-02-03 09:46

This study presents a physics-structured variational autoencoder (PS-VAE) for the rapid extraction of voxelwise multi-parameter posterior distributions in quantitative molecular MRI. The method integrates a differentiable spin physics simulator with self-supervised learning, enabling principled uncertainty quantification. Validation was performed on various subjects, including those with glioblastoma, demonstrating significant acceleration in whole brain quantification while maintaining accuracy compared to traditional Bayesian analysis.

Read abstract

Quantitative imaging methods, such as magnetic resonance fingerprinting (MRF), aim to extract interpretable pathology biomarkers by estimating biophysical tissue parameters from signal evolutions. However, the pattern-matching algorithms or neural networks used in such inverse problems often lack principled uncertainty quantification, which limits the trustworthiness and transparency, required for clinical acceptance. Here, we describe a physics-structured variational autoencoder (PS-VAE) designed for rapid extraction of voxelwise multi-parameter posterior distributions. Our approach integrates a differentiable spin physics simulator with self-supervised learning, and provides a full covariance that captures the inter-parameter correlations of the latent biophysical space. The method was validated in a multi-proton pool chemical exchange saturation transfer (CEST) and semisolid magnetization transfer (MT) molecular MRF study, across in-vitro phantoms, tumor-bearing mice, healthy human volunteers, and a subject with glioblastoma. The resulting multi-parametric posteriors are in good agreement with those calculated using a brute-force Bayesian analysis, while providing an orders-of-magnitude acceleration in whole brain quantification. In addition, we demonstrate how monitoring the multi-parameter posterior dynamics across progressively acquired signals provides practical insights for protocol optimization and may facilitate real-time adaptive acquisition.

ARXIV Cancer: bone tumor Method: self-supervised learning

A generalizable large-scale foundation model for musculoskeletal radiographs

Shinn Kim, Soobin Lee, Kyoungseob Shin, Han-Soo Kim, Yongsung Kim, Minsu Kim, Juhong Nam, Somang Ko, Daeheon Kwon, Wook Huh, Ilkyu Han, Sunghoon Kwon
Published 2026-02-03 04:04

This paper presents SKELEX, a large-scale foundation model designed for analyzing musculoskeletal radiographs. Trained using self-supervised learning on a diverse dataset of 1.2 million images, SKELEX was evaluated across 12 diagnostic tasks, showing superior performance in fracture detection, osteoarthritis grading, and bone tumor classification. The model also features zero-shot abnormality localization and has been deployed as a web application for clinical use.

Read abstract

Artificial intelligence (AI) has shown promise in detecting and characterizing musculoskeletal diseases from radiographs. However, most existing models remain task-specific, annotation-dependent, and limited in generalizability across diseases and anatomical regions. Although a generalizable foundation model trained on large-scale musculoskeletal radiographs is clinically needed, publicly available datasets remain limited in size and lack sufficient diversity to enable training across a wide range of musculoskeletal conditions and anatomical sites. Here, we present SKELEX, a large-scale foundation model for musculoskeletal radiographs, trained using self-supervised learning on 1.2 million diverse, condition-rich images. The model was evaluated on 12 downstream diagnostic tasks and generally outperformed baselines in fracture detection, osteoarthritis grading, and bone tumor classification. Furthermore, SKELEX demonstrated zero-shot abnormality localization, producing error maps that identified pathologic regions without task-specific training. Building on this capability, we developed an interpretable, region-guided model for predicting bone tumors, which maintained robust performance on independent external datasets and was deployed as a publicly accessible web application. Overall, SKELEX provides a scalable, label-efficient, and generalizable AI framework for musculoskeletal imaging, establishing a foundation for both clinical translation and data-efficient research in musculoskeletal radiology.

ARXIV Cancer: unknown Method: vision-language model

TRACE: Temporal Radiology with Anatomical Change Explanation for Grounded X-ray Report Generation

OFM Riaz Rahman Aranya, Kevin Desai
Published 2026-02-03 01:03

The paper presents TRACE, a novel model designed for temporal comparison of chest X-rays, which facilitates the detection of disease progression and treatment response. TRACE integrates change classification and spatial localization to generate natural language descriptions of changes observed between prior and current X-rays. The model achieves over 90% accuracy in spatial grounding, highlighting the importance of jointly learning temporal comparison and spatial grounding for effective change detection.

Read abstract

Temporal comparison of chest X-rays is fundamental to clinical radiology, enabling detection of disease progression, treatment response, and new findings. While vision-language models have advanced single-image report generation and visual grounding, no existing method combines these capabilities for temporal change detection. We introduce Temporal Radiology with Anatomical Change Explanation (TRACE), the first model that jointly performs temporal comparison, change classification, and spatial localization. Given a prior and current chest X-ray, TRACE generates natural language descriptions of interval changes (worsened, improved, stable) while grounding each finding with bounding box coordinates. TRACE demonstrates effective spatial localization with over 90% grounding accuracy, establishing a foundation for this challenging new task. Our ablation study uncovers an emergent capability: change detection arises only when temporal comparison and spatial grounding are jointly learned, as neither alone enables meaningful change detection. This finding suggests that grounding provides a spatial attention mechanism essential for temporal reasoning.

ARXIV Cancer: prostate cancer Method: gated multi-head Transformer

Multi-head automated segmentation by incorporating detection head into the contextual layer neural network

Edwin Kys, Febian Febian
Published 2026-02-02 18:51

This study presents a gated multi-head Transformer architecture based on Swin U-Net for automated segmentation in radiotherapy. The model integrates inter-slice context and a parallel detection head to improve anatomical accuracy and reduce false positives in segmentation. Experiments demonstrate that this approach significantly outperforms traditional segmentation models, achieving a much lower mean Dice loss and enhancing the reliability of clinical workflows.

Read abstract

Deep learning based auto segmentation is increasingly used in radiotherapy, but conventional models often produce anatomically implausible false positives, or hallucinations, in slices lacking target structures. We propose a gated multi-head Transformer architecture based on Swin U-Net, augmented with inter-slice context integration and a parallel detection head, which jointly performs slice-level structure detection via a multi-layer perceptron and pixel-level segmentation through a context-enhanced stream. Detection outputs gate the segmentation predictions to suppress false positives in anatomically invalid slices, and training uses slice-wise Tversky loss to address class imbalance. Experiments on the Prostate-Anatomical-Edge-Cases dataset from The Cancer Imaging Archive demonstrate that the gated model substantially outperforms a non-gated segmentation-only baseline, achieving a mean Dice loss of $0.013 \pm 0.036$ versus $0.732 \pm 0.314$, with detection probabilities strongly correlated with anatomical presence, effectively eliminating spurious segmentations. In contrast, the non-gated model exhibited higher variability and persistent false positives across all slices. These results indicate that detection-based gating enhances robustness and anatomical plausibility in automated segmentation applications, reducing hallucinated predictions without compromising segmentation quality in valid slices, and offers a promising approach for improving the reliability of clinical radiotherapy auto-contouring workflows.

ARXIV Cancer: cholangiocarcinoma Method: Hybrid Spatial NMF

hSNMF: Hybrid Spatially Regularized NMF for Image-Derived Spatial Transcriptomics

Md Ishtyaq Mahmud, Veena Kochat, Suresh Satpati, Jagan Mohan Reddy Dwarampudi, Humaira Anzum, Kunal Rai, Tania Banerjee
Published 2026-02-02 18:40

This study presents Hybrid Spatially Regularized Nonnegative Matrix Factorization (hSNMF) for analyzing high-resolution spatial transcriptomics data from the Xenium platform. The method enhances representation learning and clustering of tumor microarray tissues by integrating spatial and transcriptomic information. Evaluations on a cholangiocarcinoma dataset demonstrate significant improvements in spatial compactness, cluster separability, and biological coherence compared to existing methods.

Read abstract

High-resolution spatial transcriptomics platforms, such as Xenium, generate single-cell images that capture both molecular and spatial context, but their extremely high dimensionality poses major challenges for representation learning and clustering. In this study, we analyze data from the Xenium platform, which captures high-resolution images of tumor microarray (TMA) tissues and converts them into cell-by-gene matrices suitable for computational analysis. We benchmark and extend nonnegative matrix factorization (NMF) for spatial transcriptomics by introducing two spatially regularized variants. First, we propose Spatial NMF (SNMF), a lightweight baseline that enforces local spatial smoothness by diffusing each cell's NMF factor vector over its spatial neighborhood. Second, we introduce Hybrid Spatial NMF (hSNMF), which performs spatially regularized NMF followed by Leiden clustering on a hybrid adjacency that integrates spatial proximity (via a contact-radius graph) and transcriptomic similarity through a tunable mixing parameter alpha. Evaluated on a cholangiocarcinoma dataset, SNMF and hSNMF achieve markedly improved spatial compactness (CHAOS < 0.004, Moran's I > 0.96), greater cluster separability (Silhouette > 0.12, DBI < 1.8), and higher biological coherence (CMC and enrichment) compared to other spatial baselines. Availability and implementation: https://github.com/ishtyaqmahmud/hSNMF

ARXIV Cancer: colorectal cancer Method: Spectral-normalized Neural Gaussian Process

Uncertainty-Aware Image Classification In Biomedical Imaging Using Spectral-normalized Neural Gaussian Processes

Uma Meleti, Jeffrey J. Nirschl
Published 2026-02-02 17:35

This study addresses the limitations of current deep learning models in digital pathology, particularly their overconfidence and poor calibration in out-of-distribution settings. The authors propose the Spectral-normalized Neural Gaussian Process (SNGP), which enhances uncertainty estimation and out-of-distribution detection by incorporating spectral normalization and a Gaussian process layer. The evaluation of SNGP across multiple biomedical classification tasks demonstrates its comparable performance while significantly improving uncertainty awareness.

Read abstract

Accurate histopathologic interpretation is key for clinical decision-making; however, current deep learning models for digital pathology are often overconfident and poorly calibrated in out-of-distribution (OOD) settings, which limit trust and clinical adoption. Safety-critical medical imaging workflows benefit from intrinsic uncertainty-aware properties that can accurately reject OOD input. We implement the Spectral-normalized Neural Gaussian Process (SNGP), a set of lightweight modifications that apply spectral normalization and replace the final dense layer with a Gaussian process layer to improve single-model uncertainty estimation and OOD detection. We evaluate SNGP vs. deterministic and MonteCarlo dropout on six datasets across three biomedical classification tasks: white blood cells, amyloid plaques, and colorectal histopathology. SNGP has comparable in-distribution performance while significantly improving uncertainty estimation and OOD detection. Thus, SNGP or related models offer a useful framework for uncertainty-aware classification in digital pathology, supporting safe deployment and building trust with pathologists.

ARXIV Cancer: unknown Method: vision transformer

Toxicity Assessment in Preclinical Histopathology via Class-Aware Mahalanobis Distance for Known and Novel Anomalies

Olga Graf, Dhrupal Patel, Peter Groß, Charlotte Lempp, Matthias Hein, Fabian Heinemann
Published 2026-02-02 14:07

This study presents an AI-based anomaly detection framework for histopathological whole-slide images of rodent livers to assess drug-induced toxicity. The method utilizes a pre-trained Vision Transformer fine-tuned with Low-Rank Adaptation to segment tissues and detect both known and rare pathologies. The framework achieves high accuracy in identifying anomalies, thereby enhancing the efficiency of preclinical drug development.

Read abstract

Drug-induced toxicity remains a leading cause of failure in preclinical development and early clinical trials. Detecting adverse effects at an early stage is critical to reduce attrition and accelerate the development of safe medicines. Histopathological evaluation remains the gold standard for toxicity assessment, but it relies heavily on expert pathologists, creating a bottleneck for large-scale screening. To address this challenge, we introduce an AI-based anomaly detection framework for histopathological whole-slide images (WSIs) in rodent livers from toxicology studies. The system identifies healthy tissue and known pathologies (anomalies) for which training data is available. In addition, it can detect rare pathologies without training data as out-of-distribution (OOD) findings. We generate a novel dataset of pixelwise annotations of healthy tissue and known pathologies and use this data to fine-tune a pre-trained Vision Transformer (DINOv2) via Low-Rank Adaptation (LoRA) in order to do tissue segmentation. Finally, we extract features for OOD detection using the Mahalanobis distance. To better account for class-dependent variability in histological data, we propose the use of class-specific thresholds. We optimize the thresholds using the mean of the false negative and false positive rates, resulting in only 0.16\% of pathological tissue classified as healthy and 0.35\% of healthy tissue classified as pathological. Applied to mouse liver WSIs with known toxicological findings, the framework accurately detects anomalies, including rare OOD morphologies. This work demonstrates the potential of AI-driven histopathology to support preclinical workflows, reduce late-stage failures, and improve efficiency in drug development.

Find the papers that actually matter