Research Papers

ARXIV Cancer: skin cancer Method: vision-language foundation model

A Vision-Language Foundation Model for Zero-shot Clinical Collaboration and Automated Concept Discovery in Dermatology

Siyuan Yan, Xieji Li, Dan Mo, Philipp Tschandl, Yiwen Jiang, Zhonghua Wang, Ming Hu, Lie Ju, Cristina Vico-Alonso, Yizhen Zheng, Jiahe Liu, Juexiao Zhou, Camilla Chello, Jen G. Cheung, Julien Anriot, Luc Thomas, Clare Primiero, Gin Tan, Aik Beng Ng, Simon See, Xiaoying Tang, Albert Ip, Xiaoyang Liao, Adrian Bowling, Martin Haskett, Shuang Zhao, Monika Janda, H. Peter Soyer, Victoria Mar, Harald Kittler, Zongyuan Ge
Published 2026-02-11 08:14

The study presents DermFM-Zero, a vision-language foundation model designed for dermatology, which was trained on over 4 million multimodal data points using masked latent modeling and contrastive learning. The model was evaluated across 20 benchmarks, demonstrating state-of-the-art performance in zero-shot diagnosis and multimodal retrieval without the need for task-specific adaptation. In clinical settings, AI assistance significantly improved diagnostic accuracy for skin conditions, outperforming both general practitioners and board-certified dermatologists. The model's latent representations were found to be interpretable, enhancing robustness and decision support in clinical workflows.

Read abstract

Medical foundation models have shown promise in controlled benchmarks, yet widespread deployment remains hindered by reliance on task-specific fine-tuning. Here, we introduce DermFM-Zero, a dermatology vision-language foundation model trained via masked latent modelling and contrastive learning on over 4 million multimodal data points. We evaluated DermFM-Zero across 20 benchmarks spanning zero-shot diagnosis and multimodal retrieval, achieving state-of-the-art performance without task-specific adaptation. We further evaluated its zero-shot capabilities in three multinational reader studies involving over 1,100 clinicians. In primary care settings, AI assistance enabled general practitioners to nearly double their differential diagnostic accuracy across 98 skin conditions. In specialist settings, the model significantly outperformed board-certified dermatologists in multimodal skin cancer assessment. In collaborative workflows, AI assistance enabled non-experts to surpass unassisted experts while improving management appropriateness. Finally, we show that DermFM-Zero's latent representations are interpretable: sparse autoencoders unsupervisedly disentangle clinically meaningful concepts that outperform predefined-vocabulary approaches and enable targeted suppression of artifact-induced biases, enhancing robustness without retraining. These findings demonstrate that a foundation model can provide effective, safe, and transparent zero-shot clinical decision support.

ARXIV Cancer: breast cancer Method: Timing-Transformer

Time-to-Event Transformer to Capture Timing Attention of Events in EHR Time Series

Jia Li, Yu Hou, Rui Zhang
Published 2026-02-11 00:13

This study introduces LITT, a novel Timing-Transformer architecture designed to enhance the evaluation of event timing and ordering in electronic health record (EHR) time series data. The method focuses on capturing personalized sequential events to improve predictions related to patient outcomes, specifically cardiotoxicity in breast cancer patients. The effectiveness of LITT is demonstrated through its superior performance compared to existing survival analysis methods on real-world data.

Read abstract

Automatically discovering personalized sequential events from large-scale time-series data is crucial for enabling precision medicine in clinical research, yet it remains a formidable challenge even for contemporary AI models. For example, while transformers capture rich associations, they are mostly agnostic to event timing and ordering, thereby bypassing potential causal reasoning. Intuitively, we need a method capable of evaluating the "degree of alignment" among patient-specific trajectories and identifying their shared patterns, i.e., the significant events in a consistent sequence. This necessitates treating timing as a true \emph{computable} dimension, allowing models to assign ``relative timestamps'' to candidate events beyond their observed physical times. In this work, we introduce LITT, a novel Timing-Transformer architecture that enables temporary alignment of sequential events on a virtual ``relative timeline'', thereby enabling \emph{event-timing-focused attention} and personalized interpretations of clinical trajectories. Its interpretability and effectiveness are validated on real-world longitudinal EHR data from 3,276 breast cancer patients to predict the onset timing of cardiotoxicity-induced heart disease. Furthermore, LITT outperforms both the benchmark and state-of-the-art survival analysis methods on public datasets, positioning it as a significant step forward for precision medicine in clinical AI.

ARXIV Cancer: breast cancer Method: transformer

Capture Timing-Attention of Events in Clinical Time Series

Jia Li, Yu Hou, Rui Zhang
Published 2026-02-11 00:13

This study introduces LITT (Individual-Level Time Transformation), a novel architecture designed to enhance the evaluation of event timing in clinical time series data. By enabling event-timing-focused attention, LITT allows for personalized interpretations of patient trajectories. The method was validated using longitudinal electronic health record data from breast cancer patients, demonstrating superior performance compared to existing survival analysis methods.

Read abstract

Automatically discovering personalized sequential events from large-scale time-series data is crucial for enabling precision medicine in clinical research, yet it remains a formidable challenge even for contemporary AI models. For example, while transformers capture rich associations, they are mostly agnostic to event timing and ordering, thereby bypassing potential causal reasoning. Intuitively, we need a method capable of evaluating the "degree of alignment" among patient-specific trajectories and identifying their shared patterns, i.e., the significant events in a consistent sequence. This necessitates treating timing as a true \emph{computable} dimension, allowing models to assign ``relative timestamps'' to candidate events beyond their observed physical times. In this work, we introduce LITT (Individual-Level Time Transformation), a novel architecture that enables temporary alignment of sequential events on a virtual ``relative timeline'', thereby enabling \emph{event-timing-focused attention} and personalized interpretations of clinical trajectories. Its interpretability and effectiveness are validated on real-world longitudinal EHR data from 3,276 breast cancer patients to predict the onset timing of cardiotoxicity-induced heart disease. Furthermore, LITT outperforms both the benchmark and state-of-the-art survival analysis methods on public datasets, positioning it as a significant step forward for precision medicine in clinical AI.

ARXIV Cancer: unknown Method: deep learning

Comp2Comp: Open-Source Software with FDA-Cleared Artificial Intelligence Algorithms for Computed Tomography Image Analysis

Adrit Rao, Malte Jensen, Andrea T. Fisher, Louis Blankemeier, Pauline Berens, Arash Fereydooni, Seth Lirette, Eren Alkan, Felipe C. Kitamura, Juan M. Zambrano Chaves, Eduardo Reis, Arjun Desai, Marc H. Willis, Jason Hom, Andrew Johnston, Leon Lenchik, Robert D. Boutin, Eduardo M. J. M. Farina, Augusto S. Serpa, Marcelo S. Takahashi, Jordan Perchik, Steven A. Rothenberg, Jamie L. Schroeder, Ross Filice, Leonardo K. Bittencourt, Hari Trivedi, Marly van Assen, John Mongan, Kimberly Kallianos, Oliver Aalami, Akshay S. Chaudhari
Published 2026-02-10 23:30

The paper presents Comp2Comp, an open-source software package featuring FDA-cleared deep learning algorithms for analyzing computed tomography images. It focuses on two specific applications: Abdominal Aortic Quantification (AAQ) for assessing aneurysm size and Bone Mineral Density (BMD) estimation for osteoporosis risk. The algorithms were validated against ground-truth measurements from patient scans, demonstrating sufficient accuracy for clinical use.

Read abstract

Artificial intelligence allows automatic extraction of imaging biomarkers from already-acquired radiologic images. This paradigm of opportunistic imaging adds value to medical imaging without additional imaging costs or patient radiation exposure. However, many open-source image analysis solutions lack rigorous validation while commercial solutions lack transparency, leading to unexpected failures when deployed. Here, we report development and validation for two of the first fully open-sourced, FDA-510(k)-cleared deep learning pipelines to mitigate both challenges: Abdominal Aortic Quantification (AAQ) and Bone Mineral Density (BMD) estimation are both offered within the Comp2Comp package for opportunistic analysis of computed tomography scans. AAQ segments the abdominal aorta to assess aneurysm size; BMD segments vertebral bodies to estimate trabecular bone density and osteoporosis risk. AAQ-derived maximal aortic diameters were compared against radiologist ground-truth measurements on 258 patient scans enriched for abdominal aortic aneurysms from four external institutions. BMD binary classifications (low vs. normal bone density) were compared against concurrent DXA scan ground truths obtained on 371 patient scans from four external institutions. AAQ had an overall mean absolute error of 1.57 mm (95% CI 1.38-1.80 mm). BMD had a sensitivity of 81.0% (95% CI 74.0-86.8%) and specificity of 78.4% (95% CI 72.3-83.7%). Comp2Comp AAQ and BMD demonstrated sufficient accuracy for clinical use. Open-sourcing these algorithms improves transparency of typically opaque FDA clearance processes, allows hospitals to test the algorithms before cumbersome clinical pilots, and provides researchers with best-in-class methods.

ARXIV Cancer: general cancer Method: Multi-Instance Learning

Efficient Special Stain Classification

Oskar Thaeter, Christian Grashei, Anette Haas, Elisa Schmoeckel, Han Li, Peter J. Schüffler
Published 2026-02-10 17:15

This study evaluates two automated classification approaches for special stains used in histopathology, focusing on whole slide images. The Multi-Instance Learning (MIL) pipeline demonstrated superior performance on internal test data, while a lightweight thumbnail-based method showed better generalization on external data and significantly increased throughput. The findings suggest that thumbnail-based classification is a scalable solution for enhancing quality control in digital pathology.

Read abstract

Stains are essential in histopathology to visualize specific tissue characteristics, with Haematoxylin and Eosin (H&E) serving as the clinical standard. However, pathologists frequently utilize a variety of special stains for the diagnosis of specific morphologies. Maintaining accurate metadata for these slides is critical for quality control in clinical archives and for the integrity of computational pathology datasets. In this work, we compare two approaches for automated classification of stains using whole slide images, covering the 14 most commonly used special stains in our institute alongside standard and frozen-section H&E. We evaluate a Multi-Instance Learning (MIL) pipeline and a proposed lightweight thumbnail-based approach. On internal test data, MIL achieved the highest performance (macro F1: 0.941 for 16 classes; 0.969 for 14 merged classes), while the thumbnail approach remained competitive (0.897 and 0.953, respectively). On external TCGA data, the thumbnail model generalized best (weighted F1: 0.843 vs. 0.807 for MIL). The thumbnail approach also increased throughput by two orders of magnitude (5.635 vs. 0.018 slides/s for MIL with all patches). We conclude that thumbnail-based classification provides a scalable and robust solution for routine visual quality control in digital pathology workflows.

ARXIV Cancer: bladder cancer Method: Hybrid Attention-Convolution

Bladder Vessel Segmentation using a Hybrid Attention-Convolution Framework

Franziska Krauß, Matthias Ege, Zoltan Lovasz, Albrecht Bartz-Schmidt, Igor Tsaur, Oliver Sawodny, Carina Veil
Published 2026-02-10 16:34

This study presents a Hybrid Attention-Convolution (HAC) architecture designed for the segmentation of blood vessels in urinary bladder cancer surveillance. The method combines Transformers and convolutional neural networks to effectively capture vessel topology and refine segmentation details. Evaluated on the BlaVeS dataset, the HAC architecture demonstrates high accuracy and precision, outperforming existing medical segmentation models, particularly in managing artifacts and dynamic changes during endoscopic procedures.

Read abstract

Urinary bladder cancer surveillance requires tracking tumor sites across repeated interventions, yet the deformable and hollow bladder lacks stable landmarks for orientation. While blood vessels visible during endoscopy offer a patient-specific "vascular fingerprint" for navigation, automated segmentation is challenged by imperfect endoscopic data, including sparse labels, artifacts like bubbles or variable lighting, continuous deformation, and mucosal folds that mimic vessels. State-of-the-art vessel segmentation methods often fail to address these domain-specific complexities. We introduce a Hybrid Attention-Convolution (HAC) architecture that combines Transformers to capture global vessel topology prior with a CNN that learns a residual refinement map to precisely recover thin-vessel details. To prioritize structural connectivity, the Transformer is trained on optimized ground truth data that exclude short and terminal branches. Furthermore, to address data scarcity, we employ a physics-aware pretraining, that is a self-supervised strategy using clinically grounded augmentations on unlabeled data. Evaluated on the BlaVeS dataset, consisting of endoscopic video frames, our approach achieves high accuracy (0.94) and superior precision (0.61) and clDice (0.66) compared to state-of-the-art medical segmentation models. Crucially, our method successfully suppresses false positives from mucosal folds that dynamically appear and vanish as the bladder fills and empties during surgery. Hence, HAC provides the reliable structural stability required for clinical navigation.

ARXIV Cancer: general cancer Method: unbalanced optimal transport

Unbalanced optimal transport for robust longitudinal lesion evolution with registration-aware and appearance-guided priors

Melika Qahqaie, Dominik Neumann, Tobias Heimann, Andreas Maier, Veronika A. Zimmer
Published 2026-02-10 16:06

This study presents a novel registration-aware matcher based on unbalanced optimal transport (UOT) to evaluate lesion evolution in longitudinal CT scans of cancer patients. The method addresses challenges in establishing reliable lesion correspondence over time, particularly when lesions change in appearance or number. The proposed approach demonstrates improved precision and recall in edge detection and lesion-state recognition compared to traditional distance-based methods.

Read abstract

Evaluating lesion evolution in longitudinal CT scans of can cer patients is essential for assessing treatment response, yet establishing reliable lesion correspondence across time remains challenging. Standard bipartite matchers, which rely on geometric proximity, struggle when lesions appear, disappear, merge, or split. We propose a registration-aware matcher based on unbalanced optimal transport (UOT) that accommodates unequal lesion mass and adapts priors to patient-level tumor-load changes. Our transport cost blends (i) size-normalized geometry, (ii) local registration trust from the deformation-field Jacobian, and (iii) optional patch-level appearance consistency. The resulting transport plan is sparsified by relative pruning, yielding one-to-one matches as well as new, disappearing, merging, and splitting lesions without retraining or heuristic rules. On longitudinal CT data, our approach achieves consistently higher edge-detection precision and recall, improved lesion-state recall, and superior lesion-graph component F1 scores versus distance-only baselines.

ARXIV Cancer: general cancer Method: weakly supervised contrastive learning

Weakly Supervised Contrastive Learning for Histopathology Patch Embeddings

Bodong Zhang, Xiwen Li, Hamid Manoochehri, Xiaoya Tang, Deepika Sirohi, Beatrice S. Knudsen, Tolga Tasdizen
Published 2026-02-10 07:17

This study presents a novel framework called weakly supervised contrastive learning (WeakSupCon) for enhancing feature representation in histopathology image analysis. The method addresses the challenge of limited training labels by utilizing bag-level labels and effectively separates patches with different labels in the feature space. Experimental results indicate that WeakSupCon outperforms traditional self-supervised contrastive learning methods in multiple datasets.

Read abstract

Digital histopathology whole slide images (WSIs) provide gigapixel-scale high-resolution images that are highly useful for disease diagnosis. However, digital histopathology image analysis faces significant challenges due to the limited training labels, since manually annotating specific regions or small patches cropped from large WSIs requires substantial time and effort. Weakly supervised multiple instance learning (MIL) offers a practical and efficient solution by requiring only bag-level (slide-level) labels, while each bag typically contains multiple instances (patches). Most MIL methods directly use frozen image patch features generated by various image encoders as inputs and primarily focus on feature aggregation. However, feature representation learning for encoder pretraining in MIL settings has largely been neglected. In our work, we propose a novel feature representation learning framework called weakly supervised contrastive learning (WeakSupCon) that incorporates bag-level label information during training. Our method does not rely on instance-level pseudo-labeling, yet it effectively separates patches with different labels in the feature space. Experimental results demonstrate that the image features generated by our WeakSupCon method lead to improved downstream MIL performance compared to self-supervised contrastive learning approaches in three datasets. Our related code is available at github.com/BzhangURU/Paper_WeakSupCon_for_MIL

ARXIV Cancer: brain tumor Method: deep learning

Impact of domain adaptation in deep learning for medical image classifications

Yihang Wu, Ahmad Chaddad
Published 2026-02-10 02:59

This study investigates the impact of domain adaptation (DA) in deep learning for medical image classification across various datasets. The authors demonstrate the application of 10 deep learning models, particularly focusing on ResNet34, to enhance model performance in brain tumor classification and skin cancer classification. Results indicate that DA can significantly improve accuracy and model interpretability, particularly in the context of noisy data and multi-modality scenarios.

Read abstract

Domain adaptation (DA) is a quickly expanding area in machine learning that involves adjusting a model trained in one domain to perform well in another domain. While there have been notable progressions, the fundamental concept of numerous DA methodologies has persisted: aligning the data from various domains into a shared feature space. In this space, knowledge acquired from labeled source data can improve the model training on target data that lacks sufficient labels. In this study, we demonstrate the use of 10 deep learning models to simulate common DA techniques and explore their application in four medical image datasets. We have considered various situations such as multi-modality, noisy data, federated learning (FL), interpretability analysis, and classifier calibration. The experimental results indicate that using DA with ResNet34 in a brain tumor (BT) data set results in an enhancement of 4.7\% in model performance. Similarly, the use of DA can reduce the impact of Gaussian noise, as it provides $\sim 3\%$ accuracy increase using ResNet34 on a BT dataset. Furthermore, simply introducing DA into FL framework shows limited potential (e.g., $\sim 0.3\%$ increase in performance) for skin cancer classification. In addition, the DA method can improve the interpretability of the models using the gradcam++ technique, which offers clinical values. Calibration analysis also demonstrates that using DA provides a lower expected calibration error (ECE) value $\sim 2\%$ compared to CNN alone on a multi-modality dataset.

ARXIV Cancer: unknown Method: unknown

Image Quality in the Era of Artificial Intelligence

Jana G. Delfino, Jason L. Granstedt, Frank W. Samuelson, Robert Ochs, Krishna Juluru
Published 2026-02-10 02:52

This paper discusses the rapid deployment of artificial intelligence (AI) in radiology, focusing on its capabilities for image reconstruction and enhancement. It highlights the improvements in image quality, such as sharper and more detailed visuals, while also addressing the potential new failure modes introduced by AI. The authors emphasize the importance of understanding these limitations to ensure safe and effective use of AI technologies in clinical settings.

Read abstract

Artificial intelligence (AI) is being deployed within radiology at a rapid pace. AI has proven an excellent tool for reconstructing and enhancing images that appear sharper, smoother, and more detailed, can be acquired more quickly, and allowing clinicians to review them more rapidly. However, incorporation of AI also introduces new failure modes and can exacerbate the disconnect between perceived quality of an image and information content of that image. Understanding the limitations of AI-enabled image reconstruction and enhancement is critical for safe and effective use of the technology. Hence, the purpose of this communication is to bring awareness to limitations when AI is used to reconstruct or enhance a radiological image, with the goal of enabling users to reap benefits of the technology while minimizing risks.

Find the papers that actually matter