Research Papers

ARXIV Cancer: kidney cancer Method: latent diffusion models

Kidney Cancer Detection Using 3D-Based Latent Diffusion Models

Jen Dusseljee, Sarah de Boer, Alessa Hering
Published 2026-01-09 15:30

This study introduces a novel pipeline utilizing latent diffusion models for the detection of kidney anomalies in 3D contrast-enhanced abdominal CT images. The method integrates Denoising Diffusion Probabilistic Models, Denoising Diffusion Implicit Models, and Vector-Quantized Generative Adversarial Networks, operating directly on image volumes with weak supervision. Although the results do not yet reach the performance of supervised models, they highlight potential improvements in reconstruction fidelity and lesion localization.

Read abstract

In this work, we present a novel latent diffusion-based pipeline for 3D kidney anomaly detection on contrast-enhanced abdominal CT. The method combines Denoising Diffusion Probabilistic Models (DDPMs), Denoising Diffusion Implicit Models (DDIMs), and Vector-Quantized Generative Adversarial Networks (VQ-GANs). Unlike prior slice-wise approaches, our method operates directly on an image volume and leverages weak supervision with only case-level pseudo-labels. We benchmark our approach against state-of-the-art supervised segmentation and detection models. This study demonstrates the feasibility and promise of 3D latent diffusion for weakly supervised anomaly detection. While the current results do not yet match supervised baselines, they reveal key directions for improving reconstruction fidelity and lesion localization. Our findings provide an important step toward annotation-efficient, generative modeling of complex abdominal anatomy.

ARXIV Cancer: breast cancer Method: deep learning

Prompt-Free SAM-Based Multi-Task Framework for Breast Ultrasound Lesion Segmentation and Classification

Samuel E. Johnny, Bernes L. Atabonfack, Israel Alagbe, Assane Gueye
Published 2026-01-09 03:02

This study introduces a multi-task deep learning framework for the segmentation and classification of breast lesions in ultrasound imaging. The method utilizes embeddings from the Segment Anything Model (SAM) in a prompt-free, fully supervised manner, employing a lightweight convolutional head or a UNet-inspired decoder for segmentation. The framework demonstrates significant improvements in both lesion delineation and diagnostic accuracy, achieving a Dice Similarity Coefficient of 0.887 and an accuracy of 92.3 percent on the PRECISE 2025 dataset.

Read abstract

Accurate tumor segmentation and classification in breast ultrasound (BUS) imaging remain challenging due to low contrast, speckle noise, and diverse lesion morphology. This study presents a multi-task deep learning framework that jointly performs lesion segmentation and diagnostic classification using embeddings from the Segment Anything Model (SAM) vision encoder. Unlike prompt-based SAM variants, our approach employs a prompt-free, fully supervised adaptation where high-dimensional SAM features are decoded through either a lightweight convolutional head or a UNet-inspired decoder for pixel-wise segmentation. The classification branch is enhanced via mask-guided attention, allowing the model to focus on lesion-relevant features while suppressing background artifacts. Experiments on the PRECISE 2025 breast ultrasound dataset, split per class into 80 percent training and 20 percent testing, show that the proposed method achieves a Dice Similarity Coefficient (DSC) of 0.887 and an accuracy of 92.3 percent, ranking among the top entries on the PRECISE challenge leaderboard. These results demonstrate that SAM-based representations, when coupled with segmentation-guided learning, significantly improve both lesion delineation and diagnostic prediction in breast ultrasound imaging.

ARXIV Cancer: unknown Method: multi-task learning

Multi-task Cross-modal Learning for Chest X-ray Image Retrieval

Zhaohui Liang, Sivaramakrishnan Rajaraman, Niccolo Marini, Zhiyun Xue, Sameer Antani
Published 2026-01-08 21:44

This study presents a multi-task learning framework aimed at improving the retrieval of clinically relevant radiology reports using chest X-ray (CXR) image queries. The proposed method fine-tunes the BiomedCLIP model by incorporating a lightweight MLP projector head and a composite loss function. Experimental results indicate that the fine-tuned model outperforms both the pretrained BiomedCLIP and general-purpose CLIP models in terms of balanced performance across image-to-text and text-to-image retrieval tasks.

Read abstract

CLIP and BiomedCLIP are examples of vision-language foundation models and offer strong cross-modal embeddings; however, they are not optimized for fine-grained medical retrieval tasks, such as retrieving clinically relevant radiology reports using chest X-ray (CXR) image queries. To address this shortcoming, we propose a multi-task learning framework to fine-tune BiomedCLIP and evaluate improvements to CXR image-text retrieval. Using BiomedCLIP as the backbone, we incorporate a lightweight MLP projector head trained with a multi-task composite loss function that includes: (1) a binary cross-entropy loss to distinguish normal from abnormal CXR studies, (2) a supervised contrastive loss to reinforce intra-class consistency, and (3) a CLIP loss to maintain cross-modal alignment. Experimental results demonstrate that the fine-tuned model achieves more balanced and clinically meaningful performance across both image-to-text and text-to-image retrieval tasks compared to the pretrained BiomedCLIP and general-purpose CLIP models. Furthermore, t-SNE visualizations reveal clearer semantic clustering of normal and abnormal cases, demonstrating the model's enhanced diagnostic sensitivity. These findings highlight the value of domain-adaptive, multi-task learning for advancing cross-modal retrieval in biomedical applications.

ARXIV Cancer: breast cancer Method: ensemble model

Ensemble of radiomics and ConvNeXt for breast cancer diagnosis

Jorge Alberto Garza-Abdala, Gerardo Alejandro Fumagal-González, Beatriz A. Bosques-Palomo, Mario Alexis Monsivais Molina, Daly Avedano, Servando Cardona-Huerta, José Gerardo Tamez-Pena
Published 2026-01-08 20:54

This paper evaluates the effectiveness of ensemble techniques that integrate radiomics and deep learning for the early diagnosis of breast cancer from screening mammograms. Utilizing two independent datasets, the study demonstrates that the ensemble method outperforms individual models, achieving an area under the curve (AUC) of 0.87. The findings suggest that combining these approaches significantly enhances diagnostic accuracy.

Read abstract

Early diagnosis of breast cancer is crucial for improving survival rates. Radiomics and deep learning (DL) have shown significant potential in assisting radiologists with early cancer detection. This paper aims to critically assess the performance of radiomics, DL, and ensemble techniques in detecting cancer from screening mammograms. Two independent datasets were used: the RSNA 2023 Breast Cancer Detection Challenge (11,913 patients) and a Mexican cohort from the TecSalud dataset (19,400 patients). The ConvNeXtV1-small DL model was trained on the RSNA dataset and validated on the TecSalud dataset, while radiomics models were developed using the TecSalud dataset and validated with a leave-one-year-out approach. The ensemble method consistently combined and calibrated predictions using the same methodology. Results showed that the ensemble approach achieved the highest area under the curve (AUC) of 0.87, compared to 0.83 for ConvNeXtV1-small and 0.80 for radiomics. In conclusion, ensemble methods combining DL and radiomics predictions significantly enhance breast cancer diagnosis from mammograms.

ARXIV Cancer: general cancer Method: pathology vision foundation models

Atlas 2 -- Foundation models for clinical deployment

Maximilian Alber, Timo Milbich, Alexandra Carpen-Amarie, Stephan Tietz, Jonas Dippel, Lukas Muttenthaler, Beatriz Perez Cancer, Alessandro Benetti, Panos Korfiatis, Elias Eulig, Jérôme Lüscher, Jiasen Wu, Sayed Abid Hashimi, Gabriel Dernbach, Simon Schallenberg, Neelay Shah, Moritz Krügener, Aniruddh Jammoria, Jake Matras, Patrick Duffy, Matt Redlon, Philipp Jurmeister, David Horst, Lukas Ruff, Klaus-Robert Müller, Frederick Klauschen, Andrew Norgan
Published 2026-01-08 17:37

This paper introduces Atlas 2, a series of pathology vision foundation models designed to enhance computational pathology by addressing performance, robustness, and computational efficiency issues. The models were evaluated across eighty public benchmarks, demonstrating state-of-the-art performance. They were trained on a dataset of 5.5 million histopathology whole slide images from multiple medical institutions.

Read abstract

Pathology foundation models substantially advanced the possibilities in computational pathology -- yet tradeoffs in terms of performance, robustness, and computational requirements remained, which limited their clinical deployment. In this report, we present Atlas 2, Atlas 2-B, and Atlas 2-S, three pathology vision foundation models which bridge these shortcomings by showing state-of-the-art performance in prediction performance, robustness, and resource efficiency in a comprehensive evaluation across eighty public benchmarks. Our models were trained on the largest pathology foundation model dataset to date comprising 5.5 million histopathology whole slide images, collected from three medical institutions Charité - Universtätsmedizin Berlin, LMU Munich, and Mayo Clinic.

ARXIV Cancer: pancreatic cancer Method: dual-branch multi-scale UNet

DB-MSMUNet:Dual Branch Multi-scale Mamba UNet for Pancreatic CT Scans Segmentation

Qiu Guan, Zhiqiang Yang, Dezhang Ye, Yang Chen, Xinli Xu, Ying Tang
Published 2026-01-08 07:41

This paper presents DB-MSMUNet, a novel encoder-decoder architecture aimed at improving the segmentation of the pancreas and its lesions in CT scans, which is critical for diagnosing pancreatic cancer. The method incorporates a Multi-scale Mamba Module and a dual-decoder design to enhance boundary detection and detail preservation. Extensive experiments demonstrate that DB-MSMUNet outperforms existing methods in segmentation accuracy and robustness across multiple datasets.

Read abstract

Accurate segmentation of the pancreas and its lesions in CT scans is crucial for the precise diagnosis and treatment of pancreatic cancer. However, it remains a highly challenging task due to several factors such as low tissue contrast with surrounding organs, blurry anatomical boundaries, irregular organ shapes, and the small size of lesions. To tackle these issues, we propose DB-MSMUNet (Dual-Branch Multi-scale Mamba UNet), a novel encoder-decoder architecture designed specifically for robust pancreatic segmentation. The encoder is constructed using a Multi-scale Mamba Module (MSMM), which combines deformable convolutions and multi-scale state space modeling to enhance both global context modeling and local deformation adaptation. The network employs a dual-decoder design: the edge decoder introduces an Edge Enhancement Path (EEP) to explicitly capture boundary cues and refine fuzzy contours, while the area decoder incorporates a Multi-layer Decoder (MLD) to preserve fine-grained details and accurately reconstruct small lesions by leveraging multi-scale deep semantic features. Furthermore, Auxiliary Deep Supervision (ADS) heads are added at multiple scales to both decoders, providing more accurate gradient feedback and further enhancing the discriminative capability of multi-scale features. We conduct extensive experiments on three datasets: the NIH Pancreas dataset, the MSD dataset, and a clinical pancreatic tumor dataset provided by collaborating hospitals. DB-MSMUNet achieves Dice Similarity Coefficients of 89.47%, 87.59%, and 89.02%, respectively, outperforming most existing state-of-the-art methods in terms of segmentation accuracy, edge preservation, and robustness across different datasets. These results demonstrate the effectiveness and generalizability of the proposed method for real-world pancreatic CT segmentation tasks.

ARXIV Cancer: prostate cancer Method: Bayesian longitudinal mixture model

Identifying expanding TCR clonotypes with a longitudinal Bayesian mixture model and their associations with cancer patient prognosis, metastasis-directed therapy, and VJ gene enrichment

David Swanson, Alexander Sherry, Cara Haymaker, Alexandre Reuben, Chad Tang
Published 2026-01-08 03:04

This study investigates T-cell receptor (TCR) clonality to understand the immunologic response to cancer therapies. A novel Bayesian longitudinal mixture model is proposed to analyze TCR expansion or contraction without requiring pairwise comparisons within patients. The model is applied to prostate cancer patients undergoing metastasis-directed therapy, revealing significant clonal expansions associated with disease progression. Additionally, the analysis of receptor motifs indicates distinct biological characteristics of expanding clones.

Read abstract

Examination of T-cell receptor (TCR) clonality has become a way of understanding immunologic response to cancer and its interventions in recent years. An aspect of these analyses is determining which receptors expand or contract statistically significantly as a function of an exogenous perturbation such as therapeutic intervention. We characterize the commonly used Fisher's exact test approach for such analyses and propose an alternative formulation that does not necessitate pairwise, within-patient comparisons. We develop this flexible Bayesian longitudinal mixture model that accommodates variable length patient followup and handles missingness where present, not omitting data in estimation because of structural practicalities. Once clones are partitioned by the model into dynamic (expanding or contracting) and static categories, one can associate their counts or other characteristics with disease state, interventions, baseline biomarkers, and patient prognosis. We apply these developments to a cohort of prostate cancer patients who underwent randomized metastasis-directed therapy or not. Our analyses reveal a significant increase in clonal expansions among MDT patients and their association with later progressions both independent and within strata of MDT. Analysis of receptor motifs and VJ gene enrichment combinations using a high-dimensional penalized log-linear model we develop also suggests distinct biological characteristics of expanding clones, with and without inducement by MDT.

ARXIV Cancer: breast cancer Method: hierarchical visual token compression

TokenSeg: Efficient 3D Medical Image Segmentation via Hierarchical Visual Token Compression

Sen Zeng, Hong Zhou, Zheng Zhu, Yang Liu
Published 2026-01-08 02:32

The paper presents TokenSeg, a framework designed for efficient 3D medical image segmentation, addressing the computational challenges associated with voxel processing. It employs a multi-scale hierarchical encoder to extract candidate tokens and a boundary-aware tokenizer to focus on salient tokens near tumor boundaries. Extensive experiments on a breast DCE-MRI dataset show that TokenSeg achieves state-of-the-art performance while significantly reducing GPU memory usage and inference latency. The method also demonstrates strong generalization across different anatomical structures.

Read abstract

Three-dimensional medical image segmentation is a fundamental yet computationally demanding task due to the cubic growth of voxel processing and the redundant computation on homogeneous regions. To address these limitations, we propose \textbf{TokenSeg}, a boundary-aware sparse token representation framework for efficient 3D medical volume segmentation. Specifically, (1) we design a \emph{multi-scale hierarchical encoder} that extracts 400 candidate tokens across four resolution levels to capture both global anatomical context and fine boundary details; (2) we introduce a \emph{boundary-aware tokenizer} that combines VQ-VAE quantization with importance scoring to select 100 salient tokens, over 60\% of which lie near tumor boundaries; and (3) we develop a \emph{sparse-to-dense decoder} that reconstructs full-resolution masks through token reprojection, progressive upsampling, and skip connections. Extensive experiments on a 3D breast DCE-MRI dataset comprising 960 cases demonstrate that TokenSeg achieves state-of-the-art performance with 94.49\% Dice and 89.61\% IoU, while reducing GPU memory and inference latency by 64\% and 68\%, respectively. To verify the generalization capability, our evaluations on MSD cardiac and brain MRI benchmark datasets demonstrate that TokenSeg consistently delivers optimal performance across heterogeneous anatomical structures. These results highlight the effectiveness of anatomically informed sparse representation for accurate and efficient 3D medical image segmentation.

ARXIV Cancer: general cancer Method: Attention U-Net

A Unified Attention U-Net Framework for Cross-Modality Tumor Segmentation in MRI and CT

Nishan Rai, Pushpa R. Dahal
Published 2026-01-07 23:50

This study introduces a unified Attention U-Net architecture designed for cross-modality tumor segmentation using MRI and CT datasets. The model incorporates advanced preprocessing techniques and a specialized loss function to enhance performance across different imaging modalities. Results indicate that the unified model achieves competitive metrics, establishing a baseline for future research in this area.

Read abstract

This study presents a unified Attention U-Net architecture trained jointly on MRI (BraTS 2021) and CT (LIDC-IDRI) datasets to investigate the generalizability of a single model across diverse imaging modalities and anatomical sites. Our proposed pipeline incorporates modality-harmonized preprocessing, attention-gated skip connections, and a modality-aware Focal Tversky loss function. To the best of our knowledge, this study is among the first to evaluate a single Attention U-Net trained simultaneously on separate MRI (BraTS) and CT (LIDC-IDRI) tumor datasets, without relying on modality-specific encoders or domain adaptation. The unified model demonstrates competitive performance in terms of Dice coefficient, IoU, and AUC on both domains, thereby establishing a robust and reproducible baseline for future research in cross-modality tumor segmentation.

ARXIV Cancer: general cancer Method: large language model

Accommodation and Epistemic Vigilance: A Pragmatic Account of Why LLMs Fail to Challenge Harmful Beliefs

Myra Cheng, Robert D. Hawkins, Dan Jurafsky
Published 2026-01-07 22:47

This paper investigates the limitations of large language models (LLMs) in challenging harmful beliefs, particularly in medical contexts. It identifies that LLMs often accommodate users' assumptions due to social and linguistic factors, which affects their performance on safety benchmarks. The authors propose pragmatic interventions to enhance LLMs' ability to confront misinformation while maintaining low false-positive rates. The findings underscore the need for a pragmatic approach in evaluating and improving LLM safety.

Read abstract

Large language models (LLMs) frequently fail to challenge users' harmful beliefs in domains ranging from medical advice to social reasoning. We argue that these failures can be understood and addressed pragmatically as consequences of LLMs defaulting to accommodating users' assumptions and exhibiting insufficient epistemic vigilance. We show that social and linguistic factors known to influence accommodation in humans (at-issueness, linguistic encoding, and source reliability) similarly affect accommodation in LLMs, explaining performance differences across three safety benchmarks that test models' ability to challenge harmful beliefs, spanning misinformation (Cancer-Myth, SAGE-Eval) and sycophancy (ELEPHANT). We further show that simple pragmatic interventions, such as adding the phrase "wait a minute", significantly improve performance on these benchmarks while preserving low false-positive rates. Our results highlight the importance of considering pragmatics for evaluating LLM behavior and improving LLM safety.

Find the papers that actually matter