Research Papers

ARXIV Cancer: breast cancer Method: federated random survival forests

FederatedRSF : Federated Random Survival Forests for Partially Overlapping Medical Data

Maryam Moradpour, Jonas Harriehausen, Amirreza Aleyasin, Lion Philipp Wolf, Youngjun Park, Anne-Christin Hauschild
Published 2026-05-21 18:32

The paper introduces FederatedRSF, a Python package designed for federated random survival forests to address the challenges of multi-center survival prediction while maintaining patient privacy. It aggregates locally trained survival trees from different institutions, allowing for inference with partially overlapping feature sets without sharing raw data. The evaluation on the GBSG2 breast cancer cohort shows that the federated model achieves performance comparable to centralized training, even under conditions of feature heterogeneity.

Read abstract

Multi-center survival prediction can improve robustness and generalizability, yet privacy regulations and institutional governance often prevent pooling patient-level clinical and genomic data across institutions. In practice, deployment is further complicated by feature-space heterogeneity, in which sites collect different covariates or use different sequencing panels, resulting in only partially overlapping feature sets. We present FederatedRSF, a Python package that implements federated random survival forests, aggregating locally trained survival trees and redistributing only feature-compatible trees to each site, enabling inference with partial overlap without sharing raw data. We evaluate FederatedRSF on the GBSG2 breast cancer cohort distributed with the scikit-survival package, simulating feature heterogeneity across clients by withholding subsets of features, and assessing discrimination using Harrell's concordance index (C-Index) under repeated cross-validation and site-splits. The results demonstrated that the federated model can achieve performance comparable to that of the centralized training setting.

ARXIV Cancer: breast cancer Method: genetic algorithms

ROI Extraction in Thermographic Breast Images Using Genetic Algorithms

LC Mendes, EO Rodrigues, Sandro C Izidoro, Aura Conci, Panos Liatsis
Published 2026-05-21 16:08

This study introduces a method for extracting the breast region from thermographic images using Genetic Algorithms (GA). The approach utilizes color information and a fitness function based on cardioids to enhance the accuracy of cancer detection. The method demonstrated successful ROI extraction in 52 out of 58 images, highlighting its potential for improving diagnostic processes.

Read abstract

This work proposes the use of Genetic Algorithms (GA) to identify the area of the breast from the background in thermographic breast images. The proposed method uses color information, a fitness function based on cardioids, and GA. This is the first work in the literature to propose a Region of Interest (ROI) extraction based on GA and cariods. ROI extraction can improve the accuracy of cancer detection and assist with the standardization of acquisition protocols. The method is able to successfully separate the breast region in 52 out of 58 images, while being fully automatic, and not requiring manual selection of seed points.

ARXIV Cancer: general cancer Method: gradient descent

The Double Dilemma in Multi-Task Radiology Report Generation: A Gradient Dynamics Analysis and Solution

Erjian Zhang, Yatong Hao, Liejun Wang, Zhiqing Guo
Published 2026-05-21 15:40

This paper addresses the limitations of existing multi-task learning approaches in automatic radiology report generation (RRG) by analyzing the gradient dynamics involved in linear scalarization strategies. The authors propose a new optimizer, Conflict-Averse Magnitude-Enhanced Gradient Descent (CAME-Grad), which enhances the performance of RRG methods by balancing clinical supervision and report generation requirements. Experimental results demonstrate significant improvements in clinical efficacy across multiple RRG methods.

Read abstract

While multi-task learning based automatic radiology report generation (RRG) is widely adopted to ensure clinical consistency, most focus on architectural designs yet remain limited to coarse linear scalarization strategies. These strategies cannot effectively balance the hard constraints of discriminative clinical supervision with the smoothness requirements of report generation. To address these problems, we analyze the failure mechanism of linear scalarization from the perspective of gradient dynamics, utilizing the stochastic differential equation (SDE) framework to characterize it as a "Double Dilemma" of drift term deviation and diffusion term decay. Based on this, we propose a backbone-agnostic optimizer named Conflict-Averse Magnitude-Enhanced Gradient Descent (CAME-Grad). Through conflict-averse direction rectification and magnitude-enhanced energy injection, the algorithm not only ensures geometric validity, but also avoids local optimal solutions. Then, the adaptive gradient fusion mechanism is used to establish a dynamic balance between the theoretical optimal direction and the task-specific inductive bias. Experiments show that as a universal plug-and-play optimizer, CAME-Grad brings substantial and consistent improvements across eight diverse RRG methods, elevating overall clinical efficacy performance by an average of 2.3\% on MIMIC-CXR and 1.9\% on IU X-Ray. Our code is available at https://github.com/vpsg-research/CAME-Grad.

ARXIV Cancer: general cancer Method: graph-guided lesion grounding

GLeVE: Graph-Guided Lesion Grounding with Proposal Verification in 3D CT

Shuo Jiang, Yuhao Hong, Chunbo Jiang, Weihong Chen, Huangwei Chen, Shenghao Zhu, Beining Wu, Mingxuan Liu, Zhu Zhu, Feiwei Qin, Min Tan, Yifei Chen
Published 2026-05-21 15:30

This paper presents GLeVE, a graph-guided lesion grounding framework designed to improve the localization of lesions in 3D CT volumes based on radiology report descriptions. The method incorporates anatomical prior verification and utilizes octree-based autoregressive refinement to enhance accuracy. Experimental results on the AbdomenAtlas 3.0 dataset show that GLeVE outperforms traditional multimodal models and report-supervised approaches in segmentation and localization tasks.

Read abstract

Grounding radiology report descriptions to 3D CT volumes is essential for verifiable clinical interpretation, yet remains challenging due to the semantic-spatial gap between free-text narratives and volumetric anatomy. Existing report-assisted and vision-language grounding methods typically rely on phrase-level alignment or dense pixel supervision, resulting in limited lesion-wise correspondence and suboptimal localization accuracy. We propose GLeVE, a graph-guided lesion grounding framework with anatomical prior verification and octree-based autoregressive refinement. GLeVE treats each lesion description as an atomic semantic unit and encodes organ attribution, attributes, and inter-lesion relations through relation-aware graph reasoning to produce discriminative lesion-wise queries. Anatomy-aware proposal generation with region-level verification enforces one-to-one text-lesion alignment, while hierarchical octree refinement progressively improves boundary delineation. Experiments on AbdomenAtlas 3.0 demonstrate consistent gains over classical multimodal foundation models and report-supervised baselines in both segmentation accuracy and lesion-level localization.

ARXIV Cancer: brain tumor Method: three-dimensional residual encoder-decoder network

SegGuidedNet: Sub-Region-Aware Attention Supervision for Interpretable Brain Tumor Segmentation

Hasaan Maqsood, Saif Ur Rehman Khan, Sebastian Vollmer, Andreas Dengel, Muhammad Nabeel Asim
Published 2026-05-21 14:50

This study presents SegGuidedNet, a three-dimensional residual encoder-decoder network designed for the accurate segmentation of brain tumor sub-regions from multi-parametric MRI. The model introduces a SegAttentionGate module that enhances the decoder's ability to produce discriminative attention maps for tumor sub-regions, achieving high accuracy in segmentation. Evaluated on BraTS2021 and BraTS2023 datasets, SegGuidedNet demonstrates competitive performance compared to existing models while maintaining interpretability and low parameter overhead.

Read abstract

Accurate segmentation of brain tumour sub-regions from multi-parametric MRI is critical for treatment planning yet remains challenging due to morphological variability, class imbalance, and overlapping appearances of tumour regions across imaging sequences. We propose SegGuidedNet, a three-dimensional residual encoder--decoder network introducing a novel SegAttentionGate module that explicitly supervises the decoder to produce spatially discriminative attention maps for each tumour sub-region necrotic core, peritumoral oedema, and enhancing tumour via a lightweight auxiliary loss, adding less than 0.2% parameter overhead. This sub-region supervision maintains decoder discriminability between visually ambiguous classes while providing free-of-cost spatial interpretability at inference without any post-hoc explanation method. Evaluated independently on BraTS2021 and BraTS2023 GLI across 251 held-out subjects each, SegGuidedNet achieves mean Dice of 0.905 (ET= 0.873, TC=0.906, WT=0.935) and 0.897 (ET=0.859, TC=0.902, WT=0.931) respectively, surpassing ensemble-based nnU-Net and HNF-Netv2 as a single model and approaching Swin UNETR a 10-model ensemble within 2--4 Dice points at a fraction of the inference cost. The consistency of results across two benchmark editions further confirms the generalisability of the proposed approach, offering competitive accuracy with built-in interpretability in a lightweight, clinically practical framework.

ARXIV Cancer: general cancer Method: deep learning

Cell Phantom Video Generation in Elliptical Fourier Descriptor Domain

Francesco Benedetto, Roberto Basla, Luca Magri, Giacomo Boracchi
Published 2026-05-21 14:43

This paper presents a novel framework for generating synthetic videos of cell phantoms using Elliptical Fourier Descriptors (EFDs). The approach aims to address the challenge of limited annotated data for training Deep Neural Networks in biomedical applications, particularly in cell tracking. By modeling the temporal evolution of cell morphology in EFD space, the method produces biologically plausible phantom videos that can aid in the creation of annotated datasets, thereby reducing the annotation effort required.

Read abstract

Training Deep Neural Networks for tracking individual cells in biomedical videos requires a large amount of annotated data. The annotation of videos for cell tracking is very time consuming and often requires domain expertise; this explains the limited availability of public annotated data to address important medical problems like tissue repair or cancer treatment. Generating synthetic videos along with their Ground Truth annotations is a promising solution that relies, as a foundational first step, on the synthesis of single cell annotations (or phantoms). Phantoms need to be time consistent, as they have to replicate biological processes that are specific to the cell types. In this work, we propose a novel framework for generating videos of cell phantoms in the Elliptical Fourier Descriptors (EFDs) domain, a compact and geometrically interpretable representation for 2D closed contours. We represent the cell phantom evolution as a multivariate time series of EFD coefficients, introducing a strong prior for cell morphology and enabling the efficient generation of sequences that evolve coherently in time. Our experimental validation proves that modelling the temporal evolution in EFD space enables the generation of biologically plausible phantom videos. Our method can be used in generative pipelines for synthesizing annotated data for cell tracking, thus strongly mitigating the annotation effort for creating new datasets. Our code is available for download here: https://github.com/FrancescoBenedetto99/efd-cell-video-gen.

ARXIV Cancer: brain tumor Method: diffusion-based imputation

D3Seg: Dependency-Aware Diffusion for Brain Tumor Segmentation with Missing Modalities

Danish Ali, Ajmal Mian, Naveed Akhtar, Ghulam Mubashar Hassan
Published 2026-05-21 09:55

This paper presents D3Seg, a novel segmentation model aimed at improving brain tumor segmentation from multiparametric MRI, particularly in scenarios where certain modalities are missing. The model employs Multi-hop Modality Graph Fusion and a diffusion-based imputation mechanism to address the challenges posed by incomplete MRI data. Evaluation on the BraTS 2023 dataset shows that D3Seg achieves significant improvements in segmentation performance, particularly in enhancing tumor delineation, while maintaining computational efficiency.

Read abstract

Accurate brain tumor segmentation using multiparametric MRI is critical for effective treatment planning. However, in clinical settings, complete acquisition of all MRI sequences is not always possible. The absence of certain MRI modalities results in substantial performance degradation in existing segmentation methods, which typically rely on naive feature concatenation or direct fusion strategies. To address this limitation, we propose a novel segmentation model D3Seg which is designed to maintain stable performance under missing-modality settings. D3Seg introduces Multi-hop Modality Graph Fusion (MMGF) to model higher order inter-modality dependencies, a lightweight diffusion-based imputation mechanism to compensate for missing T1ce representations in latent space, and probability-space decision refinement to mitigate dominant class overconfidence and improve delineation of underrepresented tumor subregions. Extensive evaluation on BraTS 2023 dataset demonstrates that our D3Seg model consistently improves segmentation performance under missing modality configurations. The proposed model achieves approximately 1.5-2.0% Dice improvement on enhancing tumor (ET) and around 1.0% on tumor core (TC) across multiple missing modality configurations compared to the current state-of-the-art model, while maintaining computational efficiency.

ARXIV Cancer: unknown Method: unknown

Virtual 3D H&E Staining from Phase-contrast Back-illumination Interference Tomography

Anthony Song, Boyan Zhou, Mayank Golhar, Marisa Morakis, Alex Baras, Nicholas Durr
Published 2026-05-21 04:58

This paper presents HistoBIT3D, a novel framework for virtual 3D H&E staining using Back-illumination Interference Tomography (BIT). The method addresses challenges in translating BIT volumes into clinically interpretable H&E images by leveraging a voxel-wise paired dataset and advanced techniques for enhancing structural fidelity. The results demonstrate significant improvements in 3D nuclei segmentation accuracy and boundary preservation, establishing a scalable pipeline for volumetric computational histopathology.

Read abstract

Three-dimensional (3D) histopathology of unprocessed tissues has the potential to transform disease management by enabling volumetric characterization of tissue microarchitecture and in-vivo assessment. Back-illumination Interference Tomography (BIT) is a new phase microscopy technology that provides rapid, non-destructive volumetric imaging of unprocessed tissues. However, translating BIT volumes into clinically interpretable H&E images remains challenging, particularly due to shift-variant contrast and the absence of quantitative validation benchmarks. We introduce HistoBIT3D, the first voxel-wise paired BIT and fluorescence-labeled nuclei dataset, enabling quantitative evaluation of structural preservation in unsupervised virtual staining against ground-truth nuclear distributions. Using this dataset, we present a novel virtual staining framework that translates BIT volumes with shift-variant contrast into realistic H&E volumes by leveraging bidirectional multiscale content consistency and cross-domain style reuse to enhance structural fidelity and perceptual realism. Our method achieves state-of-the-art realism metrics while significantly improving 3D nuclei segmentation accuracy and boundary preservation under zero-shot Cellpose evaluation. Together, these contributions establish a quantitatively validated, structurally faithful, and scalable pipeline for 3D virtual H&E staining, advancing the paradigm of slide-free, volumetric computational histopathology. Our data and code are available at: https://github.com/aasong113/HistoBIT3D_VirtualStaining.

ARXIV Cancer: unknown Method: agglomerative continual pretraining

Universal CT Representations from Anatomy to Disease Phenotype through Agglomerative Pretraining

Yuheng Li, Yuan Gao, Haoyu Dong, Yuxiang Lai, Shansong Wang, Mojtaba Safari, James E. Baciak, Xiaofeng Yang
Published 2026-05-21 02:28

The paper introduces FlexiCT, a family of CT foundation models developed through agglomerative continual pretraining on a large dataset of CT volumes. This approach integrates multiple training stages to enhance performance across various tasks, including segmentation and classification. The results indicate that FlexiCT outperforms existing task-specific models and effectively captures imaging features related to disease phenotypes.

Read abstract

Computed tomography (CT) is a central to three-dimensional medical imaging, yet CT-based artificial intelligence remains fragmented across task-specific models for segmentation, classification, registration, and report analysis. Here we present FlexiCT, a family of CT foundation models trained by agglomerative continual pretraining on 266,227 CT volumes from 56 publicly available datasets, forming a large-scale public resource for CT representation learning. FlexiCT uses agglomerative pretraining across three stages: two-dimensional axial pretraining, three-dimensional anatomical pretraining and report-guided semantic alignment. This training strategy supports slice-level, volume-level and vision-language analysis. Across five downstream task families (segmentation, classification, registration, vision-language understanding and clinical retrieval), FlexiCT matches or exceeds prior task-specific approaches on multiple benchmarks. Its embeddings further organize CT scans along gradients associated with various tumor stages, suggesting that CT foundation models can capture imaging features relevant to disease phenotype characterization. Code is available at https://github.com/ricklisz/FlexiCT

ARXIV Cancer: general cancer Method: hierarchical UNet

An Open Multi-Center Whole-Body FDG PET/CT Foundation Model for Tumor Segmentation

Xiaofeng Liu, Qianru Zhang, Thibault Marin, Menghua Xia, Chi Liu, Georges El Fakhri, Jinsong Ouyang
Published 2026-05-20 23:59

This study presents an open-source, multi-center foundation model for tumor segmentation using whole-body FDG PET/CT scans. The model integrates anatomical and metabolic features through a hierarchical UNet architecture, allowing for early interaction between modalities. It demonstrates strong label efficiency, achieving comparable performance with only 10% of labeled training data, and outperforms traditional pretraining methods in segmentation tasks. This framework aims to enhance automated oncologic imaging and reduce reliance on extensive manual annotations.

Read abstract

The synergistic interpretation of anatomical information from computed tomography (CT) and metabolic information from positron emission tomography (PET) is important to oncologic imaging. However, existing deep learning methods for PET/CT remain largely task-specific, are often trained on single-center cohorts, or adopt dual-branch fusion schemes that delay cross-modal interaction and underutilize early spatial correspondence between PET and CT. To address these limitations, we present an open-source, multi-center, whole-body FDG PET/CT foundation model utilizing 4,997 harmonized scans from four public datasets. Our framework employs hierarchical UNet-shaped backbones with early channel-wise concatenation, enabling anatomical and metabolic features to interact from the first embedding layer onward. We further introduce a masked autoencoding objective based on zero-mean imputation, combined with a weighted global reconstruction loss. This design avoids non-physical intensity discontinuities at masked-region boundaries that arise from learnable mask tokens. On downstream AutoPET lesion segmentation, the proposed models demonstrate strong label efficiency: with only 10\% of the labeled training data, they achieve performance comparable to models trained from scratch on the full dataset. Under extreme 5-shot linear probing, joint PET/CT pretraining also achieves higher Dice scores than separated-modality pretraining. This multi-center foundation model demonstrates label efficiency and cross-modality representation learning for PET/CT tumor segmentation. It provides a robust, open-source basis for advancing automated oncologic imaging, significantly reducing the need for large-scale manual annotations in clinical practice.

Find the papers that actually matter