Research Papers

ARXIV Cancer: general cancer Method: graph neural network

SEMIR: Semantic Minor-Induced Representation Learning on Graphs for Visual Segmentation

Luke James Miller, Yugyung Lee
Published 2026-05-12 16:52

The paper introduces SEMIR, a framework for segmenting small and sparse structures in large-scale images, addressing challenges related to voxel-level computation and class imbalance. SEMIR utilizes a topology-preserving latent graph representation to enhance boundary detection and improve segmentation accuracy. The method is benchmarked on three tumor segmentation datasets, demonstrating consistent improvements in identifying minority structures.

Read abstract

Segmenting small and sparse structures in large-scale images is fundamentally constrained by voxel-level, lattice-bound computation and extreme class imbalance -- dense, full-resolution inference scales poorly and forces most pipelines to rely on fixed regionization or downsampling, coupling computational cost to image resolution and attenuating boundary evidence precisely where minority structures are most informative. We introduce SEMIR (Semantic Minor-Induced Representation Learning), a representation framework that decouples inference from the native grid by learning a task-adapted, topology-preserving latent graph representation with exact decoding. SEMIR transforms the underlying grid graph into a compact, boundary-aligned graph minor through parameterized edge contraction, node deletion, and edge deletion, while preserving an exact lifting map from minor predictions to lattice labels. Minor construction is formalized as a few-shot structure learning problem that replaces hand-tuned preprocessing with a boundary-alignment objective: minor parameters are learned by maximizing agreement between predicted boundary elements and target-specific semantic edges under a boundary Dice criterion, and the induced minor is annotated with scale- and rotation-robust geometric and intensity descriptors and supports efficient region-level inference via message passing on a graph neural network (GNN) with relational edge features. We benchmark SEMIR on three tumor segmentation datasets -- BraTS 2021, KiTS23, and LiTS -- where targets exhibit high structural variability and distributional uncertainty. SEMIR yields consistent improvements in minority-structure Dice at practical runtime. More broadly, SEMIR establishes a framework for learning task-adapted, topology-preserving latent representations with exact decoding for high-resolution structured visual data.

ARXIV Cancer: general cancer Method: hybrid volumetric learning architecture

H3D-MarNet: Wavelet-Guided Dual-Path Learning for Metal Artifact Suppression and CT Modality Transformation for Radiotherapy Workflows

Mubashara Rehman, Niki Martinel, Michele Avanzo, Riccardo Spizzo, Christian Micheloni
Published 2026-05-12 15:21

The paper presents H3D-MarNet, a two-stage framework designed to suppress metal artifacts in computed tomography (CT) and facilitate CT modality transformation for radiotherapy. The first stage employs a wavelet-based preprocessing module for artifact suppression, while the second stage utilizes a hybrid volumetric learning architecture combining CNN and transformer-based encoders. Experimental results indicate that H3D-MarNet effectively enhances image quality, achieving significant improvements in PSNR and SSIM metrics.

Read abstract

Metal artifacts in computed tomography (CT) severely degrade image quality, compromising diagnostic accuracy and radiotherapy planning, especially in cancer patients with high-density implants. We propose H3D-MarNet, a two-stage framework for artifact-aware CT domain transformation from kilo-voltage CT (kVCT) to mega-voltage CT (MVCT). In the first stage, a wavelet-based preprocessing module suppresses metal-induced artifacts through frequency-aware denoising while preserving anatomical structures. In second stage, Domain-TransNet performs kVCT-to-MVCT domain transformation using a hybrid volumetric learning architecture. Domain-TransNet integrates a CNN-based encoder to capture fine-grained local anatomical details and a transformer-based encoder to model long-range volumetric dependencies. The complementary representations are fused through an attention-based feature fusion mechanism to ensure spatial and contextual coherence across slices. A multi-stage, attention-guided decoder, supported by deep supervision, progressively reconstructs artifact-suppressed MVCT volumes. Extensive experiments demonstrate that H3D-MarNet achieves 28.14 dB PSNR and 0.717 SSIM on artifact-affected slices from full dataset, indicating effective metal artifact suppression and anatomical preservation, highlighting its potential for reliable CT modality transformation in clinical radiotherapy workflows.

ARXIV Cancer: lung cancer Method: CycleGAN

A Comparative Analysis of CT Degradation for LDCT Nodule Classification using Radiomics

Jiaying Liu, Anna Corti, Valentina D. A. Corino, Luca Mainardi
Published 2026-05-12 14:13

This study investigates the use of synthetic low-dose computed tomography (LDCT) images generated from standard-dose CT (SDCT) data to improve lung nodule classification. By employing three degradation methods, including CycleGAN, the research demonstrates that these synthetic images can enhance the performance of machine learning models in detecting lung nodules. The results indicate that the CycleGAN-generated images achieved the best classification performance, highlighting the importance of domain adaptation in this context.

Read abstract

Low-dose computed tomography (LDCT) is the standard modality for lung cancer screening, known for its low radiation dose but high noise levels. While existing literature focuses on denoising LDCT images, comparative research on simulating LDCT characteristics to directly use these images for model development is lacking. This study shifts the focus from denoising images to degrading available standard-dose CT (SDCT) data, generating synthetic images for data augmentation to train classifiers for screening-detected nodules. We compare three degradation methods: (1) a sinogram domain statistical noise insertion; (2) replicate a validated physics-based simulation using Pix2Pix; and (3) unpaired CycleGAN. The generated images were utilized to simulate LDCT screening scenario replacing 695 SDCT cases from the LIDC-IDRI dataset, from which radiomic features were extracted to train machine learning models for lung nodule classification. Regarding image quality, CycleGAN achieved the best Fréchet inception distance (0.1734) and kernel inception distance (0.0813; 0.1002) scores, indicating distributional alignment with the target low-dose domain. In the nodule classification task, results confirmed the necessity of domain adaptation since a baseline model trained on non-degraded SDCT data failed to generalize to the real LDCT set (AUC 0.789) with a low sensitivity (0.571). Degraded images generated using CycleGAN approach led to the most balanced performance on the classification task using Adam Booster classifier, achieving an AUC of 0.861, sensitivity of 0.743 and specificity of 0.858 in the independent test. Our findings confirm that generating synthetic LDCT data from standard-dose scans is a viable strategy for training robust nodule classifiers for screening detected nodules.

ARXIV Cancer: lung cancer Method: deep learning

M3Net: A Macro-to-Meso-to-Micro Clinical-inspired Hierarchical 3D Network for Pulmonary Nodule Classification

Jinyue Li, Yuzhou Yu, Jingjing Yang, Meng Fu, Yani Zhang, Shuyao He, Dianlong Ge, Xin Ning, Yannan Chu, Qiankun Li
Published 2026-05-12 10:16

This paper presents M3Net, a novel hierarchical 3D network designed for the classification of pulmonary nodules in CT scans, addressing the challenges posed by their multi-scale and heterogeneous nature. The method integrates multi-scale contextual information and employs scale-specific encoders to enhance classification accuracy. Experimental results on public and clinical datasets demonstrate that M3Net achieves state-of-the-art performance, indicating its potential for clinical integration in lung cancer screening.

Read abstract

The accurate classification of benign and malignant pulmonary nodules in CT scans is critical for early lung cancer screening, yet remains challenging due to the multi-scale and heterogeneous nature of pulmonary nodules. While deep learning offers potential for auxiliary diagnosis, most existing models act as "black boxes", lacking the transparency and explainability required for trustworthy clinical integration. To address this issue, we propose M3Net, a novel 3D network for pulmonary nodule classification inspired by the hierarchical diagnostic workflow of radiologists, which integrates multi-scale contextual information from fine-grained structures to global anatomical relationships. Our framework constructs a progressive multi-scale input, from fine-grained nodule structures to local semantics and global spatial relationships. M3Net employs scale-specific encoders and ensures cross-scale semantic consistency through latent space projection and mutual information maximization. Extensive experiments on the public LIDC-IDRI dataset and a self-collected clinical dataset (USTC-FHLN) demonstrate that our method achieves state-of-the-art performance, with accuracies of 86.96% and 84.24% respectively, outperforming the best baseline by 3.26% and 2.17%. The results validate that M3Net provides a more robust and clinically relevant solution for pulmonary nodule classification. The code is available at https://github.com/jylEcho/M3-Net.

ARXIV Cancer: unknown Method: flow-matching generative model

RNA-FM: Flow-Matching Generative Model for Genome-wide RNA-Seq Prediction

Yaxuan Song, Jianan Fan, Tianyi Wang, Qiuyue Hu, Hang Chang, Heng Huang, Weidong Cai
Published 2026-05-12 06:48

This paper introduces RNA-FM, a flow-matching generative model designed for predicting genome-wide RNA sequencing (RNA-seq) data from histopathology whole-slide images (WSIs). The method addresses limitations of existing approaches by formulating the prediction as a continuous-time conditional transport problem, allowing for better capture of biological heterogeneity and predictive uncertainty. Experimental results indicate that RNA-FM outperforms current state-of-the-art methods while ensuring biological interpretability.

Read abstract

Histopathology whole-slide images (WSIs) are routinely acquired in clinical practice and contain rich tissue morphology but lack direct molecular architecture and functional programs defining pathological states, whereas RNA sequencing (RNA-seq) provides genome-wide transcriptional profiles at substantial cost, thereby motivating WSI-based genome-wide transcriptomic prediction. Existing approaches for predicting gene expression from WSIs predominantly rely on deterministic regression with one-to-one mapping, limiting their ability to capture biological heterogeneity and predictive uncertainty. We propose RNA-FM, a flow-matching generative framework for genome-wide bulk RNA-seq prediction from WSIs. RNA-FM formulates transcriptomic prediction as a continuous-time conditional transport problem, learning a velocity field that maps a simple prior to the target gene expression distribution conditioned on morphologies. By integrating pathway-level structure, RNA-FM enables scalable and biologically interpretable genome-wide gene expression imputation. Extensive experiments demonstrate that RNA-FM consistently outperforms state-of-the-art approaches while maintaining biological meaningfulness. Code is available at https://github.com/YXSong000/RNA-FM.

ARXIV Cancer: general cancer Method: adversarial augmentation

Physics-Grounded Adversarial Stain Augmentation with Calibrated Coverage Guarantees

Mingi Hong
Published 2026-05-12 04:39

The paper presents Calibrated Adversarial Stain Augmentation (CASA), a method designed to address the issue of stain variation in histopathology models across different hospitals. By utilizing a principled budget and coverage guarantees based on multi-center statistics, CASA enhances the robustness of models during deployment. The method demonstrates superior performance on the Camelyon17-WILDS dataset, achieving a slide-level accuracy of 93.9%, significantly outperforming several existing augmentation techniques.

Read abstract

Stain variation across hospitals degrades histopathology models at deployment. Existing augmentation methods perturb color spaces with arbitrary hyperparameters, lacking both a principled budget and coverage guarantees for unseen centers. We propose \textbf{C}alibrated \textbf{A}dversarial \textbf{S}tain \textbf{A}ugmentation (\textbf{CASA}), which performs adversarial augmentation in the Macenko stain parameter space with a budget calibrated from multi-center statistics via the DKW inequality. On Camelyon17-WILDS (5 seeds), CASA achieves $93.9\% \pm 1.6\%$ slide-level accuracy -- outperforming HED-strong ($88.4\% \pm 7.3\%$), RandStainNA ($85.2\% \pm 6.7\%$), and ERM ($63.9\% \pm 11.3\%$) -- with the highest worst-group accuracy ($84.9\% \pm 0.9\%$) among all 10 compared methods.

ARXIV Cancer: brain tumor Method: convolutional neural network

Brain Tumor Classification in MRI Images: A Computationally Efficient Convolutional Neural Network

Md Fahimul Kabir Chowdhury, Jannatul Ferdous
Published 2026-05-11 21:39

This study presents a lightweight Convolutional Neural Network (CNN) designed for the multi-class classification of brain tumors using MRI images. The model targets gliomas, meningiomas, pituitary tumors, and healthy instances, achieving high classification accuracies and ROC scores while maintaining a lower computational cost compared to existing architectures. The results indicate that this CNN could serve as an effective diagnostic tool in clinical settings.

Read abstract

Improving patient outcomes depends on the prompt and accurate diagnosis of brain tumors, but manual MRI scan analysis is still time-consuming and unreliable. Although deep learning has shown promise, many of the models that are now in use are computationally intensive and have difficulty handling the intrinsic complexity and variety of different types of brain tumors. In this work, we propose a lightweight yet high-performing Convolutional Neural Network (CNN) for multi-class brain tumor classification, employing MRI images to target gliomas, meningiomas, pituitary tumors, and healthy (no tumor) instances. The model was rigorously evaluated on two publicly accessible datasets from Figshare and Kaggle. Leveraging efficient feature extraction and optimized training strategies, our CNN achieved classification accuracies of 99.03% and 99.28%, along with ROC scores of 99.88% and 99.94% on Dataset 1 and Dataset 2, respectively-all while utilizing significantly fewer parameters than popular pre-trained architectures. In contrast to cutting-edge models like DenseNet201, MobileNetV2, VGG19, Xception, InceptionV3, and ResNet50, our approach consistently demonstrated superior performance with reduced computational overhead. These findings highlight the potential of the proposed model as a practical and reliable diagnostic aid in clinical environments.

ARXIV Cancer: breast cancer Method: unknown

ABRA: Agent Benchmark for Radiology Applications

Bulat Maksudov, Vladislav Kurenkov, Kathleen M. Curran, Alessandra Mileo
Published 2026-05-11 20:34

The paper introduces ABRA, a novel radiology-agent benchmark designed for evaluating medical imaging agents in a navigable environment. It comprises 655 tasks across various difficulty levels and types, focusing on tasks such as viewer control and annotation. The results indicate that while current models achieve high execution rates, their outcomes are significantly lower, highlighting perception as the main bottleneck.

Read abstract

Existing medical-agent benchmarks deliver imaging as pre-selected samples, never as an environment the agent must navigate. We introduce ABRA, a radiology-agent benchmark in which the agent operates an OHIF viewer and an Orthanc DICOM server through twenty-one function-calling tools that span slice navigation, windowing, series selection, pixel-coordinate annotation, and structured reporting. ABRA contains 655 programmatically generated tasks across three difficulty tiers and eight types (viewer control, metadata QA, vision probe, annotation, longitudinal comparison, BI-RADS reporting, and oracle variants of annotation and BI-RADS reporting), drawn from LIDC-IDRI, Duke Breast Cancer MRI, and NLST New-Lesion LongCT. Each episode is scored along Planning, Execution, and Outcome (Bluethgen et al., 2025) by task-type-specific automatic scorers. Ten current models, five closed-weight and five open-weight, reach at least 89% Execution on real annotation but only 0-25% Outcome; on the paired oracle variant where a simulated detector supplies the finding, Outcome on the same task reaches 69-100% across the models evaluated, localising the bottleneck to perception rather than tool orchestration. Code, task generators, and scorers are released at https://github.com/Luab/ABRA

ARXIV Cancer: general cancer Method: Visual Question Answering

RadThinking: A Dataset for Longitudinal Clinical Reasoning in Radiology

Wenxuan Li, Pedro R. A. S. Bassi, Xinze Zhou, Jakob Wasserthal, Alan L. Yuille, Zongwei Zhou
Published 2026-05-11 15:57

The paper presents RadThinking, a Visual Question Answering (VQA) dataset designed to enhance clinical reasoning in radiology for cancer screening. It includes VQA pairs categorized by difficulty, ranging from atomic perception questions to complex multi-step reasoning tasks. The dataset comprises 20,362 CT scans from 9,131 patients across 43 cancer groups, aiming to facilitate systematic training and evaluation of AI systems in reasoning about cancer diagnoses.

Read abstract

Cancer screening is a reasoning task. A radiologist observes findings, compares them to prior scans, integrates clinical context, and reaches a diagnostic conclusion confirmed by pathology. We present RadThinking, a Visual Question Answering (VQA) dataset that makes this reasoning explicit and trainable. RadThinking releases VQA pairs at three difficulty tiers. Foundation VQAs are atomic perception questions. Single-step reasoning VQAs apply one clinical rule. Compositional VQAs require multi-step chain-of-thought to reach a guideline category such as LI-RADS-5. For every compositional VQA, we release the chain of foundation VQAs that solves it. The chain follows the rules of the governing clinical reporting standard. The dataset spans 20,362 CT scans from 9,131 patients across 43 cancer groups, plus 2,077 verified healthy controls with >1-year follow-up. To our knowledge, RadThinking is the first cancer-screening VQA corpus that stratifies questions by reasoning depth and grounds compositions in clinical reporting standards. The foundation tier supplies atomic perception supervision. The compositional tier supplies chain-of-thought data and verifiable rewards for reinforcement-learning recipes such as DeepSeek-R1 and OpenAI o1. RadThinking enables systematic training and evaluation of whether AI systems can reason about cancer, not merely detect it.

ARXIV Cancer: unknown Method: distributionally robust optimization

DuetFair: Coupling Inter- and Intra-Subgroup Robustness for Fair Medical Image Segmentation

Yiqi Tian, Sangjoon Park, Bo Zeng, Pengfei Jin, Yujin Oh, Quanzheng Li
Published 2026-05-11 13:08

The paper presents DuetFair, a dual-axis fairness framework aimed at improving medical image segmentation models by addressing intra-group hidden failures. The proposed FairDRO mechanism combines distribution-aware mixture-of-experts with subgroup-conditioned distributionally robust optimization to enhance performance across subgroups while minimizing hidden failures. Evaluations demonstrate that FairDRO achieves superior equity-scaled performance and improves worst-case subgroup performance in various medical image segmentation benchmarks.

Read abstract

Medical image segmentation models can perform unevenly across subgroups. Most existing fairness methods focus on improving average subgroup performance, implicitly treating each subgroup as internally homogeneous. However, this can hide difficult cases within a subgroup, where high-loss samples are obscured by the subgroup mean. We call this problem \textbf{intra-group hidden failure}. To solve this, we propose \textbf{DuetFair} mechanism, a dual-axis fairness framework that jointly considers inter-subgroup adaptation and intra-subgroup robustness. Based on DuetFair, we introduce \textbf{FairDRO}, which combines distribution-aware mixture-of-experts (dMoE) with subgroup-conditioned distributionally robust optimization (DRO) loss aggregation. This design allows the model to adapt across subgroups while also reducing hidden failures within each subgroup. We evaluate FairDRO on three medical image segmentation benchmarks with varying degrees of within-group heterogeneity. FairDRO achieves the best equity-scaled performance on Harvard-FairSeg and improves worst-case subgroup performance on HAM10000 under both age- and race-based grouping schemes. On the 3D radiotherapy target cohort, FairDRO further improves worst-group Dice by 3.5 points ($\uparrow 6.0\%$) under the tumor-stage grouping and by 4.1 points ($\uparrow 7.4\%$) under the institution grouping over the strongest baseline.

Find the papers that actually matter