Research Papers

ARXIV Cancer: general cancer Method: Cooperative Fine-Grained Refinement

Adapting SAM to Nuclei Instance Segmentation and Classification via Cooperative Fine-Grained Refinement

Jingze Su, Tianle Zhu, Jiaxin Cai, Zhiyi Wang, Qi Li, Xiao Zhang, Tong Tong, Shu Wang, Wenxi Liu
Published 2026-03-30 04:39

This paper presents a novel framework called Cooperative Fine-Grained Refinement to adapt the Segment Anything Model (SAM) for nuclei instance segmentation in computational pathology. The proposed method enhances SAM's local perception and spatial detail preservation through a multi-scale adaptive local-aware adapter, a hierarchical modulated fusion module, and a boundary-guided mask refinement. The results indicate improved accuracy in nuclei segmentation, addressing the limitations of SAM when applied to medical imaging.

Read abstract

Nuclei instance segmentation is critical in computational pathology for cancer diagnosis and prognosis. Recently, the Segment Anything Model has demonstrated exceptional performance in various segmentation tasks, leveraging its rich priors and powerful global context modeling capabilities derived from large-scale pre-training on natural images. However, directly applying SAM to the medical imaging domain faces significant limitations: it lacks sufficient perception of the local structural features that are crucial for nuclei segmentation, and full fine-tuning for downstream tasks requires substantial computational costs. To efficiently transfer SAM's robust prior knowledge to nuclei instance segmentation while supplementing its task-aware local perception, we propose a parameter-efficient fine-tuning framework, named Cooperative Fine-Grained Refinement of SAM, consisting of three core components: 1) a Multi-scale Adaptive Local-aware Adapter, which enables effective capability transfer by augmenting the frozen SAM backbone with minimal parameters and instilling a powerful perception of local structures through dynamically generated, multi-scale convolutional kernels; 2) a Hierarchical Modulated Fusion Module, which dynamically aggregates multi-level encoder features to preserve fine-grained spatial details; and 3) a Boundary-Guided Mask Refinement, which integrates multi-context boundary cues with semantic features through explicit supervision, producing a boundary-focused signal to refine initial mask predictions for sharper delineation. These three components work cooperatively to enhance local perception, preserve spatial details, and refine boundaries, enabling SAM to perform accurate nuclei instance segmentation directly.

ARXIV Cancer: breast cancer Method: large language model

Greedy Is a Strong Default: Agents as Iterative Optimizers

Yitao Li
Published 2026-03-28 21:26

This study investigates the effectiveness of using a large language model (LLM) agent as a proposer in classical optimization algorithms for various tasks, including rule-based classification for breast cancer. The results indicate that the LLM's learned prior significantly enhances performance, making greedy hill climbing with early stopping a strong default method. The framework not only improves accuracy but also generates interpretable classification rules that align with established cytopathology principles.

Read abstract

Classical optimization algorithms--hill climbing, simulated annealing, population-based methods--generate candidate solutions via random perturbations. We replace the random proposal generator with an LLM agent that reasons about evaluation diagnostics to propose informed candidates, and ask: does the classical optimization machinery still help when the proposer is no longer random? We evaluate on four tasks spanning discrete, mixed, and continuous search spaces (all replicated across 3 independent runs): rule-based classification on Breast Cancer (test accuracy 86.0% to 96.5%), mixed hyperparameter optimization for MobileNetV3-Small on STL-10 (84.5% to 85.8%, zero catastrophic failures vs. 60% for random search), LoRA fine-tuning of Qwen2.5-0.5B on SST-2 (89.5% to 92.7%, matching Optuna TPE with 2x efficiency), and XGBoost on Adult Census (AUC 0.9297 to 0.9317, tying CMA-ES with 3x fewer evaluations). Empirically, on these tasks: a cross-task ablation shows that simulated annealing, parallel investigators, and even a second LLM model (OpenAI Codex) provide no benefit over greedy hill climbing while requiring 2-3x more evaluations. In our setting, the LLM's learned prior appears strong enough that acceptance-rule sophistication has limited impact--round 1 alone delivers the majority of improvement, and variants converge to similar configurations across strategies. The practical implication is surprising simplicity: greedy hill climbing with early stopping is a strong default. Beyond accuracy, the framework produces human-interpretable artifacts--the discovered cancer classification rules independently recapitulate established cytopathology principles.

ARXIV Cancer: general cancer Method: case transformer

MOOZY: A Patient-First Foundation Model for Computational Pathology

Yousef Kotp, Vincent Quoc-Huy Trinh, Christopher Pal, Mahdi S. Hosseini
Published 2026-03-27 23:33

This paper presents MOOZY, a patient-first foundation model designed for computational pathology that focuses on whole-slide images (WSI). The model emphasizes the patient case as the core unit of representation, modeling dependencies across multiple slides from the same patient. MOOZY combines multi-stage open self-supervision with low-cost task supervision, achieving superior performance on various clinical tasks compared to existing models. The results indicate that patient-level pretraining can yield transferable embeddings for histopathology applications.

Read abstract

Computational pathology needs whole-slide image (WSI) foundation models that transfer across diverse clinical tasks, yet current approaches remain largely slide-centric, often depend on private data and expensive paired-report supervision, and do not explicitly model relationships among multiple slides from the same patient. We present MOOZY, a patient-first pathology foundation model in which the patient case, not the individual slide, is the core unit of representation. MOOZY explicitly models dependencies across all slides from the same patient via a case transformer during pretraining, combining multi-stage open self-supervision with scaled low-cost task supervision. In Stage 1, we pretrain a vision-only slide encoder on 77,134 public slide feature grids using masked self-distillation. In Stage 2, we align these representations with clinical semantics using a case transformer and multi-task supervision over 333 tasks from 56 public datasets, including 205 classification and 128 survival tasks across four endpoints. Across eight held-out tasks with five-fold frozen-feature probe evaluation, MOOZY achieves best or tied-best performance on most metrics and improves macro averages over TITAN by +7.37%, +5.50%, and +7.83% and over PRISM by +8.83%, +10.70%, and +9.78% for weighted F1, weighted ROC-AUC, and balanced accuracy, respectively. MOOZY is also parameter efficient with 85.77M parameters, 14x smaller than GigaPath. These results demonstrate that open, reproducible patient-level pretraining yields transferable embeddings, providing a practical path toward scalable patient-first histopathology foundation models.

ARXIV Cancer: general cancer Method: Immune Synapse Encoding Transformer

ImmSET: Sequence-Based Predictor of TCR-pMHC Specificity at Scale

Marco Garcia Noceda, Matthew T Noakes, Andrew FigPope, Daniel E Mattox, Bryan Howie, Harlan Robins
Published 2026-03-27 21:08

This paper introduces ImmSET, a novel sequence-based architecture designed to predict the specificity of T cell receptors (TCRs) for their cognate peptide-major histocompatibility complexes (pMHCs). The model is trained on various dataset sizes and compositions, demonstrating robust performance and generalization capabilities. ImmSET outperforms existing methods, including AlphaFold2 and AlphaFold3, in predicting TCR-pMHC specificity, establishing it as a scalable solution for multi-sequence interaction problems.

Read abstract

T cells are a critical component of the adaptive immune system, playing a role in infectious disease, autoimmunity, and cancer. T cell function is mediated by the T cell receptor (TCR) protein, a highly diverse receptor targeting specific peptides presented by the major histocompatibility complex (pMHCs). Predicting the specificity of TCRs for their cognate pMHCs is central to understanding adaptive immunity and enabling personalized therapies. However, accurate prediction of this protein-protein interaction remains challenging due to the extreme diversity of both TCRs and pMHCs. Here, we present ImmSET (Immune Synapse Encoding Transformer), a novel sequence-based architecture designed to model interactions among sets of variable-length biological sequences. We train this model across a range of dataset sizes and compositions and study the resulting models' generalization to pMHC targets. We describe a failure mode in prior sequence-based approaches that inflates previously reported performance on this task and show that ImmSET remains robust under stricter evaluation. In systematically testing the scaling behavior of ImmSET with training data, we show that performance scales consistently with data volume across multiple data types and compares favorably with the pre-trained protein language model ESM2 fine-tuned on the same datasets. Finally, we demonstrate that ImmSET can outperform AlphaFold2 and AlphaFold3-based pipelines on TCR-pMHC specificity prediction when provided sufficient training data. This work establishes ImmSET as a scalable modeling paradigm for multi-sequence interaction problems, demonstrated in the TCR-pMHC setting but generalizable to other biological domains where high-throughput sequence-driven reasoning complements structure prediction and experimental mapping.

ARXIV Cancer: general cancer Method: graph-attention deep learning

DPD-Cancer: Explainable Graph-Based Deep Learning for Small Molecule Anti-Cancer Activity Prediction

Magnus H. Strømme, Alex G. C. de Sá, David B. Ascher
Published 2026-03-27 07:00

DPD-Cancer is a graph-attention deep learning framework designed to predict small-molecule anti-cancer activity across the NCI-60 panel. The model demonstrated strong performance with an AUROC of 0.87 and an AUPRC of 0.73 on the hold-out test set. Additionally, it provided reliable predictions for pGI50-values with a median Pearson's R of 0.64. The framework is accessible as a free web server to promote its use in the research community.

Read abstract

DPD-Cancer is a graph-attention deep learning framework for predicting small-molecule DPD-Cancer is a graph-attention deep learning framework for predicting small-molecule anti-cancer activity across the NCI-60 panel, trained and evaluated under a strict chemistry-aware data-partitioning scheme. On the hold-out test set, the classifier achieved an Area Under the Receiver Operating Characteristic Curve (AUROC) of 0.87 (95% CI [0.86, 0.88]) and Area Under the Precision-Recall Curve (AUPRC) of 0.73 (95% CI [0.70, 0.76]); per-cell-line regression models for 73 cell lines produced a median Pearson's Correlation Coefficient (Pearson's R) of 0.64 and median Root Mean Squared Error (RMSE) of 0.67 for pGI50-value prediction. Benchmarks against pdCSM-Cancer, MLASM, and ACLPred under matched data conditions yielded consistently higher Matthew's Correlation Coefficient (MCC) scores, an occlusion-based attribution analysis confirmed that model explanations were quantitatively faithful to classifier decisions, and an applicability-domain analysis characterised reliability as a function of chemical distance. To facilitate widespread adoption, DPD-Cancer is available as a free, user-friendly web server for unrestricted use at https://biosig.lab.uq.edu.au/dpd_cancer/.

ARXIV Cancer: general cancer Method: Graph Attention Transformer

DPD-Cancer: Explainable Graph-based Deep Learning for Small Molecule Anti-Cancer Activity Prediction

Magnus H. Strømme, Alex G. C. de Sá, David B. Ascher
Published 2026-03-27 07:00

The paper presents DPD-Cancer, a deep learning method utilizing a Graph Attention Transformer framework for predicting small molecule anti-cancer activity. It addresses the challenges of tumor heterogeneity and genomic variability by accurately classifying drug responses and predicting cell-line specific growth inhibition concentrations. The method outperformed existing state-of-the-art approaches, achieving high AUC scores and demonstrating significant advantages in molecular representation extraction and explainability.

Read abstract

Accurate drug response prediction is a critical bottleneck in computational biochemistry, limited by the challenge of modelling the interplay between molecular structure and cellular context. In cancer research, this is acute due to tumour heterogeneity and genomic variability, which hinder the identification of effective therapies. Conventional approaches often fail to capture non-linear relationships between chemical features and biological outcomes across diverse cell lines. To address this, we introduce DPD-Cancer, a deep learning method based on a Graph Attention Transformer (GAT) framework. It is designed for small molecule anti-cancer activity classification and the quantitative prediction of cell-line specific responses, specifically growth inhibition concentration (pGI50). Benchmarked against state-of-the-art methods (pdCSM-cancer, ACLPred, and MLASM), DPD-Cancer demonstrated superior performance, achieving an Area Under ROC Curve (AUC) of up to 0.87 on strictly partitioned NCI60 data and up to 0.98 on ACLPred/MLASM datasets. For pGI50 prediction across 10 cancer types and 73 cell lines, the model achieved Pearson's correlation coefficients of up to 0.72 on independent test sets. These findings confirm that attention-based mechanisms offer significant advantages in extracting meaningful molecular representations, establishing DPD-Cancer as a competitive tool for prioritising drug candidates. Furthermore, DPD-Cancer provides explainability by leveraging the attention mechanism to identify and visualise specific molecular substructures, offering actionable insights for lead optimisation. DPD-Cancer is freely available as a web server at: https://biosig.lab.uq.edu.au/dpd_cancer/.

ARXIV Cancer: general cancer Method: modality-specific representation-aware transformer

MUST: Modality-Specific Representation-Aware Transformer for Diffusion-Enhanced Survival Prediction with Missing Modality

Kyungwon Kim, Dosik Hwang
Published 2026-03-27 04:56

This paper presents MUST, a novel framework designed for accurate survival prediction from multimodal medical data in precision oncology. The method addresses the challenge of incomplete modalities by explicitly modeling the unique contributions of each modality. Through a decomposition of modality representations and the use of conditional latent diffusion models, MUST achieves state-of-the-art performance on TCGA cancer datasets, even under conditions of missing data.

Read abstract

Accurate survival prediction from multimodal medical data is essential for precision oncology, yet clinical deployment faces a persistent challenge: modalities are frequently incomplete due to cost constraints, technical limitations, or retrospective data availability. While recent methods attempt to address missing modalities through feature alignment or joint distribution learning, they fundamentally lack explicit modeling of the unique contributions of each modality as opposed to the information derivable from other modalities. We propose MUST (Modality-Specific representation-aware Transformer), a novel framework that explicitly decomposes each modality's representation into modality-specific and cross-modal contextualized components through algebraic constraints in a learned low-rank shared subspace. This decomposition enables precise identification of what information is lost when a modality is absent. For the truly modality-specific information that cannot be inferred from available modalities, we employ conditional latent diffusion models to generate high-quality representations conditioned on recovered shared information and learned structural priors. Extensive experiments on five TCGA cancer datasets demonstrate that MUST achieves state-of-the-art performance with complete data while maintaining robust predictions in both missing pathology and missing genomics conditions, with clinically acceptable inference latency.

ARXIV Cancer: unknown Method: generative diffusion framework

Central-to-Local Adaptive Generative Diffusion Framework for Improving Gene Expression Prediction in Data-Limited Spatial Transcriptomics

Yaoyu Fang, Jiahe Qian, Xinkun Wang, Lee A. Cooper, Bo Zhou
Published 2026-03-27 02:32

This paper presents a Central-to-Local adaptive generative diffusion framework (C2L-ST) aimed at improving gene expression prediction in spatial transcriptomics, particularly under data-limited conditions. The method involves pretraining a global model on extensive histopathology datasets and adapting local models using limited molecular guidance. The results demonstrate that the generated images maintain high fidelity and improve prediction accuracy, achieving performance comparable to real data while utilizing fewer samples.

Read abstract

Spatial Transcriptomics (ST) provides spatially resolved gene expression profiles within intact tissue architecture, enabling molecular analysis in histological context. However, the high cost, limited throughput, and restricted data sharing of ST experiments result in severe data scarcity, constraining the development of robust computational models. To address this limitation, we present a Central-to-Local adaptive generative diffusion framework for ST (C2L-ST) that integrates large-scale morphological priors with limited molecular guidance. A global central model is first pretrained on extensive histopathology datasets to learn transferable morphological representations, and institution-specific local models are then adapted through lightweight gene-conditioned modulation using a small number of paired image-gene spots. This strategy enables the synthesis of realistic and molecularly consistent histology patches under data-limited conditions. The generated images exhibit high visual and structural fidelity, reproduce cellular composition, and show strong embedding overlap with real data across multiple organs, reflecting both realism and diversity. When incorporated into downstream training, synthetic image-gene pairs improve gene expression prediction accuracy and spatial coherence, achieving performance comparable to real data while requiring only a fraction of sampled spots. C2L-ST provides a scalable and data-efficient framework for molecular-level data augmentation, offering a domain-adaptive and generalizable approach for integrating histology and transcriptomics in spatial biology and related fields.

ARXIV Cancer: general cancer Method: parameter-efficient fine-tuning

FairLLaVA: Fairness-Aware Parameter-Efficient Fine-Tuning for Large Vision-Language Assistants

Mahesh Bhosale, Abdul Wasi, Shantam Srivastava, Shifa Latif, Tianyu Luan, Mingchen Gao, David Doermann, Xuan Gong
Published 2026-03-27 01:55

This paper presents FairLLaVA, a parameter-efficient fine-tuning method designed to address fairness disparities in multimodal large language models (MLLMs) used in clinical settings. The method aims to reduce group disparities in visual instruction tuning while maintaining overall performance. Extensive experiments demonstrate that FairLLaVA improves equity-scaled clinical performance and the quality of natural language generation across various medical imaging tasks.

Read abstract

While powerful in image-conditioned generation, multimodal large language models (MLLMs) can display uneven performance across demographic groups, highlighting fairness risks. In safety-critical clinical settings, such disparities risk producing unequal diagnostic narratives and eroding trust in AI-assisted decision-making. While fairness has been studied extensively in vision-only and language-only models, its impact on MLLMs remains largely underexplored. To address these biases, we introduce FairLLaVA, a parameter-efficient fine-tuning method that mitigates group disparities in visual instruction tuning without compromising overall performance. By minimizing the mutual information between target attributes, FairLLaVA regularizes the model's representations to be demographic-invariant. The method can be incorporated as a lightweight plug-in, maintaining efficiency with low-rank adapter fine-tuning, and provides an architecture-agnostic approach to fair visual instruction following. Extensive experiments on large-scale chest radiology report generation and dermoscopy visual question answering benchmarks show that FairLLaVA consistently reduces inter-group disparities while improving both equity-scaled clinical performance and natural language generation quality across diverse medical imaging modalities. Code can be accessed at https://github.com/bhosalems/FairLLaVA.

ARXIV Cancer: general cancer Method: large language model

Clinical Reasoning AI for Oncology Treatment Planning: A Multi-Specialty Case-Based Evaluation

Philippe E. Spiess, Md Muntasir Zitu, Alison Walker, Daniel A. Anaya, Robert M. Wenham, Michael Vogelbaum, Daniel Grass, Ali-Musa Jaffer, Amod Sarnaik, Caitlin McMullen, Christine Sam, John V. Kiluk, Tianshi Liu, Tiago Biachi, Julio Powsang, Jing-Yi Chern, Roger Li, Seth Felder, Samuel Reynolds, Michael Shafique, Alison Sheehan, Ashley Layman, Cydney A. Warfield, Derrick Legoas, Jaclyn Parrinello, Jena Schmitz, Kevin Eaton, Mark Honor, Luis Felipe, Issam ElNaqa, Elier Delgado, Talia Berler, Rachael V. Phillips, Frantz Francisque, Carlos Garcia Fernandez, Gilmer Valdes
Published 2026-03-27 00:26

This study evaluates OncoBrain, an AI clinical reasoning platform designed to assist in oncology treatment planning. The platform integrates general-purpose large language models with a cancer-specific graph retrieval-augmented generation layer and a treatment-plan corpus. Clinician evaluations of 173 cases across various cancer types indicated high ratings for scientific accuracy and safety, suggesting the platform's potential to enhance treatment planning in community settings.

Read abstract

Background: More than 80% of U.S. cancer care is delivered in community settings, where survival remains worse than at academic centers. Clinicians must integrate genomics, staging, radiology, pathology, and changing guidelines, creating cognitive burden. We evaluated OncoBrain, an AI clinical reasoning platform for oncology treatment-plan generation, as an early step toward OGI. Methods: OncoBrain combines general-purpose LLMs with a cancer-specific graph retrieval-augmented generation layer, a gold-standard treatment-plan corpus as long-term memory, and a model-agnostic safety layer (CHECK) for hallucination detection and suppression. We evaluated clinician-enriched case summaries across gynecologic, genitourinary, neuro-oncology, gastrointestinal/hepatobiliary, and hematologic malignancies. Three clinician groups completed structured evaluations of 173 cases using a common 16-item instrument: subspecialist oncologists reviewed 50 cases, physician reviewers 78, and advanced practice providers 45. Results: Ratings were highest for scientific accuracy, evidence support, and safety, with lower but favorable scores for workflow integration and time savings. On a 5-point scale, mean alignment with evidence and guidelines was 4.60, 4.56, and 4.70 across subspecialists, physician reviewers, and advanced practice providers. Mean scores for absence of safety or misinformation concerns were 4.80, 4.40, and 4.60. Workflow integration averaged 4.50, 3.94, and 4.00; perceived time savings averaged 5.00, 3.89, and 3.60. Conclusions: In this multi-specialty vignette-based evaluation, OncoBrain generated oncology treatment plans judged guideline-concordant, clinically acceptable, and easy to supervise. These findings support the potential of a carefully engineered AI reasoning platform to assist oncology treatment planning and justify prospective real-world evaluation in community settings.

Find the papers that actually matter