Research Papers

ARXIV Cancer: skin cancer Method: convolutional neural network

Deep Learning for Dermatology: An Innovative Framework for Approaching Precise Skin Cancer Detection

Mohammad Tahmid Noor, B. M. Shahria Alam, Tasmiah Rahman Orpa, Shaila Afroz Anika, Mahjabin Tasnim Samiha, Fahad Ahammed
Published 2026-02-19 19:59

This paper investigates the application of two deep learning models, VGG16 and DenseNet201, for the differentiation of benign and malignant skin lesions in dermatological diagnostics. The study evaluates the models' accuracy and computational efficiency using a dataset of 3297 images, achieving a maximum accuracy of 93.79% with the DenseNet201 model. The findings suggest that these models could significantly aid in the early detection and diagnosis of skin cancer.

Read abstract

Skin cancer can be life-threatening if not diagnosed early, a prevalent yet preventable disease. Globally, skin cancer is perceived among the finest prevailing cancers and millions of people are diagnosed each year. For the allotment of benign and malignant skin spots, an area of critical importance in dermatological diagnostics, the application of two prominent deep learning models, VGG16 and DenseNet201 are investigated by this paper. We evaluate these CNN architectures for their efficacy in differentiating benign from malignant skin lesions leveraging enhancements in deep learning enforced to skin cancer spotting. Our objective is to assess model accuracy and computational efficiency, offering insights into how these models could assist in early detection, diagnosis, and streamlined workflows in dermatology. We used two deep learning methods DenseNet201 and VGG16 model on a binary class dataset containing 3297 images. The best result with an accuracy of 93.79% achieved by DenseNet201. All images were resized to 224x224 by rescaling. Although both models provide excellent accuracy, there is still some room for improvement. In future using new datasets, we tend to improve our work by achieving great accuracy.

ARXIV Cancer: breast cancer Method: Latent-Guided Dual-Stream Network

LGD-Net: Latent-Guided Dual-Stream Network for HER2 Scoring with Task-Specific Domain Knowledge

Peide Zhu, Linbin Lu, Zhiqin Chen, Xiong Chen
Published 2026-02-19 19:51

The study presents LGD-Net, a novel framework designed for accurate HER2 scoring in breast cancer by predicting HER2 levels directly from H&E slides. This method addresses the limitations of traditional multi-step Immunohistochemistry (IHC) staining, which is resource-intensive and often unavailable. By employing cross-modal feature hallucination and incorporating task-specific domain knowledge, LGD-Net achieves state-of-the-art performance on the BCI dataset, demonstrating efficient inference with single-modality inputs.

Read abstract

It is a critical task to evalaute HER2 expression level accurately for breast cancer evaluation and targeted treatment therapy selection. However, the standard multi-step Immunohistochemistry (IHC) staining is resource-intensive, expensive, and time-consuming, which is also often unavailable in many areas. Consequently, predicting HER2 levels directly from H&E slides has emerged as a potential alternative solution. It has been shown to be effective to use virtual IHC images from H&E images for automatic HER2 scoring. However, the pixel-level virtual staining methods are computationally expensive and prone to reconstruction artifacts that can propagate diagnostic errors. To address these limitations, we propose the Latent-Guided Dual-Stream Network (LGD-Net), a novel framework that employes cross-modal feature hallucination instead of explicit pixel-level image generation. LGD-Net learns to map morphological H&E features directly to the molecular latent space, guided by a teacher IHC encoder during training. To ensure the hallucinated features capture clinically relevant phenotypes, we explicitly regularize the model training with task-specific domain knowledge, specifically nuclei distribution and membrane staining intensity, via lightweight auxiliary regularization tasks. Extensive experiments on the public BCI dataset demonstrate that LGD-Net achieves state-of-the-art performance, significantly outperforming baseline methods while enabling efficient inference using single-modality H&E inputs.

ARXIV Cancer: general cancer Method: active learning

Adapting Actively on the Fly: Relevance-Guided Online Meta-Learning with Latent Concepts for Geospatial Discovery

Jowaria Khan, Anindya Sarkar, Yevgeniy Vorobeychik, Elizabeth Bondi-Kelly
Published 2026-02-19 18:30

This paper presents a unified geospatial discovery framework that combines active learning, online meta-learning, and concept-guided reasoning to efficiently uncover hidden targets in dynamic environments. The proposed method introduces innovations such as a concept-weighted uncertainty sampling strategy and a relevance-aware meta-batch formation strategy. Experiments demonstrate the framework's effectiveness in identifying cancer-causing PFAS contamination with limited data.

Read abstract

In many real-world settings, such as environmental monitoring, disaster response, or public health, with costly and difficult data collection and dynamic environments, strategically sampling from unobserved regions is essential for efficiently uncovering hidden targets under tight resource constraints. Yet, sparse and biased geospatial ground truth limits the applicability of existing learning-based methods, such as reinforcement learning. To address this, we propose a unified geospatial discovery framework that integrates active learning, online meta-learning, and concept-guided reasoning. Our approach introduces two key innovations built on a shared notion of *concept relevance*, which captures how domain-specific factors influence target presence: a *concept-weighted uncertainty sampling strategy*, where uncertainty is modulated by learned relevance based on readily-available domain-specific concepts (e.g., land cover, source proximity); and a *relevance-aware meta-batch formation strategy* that promotes semantic diversity during online-meta updates, improving generalization in dynamic environments. Our experiments include testing on a real-world dataset of cancer-causing PFAS (Per- and polyfluoroalkyl substances) contamination, showcasing our method's reliability at uncovering targets with limited data and a varying environment.

ARXIV Cancer: liver cancer Method: non-rigid iterative closest point

FoundationPose-Initialized 3D-2D Liver Registration for Surgical Augmented Reality

Hanyuan Zhang, Lucas He, Runlong He, Abdolrahim Kadkhodamohammadi, Danail Stoyanov, Brian R. Davidson, Evangelos B. Mazomenos, Matthew J. Clarkson
Published 2026-02-19 16:31

This study presents a novel approach to improve tumor localization in laparoscopic liver surgery using augmented reality. The authors integrate laparoscopic depth maps with a foundation pose estimator for camera-liver pose estimation and utilize non-rigid iterative closest point (NICP) for deformation. The proposed method demonstrated a mean registration error of 9.91 mm in real patient data, indicating its potential as an efficient alternative to traditional finite-element models.

Read abstract

Augmented reality can improve tumor localization in laparoscopic liver surgery. Existing registration pipelines typically depend on organ contours; deformable (non-rigid) alignment is often handled with finite-element (FE) models coupled to dimensionality-reduction or machine-learning components. We integrate laparoscopic depth maps with a foundation pose estimator for camera-liver pose estimation and replace FE-based deformation with non-rigid iterative closest point (NICP) to lower engineering/modeling complexity and expertise requirements. On real patient data, the depth-augmented foundation pose approach achieved 9.91 mm mean registration error in 3 cases. Combined rigid-NICP registration outperformed rigid-only registration, demonstrating NICP as an efficient substitute for finite-element deformable models. This pipeline achieves clinically relevant accuracy while offering a lightweight, engineering-friendly alternative to FE-based deformation.

ARXIV Cancer: liver cancer Method: non-rigid ICP

Depth Augmented and FE Free 3D/2D Liver Registration for Laparoscopic Liver AR

Hanyuan Zhang, Lucas He, Runlong He, Weixi Yi, Abdolrahim Kadkhodamohammadi, Danail Stoyanov, Brian R. Davidson, Evangelos B. Mazomenos, Matthew J. Clarkson
Published 2026-02-19 16:31

This paper presents a novel depth-augmented, finite-element (FE)-free registration pipeline for laparoscopic liver surgery that aims to improve the accuracy of aligning preoperative 3D models with intraoperative 2D video. The method combines robust rigid initialization with patient-specific non-rigid refinement, utilizing a modified RefineNet module and a statistical deformation model. Experimental results demonstrate a mean target registration error of 14.73 mm, indicating the effectiveness of the proposed approach in enhancing surgical augmented reality applications.

Read abstract

Augmented reality (AR) guidance in laparoscopic liver surgery requires accurate registration of preoperative 3D models to intraoperative 2D video, but remains challenging due to partial visibility, specularities, and tissue deformation. Existing methods often rely on contour-based rigid initialization and finite-element (FE) models for deformable registration, increasing modeling and engineering complexity. We present a depth-augmented, FE-free 3D--2D registration pipeline that combines robust rigid initialization with patient-specific non-rigid refinement. For rigid alignment, we adapt the RefineNet module of FoundationPose to laparoscopic liver scenes by using multi-class contour maps and monocular depth for relative pose refinement. For deformable alignment, we construct a patient-specific statistical deformation model from non-rigid ICP (NICP) correspondences and optimize pose and shape parameters using a coarse-to-fine L-BFGS-B strategy. On a public clinical laparoscopic liver dataset, the proposed method achieves a mean target registration error (TRE) of 14.73\,mm under a controlled manual-contour setting designed to isolate registration performance. Ablation studies show that monocular depth improves rigid initialization over contour-only inputs, while tumor-mapping analysis indicates that good surface alignment does not necessarily translate into lower target localization error. On an external dataset without ground truth, the method produces visually plausible overlays for qualitative assessment. These results suggest that depth-augmented pose refinement and FE-free statistical deformation modeling provide a promising alternative to FE-based pipelines for controlled 3D--2D liver registration in surgical AR.

ARXIV Cancer: non-small cell lung cancer Method: Multimodal Contrastive Variational AutoEncoder

A Contrastive Variational AutoEncoder for NSCLC Survival Prediction with Missing Modalities

Michele Zanitti, Vanja Miskovic, Francesco Trovò, Alessandra Laura Giulia Pedrocchi, Ming Shen, Yan Kyaw Tun, Arsela Prelaj, Sokol Kosta
Published 2026-02-19 14:29

This study presents a Multimodal Contrastive Variational AutoEncoder (MCVAE) designed to predict survival outcomes for patients with non-small cell lung cancer (NSCLC) despite challenges posed by missing data modalities. The model integrates various data sources, including whole-slide images and transcriptomics, while employing a multi-task objective to enhance patient representation. Evaluations on TCGA datasets indicate that the MCVAE outperforms existing models in predicting disease-specific survival, particularly in scenarios with significant missingness.

Read abstract

Predicting survival outcomes for non-small cell lung cancer (NSCLC) patients is challenging due to the different individual prognostic features. This task can benefit from the integration of whole-slide images, bulk transcriptomics, and DNA methylation, which offer complementary views of the patient's condition at diagnosis. However, real-world clinical datasets are often incomplete, with entire modalities missing for a significant fraction of patients. State-of-the-art models rely on available data to create patient-level representations or use generative models to infer missing modalities, but they lack robustness in cases of severe missingness. We propose a Multimodal Contrastive Variational AutoEncoder (MCVAE) to address this issue: modality-specific variational encoders capture the uncertainty in each data source, and a fusion bottleneck with learned gating mechanisms is introduced to normalize the contributions from present modalities. We propose a multi-task objective that combines survival loss and reconstruction loss to regularize patient representations, along with a cross-modal contrastive loss that enforces cross-modal alignment in the latent space. During training, we apply stochastic modality masking to improve the robustness to arbitrary missingness patterns. Extensive evaluations on the TCGA-LUAD (n=475) and TCGA-LUSC (n=446) datasets demonstrate the efficacy of our approach in predicting disease-specific survival (DSS) and its robustness to severe missingness scenarios compared to two state-of-the-art models. Finally, we bring some clarifications on multimodal integration by testing our model on all subsets of modalities, finding that integration is not always beneficial to the task.

ARXIV Cancer: bladder cancer Method: explainable machine learning

A feature-stable and explainable machine learning framework for trustworthy decision-making under incomplete clinical data

Justyna Andrys-Olek, Paulina Tworek, Luca Gherardini, Mark W. Ruddock, Mary Jo Kurt, Peter Fitzgerald, Jose Sousa
Published 2026-02-19 13:48

This paper presents CACTUS, an explainable machine learning framework designed to enhance decision-making in the presence of incomplete clinical data. The framework integrates feature abstraction, interpretable classification, and stability analysis to ensure that key features remain informative despite data quality degradation. The authors benchmark CACTUS against traditional machine learning methods using a cohort of patients evaluated for bladder cancer, demonstrating its superior predictive performance and stability of features under conditions of missing data.

Read abstract

Machine learning models are increasingly applied to biomedical data, yet their adoption in high stakes domains remains limited by poor robustness, limited interpretability, and instability of learned features under realistic data perturbations, such as missingness. In particular, models that achieve high predictive performance may still fail to inspire trust if their key features fluctuate when data completeness changes, undermining reproducibility and downstream decision-making. Here, we present CACTUS (Comprehensive Abstraction and Classification Tool for Uncovering Structures), an explainable machine learning framework explicitly designed to address these challenges in small, heterogeneous, and incomplete clinical datasets. CACTUS integrates feature abstraction, interpretable classification, and systematic feature stability analysis to quantify how consistently informative features are preserved as data quality degrades. Using a real-world haematuria cohort comprising 568 patients evaluated for bladder cancer, we benchmark CACTUS against widely used machine learning approaches, including random forests and gradient boosting methods, under controlled levels of randomly introduced missing data. We demonstrate that CACTUS achieves competitive or superior predictive performance while maintaining markedly higher stability of top-ranked features as missingness increases, including in sex-stratified analyses. Our results show that feature stability provides information complementary to conventional performance metrics and is essential for assessing the trustworthiness of machine learning models applied to biomedical data. By explicitly quantifying robustness to missing data and prioritising interpretable, stable features, CACTUS offers a generalizable framework for trustworthy data-driven decision support.

ARXIV Cancer: unknown Method: multimodal fusion

SubQuad: Near-Quadratic-Free Structure Inference with Distribution-Balanced Objectives in Adaptive Receptor framework

Rong Fu, Zijian Zhang, Kun Liu, Jiekai Wu, Xianda Li, Simon Fong
Published 2026-02-19 12:51

The paper presents SubQuad, an end-to-end pipeline designed to enhance the comparative analysis of adaptive immune repertoires by addressing the challenges of high computational costs and dataset imbalances. It integrates antigen-aware retrieval with GPU-accelerated affinity kernels and fairness-constrained clustering. The system demonstrates improved throughput and memory efficiency while maintaining or enhancing key performance metrics such as recall and cluster purity. SubQuad aims to provide a scalable and bias-aware platform for applications in vaccine target prioritization and biomarker discovery.

Read abstract

Comparative analysis of adaptive immune repertoires at population scale is hampered by two practical bottlenecks: the near-quadratic cost of pairwise affinity evaluations and dataset imbalances that obscure clinically important minority clonotypes. We introduce SubQuad, an end-to-end pipeline that addresses these challenges by combining antigen-aware, near-subquadratic retrieval with GPU-accelerated affinity kernels, learned multimodal fusion, and fairness-constrained clustering. The system employs compact MinHash prefiltering to sharply reduce candidate comparisons, a differentiable gating module that adaptively weights complementary alignment and embedding channels on a per-pair basis, and an automated calibration routine that enforces proportional representation of rare antigen-specific subgroups. On large viral and tumor repertoires SubQuad achieves measured gains in throughput and peak memory usage while preserving or improving recall@k, cluster purity, and subgroup equity. By co-designing indexing, similarity fusion, and equity-aware objectives, SubQuad offers a scalable, bias-aware platform for repertoire mining and downstream translational tasks such as vaccine target prioritization and biomarker discovery.

ARXIV Cancer: unknown Method: vision transformer

Automated Histopathology Report Generation via Pyramidal Feature Extraction and the UNI Foundation Model

Ahmet Halici, Ece Tugba Cebeci, Musa Balci, Mustafa Cini, Serkan Sokmen
Published 2026-02-18 12:55

This study presents a hierarchical vision language framework aimed at generating diagnostic reports from histopathology whole slide images (WSIs). The method utilizes a combination of a frozen pathology foundation model and a Transformer decoder, incorporating multi-resolution pyramidal patch selection for effective processing. The approach enhances report reliability through a retrieval-based verification step that compares generated reports with a reference corpus.

Read abstract

Generating diagnostic text from histopathology whole slide images (WSIs) is challenging due to the gigapixel scale of the input and the requirement for precise, domain specific language. We propose a hierarchical vision language framework that combines a frozen pathology foundation model with a Transformer decoder for report generation. To make WSI processing tractable, we perform multi resolution pyramidal patch selection (downsampling factors 2^3 to 2^6) and remove background and artifacts using Laplacian variance and HSV based criteria. Patch features are extracted with the UNI Vision Transformer and projected to a 6 layer Transformer decoder that generates diagnostic text via cross attention. To better represent biomedical terminology, we tokenize the output using BioGPT. Finally, we add a retrieval based verification step that compares generated reports with a reference corpus using Sentence BERT embeddings; if a high similarity match is found, the generated report is replaced with the retrieved ground truth reference to improve reliability.

ARXIV Cancer: skin cancer Method: Hybrid Parallel-Fusion Cascaded Attention Network

HyPCA-Net: Advancing Multimodal Fusion in Medical Image Analysis

J. Dhar, M. K. Pandey, D. Chakladar, M. Haghighat, A. Alavi, S. Mistry, N. Zaidi
Published 2026-02-18 07:47

This paper presents HyPCA-Net, a novel Hybrid Parallel-Fusion Cascaded Attention Network designed to enhance multimodal fusion in medical image analysis. The proposed method addresses challenges associated with existing frameworks, such as high computational costs and information loss during inter-module transitions. Extensive experiments demonstrate that HyPCA-Net outperforms leading methods, achieving significant performance improvements while reducing computational costs.

Read abstract

Multimodal fusion frameworks, which integrate diverse medical imaging modalities (e.g., MRI, CT), have shown great potential in applications such as skin cancer detection, dementia diagnosis, and brain tumor prediction. However, existing multimodal fusion methods face significant challenges. First, they often rely on computationally expensive models, limiting their applicability in low-resource environments. Second, they often employ cascaded attention modules, which potentially increase risk of information loss during inter-module transitions and hinder their capacity to effectively capture robust shared representations across modalities. This restricts their generalization in multi-disease analysis tasks. To address these limitations, we propose a Hybrid Parallel-Fusion Cascaded Attention Network (HyPCA-Net), composed of two core novel blocks: (a) a computationally efficient residual adaptive learning attention block for capturing refined modality-specific representations, and (b) a dual-view cascaded attention block aimed at learning robust shared representations across diverse modalities. Extensive experiments on ten publicly available datasets exhibit that HyPCA-Net significantly outperforms existing leading methods, with improvements of up to 5.2% in performance and reductions of up to 73.1% in computational cost. Code: https://github.com/misti1203/HyPCA-Net.

Find the papers that actually matter