Research Papers

ARXIV Cancer: glioma Method: radiomics

ReFRM3D: A Radiomics-enhanced Fused Residual Multiparametric 3D Network with Multi-Scale Feature Fusion for Glioma Characterization

Md. Abdur Rahman, Mohaimenul Azam Khan Raiaan, Arefin Ittesafun Abian, Yan Zhang, Mirjam Jonkman, Sami Azam
Published 2025-12-27 12:12

This study presents ReFRM3D, a novel radiomics-enhanced fused residual multiparametric 3D network designed for the characterization of gliomas. The method utilizes multi-parametric MRI data to improve tumor segmentation and classification efficiency. Experimental results indicate significant enhancements in segmentation performance across multiple datasets, achieving high Dice Similarity Coefficients for various tumor regions.

Read abstract

Gliomas are among the most aggressive cancers, characterized by high mortality rates and complex diagnostic processes. Existing studies on glioma diagnosis and classification often describe issues such as high variability in imaging data, inadequate optimization of computational resources, and inefficient segmentation and classification of gliomas. To address these challenges, we propose novel techniques utilizing multi-parametric MRI data to enhance tumor segmentation and classification efficiency. Our work introduces the first-ever radiomics-enhanced fused residual multiparametric 3D network (ReFRM3D) for brain tumor characterization, which is based on a 3D U-Net architecture and features multi-scale feature fusion, hybrid upsampling, and an extended residual skip mechanism. Additionally, we propose a multi-feature tumor marker-based classifier that leverages radiomic features extracted from the segmented regions. Experimental results demonstrate significant improvements in segmentation performance across the BraTS2019, BraTS2020, and BraTS2021 datasets, achieving high Dice Similarity Coefficients (DSC) of 94.04%, 92.68%, and 93.64% for whole tumor (WT), enhancing tumor (ET), and tumor core (TC) respectively in BraTS2019; 94.09%, 92.91%, and 93.84% in BraTS2020; and 93.70%, 90.36%, and 92.13% in BraTS2021.

ARXIV Cancer: oncology Method: DistilBERT

ADMEDTAGGER: an annotation framework for distillation of expert knowledge for the Polish medical language

Franciszek Górski, Andrzej Czyżewski
Published 2025-12-27 10:00

This paper presents ADMEDTAGGER, an annotation framework that utilizes a multilingual LLM pretrained on a large corpus to distill expert knowledge for tagging medical texts in Polish. The study involved developing a multi-class classifier using a limited annotated dataset, leading to the training of various classifiers based on the BERT architecture. The DistilBERT model outperformed others, achieving high F1 scores across clinical categories, demonstrating its effectiveness as a compact alternative to larger language models.

Read abstract

In this work, we present an annotation framework that demonstrates how a multilingual LLM pretrained on a large corpus can be used as a teacher model to distill the expert knowledge needed for tagging medical texts in Polish. This work is part of a larger project called ADMEDVOICE, within which we collected an extensive corpus of medical texts representing five clinical categories - Radiology, Oncology, Cardiology, Hypertension, and Pathology. Using this data, we had to develop a multi-class classifier, but the fundamental problem turned out to be the lack of resources for annotating an adequate number of texts. Therefore, in our solution, we used the multilingual Llama3.1 model to annotate an extensive corpus of medical texts in Polish. Using our limited annotation resources, we verified only a portion of these labels, creating a test set from them. The data annotated in this way were then used for training and validation of 3 different types of classifiers based on the BERT architecture - the distilled DistilBERT model, BioBERT fine-tuned on medical data, and HerBERT fine-tuned on the Polish language corpus. Among the models we trained, the DistilBERT model achieved the best results, reaching an F1 score > 0.80 for each clinical category and an F1 score > 0.93 for 3 of them. In this way, we obtained a series of highly effective classifiers that represent an alternative to large language models, due to their nearly 500 times smaller size, 300 times lower GPU VRAM consumption, and several hundred times faster inference.

ARXIV Cancer: breast cancer Method: vision transformer

Feature Learning with Multi-Stage Vision Transformers on Inter-Modality HER2 Status Scoring and Tumor Classification on Whole Slides

Olaide N. Oyelade, Oliver Hoxey, Yulia Humrye
Published 2025-12-26 17:45

This study presents a novel approach for HER2 status scoring and tumor classification using a multi-stage vision transformer pipeline on whole slide images (WSIs). The method integrates patch-wise processing of hematoxylin and eosin (H&E) images with immunohistochemistry (IHC) stained images to enhance HER2 scoring accuracy. Experimental results demonstrate a classification accuracy of 0.94 and specificity of 0.933 for predicting HER2 status, indicating the effectiveness of the proposed method in comparison to human pathologists.

Read abstract

The popular use of histopathology images, such as hematoxylin and eosin (H&E), has proven to be useful in detecting tumors. However, moving such cancer cases forward for treatment requires accurate on the amount of the human epidermal growth factor receptor 2 (HER2) protein expression. Predicting both the lower and higher levels of HER2 can be challenging. Moreover, jointly analyzing H&E and immunohistochemistry (IHC) stained images for HER2 scoring is difficult. Although several deep learning methods have been investigated to address the challenge of HER2 scoring, they suffer from providing a pixel-level localization of HER2 status. In this study, we propose a single end-to-end pipeline using a system of vision transformers with HER2 status scoring on whole slide images of WSIs. The method includes patch-wise processing of H&E WSIs for tumor localization. A novel mapping function is proposed to correspondingly identify correlated IHC WSIs regions with malignant regions on H&E. A clinically inspired HER2 scoring mechanism is embedded in the pipeline and allows for automatic pixel-level annotation of 4-way HER2 scoring (0, 1+, 2+, and 3+). Also, the proposed method accurately returns HER2-negative and HER2-positive. Privately curated datasets were collaboratively extracted from 13 different cases of WSIs of H&E and IHC. A thorough experiment is conducted on the proposed method. Results obtained showed a good classification accuracy during tumor localization. Also, a classification accuracy of 0.94 and a specificity of 0.933 were returned for the prediction of HER2 status, scoring in the 4-way methods. The applicability of the proposed pipeline was investigated using WSIs patches as comparable to human pathologists. Findings from the study showed the usability of jointly evaluated H&E and IHC images on end-to-end ViTs-based models for HER2 scoring

ARXIV Cancer: glioblastoma Method: variational autoencoder

The Multi-View Paradigm Shift in MRI Radiomics: Predicting MGMT Methylation in Glioblastoma

Mariya Miteva, Maria Nisheva-Pavlova
Published 2025-12-26 16:32

This study presents a multi-view latent representation learning framework utilizing variational autoencoders to predict MGMT promoter methylation in glioblastoma from MRI radiomic features. The method integrates data from post-contrast T1-weighted and Fluid-Attenuated Inversion Recovery MRI, addressing limitations of conventional unimodal approaches. The proposed framework aims to enhance the classification accuracy of MGMT methylation status, which is crucial for treatment decisions.

Read abstract

Non-invasive inference of molecular tumor characteristics from medical imaging is a central goal of radiogenomics, particularly in glioblastoma (GBM), where O6-methylguanine-DNA methyltransferase (MGMT) promoter methylation carries important prognostic and therapeutic significance. Although radiomics-based machine learning methods have shown promise for this task, conventional unimodal and early-fusion approaches are often limited by high feature redundancy and an incomplete modeling of modality-specific information. In this work, we introduce a multi-view latent representation learning framework based on variational autoencoders (VAE) to integrate complementary radiomic features derived from post-contrast T1-weighted (T1Gd) and Fluid-Attenuated Inversion Recovery (FLAIR) magnetic resonance imaging (MRI). By encoding each modality through an independent probabilistic encoder and performing fusion in a compact latent space, the proposed approach preserves modality-specific structure while enabling effective multimodal integration. The resulting latent embeddings are subsequently used for MGMT promoter methylation classification.

ARXIV Cancer: unknown Method: deep learning

AI for Mycetoma Diagnosis in Histopathological Images: The MICCAI 2024 Challenge

Hyam Omar Ali, Sahar Alhesseen, Lamis Elkhair, Adrian Galdran, Ming Feng, Zhixiang Xiong, Zengming Lin, Kele Xu, Liang Hu, Benjamin Keel, Oliver Mills, James Battye, Akshay Kumar, Asra Aslam, Prasad Dutande, Ujjwal Baid, Bhakti Baheti, Suhas Gajre, Aravind Shrenivas Murali, Eung-Joo Lee, Ahmed Fahal, Rachid Jennane
Published 2025-12-25 21:46

This paper discusses the Mycetoma MicroImage: Detect and Classify Challenge (mAIcetoma), aimed at improving mycetoma diagnosis through AI solutions. The challenge focused on developing automated models for segmenting mycetoma grains and classifying mycetoma types from histopathological images. Five finalist teams participated, proposing various deep learning architectures that achieved high segmentation accuracy and significant performance in classifying mycetoma types.

Read abstract

Mycetoma is a neglected tropical disease caused by fungi or bacteria leading to severe tissue damage and disabilities. It affects poor and rural communities and presents medical challenges and socioeconomic burdens on patients and healthcare systems in endemic regions worldwide. Mycetoma diagnosis is a major challenge in mycetoma management, particularly in low-resource settings where expert pathologists are limited. To address this challenge, this paper presents an overview of the Mycetoma MicroImage: Detect and Classify Challenge (mAIcetoma) which was organized to advance mycetoma diagnosis through AI solutions. mAIcetoma focused on developing automated models for segmenting mycetoma grains and classifying mycetoma types from histopathological images. The challenge attracted the attention of several teams worldwide to participate and five finalist teams fulfilled the challenge objectives. The teams proposed various deep learning architectures for the ultimate goal of this challenge. Mycetoma database (MyData) was provided to participants as a standardized dataset to run the proposed models. Those models were evaluated using evaluation metrics. Results showed that all the models achieved high segmentation accuracy, emphasizing the necessitate of grain detection as a critical step in mycetoma diagnosis. In addition, the top-performing models show a significant performance in classifying mycetoma types.

ARXIV Cancer: liver cancer Method: Adaptive Quaternion Cross-Fusion Network

A-QCF-Net: An Adaptive Quaternion Cross-Fusion Network for Multimodal Liver Tumor Segmentation from Unpaired Datasets

Arunkumar V, Firos V M, Senthilkumar S, Gangadharan G R
Published 2025-12-25 18:42

This paper presents the Adaptive Quaternion Cross-Fusion Network (A-QCF-Net), designed for multimodal liver tumor segmentation using unpaired CT and MRI datasets. The model leverages Quaternion Neural Networks to create a shared feature space, facilitating knowledge transfer between imaging modalities. Validation results demonstrate significant improvements in segmentation accuracy compared to existing unimodal approaches, indicating the model's potential for clinical application.

Read abstract

Multimodal medical imaging provides complementary information that is crucial for accurate delineation of pathology, but the development of deep learning models is limited by the scarcity of large datasets in which different modalities are paired and spatially aligned. This paper addresses this fundamental limitation by proposing an Adaptive Quaternion Cross-Fusion Network (A-QCF-Net) that learns a single unified segmentation model from completely separate and unpaired CT and MRI cohorts. The architecture exploits the parameter efficiency and expressive power of Quaternion Neural Networks to construct a shared feature space. At its core is the Adaptive Quaternion Cross-Fusion (A-QCF) block, a data driven attention module that enables bidirectional knowledge transfer between the two streams. By learning to modulate the flow of information dynamically, the A-QCF block allows the network to exchange abstract modality specific expertise, such as the sharp anatomical boundary information available in CT and the subtle soft tissue contrast provided by MRI. This mutual exchange regularizes and enriches the feature representations of both streams. We validate the framework by jointly training a single model on the unpaired LiTS (CT) and ATLAS (MRI) datasets. The jointly trained model achieves Tumor Dice scores of 76.7% on CT and 78.3% on MRI, significantly exceeding the strong unimodal nnU-Net baseline by margins of 5.4% and 4.7% respectively. Furthermore, comprehensive explainability analysis using Grad-CAM and Grad-CAM++ confirms that the model correctly focuses on relevant pathological structures, ensuring the learned representations are clinically meaningful. This provides a robust and clinically viable paradigm for unlocking the large unpaired imaging archives that are common in healthcare.

ARXIV Cancer: colorectal cancer Method: gated progressive fusion network

GPF-Net: Gated Progressive Fusion Learning for Polyp Re-Identification

Suncheng Xiang, Xiaoyang Wang, Junjie Jiang, Hejia Wang, Dahong Qian
Published 2025-12-25 02:40

This paper presents GPF-Net, a Gated Progressive Fusion network designed for the task of colonoscopic polyp re-identification. The method addresses the challenge of matching polyps from different views and cameras, which is crucial for colorectal cancer diagnosis. By employing a gated progressive fusion strategy, the architecture enhances feature interactions and improves the identification of small objects. Experimental results demonstrate the effectiveness of this approach compared to existing unimodal models.

Read abstract

Colonoscopic Polyp Re-Identification aims to match the same polyp from a large gallery with images from different views taken using different cameras, which plays an important role in the prevention and treatment of colorectal cancer in computer-aided diagnosis. However, the coarse resolution of high-level features of a specific polyp often leads to inferior results for small objects where detailed information is important. To address this challenge, we propose a novel architecture, named Gated Progressive Fusion network, to selectively fuse features from multiple levels using gates in a fully connected way for polyp ReID. On the basis of it, a gated progressive fusion strategy is introduced to achieve layer-wise refinement of semantic information through multi-level feature interactions. Experiments on standard benchmarks show the benefits of the multimodal setting over state-of-the-art unimodal ReID models, especially when combined with the specialized multimodal fusion strategy.

ARXIV Cancer: general cancer Method: vision-language model

A Tool Bottleneck Framework for Clinically-Informed and Interpretable Medical Image Understanding

Christina Liu, Alan Q. Wang, Joy Hsu, Jiajun Wu, Ehsan Adeli
Published 2025-12-24 20:30

The paper presents the Tool Bottleneck Framework (TBF) aimed at enhancing medical image understanding through the use of vision-language models (VLMs). TBF utilizes a learned Tool Bottleneck Model (TBM) to select and compose tools that extract clinically relevant features from images, improving prediction accuracy and interpretability. The framework is evaluated on histopathology and dermatology tasks, demonstrating performance comparable to or better than existing deep learning classifiers and tool-use frameworks, especially in data-limited scenarios.

Read abstract

Recent tool-use frameworks powered by vision-language models (VLMs) improve image understanding by grounding model predictions with specialized tools. Broadly, these frameworks leverage VLMs and a pre-specified toolbox to decompose the prediction task into multiple tool calls (often deep learning models) which are composed to make a prediction. The dominant approach to composing tools is using text, via function calls embedded in VLM-generated code or natural language. However, these methods often perform poorly on medical image understanding, where salient information is encoded as spatially-localized features that are difficult to compose or fuse via text alone. To address this, we propose a tool-use framework for medical image understanding called the Tool Bottleneck Framework (TBF), which composes VLM-selected tools using a learned Tool Bottleneck Model (TBM). For a given image and task, TBF leverages an off-the-shelf medical VLM to select tools from a toolbox that each extract clinically-relevant features. Instead of text-based composition, these tools are composed by the TBM, which computes and fuses the tool outputs using a neural network before outputting the final prediction. We propose a simple and effective strategy for TBMs to make predictions with any arbitrary VLM tool selection. Overall, our framework not only improves tool-use in medical imaging contexts, but also yields more interpretable, clinically-grounded predictors. We evaluate TBF on tasks in histopathology and dermatology and find that these advantages enable our framework to perform on par with or better than deep learning-based classifiers, VLMs, and state-of-the-art tool-use frameworks, with particular gains in data-limited regimes. Our code is available at https://github.com/christinaliu2020/tool-bottleneck-framework.

ARXIV Cancer: general cancer Method: transformer

TICON: A Slide-Level Tile Contextualizer for Histopathology Representation Learning

Varun Belagali, Saarthak Kapse, Pierre Marza, Srijan Das, Zilinghan Li, Sofiène Boutaj, Pushpak Pati, Srikar Yellapragada, Tarak Nath Nandi, Ravi K Madduri, Joel Saltz, Prateek Prasanna, Stergios Christodoulidis, Maria Vakalopoulou, Dimitris Samaras
Published 2025-12-24 18:58

The paper presents TICON, a transformer-based model designed to enhance the contextualization of tile representations in histopathology. By addressing the limitations of standard tile encoder pipelines, TICON produces rich embeddings that incorporate both local and global slide-level information. The model is pretrained using a masked modeling objective and demonstrates significant performance improvements across various computational pathology tasks, achieving state-of-the-art results on multiple benchmarks.

Read abstract

The interpretation of small tiles in large whole slide images (WSI) often needs a larger image context. We introduce TICON, a transformer-based tile representation contextualizer that produces rich, contextualized embeddings for ''any'' application in computational pathology. Standard tile encoder-based pipelines, which extract embeddings of tiles stripped from their context, fail to model the rich slide-level information essential for both local and global tasks. Furthermore, different tile-encoders excel at different downstream tasks. Therefore, a unified model is needed to contextualize embeddings derived from ''any'' tile-level foundation model. TICON addresses this need with a single, shared encoder, pretrained using a masked modeling objective to simultaneously unify and contextualize representations from diverse tile-level pathology foundation models. Our experiments demonstrate that TICON-contextualized embeddings significantly improve performance across many different tasks, establishing new state-of-the-art results on tile-level benchmarks (i.e., HEST-Bench, THUNDER, CATCH) and slide-level benchmarks (i.e., Patho-Bench). Finally, we pretrain an aggregator on TICON to form a slide-level foundation model, using only 11K WSIs, outperforming SoTA slide-level foundation models pretrained with up to 350K WSIs.

ARXIV Cancer: acute myeloid leukemia Method: metaheuristic algorithm

Transcriptome-Conditioned Personalized De Novo Drug Generation for AML Using Metaheuristic Assembly and Target-Driven Filtering

Abdullah G. Elafifi, Basma Mamdouh, Mariam Hanafy, Muhammed Alaa Eldin, Yosef Khaled, Nesma Mohamed El-Gelany, Tarek H. M. Abou-El-Enien
Published 2025-12-24 17:39

This study addresses the challenges of Acute Myeloid Leukemia (AML) by developing a computational framework that integrates patient-specific transcriptomics with drug discovery. The framework employs Weighted Gene Co-expression Network Analysis to identify key biomarkers and utilizes a novel metaheuristic algorithm for assembling drug-like ligands. The results indicate the potential for generating personalized drug candidates with favorable binding properties, contributing to precision oncology for AML.

Read abstract

Acute Myeloid Leukemia (AML) remains a clinical challenge due to its extreme molecular heterogeneity and high relapse rates. While precision medicine has introduced mutation-specific therapies, many patients still lack effective, personalized options. This paper presents a novel, end-to-end computational framework that bridges the gap between patient-specific transcriptomics and de novo drug discovery. By analyzing bulk RNA sequencing data from the TCGA-LAML cohort, the study utilized Weighted Gene Co-expression Network Analysis (WGCNA) to prioritize 20 high-value biomarkers, including metabolic transporters like HK3 and immune-modulatory receptors such as SIGLEC9. The physical structures of these targets were modeled using AlphaFold3, and druggable hotspots were quantitatively mapped via the DOGSiteScorer engine. Then developed a novel, reaction-first evolutionary metaheuristic algorithm as well as multi-objective optimization programming that assembles novel ligands from fragment libraries, guided by spatial alignment to these identified hotspots. The generative model produced structurally unique chemical entities with a strong bias toward drug-like space, as evidenced by QED scores peaking between 0.5 and 0.7. Validation through ADMET profiling and SwissDock molecular docking identified high-confidence candidates, such as Ligand L1, which achieved a binding free energy of -6.571 kcal/mol against the A08A96 biomarker. These results demonstrate that integrating systems biology with metaheuristic molecular assembly can produce pharmacologically viable, patient tailored leads, offering a scalable blueprint for precision oncology in AML and beyond

Find the papers that actually matter