Log in to save searches and build a personal reading queue.
Find the papers that actually matter
Search by concept, cancer type, source, or modeling approach. Every result is presented in a cleaner, review-friendly layout with summaries and direct access to the abstract.
Automated O-RADS Risk Stratification Using a Large Language Model Analysis of Narrative Ultrasound Reports.
Read abstract
The Ovarian-Adnexal Reporting and Data System (O-RADS) is essential for standardizing the risk stratification of ovarian lesions detected on ultrasound. However, manual assignment of O-RADS scores is time-consuming and can vary between observers. This study investigates an automated method for O-RADS scoring using a large language model (LLM) to analyze narrative ultrasound reports. A two-stage pipeline was developed for automated O-RADS classification. Initially, the Lingshu LLM, specialized in medical language, extracted and embedded features from free-text descriptions of ovarian lesions. It identified key diagnostic features mentioned by sonologists. Subsequently, these features were used to train and evaluate several machine learning algorithms, including logistic regression (LR), support vector machines and random forests, to predict O-RADS scores (1-5). The proposed method was evaluated on a dataset of 513 cases using fivefold cross-validation. The pipeline using Lingshu model embeddings with LR achieved the highest accuracy of 0.803 [95% CI: 0.753, 0.853], a weighted-average F1-score of 0.819 [95% CI: 0.777, 0.861] and a macro-averaged AUROC of 0.948 [95% CI: 0.937, 0.959]. This outperformed the MedGemma model's pipeline, which had an accuracy of 0.760 [95% CI: 0.700, 0.820], F1-score of 0.787 [95% CI: 0.739, 0.835] and AUROC of 0.941 [95% CI: 0.911, 0.971]. This study introduces a novel approach to automate O-RADS scoring using LLMs for feature extraction and traditional machine learning for classification. The results indicate that this method can accurately stratify ovarian cancer risk, potentially improving clinical workflow efficiency and reducing diagnostic variability. This approach may support radiologists in making more consistent and timely assessments.
Construction of an interpretable multimodal image model for differentiating T1-stage nasopharyngeal carcinoma from benign hyperplasia.
Read abstract
Differentiating T1-stage nasopharyngeal carcinoma (NPC) from benign hyperplasia (BH) is challenging. This study aims to construct and validate a multimodal model combining magnetic resonance imaging (MRI) and endoscopy to distinguish T1-NPC from BH. Additionally, SHapley Additive exPlanations (SHAP) are used for model interpretability analysis. Data from 161 patients with histologically confirmed diagnoses between 2015 and 2022 were retrospectively collected, including 95 cases of T1-NPC and 66 cases of BH. Regions of interest (ROI) were drawn based on MRI and endoscopy to extract features. Feature selection techniques, such as elastic net, recursive feature elimination, and deep learning, were used to identify the optimal feature subset. Naive Bayes, Adaptive Boosting (AdaBoost), Light Gradient Boosting Machine (LightGBM), k-Nearest Neighbors (kNN), and Multilayer Perceptron (MLP) were applied to establish the MRI radiomics model and the MRI-endoscopy combined model. SHAP was used to perform interpretability analysis of the models. The MRI-endoscopy combined model outperformed the radiomics model, with the MLP-based model showing the best performance. The mean AUC of the test set reached 0.98, with an accuracy of 0.90, precision of 0.90, sensitivity of 0.93, and specificity of 0.86. SHAP analysis revealed that texture features (including GLSZM, GLCM, and GLRLM) and first-order features were critical for distinguishing T1-NPC from BH. Compared to traditional radiomics methods, the multimodal model combining MRI and endoscopy more accurately distinguishes between benign and malignant tissues. SHAP enables visualization of feature contributions and model predictions, highlighting the model's clinical potential.
Artificial intelligence-assisted three-dimensional imaging of breast microinvasive carcinoma reveals larger invasive focus size in a substantial proportion of cases.
Read abstract
Microinvasive carcinoma of breast is a unique type of malignancy characterized by the presence of small invasive foci (less than 1 mm in diameter) in a background of carcinoma in situ. The disease is the earliest stage of breast invasive carcinoma development, and patients diagnosed with this disease are often treated conservatively. However, diagnosing microinvasive carcinoma based on a single tissue section may underestimate the invasive focus size. We developed a three-dimensional (3D) imaging method to re-evaluate the invasive focus size in microinvasive carcinoma cases in which the original reported focus size was close to 1 mm. The 3D images were annotated and used to developed an artificial intelligence (AI) program based on the HRNetV2 architecture to assist in the annotation of future cases. We found that in 8 of 11 cases (72.7 %), the foci sizes are greater than 1 mm when the specimens are analyzed in 3D space, resulting in re-classification of the cases as T1a invasive carcinoma. Notably, in one of the reclassified cases, isolated tumor cells were identified in the sentinel lymph node biopsy. Our findings challenge the robustness of the microinvasion concept and indicate that AI-assisted 3D imaging is a valuable tool in precision diagnosis of microinvasive carcinoma.
Lung cancer as a global health challenge: Multidimensional biomarker research and therapeutic advances.
Read abstract
Lung cancer, the leading cause of global cancer-related mortality, is categorized into small-cell and non-small-cell subtypes. The heterogeneous non-small-cell lung cancer group is further subcategorized primarily into adenocarcinoma, squamous cell carcinoma, and large cell carcinoma, each underpinned by distinct molecular alterations. Although traditional serum biomarkers aid in subtype differentiation and treatment monitoring, their utility is limited by challenges such as poor specificity due to inflammatory confounders and the difficulty of dynamically tracking therapeutic resistance. Recent advances have identified emergent subtype-specific biomarkers that reflect metabolic reprogramming, epigenetic dysregulation, stemness signatures, and interactions within the immune microenvironment. By integrating analytes such as ctDNA, exosomal RNAs, and urinary DNA with multi-analyte panels and advanced imaging, liquid biopsies offer a promising avenue to enhance early detection accuracy, prognostication, and dynamic therapy monitoring. Nevertheless, the clinical adoption is hindered by several challenges, including incomplete validation, the need for technical standardization, intratumoral heterogeneity, and inter-ethnic variability. The convergence of artificial intelligence (AI)-enhanced multi-omics with biomarker-guided therapeutics represents a transformative strategy with the potential to overcome resistance, mitigate ethnic disparities, and ultimately transform lung cancer into a chronic, manageable disease. Therefore, prioritizing clinically validated AI-integrated platforms is pivotal to achieve precision oncology.
Peptide-drug conjugates bearing an antimitotic Ahx-DA1 payload achieve potent antitumor activity in Her2-amplified and EGFR-positive KRAS-mutant cancers in vivo.
Read abstract
Peptide-drug conjugates (PDCs) represent a targeted cancer therapy strategy that combines tumor-homing peptides with potent cytotoxic payloads, offering a promising alternative to antibody-drug conjugates (ADCs) through improved tissue penetration, synthetic accessibility, and tumor selectivity. Auristatins (MMAE, MMAF, etc.), which are synthetic analogues of antimitotic dolastatin 10 (Dol-10), are widely used as ADC payloads; however, their systematic evaluation in PDC formats remains limited. In this study, we investigated Ahx-DA1, an enzymatically stable derivative of microtubule inhibitor DA1, a previously reported dolastatin-10 analogue, as a payload for PDCs. Two receptor-specific peptides, HER2-targeting peptide A9 and EGFR-binding peptide P6, were conjugated to a Ahx-DA1 and evaluated in the HER2-overexpressing breast cancer BT-474 model and the EGFR-overexpressing KRAS-mutated colorectal (HCT116) and pancreatic (PANC1) models, respectively. A cell-based study of DA1-bearing PDCs revealed specific and potent cytotoxicity in cancer cell lines, with the corresponding overexpressed receptors demonstrating high target specificity. The DA1-based PDCs exhibited high stability and favorable tolerability profiles across all the tested xenograft models. In vivo studies demonstrated pronounced tumor growth inhibition by A9-DA1 in HER2+ xenograft and P6-DA1 in EGFR+ KRAS mutated colorectal and pancreatic xenograft models. Overall, our findings suggest that Ahx-DA1 is a highly effective auristatin-class payload for the development of DA1 based anticancer PDCs.
Breast area affects the performance of a commercial artificial intelligence algorithm assessment of negative digital breast tomosynthesis exams.
Read abstract
To understand whether cancer-neutral image attributes (breast area and number of slices) impact an AI algorithm assessment of negative digital breast tomosynthesis (DBT) screening exams. This retrospective cohort study included women from a single institution whose screening mammogram was interpreted as negative between 2016 and 2019. All patients had at least 2 years follow-up without evidence of malignancy. Primary outcome measures were AI-calculated assessment of present and future likelihood of malignancy, quantified as a case and risk score. A multivariable linear regression model evaluated the relationship between patient demographics (age, race/ethnicity), image size (breast area, number of slices), and AI algorithm outputs (breast density, case score, risk score). There were 4842 female patients included in the study (mean age 55.0 ± 10.6 years). For case score, there was a positive association with breast area (p < 0.0001), as well as older age, breast density (scattered vs fatty), and race (White vs Asian and Black vs White, all p < 0.05). For risk score, there was also a positive association with breast area (p < 0.001), as well as older age, breast density (scattered vs fatty, heterogeneously dense vs scattered, extremely dense vs heterogeneously dense), and race (White vs Asian, all p < 0.05). Number of DBT slices was not significantly associated with either case or risk scores. Known breast cancer risk factors and one neutral characteristic (breast area), significantly impacted an AI algorithm's assessment of present and future likelihood of malignancy.
The transformative role of single-cell analysis in multifactorial disorders research.
Read abstract
Multifactorial inherited disorders (MIDs) arise from complex interactions between polygenic risk and environmental exposures, presenting major challenges for mechanistic discovery, patient stratification, and targeted therapy development. While traditional approaches like genome-wide association studies (GWAS) and bulk omics profiling have identified broad associations, they often struggle to resolve the cellular context in which these interactions drive pathogenesis.Emergingsingle-cell technologies now offer unprecedented resolution to dissect tissue heterogeneity, define rare or transient disease-relevant cell states, and map dynamic trajectories across tissues and disease stages. This reviewprovides a comprehensive synthesis ofcurrent single-cell methodologies including transcriptomic, epigenomic, proteomic, and spatial techniques and their application to MID research. We explore how these toolsare revealingcell-type-specific regulatory circuits,contextualizingthe functional impact of inherited risk variants, andelucidatingcellular responses to environmental perturbations.We propose thatintegrating single-cell multi-omics data is critical for illuminating the mechanistic basis of complex traits and for advancing biomarker discovery. However, significant challenges remain, including technical variability, limited cohort scalability, difficulties in multi-modal data integration, and a lack of standardized analytical workflows for polygenic diseases. Overcoming these barriers will require harmonized study designs, robust computational frameworks, and the incorporation of longitudinal and environmental exposure data.Ultimately, we conclude thatsingle-cell analysis is poised to transform MID research, offering a powerful new paradigm for mechanistic insight, therapeutic innovation, and the realization of precision medicine.
Actinic keratosis staging in multimodal image data.
Read abstract
Actinic Keratosis (AK) is a common skin condition, usually appearing on sun-exposed areas, whose progression is associated with characteristic dermatoscopic and structural changes. Early detection of AK is crucial, as cancer progression may occur in changed skin. This study aimed to develop a multimodal, machine-learning-based framework combining dermatoscopic and high-frequency ultrasound (HFUS) data to automatically stage AK and identify early lesions. A dataset containing 222 pairs of dermatoscopic and HFUS images was clinically evaluated using the 3-point Zalaudek scale. Dermatoscopic images underwent ROI selection, hair removal, and extensive feature extraction (color, erythema, pigmentation, vessels, scales, pixel intensities, GLCM/LBP texture). HFUS images were divided into entry echo, sub-epidermal low-echoic band (SLEB), and dermis using a deep neural network, and then features describing the morphology and structure of the skin for each layer were extracted. A pre-trained EfficientNet network was used for feature extraction. Logistic Regression, k-Nearest Neighbors, Random Forests, Support Vector Machines and Multilayer Perceptrons with Sequential Feature Selection using 5-fold patient-wise cross-validation were used for feature-based classification. Additionally, multimodal TwinCNN was evaluated, with various pre-trained models as feature extractors. Combining dermatoscopic and HFUS features consistently outperformed single-modality models. Depending on the defined task, the models achieved over 80% accuracy (healthy, AK1-AK3), 78% (AK1-AK3), and almost 90% in the case of early AK detection vs. healthy and advanced AK on multimodal features. The TwinCNN model performed worse than classical machine-learning approaches, likely due to the limited size of the dataset and class imbalance. A multimodal framework integrating dermatoscopic and HFUS imaging enables accurate AK classification, surpassing single-modality approaches. Future work should expand multicenter datasets, improve automation of pre-processing steps, and explore enhanced neural multimodal fusion architectures.
Artificial intelligence applications in OCT and OCTA for diabetic retinopathy: A systematic review.
Read abstract
Diabetic retinopathy (DR) is a leading cause of vision impairment worldwide. Optical coherence tomography (OCT) and OCT angiography (OCTA) provide detailed retinal imaging, enabling early detection of microvascular changes. This study aims to systematically review artificial intelligence (AI), particularly deep learning (DL), applications for DR detection and analysis using OCT and OCTA images. A comprehensive literature search was conducted across PubMed, Web of Science, Scopus, IEEE Xplore, and Embase for studies published up to March 2026. A total of 1007 articles were identified, of which 175 studies met the inclusion criteria following the PRISMA study selection process. DL-based approaches consistently demonstrated superior performance compared to traditional machine learning (ML) methods, with reported AUC values typically ranging from 0.90 to 0.99 across classification and segmentation tasks. Convolutional neural networks (CNNs), Vision Transformers (ViTs), and encoder-decoder architectures such as U-Net showed strong performance in detecting key DR biomarkers, including microaneurysms, macular edema, and neovascularization. However, performance variability was observed depending on dataset size, imaging modality, and annotation quality. AI-driven analysis of OCT and OCTA images offers significant potential for automated DR detection. Despite promising results, challenges such as limited public datasets, lack of cross-institutional validation, and model interpretability remain. Future research should focus on multimodal integration, explainable AI, and large-scale validation to enhance clinical applicability.
Large Language Models for the Differentiation of Benign and Malignant Liver Nodules based on Multimodal Prompts in Liver US Cases.
Read abstract
Large language models (LLMs) that can process both images and text are increasingly being used in radiology. This study aimed to evaluate the performance of LLMs including GPT-4 Omni (GPT-4o), Claude-3.5-Sonnet (Claude), and Gemini 1.5 Pro (Gemini) in differentiating benign and malignant nodules in liver US cases and compare it with that of human readers. Four hundred liver US cases with pathologically confirmed liver nodules visible on B-mode US from January 2020 to November 2024 were randomly selected in this retrospective study. They were divided into a development set (n = 100) and a test set (n = 300). Five prompt groups for LLMs including US image [I-only], image description [D-only], image and description [I+D], image and liver US e-textbook [I+T], and image and medical history [I+H] were evaluated to identify the optimal input in development set. In test set, accuracy of LLMs in differentiating benign and malignant liver nodules was compared with that of human readers using McNemar's test. In development set, the prompt group I+H for all LLMs exhibited the highest diagnostic accuracy in differentiating benign and malignant liver nodules, being considering as the optimal input (taking GPT-4o as an example, with I-only, 57.0% [as reference]; D-only, 62.0%, p = 0.55; I+D, 62.0%, p = 0.54; I+T, 62.0%, p = 0.36; I+H, 77.0%, p = 0.01). In test set, LLMs with I+H outperformed junior group and showed similar accuracy to senior group (Junior, 70.0% [as reference1]; Senior, 78.3% [as reference2]; GPT-4o, 83.3%, P1 < .001, P2 = .10; Claude, 77.0%, p1 = 0.04, p2 = 0.72; Gemini, 75.3%, p1 = 0.14, p2 = 0.36). Large language models with US image and medical history inputs achieved accuracy comparable to senior radiologists and superior to junior radiologists in differentiating benign and malignant liver nodules.