INTEGRATION OF SINGLE-CELL RNA SEQUENCING AND MACHINE LEARNING MODELING IN THE IDENTIFICATION OF STEMNESS FEATURES IN ACUTE MYELOID LEUKEMIA PATIENTS

Perez, FAB; Malta, TM

doi:10.1016/j.htct.2025.104557

Informação do artigo

Resume

Baixar PDF

Estatísticas

Suplemento especial

Este artigo faz parte de:

Vol. 47. Núm S3

HEMO 2025 / III Simpósio Brasileiro de Citometria de Fluxo

Mais dados

Introduction

Human hematopoiesis is a dynamic process initiated during embryonic development in distinct anatomical niches, culminating in the formation of hematopoietic stem cells (HSCs), responsible for lifelong blood cell production. With aging, HSCs undergo functional decline, contributing to hematological disorders such as acute myeloid leukemia (AML). In these conditions, altered stem cells may drive disease progression and resist treatment. Advances in single- cell transcriptomics (scRNA-seq) have enabled high-resolution analysis of cellular heterogeneity in both healthy and diseased tissues. To manage the complexity of such data, machine learning (ML) algorithms have been increasingly employed to uncover gene expression patterns, differentiation states, and pathological features. These models outperform traditional marker-based methods by classifying cell types, estimating dedifferentiation levels, and detecting resistant subpopulations. In this context, algorithms such as One-Class Logistic Regression (OCLR) and decision tree-based methods have shown promise in identifying molecular stem cell signatures across biological settings. Once trained, these models can be applied to new datasets to recognize cells with similar profiles, aiding in the functional annotation of both normal and malignant samples.

Objectives

The aim of this study was to train machine learning models on public bone marrow scRNA-seq datasets to identify cells with a stemness profile, apply these models to transcriptomic data to classify samples, and evaluate their clinical impact.

Material and methods

ML models including OCLR, Random Forest, and linear-kernel Support Vector Machine (SVM), were used to train classifiers on public bone marrow scRNA-seq datasets. The models were applied using Spearman correlation on normalized and scaled raw counts from transcriptomic data of the TCGA AML cohort (n = 151) and two public cohorts with healthy samples (n = 101). A z-score was calculated as: z = sample score – mean (healthy)/ SD (healthy). Scores above 1.96 were considered indicative of high stemness. Hazard ratios were calculated using Cox proportional hazards models.

Results

All models achieved comparable performance in metrics such as AUC and accuracy, with Random Forest showing higher Area Under Precision Recall Curve (AUPRC) in external validation and statistically outperforming SVM (p = 0.0380, Nemenyi post-test). Survival analysis revealed that the Random Forest model was significantly associated with overall survival. Patients in the high-stemness group (z-score > 1.96) had a hazard ratio (HR) of 1.73 (95% CI: 1.03–2.89, Logrank p value = 0.0344) compared to the low-stemness group. The median survival was 0.75 years for the high group and 1.59 years for the low group. In contrast, no significant association was observed for the OCLR (HR = 1.02, 95% CI: 0.59–1.76, p = 0.922) or SVM (HR = 1.15, 95% CI: 0.66–2.02, p = 0.600) models.

Discussion and conclusion

In conclusion, although all models demonstrated similar discriminative performance, the Random Forest approach not only achieved superior AUPRC in external validation but also showed a significant prognostic association with overall survival. These findings suggest that, among the tested stemness scoring methods, the Random Forest–derived HSCsi model may provide greater clinical utility by integrating predictive accuracy with prognostic relevance.

Funding

This work was supported by the National Council for Scientific and Technological Development (CNPq), Brazil.

O texto completo está disponível em PDF

INTEGRATION OF SINGLE-CELL RNA SEQUENCING AND MACHINE LEARNING MODELING IN THE IDENTIFICATION OF STEMNESS FEATURES IN ACUTE MYELOID LEUKEMIA PATIENTS

Receba a nossa Newsletter