TY - JOUR
T1 - A deep learning model predicts the presence of diverse cancer types using circulating tumor cells
AU - Albaradei, Somayah
AU - Alganmi, Nofe
AU - Albaradie, Abdulrahman
AU - Alharbi, Eaman
AU - Motwalli, Olaa
AU - Thafar, Maha A.
AU - Gojobori, Takashi
AU - Essack, Magbubah
AU - Gao, Xin
N1 - Publisher Copyright:
© 2023, The Author(s).
PY - 2023/12
Y1 - 2023/12
N2 - Circulating tumor cells (CTCs) are cancer cells that detach from the primary tumor and intravasate into the bloodstream. Thus, non-invasive liquid biopsies are being used to analyze CTC-expressed genes to identify potential cancer biomarkers. In this regard, several studies have used gene expression changes in blood to predict the presence of CTC and, consequently, cancer. However, the CTC mRNA data has not been used to develop a generic approach that indicates the presence of multiple cancer types. In this study, we developed such a generic approach. Briefly, we designed two computational workflows, one using the raw mRNA data and deep learning (DL) and the other exploiting five hub gene ranking algorithms (Degree, Maximum Neighborhood Component, Betweenness Centrality, Closeness Centrality, and Stress Centrality) with machine learning (ML). Both workflows aim to determine the top genes that best distinguish cancer types based on the CTC mRNA data. We demonstrate that our automated, robust DL framework (DNNraw) more accurately indicates the presence of multiple cancer types using the CTC gene expression data than multiple ML approaches. The DL approach achieved average precision of 0.9652, recall of 0.9640, f1-score of 0.9638 and overall accuracy of 0.9640. Furthermore, since we designed multiple approaches, we also provide a bioinformatics analysis of the gene commonly identified as top-ranked by the different methods. To our knowledge, this is the first study wherein a generic approach has been developed to predict the presence of multiple cancer types using raw CTC mRNA data, as opposed to other models that require a feature selection step.
AB - Circulating tumor cells (CTCs) are cancer cells that detach from the primary tumor and intravasate into the bloodstream. Thus, non-invasive liquid biopsies are being used to analyze CTC-expressed genes to identify potential cancer biomarkers. In this regard, several studies have used gene expression changes in blood to predict the presence of CTC and, consequently, cancer. However, the CTC mRNA data has not been used to develop a generic approach that indicates the presence of multiple cancer types. In this study, we developed such a generic approach. Briefly, we designed two computational workflows, one using the raw mRNA data and deep learning (DL) and the other exploiting five hub gene ranking algorithms (Degree, Maximum Neighborhood Component, Betweenness Centrality, Closeness Centrality, and Stress Centrality) with machine learning (ML). Both workflows aim to determine the top genes that best distinguish cancer types based on the CTC mRNA data. We demonstrate that our automated, robust DL framework (DNNraw) more accurately indicates the presence of multiple cancer types using the CTC gene expression data than multiple ML approaches. The DL approach achieved average precision of 0.9652, recall of 0.9640, f1-score of 0.9638 and overall accuracy of 0.9640. Furthermore, since we designed multiple approaches, we also provide a bioinformatics analysis of the gene commonly identified as top-ranked by the different methods. To our knowledge, this is the first study wherein a generic approach has been developed to predict the presence of multiple cancer types using raw CTC mRNA data, as opposed to other models that require a feature selection step.
UR - http://www.scopus.com/inward/record.url?scp=85178200562&partnerID=8YFLogxK
U2 - 10.1038/s41598-023-47805-2
DO - 10.1038/s41598-023-47805-2
M3 - Article
C2 - 38036622
AN - SCOPUS:85178200562
SN - 2045-2322
VL - 13
JO - Scientific Reports
JF - Scientific Reports
IS - 1
M1 - 21114
ER -