Abstract
Computational tools for integrative analyses of diverse single-cell experiments are facing formidable new challenges including dramatic increases in data scale, sample heterogeneity, and the need to informatively cross-reference new data with foundational datasets. Here, we present SCALEX, a deep-learning method that integrates single-cell data by projecting cells into a batch-invariant, common cell-embedding space in a truly online manner (i.e., without retraining the model). SCALEX substantially outperforms online iNMF and other state-of-the-art non-online integration methods on benchmark single-cell datasets of diverse modalities, (e.g., single-cell RNA sequencing, scRNA-seq, single-cell assay for transposase-accessible chromatin use sequencing, scATAC-seq), especially for datasets with partial overlaps, accurately aligning similar cell populations while retaining true biological differences. We showcase SCALEX’s advantages by constructing continuously expandable single-cell atlases for human, mouse, and COVID-19 patients, each assembled from diverse data sources and growing with every new data. The online data integration capacity and superior performance makes SCALEX particularly appropriate for large-scale single-cell applications to build upon previous scientific insights.
Original language | English (US) |
---|---|
Journal | Nature Communications |
Volume | 13 |
Issue number | 1 |
DOIs | |
State | Published - Oct 17 2022 |
Bibliographical note
KAUST Repository Item: Exported on 2022-10-19Acknowledged KAUST grant number(s): FCC/1/1976-44-01, FCC/1/1976-45-01, URF/1/4352-01-01, URF/1/4663-01-01
Acknowledgements: We thank Jianbin Wang, Jin Gu and Fuchou Tang for helpful comments and advice. This work is supported by the State Key Research Development Program of China (Grant No. 2019YFA0110002, Q.C.Z.), the National Natural Science Foundation of China (Grants No. 32125007 and 91940306, Q.C.Z.), the Beijing Advanced Innovation Center for Structural Biology, and the Tsinghua-Peking Joint Center for Life Sciences. We thank the Tsinghua University Branch of China National Center for Protein Sciences (Beijing) for computational facility support. This work is also supported by the King Abdullah University of Science and Technology (KAUST) Office of Research Administration (ORA) under Award No. FCC/1/1976-44-01, FCC/1/1976-45-01, URF/1/4352-01-01, and URF/1/4663-01-01 (X.G.).
ASJC Scopus subject areas
- General Biochemistry, Genetics and Molecular Biology
- General Chemistry
- General Physics and Astronomy