Online single-cell data integration through projecting heterogeneous datasets into a common cell-embedding space

Lei Xiong, Kang Tian, Yuzhe Li, Weixi Ning, Xin Gao, Qiangfeng Cliff Zhang

Research output: Contribution to journalArticlepeer-review

30 Scopus citations

Abstract

Computational tools for integrative analyses of diverse single-cell experiments are facing formidable new challenges including dramatic increases in data scale, sample heterogeneity, and the need to informatively cross-reference new data with foundational datasets. Here, we present SCALEX, a deep-learning method that integrates single-cell data by projecting cells into a batch-invariant, common cell-embedding space in a truly online manner (i.e., without retraining the model). SCALEX substantially outperforms online iNMF and other state-of-the-art non-online integration methods on benchmark single-cell datasets of diverse modalities, (e.g., single-cell RNA sequencing, scRNA-seq, single-cell assay for transposase-accessible chromatin use sequencing, scATAC-seq), especially for datasets with partial overlaps, accurately aligning similar cell populations while retaining true biological differences. We showcase SCALEX’s advantages by constructing continuously expandable single-cell atlases for human, mouse, and COVID-19 patients, each assembled from diverse data sources and growing with every new data. The online data integration capacity and superior performance makes SCALEX particularly appropriate for large-scale single-cell applications to build upon previous scientific insights.
Original languageEnglish (US)
JournalNature Communications
Volume13
Issue number1
DOIs
StatePublished - Oct 17 2022

Bibliographical note

KAUST Repository Item: Exported on 2022-10-19
Acknowledged KAUST grant number(s): FCC/1/1976-44-01, FCC/1/1976-45-01, URF/1/4352-01-01, URF/1/4663-01-01
Acknowledgements: We thank Jianbin Wang, Jin Gu and Fuchou Tang for helpful comments and advice. This work is supported by the State Key Research Development Program of China (Grant No. 2019YFA0110002, Q.C.Z.), the National Natural Science Foundation of China (Grants No. 32125007 and 91940306, Q.C.Z.), the Beijing Advanced Innovation Center for Structural Biology, and the Tsinghua-Peking Joint Center for Life Sciences. We thank the Tsinghua University Branch of China National Center for Protein Sciences (Beijing) for computational facility support. This work is also supported by the King Abdullah University of Science and Technology (KAUST) Office of Research Administration (ORA) under Award No. FCC/1/1976-44-01, FCC/1/1976-45-01, URF/1/4352-01-01, and URF/1/4663-01-01 (X.G.).

ASJC Scopus subject areas

  • General Biochemistry, Genetics and Molecular Biology
  • General Chemistry
  • General Physics and Astronomy

Fingerprint

Dive into the research topics of 'Online single-cell data integration through projecting heterogeneous datasets into a common cell-embedding space'. Together they form a unique fingerprint.

Cite this