A genetic barcode of SARS-CoV-2 for monitoring global distribution of different clades during the COVID-19 pandemic.

Qingtian Guan, Mukhtar Sadykov, Sara Mfarrej, Sharif Hala, Raeece Naeem, Raushan Nugmanova, Awad Al-Omari, Samer Salih, Abbas Al Mutair, Michael J Carr, William W Hall, Stefan T. Arold, Arnab Pain

Research output: Contribution to journalArticlepeer-review

31 Scopus citations


The SARS-CoV-2 pathogen has established endemicity in humans. This necessitates the development of rapid genetic surveillance methodologies to serve as an adjunct with existing comprehensive, albeit though slower, genome sequencing-driven approaches. A total of 21,789 complete genomes were downloaded from GISAID on May 28, 2020 for analyses. We have defined the major clades and subclades of circulating SARS-CoV-2 genomes. A rapid sequencing-based genotyping protocol was developed and tested on SARS-CoV-2-positive RNA samples by next-generation sequencing. We describe 11 major mutations which defined five major clades (G614, S84, V251, I378 and D392) of globally circulating viral populations. The clades can specifically identify using an 11-nucleotide genetic barcode. An analysis of amino acid variation in SARS-CoV-2 proteins provided evidence of substitution events in the viral proteins involved in both host entry and genome replication. Globally circulating SARS-CoV-2 genomes could be classified into 5 major clades based on mutational profiles defined by an 11-nucleotide barcode. We have successfully developed a multiplexed sequencing-based, rapid genotyping protocol for high-throughput classification of major clade types of SARS-CoV-2 in clinical samples. This barcoding strategy will be required to monitor decreases in genetic diversity as treatment and vaccine approaches become widely available.

Cite this