Chinese scientists have analyzed millions of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genomes and identified two main genetic drivers of the ongoing coronavirus disease 2019 (COVID-19) pandemic.
The study is currently available on the bioRxiv* preprint server.
Since the start of the pandemic in China in December 2019, the COVID-19 pandemic has caused over 642 million infections and over 6.6 million deaths worldwide. Even after three years of the pandemic, the origin of SARS-CoV-2, the pathogen responsible for COVID-19, remains unclear. To prevent the emergence of future coronavirus epidemics, it is important to know the dynamics of viral transmission from its natural reservoir to humans.
The first SARS-CoV-2 genomes were classified into two main lineages, S and L. While the L lineage constituted 70% of the genomes sequenced, the S lineage was associated with 30% of the genomes sequenced during the early phase of the pandemic.
Evolutionary studies have suggested that the S lineage is more related to coronaviruses in animals. However, due to inadequate genomic sequences at the start of the pandemic phase, the exact origin of SARS-CoV-2 could not be detected.
In the current study, scientists analyzed more than 3 million SARS-CoV-2 genomes from the publicly available viral genome database and used two gene loci to identify the haplotype of the virus. most recent common ancestor (MRCA) and determine the proximal geographic origin of the virus.
Two gene loci used in the analysis were amino acid 614 of spike protein (S_614) and amino acid 48 of non-structural protein 8 (NSP8_48).
Analysis of a total of 3.14 million viral genomes using two gene loci identified seven S_614 alleles, six NSP8_48 alleles and 16 linkage haplotypes. A haplotype is a group of genes within an organism that have been inherited together from a single parent.
The GL haplotype, S_614G and NSP8_48L, constituted approximately 99% of the sequenced genomes. It was the main haplotype causing the pandemic worldwide. In contrast, the DL haplotype, defined as S_614D and NS8_48L, constituted about 60% of the genomes in China and only 0.45% of the global genomes. This haplotype was the main driver of the pandemic in China in March 2020.
In addition, the GS (S_614G and NS8_48S), DS (S_614D and NS8_48S) and NS (S_614N and NS8_48S) haplotypes accounted for 0.26%, 0.06% and 0.0067% of the sequenced genomes, respectively.
The main evolutionary trajectory of SARS-CoV-2 was determined as DS→DL→GL. The other haplotypes were minor evolutionary byproducts.
An interesting finding from the study was that the most recent GL haplotype was associated with the oldest hour of MRCA (May 2019). In contrast, the oldest DS haplotype had the most recent MRCA time (October 2019). This observation indicates that the ancestral strains of SARS-CoV-2 that created the GL haplotype died out and were replaced by the newer strains more adapted to their place of origin.
The new strains appeared, evolved into toxic strains and triggered an epidemic in China where GL strains were not predefined at the end of 2019. GL strains began to spread around the world even before their discovery and have triggered the global pandemic. However, their existence could not be detected until the declaration of the pandemic in China.
Only a slight impact of the GL haplotype was observed in China during the early phase of the pandemic due to its late emergence and strict control measures.
Significance of the study
The study describes the evolutionary trajectory of SARS-CoV-2, which suggests two major beginnings of the COVID-19 pandemic. The DL haplotype in China led one start, and the GL haplotype led the other in the rest of the world.
The study could not detect the place of origin of the GL haplotype because this haplotype was already circulating in the world before the declaration of the pandemic. However, a recent study has suggested that the GL haplotype may have emerged in Europe.
bioRxiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be considered conclusive, guide clinical practice/health-related behaviors, or treated as established information.