- Published
- 07 September 2022
- Journal article
Time Series Analysis of SARS-CoV-2 Genomes and Correlations among Highly Prevalent Mutations
- Authors
- Source
- Microbiology Spectrum
Abstract
The efforts of the scientific community to tame the recent pandemic caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) seem to have been diluted by the emergence of new viral strains. Therefore, it is imperative to understand the effect of mutations on viral evolution. We performed a time series analysis on 59,541 SARS-CoV-2 genomic sequences from around the world to gain insights into the kinetics of the mutations arising in the viral genomes. These 59,541 genomes were grouped according to month (January 2020 to March 2021) based on the collection date. Meta-analysis of these data led us to identify significant mutations in viral genomes. Pearson correlation of these mutations led us to the identification of 16 comutations. Among these comutations, some of the individual mutations have been shown to contribute to viral replication and fitness, suggesting a possible role of other unexplored mutations in viral evolution. We observed that the mutations 241C.T in the 59 untranslated region (UTR), 3037C.T in nsp3, 14408C.T in the RNA-dependent RNA polymerase (RdRp), and 23403A.G in spike are correlated with each other and were grouped in a single cluster by hierarchical clustering. These mutations have replaced the wild-type nucleotides in SARS-CoV-2 sequences. Additionally, we employed a suite of computational tools to investigate the effects of T85I (1059C.T), P323L (14408C.T), and Q57H (25563G.T) mutations in nsp2, RdRp, and the ORF3a protein of SARS-CoV-2, respectively. We observed that the mutations T85I and Q57H tend to be deleterious and destabilize the respective wild-type protein, whereas P323L in RdRp tends to be neutral and has a stabilizing effect. IMPORTANCE We performed a meta-analysis on SARS-CoV-2 genomes categorized by collection month and identified several significant mutations. Pearson correlation analysis of these significant mutations identified 16 comutations having absolute correlation coefficients of .0.4 and a frequency of .30% in the genomes used in this study. The correlation results were further validated by another statistical tool called hierarchical clustering, where mutations were grouped in clusters on the basis of their similarity. We identified several positive and negative correlations among comutations in SARS-CoV-2 isolates from around the world which might contribute to viral pathogenesis. The negative correlations among some of the mutations in SARS-CoV-2 identified in this study warrant further investigations. Further analysis of mutations such as T85I in nsp2 and Q57H in ORF3a protein revealed that these mutations tend to destabilize the protein relative to the wild type, whereas P323L in RdRp is neutral and has a stabilizing effect. Thus, we have identified several comutations which can be further characterized to gain insights into SARS-CoV-2 evolution.
Cite as
Periwal, N., Rathod, S., Sarma, S., Johar, G., Jain, A., Barnwal, R., Srivastava, K., Kaur, B., Arora, P. & Sood, V. 2022, 'Time Series Analysis of SARS-CoV-2 Genomes and Correlations among Highly Prevalent Mutations', Microbiology Spectrum, 10(5), article no: e01219-22. https://doi.org/10.1128/spectrum.01219-22