We’re currently experiencing technical issues with the PHS website. Our team is investigating and working to restore normal service. Thank you for your patience.

Abstract

Due to the continued evolution of the SARS-CoV-2 pandemic, researchers worldwide are working to mitigate, suppress its spread, and better understand it by deploying digital signal processing (DSP) and machine learning approaches. This study presents an alignment-free approach to classify the SARS-CoV-2 using complementary DNA, which is DNA synthesized from the single-stranded RNA virus. Herein, a total of 1582 samples, with different lengths of genome sequences from different regions, were collected from various data sources and divided into a SARS-CoV-2 and a non-SARS-CoV-2 group. We extracted eight biomarkers based on three-base periodicity, using DSP techniques, and ranked those based on a filter-based feature selection. The ranked biomarkers were fed into k-nearest neighbor, support vector machines, decision trees, and random forest classifiers for the classification of SARS-CoV-2 from other coronaviruses. The training dataset was used to test the performance of the classifiers based on accuracy and F-measure via 10-fold cross-validation. Kappa-scores were estimated to check the influence of unbalanced data. Further, 10 × 10 cross-validation paired t-test was utilized to test the best model with unseen data. Random forest was elected as the best model, differentiating the SARS-CoV-2 coronavirus from other coronaviruses and a control a group with an accuracy of 97.4 %, sensitivity of 96.2 %, and specificity of 98.2 %, when tested with unseen samples. Moreover, the proposed algorithm was computationally efficient, taking only 0.31 s to compute the genome biomarkers, outperforming previous studies.

Rights

© 2021 Elsevier Ltd. All rights reserved.

Cite as

Singh, O., Vallejo, M., El-Badawy, I., Aysha, A., Madhanagopal, J. & Mohd Faudzi, A. 2021, 'Classification of SARS-CoV-2 and Non-SARS-CoV-2 Using Machine Learning Algorithms', Computers in Biology and Medicine, 136, article no: 104650. https://doi.org/10.1016/j.compbiomed.2021.104650

Downloadable citations

Download HTML citationHTML Download BIB citationBIB Download RIS citationRIS
Last updated: 16 June 2022