COVID-19 Mortality Prediction among Patients Using Epidemiological Parameters: An Ensemble Machine Learning Approach

Krishnaraj Chadaga,1

Srikanth Prabhu,1*Email

Shashikiran Umakanth,2

Vivekananda Bhat K,1

Niranjana Sampathila,3

Rajagopala Chadaga P4 

Krishna Prakasha K5

1Department of Computer Science and Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, Karnataka, 576104, India.

2Department of Medicine, Dr. TMA Pai Hospital, Manipal Academy of Higher Education, Manipal, Karnataka 576104, India.

3Department of Biomedical Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, Karnataka 576104, India.

4Department of Mechanical and Manufacturing Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education. Manipal, Karnataka 576104, India.

5Department of Information and Communication Technology, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, Karnataka 576104, India.

Abstract

Coronavirus infection (COVID-19) is a dangerous disease caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) that has quickly spread all around the world, becoming a global pandemic on 11th March 2020. Vaccines have been developed to prevent the spread of this disease and various researches are being conducted to find the cure too. Machine learning (ML) has shown to be useful in battling COVID-19 and various applications have been deployed to comprehend real-world events through the meticulous analysis of data. In this study, we perform a retrospective study of epidemiological parameters to predict the mortality among SARS-CoV-2 patients. The goal of this research is to find important predictive parameters that can indicate the patients who are at the highest risk of death. Supervised ensemble machine learning models were developed that included random forest, catboost, adaboost, gradient boost, extreme gradient boosting and light GBM (Gradient Boosting Machine) for the COVID-19 epidemiology dataset that was obtained from Mexico. Prior to creating the models, Pearson’s co-relation and mutual information analysis between various dependent and independent features were used to establish the strength of the association between features in the dataset. Extreme Gradient Boosting achieved the highest results with an accuracy of 96%.

COVID-19 Mortality Prediction among Patients Using Epidemiological Parameters: An Ensemble Machine Learning Approach