Machine learning can predict aggressive Multiple Sclerosis using non-invasive initially collected data at the time of diagnosis

  • Salma Mohamed Abdelhamid Mohamed Aly, Assistant lecturer Community Medicine Department, Faculty of Medicine, Alexandria University
  • Samia Abdel Aziz Abou Khatwa, Professor Community Medicine Department, Faculty of Medicine, Alexandria University
  • John David Osborne, Assistant professor Informatics Institute (General Internal Medicine), School of Medicine, University of Alabama at Birmingham
  • Ismael Ramadan Aly Ismael, Professor Neuropsychiatry Department, Faculty of Medicine, Alexandria University
  • Nadia Fouad Farghaly, Professor Community Medicine Department, Faculty of Medicine, Alexandria University
  • Mohamed Hamed Issa, Intern Faculty of Medicine, Alexandria University
  • Mariam Zakareia Al-mokaddem, Undergraduate student Faculty of medicine, Alexandria University
Keywords: Multiple Sclerosis, Expanded Disability Status Scale, Machine Learning, XGBoost, SHAP method


Background: Multiple sclerosis (MS) is a central nervous system (CNS) disorder characterized by inflammation, demyelination, and neurodegeneration. It is the most common cause of non-traumatic neurological disability in young adults. The course of the disease varies between individuals: some patients accumulate minimal disability over their lives, whereas others experience a rapidly disabling course. This latter subset of patients is often referred to as having ‘aggressive’ MS. Early intervention might protect patients from irreversible damage and disability. The study objective is to assess a variety of machine learning (ML) models for predicting the disease course.

Material & Method: A retrospective study was conducted on patients from the Neuropsychiatry MS clinic at Alexandria University. Patients were classified as aggressive or mild MS based on their Expanded Disability Status Scale (EDSS) after 5 years from the onset of the disease. Six ML classification models (XGboost, Support vector machine (SVM), Random Forest (RF), Logistic regression, Decision tree (DT), and Naïve Bayes (NB)) were assessed to predict disability progression till the end of the first year from the onset of the disease using recursive feature elimination (RFE) and the SHAP method was used to interpret predictions. Demographic, health history, and clinical features were used as predictors.

Results: The XGBoost classification model was the best performing model with an accuracy of 88.5% and F1 score of 88.9%.  Top SHAP predictors were duration of stability without treatment, BMI, and a number of relapses within the first year from the onset of treatment.

Conclusion: Our independently generated data set and results support previous preliminary work that XGBoost can predict multiple sclerosis progression. Our XGBoost implementation could rule out the aggressive course outcome 94.9% of the time by using only non-invasive features that are routinely collected earliest in the patients’ contact with the clinic. This suggests a lower practical burden for the clinical implementation of ML progression algorithms. Finally, our results suggest allergic rhinitis may be a risk factor for the aggressive progression of MS.