Ending the HIV epidemic requires innovative use of data for intelligent decision-making from surveillance through treatment. This study sought to examine the usefulness of using linked integrated PLWH health data to predict PLWH's future HIV care status and compare the performance of machine-learning methods for predicting future HIV care status for SC PLWH.
We employed supervised machine learning for its ability to predict PLWH's future care status by synthesizing and learning from PLWH's existing health data. This method is appropriate for the nature of integrated PLWH data because of its high volume and dimensionality.
A data set of 8888 distinct PLWH's health records were retrieved from an integrated PLWH data repository. We experimented and scored seven representative machine-learning models including Bayesian Network, Automated Neural Network, Support Vector Machine, Logistic Regression, LASSO, Decision Trees and Random Forest to best predict PLWH's care status. We further identified principal factors that can predict the retention-in-care based on the champion model.
Bayesian Network (F = 0.87, AUC = 0.94, precision = 0.87, recall = 0.86) was the best predictive model, followed by Random Forest (F = 0.78, AUC = 0.81, precision = 0.72, recall = 0.85), Decision Tree (F = 0.76, AUC = 0.75, precision = 0.70, recall = 0.82) and Neural Network (cluster) (F = 0.75, AUC = 0.71, precision = 0.69, recall = 0.81).
These algorithmic applications of Bayesian Networks and other machine-learning algorithms hold promise for predicting future HIV care status at the individual level. Prediction of future care patterns for SC PLWH can help optimize health service resources for effective interventions. Predictions can also help improve retention across the HIV continuum.