During the last decade, more attention has been paid on the development of risk scores in primary prevention, while patients with established cardiovascular disease are usually categorized into a clinically high risk population without further stratification. Even those high risk patients show heterogeneity in their individual risk. Therefore further risk stratification might identify those with would benefit from specific risk reduction strategies the most. We have developed the first long-term risk prediction model of cardiovascular mortality in patients with established coronary heart disease and in patients with an experienced myocardial infarction based on newly available machine learning techniques.
2879 patients from the LURIC study who have presented in hospital were included in this analysis. Over a medium follow-up of 9.9 years, 540 patients had died of cardiovascular causes. 184 biomarkers and 21 clinical data were ranked according to the prediction accuracy using three different ranking methods (correlation, information gain and information gain ratio). Seven different predictors (random forest, random tree, naïve bayes predictor, rule based predictor, linear regression, polynomial and radial bases function support vector machine) were used to generate risk models.
The main predictive biomarker was NT-proBNP, CT-proAVP followed by TnT and estimated GFR. Using more than five biomarkers lead to a comparatively high increase in cost and effort without further improving the accuracy of the generated models. Comparing all biomarkers over all prediction algorithms with respect to the area under the curve, we found that the random forest approach yielded the best results followed by the rules based approach, logistic regression and the radial basis function support vector machine. Adding clinical variables further improved the models. Generally the machine learning risk models predicted five- and ten-year cardiovascular mortality better than the conventional statistical approaches.
We have developed the first CV mortality prediction model based on machine learning techniques, (1) on an extensive database of clinical, routinely and non-routinely measured laboratory data, (2) created a fully automatic and self-validated framework, (3) which is easily to apply on all spectra of population, events and time frames.
1Vth Department of Medicine, Medical Faculty Mannheim, University of Heidelberg, Mannheim, Mannheim, Germany
2Université du Luxembourg, Faculté des Sciences, de la Technologie et de la Communication, Esch-sur-Alzett, Luxembourg
3Université du Luxembourg, Luxembourg Centre for Systems Biomedicine, Esch-sur-Alzett, Luxembourg
4Clinical Institute of Medical and Chemical Laboratory Diagnostics, Medical University of Graz, Graz, Austria