Data mining represents an alternative approach to identify new predictors of multifactorial diseases. This work aimed at building an accurate predictive model for incident hypertension using data mining procedures.
The primary study population consisted of 1605 normotensive individuals aged 20–79 years with 5-year follow-up from the population-based study, that is the Study of Health in Pomerania (SHIP). The initial set was randomly split into a training and a testing set. We used a probabilistic graphical model applying a Bayesian network to create a predictive model for incident hypertension and compared the predictive performance with the established Framingham risk score for hypertension. Finally, the model was validated in 2887 participants from INTER99, a Danish community-based intervention study.
In the training set of SHIP data, the Bayesian network used a small subset of relevant baseline features including age, mean arterial pressure, rs16998073, serum glucose and urinary albumin concentrations. Furthermore, we detected relevant interactions between age and serum glucose as well as between rs16998073 and urinary albumin concentrations [area under the receiver operating characteristic (AUC 0.76)]. The model was confirmed in the SHIP validation set (AUC 0.78) and externally replicated in INTER99 (AUC 0.77). Compared to the established Framingham risk score for hypertension, the predictive performance of the new model was similar in the SHIP validation set and moderately better in INTER99.
Data mining procedures identified a predictive model for incident hypertension, which included innovative and easy-to-measure variables. The findings promise great applicability in screening settings and clinical practice.