We sought to evaluate several statistical modeling approaches in predicting prospective total annual health costs (medical plus pharmacy) of health plan participants using Pharmacy Health Dimensions (PHD), a pharmacy claims-based risk index.
We undertook a 2-year (baseline year/follow-up year) longitudinal analysis of integrated medical and pharmacy claims. Included were plan participants younger than 65 years of age with continuous medical and pharmacy coverage (n = 344,832). PHD drug categories, age, gender, and pharmacy costs were derived across the baseline year. Annual total health costs were calculated for each plan participant in follow-up year. Models examined included ordinary least squares (OLS) regression, log-transformed OLS regression with smearing estimator, and 3 two-part models using OLS regression, log-OLS regression with smearing estimator, and generalized linear modeling (GLM), respectively. A 10% random sample was withheld for model validation, which was assessed via adjusted r2, mean absolute prediction error, specificity, and positive predictive value.
Most PHD drug categories were significant independent predictors of total costs. Among models tested, the OLS model had the lowest mean absolute prediction error and highest adjusted r2. The log-OLS and 2-part log-OLS models did not predict costs accurately as the result of issues of log-scale heteroscedasticity. The 2-part model using GLM had lower adjusted r2 but similar performance in other assessment measures compared with the OLS or 2-part OLS models.
The PHD system derived solely from pharmacy claims data can be used to predict future total health costs. Using PHD with a simple OLS model may provide similar predictive accuracy in comparison to more advanced econometric models.