With the increasing focus on reducing hospital readmissions in the United States, numerous readmissions risk prediction models have been proposed, mostly developed through analyses of structured data fields in electronic medical records and administrative databases. Three areas that may have an impact on readmission but are poorly captured using structured data sources are patients’ physical function, cognitive status, and psychosocial environment and support.
Objective of the Study:
The objective of the study was to build a discriminative model using information germane to these 3 areas to identify hospitalized patients’ risk for 30-day all cause readmissions.
We conducted clinician focus groups to identify language used in the clinical record regarding these 3 areas. We then created a dataset including 30,000 inpatients, 10,000 from each of 3 hospitals, and searched those records for the focus group-derived language using natural language processing. A 30-day readmission prediction model was developed on 75% of the dataset and validated on the other 25% and also on hospital specific subsets.
Focus group language was aggregated into 35 variables. The final model had 16 variables, a validated C-statistic of 0.74, and was well calibrated. Subset validation of the model by hospital yielded C-statistics of 0.70–0.75.
Deriving a 30-day readmission risk prediction model through identification of physical, cognitive, and psychosocial issues using natural language processing yielded a model that performs similarly to the better performing models previously published with the added advantage of being based on clinically relevant factors and also automated and scalable. Because of the clinical relevance of the variables in the model, future research may be able to test if targeting interventions to identified risks results in reductions in readmissions.