Secondary Logo

Journal Logo


Leveraging Electronic Health Records and Machine Learning to Tailor Nursing Care for Patients at High Risk for Readmissions

Brom, Heather PhD, APRN; Brooks Carthon, J. Margo PhD, APRN, FAAN; Ikeaba, Uchechukwu MS; Chittams, Jesse MS

Author Information
doi: 10.1097/NCQ.0000000000000412


Electronic health record (EHR)-derived data are increasingly used across health care systems to identify high-risk patients, direct quality improvement initiatives, and monitor clinical trials.1,2 In the past decade alone, EHR systems have become more comprehensive allowing providers to share patient-related information across multiple health care systems.3 More recently, EHRs are employed as a means to identify patterns of high utilization within health care systems, including unplanned readmissions.4,5 A study by Shadmi and colleagues,6 for example, employed the EHR to develop a readmission risk prediction model for all-cause readmissions to an integrated delivery system, while other studies have leveraged the EHR to identify psychosocial risk factors that may heighten risk for 30-day readmission.7

Readmissions are of particular interest to health care systems due to the Hospital Readmissions Reduction Program, which penalizes hospitals for higher-than-expected readmission rates for selected conditions.8 Readmissions within 30 days of a hospital discharge are believed to be related to lapses in care quality during a patient's initial hospitalization. Since the inception of Hospital Readmissions Reduction Program in 2012, commercial insurers (eg, Independence Blue Cross) and some Medicaid health maintenance organizations have followed suit and either penalize or no longer pay for readmissions for specific conditions.9 While some studies have suggested a decrease in overall readmissions since the passage of Hospital Readmissions Reduction Program,10 concerns have emerged that such penalties may lead to unintended consequences.11 Specifically, evidence suggests that providers disproportionately serving socially vulnerable patients, such as safety net providers, are more likely to be penalized and less likely to receive financial rewards.12–14


As a level 1 trauma center located in a large urban area within the northeast United States, our hospital is the primary provider and safety net hospital for a large swath of the city's most socially vulnerable residents, many of whom are at high risk for readmissions. With the social and medical needs of our patient population in mind, we formed a nurse-led interdisciplinary work group in December 2017 to determine how we could improve care delivery for this high-risk patient population. One goal of the work group was to use EHR-derived data to identify factors associated with unplanned 30-day readmissions.

Prior research revealed 2 or more hospital admissions in the preceding 12 months to be the best predictor of all-cause 30-day readmission within our health system. Based on this finding, a readmission flag was embedded into the EHR to alert clinicians to a patient's high readmission risk status. The overall hospital 30-day all-cause readmission rate was 15.1%, but for those who were flagged as being high risk for readmission, it was 30.4%. However, after 12 months of implementing the EHR flag to alert clinicians to a patient's high risk of readmission, there was no change in 30-day all-cause readmission. The authors posited that this null finding may be due to nonroutine use of the flag in practice and that a readmission flag alone may not be enough to influence readmission rates.15 Our work group suspected that while appealing, an EHR flag alone may not be clinically useful in identifying patients at highest risk for readmissions and that a more precise mechanism to identify and target care delivery for high-risk patients was needed.

In the current study, we applied a novel machine-learning approach to identify patients in our hospital who were most at risk for readmissions. Machine learning is a useful and powerful technique that can use EHR data in real time to target specific high-risk patients. When applying a machine-learning approach, data are divided into a training set and a test set. With the training set, the machine-learning algorithm “learns” to build a model that uses the variables, or factors, in the data that best predict the outcome of interest. The created algorithm is then applied to the test data set to see how well the algorithm predicts the outcome of interest, such as identifying readmissions. Machine learning is increasingly used in clinical and epidemiological research and clinical care including the creation of disease-specific prediction models and readmission risk.6,15–18

One machine-learning approach is Classification and Regression Tree (CART), also known as decision trees. In CART, interactions among variables are considered recursively rather than simultaneously, as the case in linear regression. This analysis creates a tree, or classification rule, that can assist clinicians in understanding complex relationships and may be valuable, particularly to nurses, in tailoring clinical care to high-risk patients.16,19 The aim of this study was to employ CART to determine predictors of 30-day readmissions to our institution over a single quarter between August and October of 2017. Results of this study will be used to inform nurse-led clinical interventions for patients at high risk for readmissions.


Study design and population

We conducted a retrospective observational study of EHR data from an academic medical center located in a large urban area. All adult patients discharged from medical services from August 1, 2017, through October 31, 2017, were included in our analysis. Medical services included cardiovascular medicine, endocrinology, family medicine, gastroenterology, general internal medicine, geriatric medicine, hematology/oncology, hospitalist, infectious disease, medicine, and pulmonary, renal, and rheumatology. We received expedited approval and a Health Insurance Portability and Accountability Act waiver from our university's institutional review board before conducting this study (protocol number: 829212).

Data collection

Data were obtained from the health system's data store, which standardizes and consolidates data across multiple electronic systems incorporating key detailed clinical data elements to support health care research and patient care initiatives throughout the health system. The data store provided de-identified patient-level data for all adult discharges from our hospital during the study period.

Outcome variable

The primary outcome variable of interest was all-cause readmission to our hospital within 30 days of an indexed hospitalization to a medical service. The index admission was defined as the first admission for the patient that appeared in our data. Subsequent 30-day readmissions were included if they also occurred within the same 30-day window. This process was repeated for additional index admissions that occurred after the first 30-day window. We chose to include all readmissions in our analysis on the basis of prior work demonstrating that patients with multiple readmissions often have distinctive characteristics, such as complex chronic conditions and concurrent behavioral diagnoses that may affect health care utilization.20

Explanatory variables

We selected a comprehensive set of variables available to us through the health system's data store including patient demographics (eg, age, sex, race), social characteristics (eg, marital status, zip code), and utilization patterns (eg, hospital length of stay, emergency department [ED] visits) that have previously been used in readmission risk modeling.21,22 Patient comorbidities were determined by the presence of 27 of the 31 Elixhauser comorbidities using International Classification of Diseases, Tenth Revision (ICD-10) codes.23 Elixhauser comorbidities are used widely to measure comorbidity in administrative data and include conditions such as congestive heart failure, diabetes, dementia, and fluid and electrolyte disorders.24 Not included in our analysis were the following Elixhauser comorbidities due to their omission from the data received from the data store: AIDS/HIV, alcohol abuse, drug abuse, and psychoses.


To determine factors associated with hospital readmission, we first conducted bivariate analyses to compare individual factors among patients who did and did not experience a readmission. Chi-square test was used for categorical and t test for continuous variables. A separate set of analyses was then undertaken to determine predictors of multiple 30-day readmissions during the study period.

Next, factors that were significant at the P < .20 level in the bivariate analysis (comparing those readmitted with those who were not) were included in the CART algorithm to predict the probability of a 30-day readmission. Age was also included in the model because of its association with readmissions in prior work.22 Using CART, the data were divided into 3 samples (50% training, 25% validation, and 25% test) and stratified randomly using the outcome variable (30-day readmission) to perform a random selection with balance across the 3 samples. The training sample was used to build the CART model; the validation sample was used to determine the optimal termination nodes for the decision tree; and the test sample was used to determine how well the decision tree would perform on an independent sample.25 Cross-validation procedures were used to avoid overfitting when creating the decision trees.

We completed diagnostics and goodness-of-fit tests, which resulted in a nonsignificant P value of the Hosmer-Lemeshow test statistic, indicating no evidence of a poor model fit.26 To further explore how well CART performed, we also created a stepwise logistic regression model to determine which factors were associated with 30-day readmission. The same factors included in CART were considered in the logistic regression. The c-statistic for the logistic regression and CART were compared. The c-statistic, or the concordance-statistic, is equal to the area under the receiver operating characteristic curve and is a measure of model fit.27,28 The c-statistic for logistic regression was 0.83 and using the test data set was 0.74 for CART. Values of 0.5 mean that the model is no better at predicting the outcome than by chance, a value greater than 0.7 is a good model and greater than 0.8 is considered a strong model. The CART model was created using JMP Pro version 13.25 All other analyses were conducted in STATA (version 15; College Station, Texas) and SAS (version 9.4; Cary, North Carolina).


Sample characteristics

During the study period, there were 2165 admissions to medical services, of which 242 were readmissions (11.2%). Patient demographics and clinical characteristics for the sample are provided in Supplemental Digital Content Table 1, available at: Patients on average were 63.6 years of age. The majority of admitted patients were black (63.5%), female (50.8%), English speaking (98.1%), not married (69.5%), resided outside of the metropolitan area (54.0%), and were insured through Medicare (58.0%). Patients admitted had on average 5.3 comorbidities, were admitted through the ED (77.2%), and had on average 0.6 ED visits in the past 3 months.

Thirty-day readmissions

Supplemental Digital Content Table 1, available at:, displays demographic and clinical characteristics of patients who were readmitted compared with those who were not. Patients who experienced at least 1 30-day readmission were more frequently black (76.9% vs 61.9%, P < .001), less frequently married (21.5% vs 32.7%, P < .001), more frequently resided in the metropolitan area (58.3% vs 44.5%, P < .001), and more frequently held Medicaid insurance (45.5% vs 21.7%, P < .001) than patients who did not experience a readmission. On average, readmitted patients had more comorbidities (7.6 vs 5.0, P < .001) and visited the ED more (2.0 vs 0.4 visits, P < .001) than patients who did not experience a readmission. The top 5 conditions of patients who were readmitted were fluid and electrolyte disorders, hypertension—complicated, cardiac arrhythmias, hypertension—uncomplicated, and congestive heart failure. There were no significant differences between those who were readmitted and those who were not with regard to age, sex, ethnicity, English speaking, discharge disposition, and length of stay.

Multiple 30-day readmissions

Supplemental Digital Content Table 2, available at:, displays demographic and clinical characteristics of patients readmitted 0 to 1 time compared with those readmitted 2 times or more between August and October 2017. Ninety readmissions occurred among patients who experienced multiple 30-day readmissions (37.2% of 30-day readmissions). Similar to patients with at least 1 readmission, those who experienced 2 or more readmissions were more frequently black (91.1% vs 62.3%, P < .001), less frequently married (11.1% vs 32.3%, P < .001), more frequently a resident of the metropolitan area (64.4% vs 45.2%, P < .001), and insured by Medicaid (64.4% vs 22.6%, P < .001) than patients with 1 or no readmissions. They were also more frequently admitted through the ED (93.3% vs 76.5%, P = .001), visited the ED more (3.7 vs 0.5 visits, P < .001), and had more comorbidities (8.5 vs 5.2, P < .001).

Classification and Regression Tree

Factors that were significant at the P value of less than .20 level in bivariate analysis comparing any 30-day readmissions with patients who were not readmitted were included in the machine-learning analysis using CART methodology. Age was also included in the model because of its association with readmissions in prior work.22 The CART analysis created a decision tree of factors that were most predictive of readmission. For simplicity, we focus on reporting the results on the tree “branch” that produced the profile of patients at highest risk for readmission, results of which are displayed in the Table. Factors are displayed in order of importance, with the best determinants of a readmission including ED visit, followed by number of comorbidities, Medicaid insurance, and age.

Predictive Hierarchy of Factors That Best Discriminated Readmitted Patients

As part of the CART analysis, the algorithm determines the best cut point for each continuous variable (ED visits, comorbidities, and age). Among patients with 1 ED visit or more, the probability of readmission was 27.8%. Comorbidity was the next best discriminator and among patients who experienced 1 ED visit or more and 9 comorbidities or more, the probability of readmission was 44.0%. Medicaid was the next best discriminator and patients with 1 ED visit or more, 9 comorbidities or more, and who were insured by Medicaid had a 71.0% probability of readmission. Age was the last best discriminator, with a 92.1% probability of 30-day readmission among patients who had 1 ED visit or more, 9 comorbidities or more, Medicaid insurance, and age 65 years or more.


In this study, we sought to determine to what extent 30-day readmissions to our safety net hospital were predicted by patient risk factors using a machine-learning methodology. In a 3-month period, our institution experienced 242 medical service readmissions or more than 75 patients per month. Of those readmissions, more than a third were patients who were admitted to the hospital 3 or more times over the 3-month time period. Using CART, a machine-learning algorithm, we were able to determine a profile of patients who are at risk for readmission inclusive of patients who visit the ED, have multiple comorbidities (≥9), are insured through Medicaid, and are 65 years of age or older.

Our use of CART and EHR-derived data differs from prior studies that largely relied on large administrative data sets to predict 30-day readmissions.21,28,29 Many risk prediction models attempt to generalize readmission risk across settings21; however, the prevalence and patterns of readmissions vary by geographic region and from hospital to hospital.29,30 In addition, these prediction models can be cumbersome, difficult to implement, and in some cases irrelevant to the patient population at hand.17 Even within our own health system, where an EHR flag for readmission risk was created from a simple risk prediction model, there was no decline in readmission once integrated into the EHR.15

In light of the limitations of using large administrative data sources and traditional statistical methodologies, machine learning coupled with EHR-derived data is increasingly being used to create patient risk profiles.16,18,31 For example, Bayati et al17 leveraged EHR data to create a machine-learning algorithm to identify heart failure patients at high risk for readmission. They used these findings to better allocate post–acute care resources to reduce readmissions and referred to their use of EHR data to facilitate clinical practice change as a “prediction to action pipeline.”17(p2) Utilizing a machine-learning approach allowed us to create a profile of patients at high risk for readmission from which we plan to develop a clinical pathway to meet their specific needs.

Leveraging findings to tailor nursing care

Given their proximity to the bedside, nurses are in prime positions to identify patients in need of additional resources and to tailor their support accordingly. However, due to competing time demands, nurses may prioritize medical needs over social needs, leaving social risk factors overlooked.32 In addition, even when nurses identify patients at risk, they may not be aware of all available community resources. Our work group sought to uniformly identify patients at risk for adverse readmission outcomes and improve their care by leveraging EHR-derived data. Implementing an institution-specific machine-learning algorithm using EHR data adds value in that it accounts for local patient demographics and patterns of care delivery.16,17 In the next phase of this project, our work group plans to use the readmission profile developed from our machine-learning algorithm to automatically identify patients at high risk for 30-day readmissions. Once a patient is identified, the EHR will then “nudge” clinicians into implementing a consistent action plan, which is currently under development. Our ability to leverage EHR data to microtarget nursing resources and tailor care has the potential to improve quality outcomes for socially at-risk patients.


Our study has several limitations. We were limited in the variables that were available to us through the health system's data store. For example, we could not obtain more information that might determine social risk, such as housing status and concurrent substance abuse, nor data that were not entered into discrete fields, such as provider notes, which could provide additional insights to a patient's social risk. We are not unique in this concern and increasing assessment of the social determinants of health is being incorporated into EHRs; however, no consensus on standard assessment in EHRs exists.33 Data from the EHR can also be “messy” and may vary on the basis of how variables are constructed and from what sources they are derived.34 A larger training set would further increase classification accuracy as would prospective data, since our analysis was retrospective. We also considered only readmissions to our study hospital. Including multiple hospitals in the area would likely further refine our model. Despite these limitations, with our sample of 2165 admissions, we were still able to create a robust prediction model, which we plan to implement in future research.


Through a machine-learning technique, we successfully identified a patient profile that includes social and medical factors predicting 30-day readmission. With this algorithm, we are poised to better allocate nursing resources to a targeted patient population.


1. Campanella P, Lovato E, Marone C, et al The impact of electronic health records on healthcare quality: a systematic review and meta-analysis. Eur J Public Health. 2016;26(1):60–64.
2. Adler-Milstein J, DesRoches C, Kralovec P, et al Electronic health record adoption in US hospitals: progress continues, but challenges persist. Health Aff (Millwood). 2015;34(12):2174–2180.
3. Payne TH, Corley S, Cullen TA, et al Report of the AMIA EHR-2020 Task Force on the status and future direction of EHRs. J Am Med Inform Assoc. 2015;22(5):1102–1110.
4. Garcia-Arce A, Rico F, Zayas-Castro JL. Comparison of machine learning algorithms for the prediction of preventable hospital readmissions. J Healthc Qual. 2018;40(3):129–138.
5. Futoma J, Morris J, Lucas J. A comparison of models for predicting early hospital readmissions. J Biomed Inform. 2015;56:229–238.
6. Shadmi E, Flaks-Manov N, Hoshen M, Goldman O, Bitterman H, Balicer RD. Predicting 30-day readmissions with preadmission electronic health record data. Med Care. 2015;53(3):283–289.
7. Watson AJ, O'Rourke J, Jethwani K, et al Linking electronic health record-extracted psychosocial data in real-time to risk of readmission for heart failure. Psychosomatics. 2011;52(4):319–327.
8. Rau J. Medicare's readmission penalties hit new high. KHN Web site. Published August 2, 2016. Accessed November 1, 2018.
9. Cross I. Inpatient hospital readmission policy changes: frequently asked questions. Independence Blue Cross Web site.!OpenDocument. Published 2018. Accessed November 1, 2018.
10. Zuckerman RB, Sheingold SH, Orav EJ, Ruhter J, Epstein AM. Readmissions, observation, and the Hospital Readmissions Reduction Program. N Engl J Med. 2016;374(16):1543–1551.
11. Bhalla R, Kalkut G. Could Medicare readmission policy exacerbate health care system inequity? Ann Intern Med. 2010;152(2):114–117.
12. Barnett M, Hsu J, McWilliams J. Patient characteristics and differences in hospital readmission rates. JAMA Intern Med. 2015;175(11):1803–1812.
13. Dickens C, Weitzel D, Brown S. Mr. G and the revolving door: breaking the readmission cycle at a safety-net hospital. Health Aff (Millwood). 2016;35(3):540–543.
14. Berenson J, Shih A. Higher readmissions at safety-net hospitals and potential policy solutions. The Commonwealth Fund Web site. Published December 10, 2012. Accessed November 1, 2018.
15. Baillie C, VanZandbergen C, Tait G, et al The readmission risk flag: using the electronic health record to automatically identify patients at risk for 30-day readmission. J Hosp Med. 2013;8(12):689–695.
16. Fisher SR, Graham JE, Krishnan S, Ottenbacher KJ. Predictors of 30-day readmission following inpatient rehabilitation for patients at high risk for hospital readmission. Phys Ther. 2016;96(1):62–70.
17. Bayati M, Braverman M, Gillam M, et al Data-driven decisions for reducing readmissions for heart failure: general methodology and case study. PLoS One. 2014;9(10):e109264.
18. Goldstein B, Navar A, Carter R. Moving beyond regression techniques in cardiovascular risk prediction: applying machine learning to address analytic challenges. Eur Heart J. 2017;38(23):1805–1814.
19. Lemon SC, Roy J, Clark MA, Friedmann PD, Rakowski W. Classification and regression tree analysis in public health: methodological review and comparison with logistic regression. Ann Behav Med. 2003;26(3):172–181.
20. Szekendi MK, Williams MV, Carrier D, Hensley L, Thomas S, Cerese J. The characteristics of patients frequently admitted to academic medical centers in the United States. J Hosp Med. 2015;10(9):563–568.
21. Kansagara D, Englander H, Salanitro A, et al Risk prediction models for hospital readmission: a systematic review. JAMA. 2011;306(15):1688–1698.
22. Calvillo-King L, Arnold D, Eubank K, et al Impact of social factors on risk of readmission or mortality in pneumonia and heart failure: systematic review. J Gen Intern Med. 2013;28(2):269–282.
23. Quan H, Sundararajan V, Halfon P, et al Coding algorithms in ICD-9-CM and ICD-10 administrative data. Med Care. 2005;43(11):1130–1139.
24. Elixhauser A, Steiner C, Harris DR, Coffey RM. Comorbidity measures for use with administrative data. Med Care. 1998;36(1):8–27.
25. Grayson J, Gardner S, Stephens M. Building Better Models with JMP Pro. Cary, NC: SAS Institute; 2015.
26. Hosmer DW, Hosmer T, Le Cessie S, Lemeshow S. A comparison of goodness-of-fit tests for the logistic regression model. Stat Med. 1997;16(9):965–980.
27. Austin PC, Steyerberg EW. Interpreting the concordance statistic of a logistic regression model: relation to the variance and odds ratio of a continuous explanatory variable. BMC Med Res Methodol. 2012;12:82.
28. Steyerberg EW, Vickers AJ, Cook NR, et al Assessing the performance of prediction models: a framework for some traditional and novel measures. Epidemiology. 2010;21(1):128–138.
29. Jencks S, Williams M, Coleman E. Rehospitalizations among patients in the Medicare fee-for-service program. N Engl J Med. 2009;360(14):1418–1428.
30. Krumholz H, Wang K, Lin Z, et al Hospital-readmission risk—isolating hospital effects from patient effects. N Engl J Med. 2017;377(11):1055–1064.
31. Shameer K, Johnson KW, Yahi A, et al Predictive modeling of hospital readmission rates using electronic medical record-wide machine learning: a case-study using Mount Sinai Heart Failure Cohort. Pac Symp Biocomput. 2017;22:276–287.
32. Carthon JM, Lasater KB, Sloane DM, Kutney-Lee A. The quality of hospital work environments and missed nursing care is linked to heart failure readmissions: a cross-sectional study of US hospitals. BMJ Qual Saf. 2015;24(4):255–263.
33. Cantor M, Thorpe L. Integrating data on social determinants of health into electronic health records. Health Aff (Millwood). 2018;37(4):585–590.
34. Goldstein BA, Navar AM, Pencina MJ, Ioannidis JP. Opportunities and challenges in developing risk prediction models with electronic health records data: a systematic review. J Am Med Inform Assoc. 2017;24(1):198–208.

Classification and Regression Tree; electronic health records; machine learning; readmissions

Supplemental Digital Content

Copyright © 2020 Wolters Kluwer Health, Inc. All rights reserved