Recently, calls to determine the outcomes of graduate medical education (GME)1–7 and align the goals and purposes of medical education with those of health care delivery8 have been increasing. These appeals are consistent with the intent of competency-based medical education (CBME) to ensure training curricula and outcomes that prepare graduates to meet the needs of patients.9 Naturally, achieving this aim requires defining the goals of GME and measuring achievement of those goals.1 The medical education community has suggested that “big data” can help achieve this ambition.2,4 Educators have further noted that big data may be the answer to linking data on education with data on care.3 Although publicly available big data from local, state, and federal governments are increasingly common in medicine, the GME community is not taking advantage of the opportunities afforded.2,4 This may be a result of the challenges with harnessing big data for these purposes.3 Indeed, the limited efforts that have used big data to inform medical education outcomes have noted the substantial resources required to complete this work.10
Recent calls to determine GME outcomes have also placed more focus on programs, institutions, and the GME system on the whole rather than on individual residents or fellows.5,11,12 Although this focus is important, individual residents are equally imperative to consider given that providing individuals with data related to their performance on the team may be the best way to drive their personal improvement.13,14 Furthermore, we graduate, certify, and credential individuals in medicine rather than teams—and competent individuals form the basis of functional teams—making it crucial to be able to determine an individual’s performance. Finally, discerning the performance of individuals on teams can be used collectively to inform team performance.
Efforts to provide resident-level quality feedback are challenged by issues of attribution,15 with some calling the ability to attribute performance to an individual into question.16 However, other work offers promise in this area. For example, some of us have developed resident-sensitive quality measures, which attempt to capture work that is likely attributable to individual residents.17 Furthermore, Levin and Hron18 have harnessed the electronic health record (EHR) to provide data on patient volumes and diagnoses seen by individual residents. Finally, Herzke and colleagues19 describe a method for attributing patient-level metrics to attending physicians through the type, timing, and number of charges for patient hospitalizations.
To explore the use of big data in the EHR at the individual resident level, we sought to determine a method for attributing care for individual patients to individual interns based on “footprints” in the EHR (i.e., activities logged in the EHR). We believe such modeling represents a first step toward disentangling overall care and attribution of that care. If it is not possible to identify a primary intern for a patient, it is also not possible to disentangle the level of contribution of more than one intern or other member of the team.
In this study, we intended to demonstrate the feasibility of predicting primary interns caring for patients using selected EHR data. Although this preliminary effort does not consider patient care outcomes attributed to these interns or consider the role that supervisors and other members of the health care team play in modifying those outcomes, it does provide the foundation to take those next steps. Furthermore, it informs opportunities for automation of case logs to track factors such as diagnoses and ranges of patient complexity seen by interns. It also allows for connection of interns to quality data for their patients, creating opportunities for reflective continuous quality improvement exercises and, ultimately, relative contribution of these interns to improving care quality and outcomes. All of these purposes serve to answer the call to determine the outcomes of GME training to ensure that they indeed serve to meet the needs of patient populations.
This study was conducted at the University of Cincinnati Medical Center (UCMC). All residents rotating on the inpatient general medicine wards during the time frames of interest were considered eligible. The internal medicine residency at UCMC is a three-year residency program with 89 categorical (i.e., noncombined training program) and preliminary year (e.g., single-year residents destined for another specialty after that year) trainees. UCMC has two types of general medicine inpatient teams: attending, senior resident, intern, third-year medical student, and often a fourth-year medical student; and attending and intern only. Both teams see general internal medicine patients on the same units.
We considered eligible patient records to be those that had an eligible intern as the primary intern on each of the days of the study. We conceptually defined “primary intern” as a trainee who was assigned to a patient and primarily responsible for delivery of that patient’s care on a given day (e.g., consulting other services, writing progress notes, communicating with the patient and other members of the health care team) as determined by the attending physician of record.
Model predictor selection
Primary interns often write daily progress notes and discharge summaries as well as enter orders for their primary patients, so we chose these activities as variables to include in modeling.
We also chose information about users’ interactions with the EHR system for inclusion in modeling. Such interactions are recorded automatically in the form of event logs or audit trails (“EHR clicks” hereafter). Our hypothesis was that EHR clicks can play a significant role in predicting primary interns. A full list of EHR clicks is shown in Supplemental Digital Appendix 1, available at http://links.lww.com/ACADMED/A672. We believed that many of these EHR clicks may not be germane to the work that interns often do. Therefore, we created two additional categories of EHR clicks based on the perceived importance of types of clicks to identify a primary intern. To determine this, we convened a group of 14 current internal medicine residents to provide feedback on the perceived importance of click types to likely represent a resident caring for a hospitalized patient on a resident team. Residents were provided a list of clicks and asked to circle those that they believed were commonly done by interns and upper-level residents (i.e., circle everything that applies) and star those that they strongly believed were commonly done by interns and upper-level residents (i.e., star what is most important). Example clicks prioritized by residents included medications activity accessed, note viewed in chart review, inpatient sign-out activity accessed, and inpatient orders section accessed. All included clicks, with resident voting, are shown in Supplemental Digital Appendix 2, available at http://links.lww.com/ACADMED/A672.
In addition to the resident-generated categories, most of the physician authors (D.J.S., B.K., D.R.S., M.K., and E.W.), all experienced GME administrators, selected click behaviors they felt were common and likely performed by interns and upper-level residents to form a separate category in the dataset. These are shown in Supplemental Digital Appendix 3, available at http://links.lww.com/ACADMED/A672.
The goal of gathering feedback from residents and our author team was simply to crudely eliminate click types that were unlikely to be useful in modeling (e.g., inpatient education activity accessed, barcode scanned, and edit claim information window accessed). All “eliminated” click types were considered as part of the total number of clicks in the analyses. The hope was that eliminating some of these in future modeling would pick up more signal in the noise. We deemed crude grouping appropriate, given the desire to develop a model rather than refine one.
On a daily basis, five attending physicians (including authors B.K. and D.R.S.) independently recorded the primary intern for each patient each day during their service time from August 1 to 12, 2017; August 17 to 25, 2017; and January 8 to 13, 2018. As such, we considered each day as a discrete primary intern–patient pair to allow for changes in primary interns with activities such as cross-covering.
This dataset was expanded both in terms of columns and rows, by combining it with the data from the UCMC EHR (Epic Systems Corporation; Verona, Wisconsin). For row expansion, because multiple EHR users were involved in patient care on each day, each EHR user who had “touched” the patient had a row in the dataset. That is, we included all EHR users (i.e., intern and nonintern, as well as physician and nonphysician) with any interaction in the EHR on a given day in this expansion. For column expansion, for each EHR user of each patient on each day, we added the following information:
- patient admission/discharge time and length of stay;
- provider type and postgraduate year (PGY) of residency, if applicable;
- whether this person was the progress note author for the day;
- whether this person was the discharge summary author for the encounter;
- whether this person ever placed an order for the patient on that day as well as the total number of orders placed;
- total EHR clicks; and
- EHR clicks in different categories (e.g., notes viewed, flowsheets activity accessed, orders viewed).
The data collection and study conduction were reviewed and determined to be exempt by the UCMC Institutional Review Board (no. 2016-8982).
We further manipulated the expanded dataset to generate informative variables. First, residents from the fourth year of the medicine–pediatrics combined training program were coded as postgraduate year 3 (PGY3) because there was a small number of these residents, their data behaved the same as PGY3 categorical residents, and they (like PGY3 categorical residents) were rarely expected to be identified as a primary resident. Second, the total numbers of clicks were ranked among all EHR users and within all the residents (interns and noninterns). For example, a resident may have been ranked first with the highest number of clicks among other residents but may have been ranked fifth overall because other users (e.g., nurses) may have touched the patient more in the EHR. Our pilot experiment showed that the former was a better predictor, so the latter was dropped. Third, we recategorized the clicks because the original categories prescribed by the EHR vendor were too granular. This new categorization was made based on the perceived importance of types of clicks to identify a primary intern, as detailed above.
Before modeling, we eliminated all nonresident EHR users from the dataset based on provider types in our EHR system. The final dataset was split into two subsets for training and validation, respectively. Training data were used to train the model, and validation data were used to validate that model. We considered two data split proportions to examine the differences between model performance. The first split (80–20) used 80% of the data for training and 20% for validation, as is commonly used in machine learning tasks.20 The second split (50–50) used half of the data for training as well as for validation. This split may have increased the model performance because of the larger validation dataset. We executed data splits at the patient level. All of the following variables were considered, and no systematic feature selection was performed.
- Resident PGY
- Did this resident write the progress note of this patient on this day?
- Was this resident the author of the progress note of the patient on this day?
- Did this resident write the discharge summary of the patient?
- Was this resident the author of the discharge summary of the patient?
- Did this resident place an order on this patient on this date?
- Number of orders placed by this resident on this patient on this date
- Rank of number of clicks among residents
- Number of event logs of this resident on this patient on this date
- Number of clicks deemed to be of medium importance among residents for this patient on this date
- Number of clicks deemed to be of high importance among residents for this patient on this date
- Number of clicks deemed to be of high importance by physician authors with GME administrative experience
However, we assigned variables differently to generate four models. As shown in Table 1, Model 1 had variables 1–9 as its input, and Model 2 replaced variable 9 with variables 10–12. Models 3 and 4 had a similar setting as Models 1 and 2 except that they did not have variable 1 (PGY). This variable assignment was expected to better demonstrate the performance gain/loss when PGY or EHR click information was available.
The data were modeled using SAS 9.4 software’s high-performance procedure HPSLIT (a decision tree algorithm) in SAS/STAT statistical software, version 14.3 (SAS Institute Inc., Cary, North Carolina). This decision algorithm determined the decisive variables in each layer by how much information was gained when a new variable was added. The tree depth was set at five, and up to three leaves were permitted at each node. We selected a decision tree algorithm because of its easily interpretable output to understand the role of the predictor variables. Comparing decision trees with other modeling techniques was beyond the scope of this exploratory study.
We used the following metrics to evaluate the model performance: sensitivity (recall), specificity, precision, F score (a measure of accuracy, where 1 is best and 0 is worst), and the area under the receiver-operator curve (ROC), or AUC. Specifically, F score, AUC, and specificity of the validation dataset were used to determine the model with the best performance.
A total of 369 recordings of primary intern were made by the five attending physicians. The dataset for patients related to these records was expanded to 23,242 access logs (all clicks/activities in the EHR) for individual EHR users who touched these patients in the EHR. On a given day, an average of 7 (range: 1–28) medical providers (medical students, residents, fellows, and attending physicians) touched a patient in the EHR. The dataset was also manipulated to generate the 12 variables described above. To focus on predicting primary interns, only internal medicine interns and upper-level residents were kept in the dataset, and all other clinicians were dropped. This led to a subset of 1,511 daily access logs (internal medicine interns and upper-level residents touching the patient) belonging to 120 patients, with 116 access logs (7.68%) marked by the attending physicians as primary interns.
The 80–20 data split at the patient level resulted in a training set of 1,193 (78.95%) records and 318 (21.05%) validation records; the 50–50 split resulted in a training set of 869 (57.51%) records and 642 (42.49%) records. Each split trained the same four models. The model performance is shown in Table 2. The four models in the 80–20 data split did not achieve better performance based on F score and AUC. In the 50–50 data split, Models 1 and 2 outperformed Models 3 and 4, indicating that PGY is a critical piece of information in determining a primary intern. In addition, recategorizing clicks based on perceived importance improved model performance. However, none of the models had desirable results for precision (all below 80%).
Table 3 summarizes the important variables of the four models in the 50–50 data split. Across all models, a key variable was whether a given clinician wrote the progress note for the day. The relative rank of number of clicks among residents was also consistently at the top of the list. A trainee’s PGY was the second most important variable if it was included (Models 1 and 2). However, this variable seems to provide strong information to identify primary interns such that the click variables (total clicks and categorized clicks, or variables 9–12) were not as important. When PGY was not included in a model, the click variables played a more vital role in predicting primary interns (Model 4).
The best model in our study was Model 2 in the 50–50 data split, which achieved 78.95% sensitivity and 97.61% specificity. Figure 1 demonstrates the ROC curve of this model. The AUC was approximately 91% on the validation dataset. Figure 2 further illustrates the decision tree of Model 2. This decision tree has five levels. The first decision was whether the clinician wrote the progress note on the patient on that day. If yes (ID:1) and the clinician’s PGY is 2 or 3 (ID:3), this clinician was not a primary intern. If this clinician’s PGY was 1 (ID:4) and he/she ever placed an order on this patient on the specified date (ID:9), this clinician was likely to be a primary intern. On the other hand, if this clinician was not a progress note author (ID:2) and his/her PGY was 2 or 3 (ID:5,6), this clinician was not a primary intern. However, if his/her PGY number was 1 (ID:7), his/her ranking of total clicks among residents would be a decisive factor (ID:A,B,C).
Supplemental Digital Appendix 4, available at http://links.lww.com/ACADMED/A672, lists all the decision points and the counts of each subdataset, which can be used to implement this model to predict primary interns using EHR data. Click variables were largely used in levels 3–5 to identify primary interns. As can be seen in Table 3, these click variables were less important than other variables in the upper levels, although they still provided some information to identify primary interns.
In this study, we successfully predicted primary interns caring for patients on inpatient internal medicine teams using EHR data with excellent model performance. This proof-of-concept study demonstrates that individual resident attribution in the EHR is possible and can be automated through computerized algorithms. Modeling and efforts such as those undertaken in this study provide the foundation for attributing patients to primary interns and expanding to attributing patient care to other residents and team members as well.
In our study, PGY and being the daily progress note author were decisive factors in determining primary intern. These findings are expected because interns are PGY1s and write the daily progress notes for those patients at our institution. However, interns also provide care for patients who are not their primary patient, and multiple interns on an inpatient team are each the primary intern for only a subset of the patients cared for by that team when multiple interns are present on a team. Therefore, although decisive, PGY has limitations, highlighting the importance of additional modeling.
Although progress note author was a decisive factor in our best model, it was most decisive in excluding individuals as primary interns when they were not primary interns but not when identifying primary interns when they wrote daily progress notes. We did not consider other authors of daily progress notes, but future work should consider this.
Our study also found that EHR clicks contained critical information about residents’ behaviors and patient touches in the EHR and had great potential to predict primary interns. These data already exist in the EHR, but they are largely ignored and may not be stored properly for analysis. For example, in our system the log data are archived within a year, so conducting a retrospective attribution study can be very resource consuming. Our model stresses the value in harnessing these data to identify primary interns. Once commandeered, such data may serve a role for understanding residents’ workflow, providing insights into improving efficiency and organization.
Practical considerations and future work
The best model in our study (i.e., Model 2) achieved 78.95% sensitivity, 97.61% specificity, and an AUC of about 91% for the validation dataset. This AUC is very close to perfect and indicates excellent model performance.
These results are likely best viewed through the lens of the intended use of automation to determine a primary intern in the EHR. Our goal was to correctly attribute patients to primary interns. Therefore, specificity is most important, and our model was excellent in this task. Given our results, many encounters could likely be yielded for the purpose of engaging residents in quality improvement efforts, which is currently suboptimal.21 Therefore, identifying any, let alone many, encounters where quality measures from a resident’s primary patients can be provided for reflective continuous quality improvement efforts, including identifying and addressing critical deficiencies and following developmental progress over time, can make an important contribution. However, it is important to bear in mind that this study considered interns, and thus the focus should likely remain on continuous quality improvement rather than any attempts to infer higher-stakes attribution of care, which have influences from supervisors and other members of the team. It is also important to bear in mind that our model considered primary interns on a daily basis, and thus an intern who served as the primary intern for the same patient for several days could correctly be ascribed more weight in continuous quality improvement activities but incorrectly counted more than once if using the model to identify diagnoses and complexity seen by the intern.
Although our results offer benefit for primary interns, much work remains. First, future modeling efforts will need to discern attribution of patients to other interns as well as upper-level residents. Second, optimizing the model to gain sensitivity will likely be important. For continuous quality improvement purposes, tilting the ROC toward specificity and trying to avoid false positives is likely acceptable. However, using this process to make higher-stakes performance assessment decisions, which would have tremendous value in the CBME era with a focus on educational and patient outcomes,22–24 will require better sensitivity as well as better understanding of relative contribution toward overall care provided by primary interns. Moving forward, it will be important to learn more about the encounters that this model misses. For example, is less footprint in the EHR associated with worse care, better care, or no difference in care provided by a primary intern? Do those missed by the model have worse performance, and thus are activities in the model performed by other members of the team to compensate?
Future work should also focus on developing a model that seeks to discern the relative attribution of care for all health care professionals, or at least those in the most central frontline roles (e.g., physicians, advanced practice providers, nurses, pharmacists) of patient care. This was beyond the scope of our current study but important. Determining relative attribution for care would enable answering important questions about which actions and outcomes for patients are more or less likely to be attributed to primary interns compared with other providers, including supervising and other residents as well as attending physicians. For example, are adverse events and outcomes more or less likely to be attributed to primary interns, and what attribution to patient care resides with providers caring for patients when the primary intern is not present (e.g., night float, cross-covering, on-call residents)? Furthermore, how are outcomes shared among team members caring for patients at the same time? The breadth of future work to define attribution of all members of the team through EHR data underscores the importance of the modeling work we described in this article. Although such modeling could be obviated by simply having residents assign themselves a role in patient care each day in the EHR, such self-assignment is riddled with challenges, including the need to reliably assign oneself each day and/or unassign oneself when no longer in that role. In addition, it emphasizes the need to develop myriad definitions of roles that all residents on the care team (e.g., nonprimary intern, supervising resident, consult service resident, continuity clinic resident whose patient is hospitalized) can reliably assign themselves to each day.
This study has limitations to consider. First, our modeling is based on data collected by five attending physicians at a single institution. Future work should include applying various modeling techniques on larger datasets from multiple institutions. Second, we only considered primary interns and not other members of the team. Future work should explore other resident and nonresident members of the team. Third, this study sought to attribute patients to primary interns but not necessarily patient care to those interns. Given our focus on interns, much of the decision making that leads to patient care is likely a collaborative process in conjunction with other members of the health care team, perhaps most importantly senior residents and attending physicians. Future work should seek to tease out, to the extent possible, which EHR entries resulted from a decision made by primary interns before supervisor review, which entries are the result of a directive from a supervisor, and which entries resulted from a collaborative dialogue among team members. Although we cannot make such determinations with our current data, our data can be a starting place for such a line of inquiry. Additionally, this work is time-consuming. We estimate that our data extraction, manipulation, and modeling required 200 person-hours for completion of this study. However, implementing our model in a new setting would take considerably less time because of the foundation already laid. Finally, by design we only considered the individual in this study, but data such as those collected can also inform team performance and should be explored for such purposes.
Prediction models for health care decisions and patient outcomes deserve attention.25 EHR data modeling can be a pathway to medical education’s obligation to determine both educational outcomes as well as their impact on patient outcomes.
1. Weinstein DF, Thibault GE. Illuminating graduate medical education outcomes in order to improve them. Acad Med. 2018;93:975–978.
2. Arora VM. Harnessing the power of big data to improve graduate medical education: Big idea or bust? Acad Med. 2018;93:833–834.
3. Chahine S, Kulasegaram KM, Wright S, et al. A call to investigate the relationship between education and health outcomes using big data. Acad Med. 2018;93:829–832.
4. Triola MM, Hawkins RE, Skochelak SE. The time is now: Using graduates’ practice data to drive medical education reform. Acad Med. 2018;93:826–828.
5. Caverzagie KJ, Lane SW, Sharma N, et al. Proposed performance-based metrics for the future funding of graduate medical education: Starting the conversation. Acad Med. 2018;93:1002–1013.
6. Weinstein DF. Optimizing GME by measuring its outcomes. N Engl J Med. 2017;377:2007–2009.
7. Cohen E, Kuo DZ, Agrawal R, et al. Children with medical complexity: An emerging population for clinical and research initiatives. Pediatrics. 2011;127:529–538.
8. Sklar DP, Hemmer PA, Durning SJ. Medical education and health care delivery: A call to better align goals and purposes. Acad Med. 2018;93:384–390.
9. Frenk J, Chen L, Bhutta ZA, et al. Health professionals for a new century: Transforming education to strengthen health systems in an interdependent world. Lancet. 2010;376:1923–1958.
10. Carraccio CL, Englander R. From Flexner to competencies: Reflections on a decade and the journey ahead. Acad Med. 2013;88:1067–1073.
11. Silkens MEWM, Arah OA, Wagner C, Scherpbier AJJA, Heineman MJ, Lombarts KMJMH. The relationship between the learning and patient safety climates of clinical departments and residents’ patient safety behaviors. Acad Med. 2018;93:1374–1380.
12. Smirnova A, Ravelli ACJ, Stalmeijer RE, et al. The association between learning climate and adverse obstetrical outcomes in 16 nontertiary obstetrics–gynecology departments in the Netherlands. Acad Med. 2017;92:1740–1748.
13. Sandars J, Cleary TJ. Self-regulation theory: Applications to medical education: AMEE guide no. 58. Med Teach. 2011;33:875–886.
14. Schon DA. The Reflective Practitioner: How Professionals Think in Action. 1983.New York, NY: Basic Books, Inc..
15. Berman S. Training pediatricians to become child advocates. Pediatrics. 1998;102(3 pt 1):632–636.
16. Sebok-Syer SS, Chahine S, Watling CJ, Goldszmidt M, Cristancho S, Lingard L. Considering the interdependence of clinical performance: Implications for assessment and entrustment. Med Educ. 2018;52:970–980.
17. Schumacher DJ, Holmboe ES, van der Vleuten C, Busari JO, Carraccio C. Developing resident-sensitive quality measures: A model from pediatric emergency medicine. Acad Med. 2018;93:1071–1078.
18. Levin JC, Hron J. Automated reporting of trainee metrics using electronic clinical systems. J Grad Med Educ. 2017;9:361–365.
19. Herzke CA, Michtalik HJ, Durkin N, et al. A method for attributing patient-level metrics to rotating providers in an inpatient setting. J Hosp Med. 2018;13:470–475.
20. Whitenack D. Machine Learning With Go: Implementation Regression, Classification, Clustering, Time-Series Models, Neural Networks, and More Using the Go Programming Language. 2017.Birmingham, UK: Packt Publishing.
21. Butler JM, Anderson KA, Supiano MA, Weir CR. “It feels like a lot of extra work”: Resident attitudes about quality improvement and implications for an effective learning health care system. Acad Med. 2017;92:984–990.
22. Kalet AL, Gillespie CC, Schwartz MD, et al. New measures to establish the evidence base for medical education: Identifying educationally sensitive patient outcomes. Acad Med. 2010;85:844–851.
23. Kogan JR, Conforti LN, Iobst WF, Holmboe ES. Reconceptualizing variable rater assessments as both an educational and clinical care problem. Acad Med. 2014;89:721–727.
24. Carraccio C, Englander R, Holmboe ES, Kogan JR. Driving care quality: Aligning trainee assessment and supervision through practical application of entrustable professional activities, competencies, and milestones. Acad Med. 2016;91:199–203.
25. Shah ND, Steyerberg EW, Kent DM. Big data and predictive analytics: Recalibrating expectations. JAMA. 2018;320:27–28.