Purpose: To assess the accuracy of residents’ record review, using trained abstractors as a gold standard comparison.
Method: In 2005, the authors asked 74 residents to review their own charts (n = 392) after they received brief instruction on both how to locate data on the medical record and how to use a data abstraction form. Trained abstractors then re-reviewed these charts to assess performance of preventive health care measures in medicine (smoking screening, smoking cessation advice, mammography, colon cancer screening, lipid screening, and pneumonia vaccination) and pediatrics (parent smoking screening, parent smoking cessation advice, car seat safety, car restraint use, eye alignment, and immunizations up to date). The authors then quantified agreement between the two record reviews and assessed the sensitivity and specificity of the residents versus the trained abstractors.
Results: Overall resident-measured performance was similar (within 5%) to that of the trained abstractor for five of six measures in medicine and four of six in pediatrics. For the various measures, sensitivity of resident-measured performance ranged from 100% to 15% and specificity from 100% to 33% compared with the trained abstractors. Relative to the trained abstractor record review, residents did not overestimate their performance. Most residents’ (81%) relative performance rankings did not change when the basis for the ranking was resident measured versus trained abstractor measured.
Conclusions: Residents’ self-abstraction can be an alternative to costly trained abstractors. Appropriate use of these data should be carefully considered, acknowledging the limitations.
Dr. Houston is co-director, Birmingham HSR&D Research Enhancement Award Program, Birmingham Veterans Affairs Medical Center, associate professor of medicine, University of Alabama at Birmingham, and assistant program director for research, University of Alabama at Birmingham Internal Medicine Residency, Birmingham, Alabama.
Dr. Wall is associate professor of medicine, Division of General Pediatrics, University of Alabama at Birmingham, Birmingham, Alabama.
Dr. Willett is associate professor of medicine, Division of General Internal Medicine, and associate director, Internal Medicine Residency Program, University of Alabama at Birmingham, Birmingham, Alabama.
Dr. Heudebert is professor of medicine, Division of General Internal Medicine, assistant dean of graduate medical education, and director, Internal Medicine Residency Program, University of Alabama at Birmingham, Birmingham, Alabama.
Dr. Allison is professor of medicine, Division of Preventive Medicine, University of Alabama at Birmingham, and assistant director, Center for Outcomes and Effectiveness Research, Birmingham, Alabama.
Please see the end of this article for information about the authors.
Correspondence should be addressed to Dr. Houston, 1530 Third Ave. South, FOT 720, University of Alabama at Birmingham, Birmingham, AL 35294; telephone: (205) 934-7997; fax: (205) 975-7797; e-mail: (email@example.com).
Under the Outcomes Project of the Accreditation Council for Graduate Medical Education (ACGME), evaluation of residency education must expand beyond standard evaluation methods (e.g., in-training exams, faculty evaluations).1 An objective evaluation of residency education is a frequently noted goal. Within the ACGME toolbox of acceptable assessment methods, one suggested method of direct observation, developed by the ACGME, is a record review—an audit of resident performance based on chart abstraction.1–4
Based on the ACGME toolbox, “record review” is the preferred tool for several practice-based assessments including provision of preventive health services, appropriate use of laboratory tests, development and carrying out of management plans, and self-analysis of “own practice.”5 However, record review using traditional, trained chart abstractors is costly,4 especially in the absence of a coded electronic medical record system. Possibly, as a result of this cost and because of the difficulty in identifying and training abstractors, record review has not been widely adopted. An alternative to trained chart abstractors, provider self-abstraction, is currently included as an optional module for license renewal for physicians in practice through the American Board of Internal Medicine (ABIM).6 The ACGME is currently evaluating the ABIM method using a small sample of residency programs. Compared with preabstracted performance reports, self-abstraction for performance assessment may have added value for trainees because it allows them to directly reflect on their documentation as they abstract. Although residents have performed self-abstraction in residency training,7–9 previous research has questioned the potential accuracy of providers acting as their own chart abstractors.4 Providers may be biased to overreport performance. The accuracy of resident self-abstraction has not been assessed.
We designed a comparison study to assess the difference in record-review-based performance assessment as audited both by residents themselves and by trained chart abstractors. In designing the analysis, we considered the potential barriers and uses of resident chart audits in a residency program. Barriers include time limitations and rejection by trainees in the ambulatory clinic and training program. Potential uses for data from self-abstraction are feedback and longitudinal comparisons for all residents as a group. Clinical instructors could also use the data to provide individual feedback to residents about their own performance, and residents could compare their performance with that of their peers. To use self-abstraction data for performance feedback and comparisons, it must be acceptable and reasonably accurate.
Study design and participants
In academic year 2005, 37 internal medicine (IM) residents and 37 pediatrics residents conducted record reviews during their residency clinic experience. Residents spent one hour abstracting charts of patients they had seen in their continuity clinic within the past six months. We asked all postgraduate year two and postgraduate year three residents from one IM and one pediatrics residency to participate between February and June 2005. Because this was an evaluation of a quality improvement performance audit, the University of Alabama at Birmingham institutional review board allowed exemption of informed consent from the resident participants and granted a Health Insurance Portability and Accountability Act (HIPAA) waiver and exemption of informed consent at the patient level because patient data were collected without unique identifiers from existing records.
Study setting and patient sample
The IM and pediatrics clinics serve a low-income, urban patient population. Residents attend clinic, on average, one half-day, four to five hours, per week. Clinic notes, as well as lab and procedure reports, are stored in patient charts as paper records in the clinics. Medicine residents dictate clinical notes, and pediatrics residents use a structured paper note with checkboxes for commonly delivered services. Pediatrics charts also contain a sheet to record immunizations.
Selection of preventive services and charts
For both clinics, we identified six evidence-based preventive services that are (1) common in the patient population, (2) relevant to residency education, (3) reliably abstracted for clinic charts in previous studies, and (4) deemed by the residency program as potential areas for improvement.2,10 For IM, these preventive services included a smoking screening, smoking cessation advice, a breast and colon cancer screening, lipid screening, and a pneumonia vaccination. In pediatrics, we screened families for smoking exposure, advised parents who smoked to quit, counseled on use of car seats for children under age four and car restraints for ages four and over, assessed eye alignment, and determined whether immunizations were up to date. To identify patients who were appropriate for these services and to specifically define each measure, we used the United States Preventive Health Task Force documents and current Health Plan Employer Data and Information Set measures.11,12 We based appropriateness of the service for the patient on the definition of each indicator. For example, mammography is not appropriate for males. We randomly selected for abstraction charts of patients whom residents had seen in their continuity clinics.
Resident chart abstraction
We provided residents a one-hour block of time during one continuity clinic session to abstract their charts. Residents abstracted data from charts of patients they themselves had seen during the previous year. For each clinic, we developed a paper form with instructions for each chart abstracted. Before clinic, a research assistant collected charts of patients whom the residents had seen during the past year, separated them by assigned resident, and placed them on the conference room table. When each resident had completed his or her patient care, the research assistant and one of us (L.L.W.) presented a five-minute introduction to the abstraction session, providing a brief overview of the purpose of the review, reviewing the abstraction form, defining the measures, and allowing time for questions. The abstraction form detailed the specific performance measure definitions. Each performance measure definition included the appropriate patient population (e.g., age, gender, screening interval), criteria required for positive performance (e.g., documenting any counseling, obtaining or scheduling any testing), and where the information was likely located in the chart.2,10 Residents then abstracted each chart, blinded to the results of the standardized chart audit by the trained abstractors.
Training of abstractors and standardized chart audit
One of two trained abstractors with extensive experience in chart audit abstracted each resident’s charts, blinded to the results of the resident self-abstraction. The abstractors had been previously trained and had completed chart abstraction for a large audit and feedback of resident performance, including 3,958 charts from the residents’ clinics. During this previous study, they had double-abstracted a 5% sample of charts, with errors adjudicated by group review.10 Error rates averaged less than 3%.
For each of the six measures, we calculated the measured performance, as assessed by both the resident and the trained abstractor. We quantified raw agreement and kappa for each performance measure for the two abstractions. The total number of potential disagreements was the total number of performance measures abstracted for all of the residents’ charts. Raw agreement was the number of disagreements divided by the total number of potential disagreements. Kappa is a measure of agreement that is adjusted for the amount of agreement that could be expected from chance alone. Then, using the record review by trained abstractors as the reference criterion, we calculated the sensitivity and specificity of the residents’ assessments as a group.
Next, we quantified each resident’s performance and ranked the residents from highest to lowest performance on the smoking screening measure. We chose to rank on smoking screening because (1) it is applicable to both pediatrics and medicine and (2) it is applicable to all patients, giving the highest number of charts per resident. We report the number of providers whose ranking changed quartile from the self-abstraction to the trained abstractor abstraction.
Thirty-seven IM and 37 pediatric residents participated. Overall, approximately half of the residents were women (n = 14, 38% IM; and n = 22, 59% pediatrics). IM residents abstracted a total of 120 patient charts (mean per resident = 3.3 [SD 1.9]). Pediatric residents abstracted a higher number of charts, 272 (mean per resident = 7.3 [SD 2.1]). Patients were almost all African American (n = 345; 88%); 70 (58%) of the IM patients were women, and 141 (52%) of pediatric patients were girls. The majority of IM patients (n = 68; 57%) had three or more visits in the measurement period, and the majority (n = 226; 83%) of pediatric patients only had one visit.
Comparing record review by trained abstractors and residents
On the basis of the performance measure definitions above, some measures were not appropriate for some patients (e.g., mammography is not appropriate for men). The total number of potential disagreements (sum of total number of measurements on appropriate patients for all six performance measures) was 618 for IM and 990 for pediatrics. The overall raw rate of disagreement of the six performance measures for IM comparing trained abstractor with resident was 13.4% (83/618), and for pediatrics it was 10.6% (105/990).
Using the trained abstractor as the gold standard, percent agreement, kappa, sensitivity, and specificity of the resident abstractor varied by performance measure (Tables 1 and 2). However, for most measures, the overall performance rates at the clinic level from the resident and trained abstractor record review were similar. For five of the six measures in IM, overall performance measured by residents and abstractors was within 5% (Table 1). Resident-measured performance was lower than that of the trained abstractors for four of the six measures. Results were similar in pediatrics, with four of the six measures within 5% (Table 2).
Differences in individual performance rank based on record review by residents versus trained abstractors
We used smoking screening to evaluate differences in measured performance and impact on relative performance ranking of the residents. We collapsed the patient data at the resident level to obtain mean performance on this measure. The mean performance for IM and pediatrics combined, based on the trained abstractor, was 84% (SD 0.27), and, based on resident abstractor, was 81% (SD 0.3). Eighty-one percent (60/74) of the residents’ measured performance did not differ compared with measured performance based on the trained abstractor, and for only 13 residents (18%) did the residents’ measured performance differ by more than 10%. Comparing quartiles of performance ranking, six residents (8%) fell in performance ranking by two or more quartiles when using their own self-abstracted performance versus the trained abstractor-measured performance, and six residents (8%) rose by two quartiles.
Discussion and Conclusions
Despite some disagreements, we found that the resident self-abstraction provided a fairly equivalent depiction of overall performance, compared with trained abstractor, at the clinic level for the group of residents. For five (IM) and four (pediatrics) of the six measures, the resident-measured performance was within 5% of the abstractors’ audit of documented performance. This suggests that for these measures, resident self-abstraction could be used to provide an estimate of documented performance in clinic, replacing more costly audit by chart abstractors. In a prior abstraction of 761 charts, we calculated the total cost of abstraction (including training of abstractors) to be $7,510, for a per-chart cost of $9.8.13 In comparison, from the perspective of the training program, the cost of the resident self-abstraction was minimal and included the cost of printing paper surveys, time for support personnel to gather charts for residents, and the effort required to hold a block of clinic time open for this activity. However, this perspective does not account for the loss of revenue to the clinic that resulted from the loss of patient appointments to perform self-abstraction. The cost-savings/revenue-loss equation will likely vary with how and where self-abstraction is implemented. We implemented abstraction during a clinic session because we thought it would be most feasible and acceptable to our residents. However, it could also be conducted after clinic, during a lunch session, or some other time during work hours that would not result in loss of clinical revenue.
When measuring the performance of specific residents, the proportion of residents, especially pediatric residents, whose data agreed with the chart abstractor (within 10%) was high for tobacco use screening. We found that measured performance and subsequent performance rank were similar, despite the relatively low number of charts per resident. However, we did not assess provider-specific performance for other indicators that had substantially lower numbers of charts. For these indicators, the number of charts abstracted would likely create an unstable estimate of individual performance.
There may be additional benefits to resident self-abstraction unrelated to accuracy of audit and subsequent feedback.14,15 The act of chart review in itself may be an intervention, allowing self-reflection. Of note, the ABIM has implemented self-abstraction as a module for board recertification.6 We conducted self-abstraction in this study because it is most consistent with the current practices of the ABIM and is likely to be the model that our trainees will experience in the future. Anecdotally, our residents found the abstraction experience acceptable, and several reported that they were “surprised” or “disappointed” that they had not documented provision of preventive services to some patients who were appropriate. Thus, the opportunity to review even a few charts may act as an important training experience.
Our analysis is limited in that it derives from two residency programs from one institution and from a limited sample of residents and charts. Further research across multiple residency programs should assess whether residents can achieve higher levels of agreement with further training or more frequent abstractions. The number of preventive services we abstracted was low and does not reflect the totality of resident performance. We acknowledge that although chart abstraction is an accepted standard for performance evaluation, this method measures documented performance, and additional preventive services may have been delivered that were not documented—or documented but not delivered. We also do not have prospective follow-up data, assessing change in performance after self-abstraction, or comparing the impact of self-abstraction with feedback from trained abstractors’ audits. Understanding whether self-abstraction changes behavior and whether it is as effective as feedback from abstraction from trained abstractors is a critical topic for future research. Other research may also investigate residents’ review of one another’s charts.
We based our self-abstraction strategy on what could be reasonably integrated into a residency training program. We felt that residents would be willing to self-abstract during a clinic block reserved for this purpose. The number of charts represents the number that the residents could reasonably abstract in the given time. The self-abstraction was feasible, although the total number of charts abstracted was limited. We blocked out a time of one hour. On average, IM residents reviewed 3.5 charts per hour, and pediatrics residents reviewed seven charts per hour. We could have blocked off a series of clinic times to enhance the sample of charts per resident. However, as the time burden increases, the feasibility of implementing self-abstraction and the acceptability to residents would likely decline. For those measures with poor agreement, resident abstraction may have been improved if time and training had increased. Clearly, our trained abstractors had been more rigorously trained than the residents and had more experience with chart audit, but this again goes to the feasibility of implementing self-abstraction.
Providing feedback to individual residents or comparing residents using self-abstraction is limited by the number of charts abstracted and, thus, by the precision of the estimate. Because of the limitations in accuracy and precision, we do not feel that self-abstraction is robust enough to serve as the basis of incentive programs in training or promotional criteria. On the basis of our results, we would recommend that audit and feedback be used for formative assessment and self-reflection, not for summative evaluation by the training program. Of note, the residents did not overestimate their performance compared with the trained abstractors. Anecdotally, residents seemed to derive some value from the exercise, although we did not assess perceived importance as part of this analysis.
In conclusion, resident self-abstraction is feasible, acceptable to residents, and cost saving. Our results have important implications for residency training because implementing record review by trained abstractors is cost-prohibitive for many residency programs. Future efforts assessing self-abstraction in a broader array of settings and using a broader array of performance data are needed. Understanding how resident self-abstraction triangulates with other methods of assessment from the ACGME toolbox,5 such as patient surveys, would be a valuable addition to our results. Self-abstraction is likely most appropriate in the context of an overall, multicomponent evaluation using a variety of tools from the ACGME toolbox.
The authors wish to thank Heather Coley for her editorial assistance with this report and for management of our medical education research initiatives.
This project was supported with funding from the University of Alabama Health Services Foundation General Endowment Fund.
Part of this research was presented previously as an oral abstract at the 2006 National Society for General Internal Medicine conference in Los Angeles, California, April 26–29, 2006.
1 Accreditation Council for Graduate Medical Education. Outcome Project. Available at: (http://www.acgme.org/outcome
). Accessed November 24, 2008.
2 Willett LL, Palonen K, Allison JJ, et al. Differences in preventive health quality by residency year. Is seniority better? J Gen Intern Med. 2005;20:825–829.
3 Peabody JW, Luck J, Glassman P, Dresselhaus TR, Lee M. Comparison of vignettes, standardized patients, and chart abstraction: A prospective validation study of 3 methods for measuring quality. JAMA. 2000;283:1715–1722.
4 Allison JJ, Wall TC, Spettell CM, et al. The art and science of chart review. Jt Comm J Qual Improv. 2000;26:115–136.
6 Wasserman SI, Kimball HR, Duffy FD. Recertification in internal medicine: A program of continuous professional development. Task Force on Recertification. Ann Intern Med. 2000;133:202–208.
7 Staton LJ, Kraemer SM, Patel S, Talente GM, Estrada CA. “Correction:” Peer chart audits: A tool to meet Accreditation Council on Graduate Medical Education (ACGME) competency in practice-based learning and improvement. Implement Sci. 2007;2:24.
8 Paukert JL, Chumley-Jones HS, Littlefield JH. Do peer chart audits improve residents’ performance in providing preventive care? Acad Med. 2003;78(10 suppl):S39–S41.
9 Holmboe ES, Prince L, Green M. Teaching and improving quality of care in a primary care internal medicine residency clinic. Acad Med. 2005;80:571–577.
10 Houston TK, Wall T, Allison JJ, et al. Implementing achievable benchmarks in preventive health: A controlled trial in residency education. Acad Med. 2006;81:608–616.
13 Palonen KP, Allison JJ, Heudebert GR, et al. Measuring resident physicians’ performance of preventive care: Comparing chart review with patient survey. J Gen Intern Med. 2006;21:226–230.
14 Holmboe ES, Meehan TP, Lynn L, Doyle P, Sherwin T, Duffy FD. Promoting physicians’ self-assessment and quality improvement: The ABIM diabetes practice improvement module. J Contin Educ Health Prof. 2006;26:109–119.
© 2009 Association of American Medical Colleges
15 Holmboe ES, Rodak W, Mills G, McFarlane MJ, Schultz HJ. Outcomes-based evaluation in resident education: Creating systems and structured portfolios. Am J Med. 2006;119:708–714.