Comparative Trial of a Web-Based Tool to Improve the Quality of Care Provided to Older Adults in Residency Clinics: Modest Success and a Tough Road Ahead

Holmboe, Eric S. MD; Hess, Brian J. PhD2; Conforti, Lisa N. MPH; Lynn, Lorna A. MD

Academic Medicine:
doi: 10.1097/ACM.0b013e31824cecb3
Geriatrics Education and Training

Purpose: To determine whether residency programs can use a multicomponent, Web-based quality improvement tool to improve the care of older adults.

Method: The authors conducted an exploratory, cluster-randomized, comparative before–after trial of the Care of the Vulnerable Elderly Practice Improvement Module in the ambulatory clinics of 46 internal medicine and family medicine residency programs, 2006–2008. The main outcomes were the deltas between pre- and post-performance on the Assessing Care of the Vulnerable Elderly (ACOVE) quality measures.

Results: Of the 46 programs initially selected for the study, 37 (80%) provided both baseline and follow-up data. Performance on all 10 ACOVE measures was poor at baseline (range 8.6%–33.6%). Intervention clinics most frequently chose for improvement fall-risk screening and documentation of end-of-life preferences. The change in the percentage of patients screened for fall risk for the intervention clinics that targeted this measure was significantly greater than the change observed by the control clinics (+23.3% versus +9.7%, P = .003, odds ratio [OR] = 2.0; 95% confidence interval [CI]: 1.25–3.75), as was the difference observed for documentation of preference for life-sustaining care (+16.4% versus +2.8%, P = .002, OR = 6.3; 95% CI: 2.0–19.6) and surrogate decision maker (+14.3% versus +2.8%, P = .003, OR = 6.8; 95% CI: 1.9–24.4).

Conclusions: A multicomponent, Web-based, quality improvement tool can help residency programs improve care for older adults, but much work remains for improving the state of care for this population in training settings.

Author Information

Dr. Holmboe is chief medical officer and senior vice president, American Board of Internal Medicine, Philadelphia, Pennsylvania.

Dr. Hess is director of research analysis, Department of Psychometrics, American Board of Internal Medicine, Philadelphia, Pennsylvania.

Ms. Conforti is research associate for academic programs, Department of Quality Research, American Board of Internal Medicine, Philadelphia, Pennsylvania.

Dr. Lynn is director for practice improvement module research, Department of Quality Research, American Board of Internal Medicine, Philadelphia, Pennsylvania.

Correspondence should be addressed to Dr. Holmboe, American Board of Internal Medicine, 510 Walnut St., Suite 1700, Philadelphia, PA 19106; telephone: (215) 446-3609; e-mail:

Article Outline

In 2008, the Institute of Medicine released its seminal report Retooling for an Aging America: Building the Health Care Workforce.1 The report recommended sweeping changes in the training of the health care workforce, noting deficiencies in medical school, residency, and fellowship training as major impediments to improving the care of older patients. Because few trainees are choosing geriatrics as a career,2,3 medical educators need to train all physicians, regardless of their specialty, both to meet the specific health care needs of older adults and to incorporate quality improvement (QI) activities around geriatric care into their practice. Regrettably, current data suggest that the medical education community is falling far short of the mark in the United States.4,5

Given this backdrop, all residents in internal medicine (IM) and family medicine (FM) should examine the quality of care they provide to older patients during their residency training, and they should act on the resulting data to implement improvement in their practices. However, current residency experiences in geriatrics are often limited to one-month rotations in nursing homes or geriatric assessment units.6 Residents are unlikely to learn and, subsequently, to provide, high-quality ambulatory care to older adults if the clinical site of their training does not provide high-quality geriatric care.

By the end of their training, residents should be capable of reviewing data from their practice and implementing a change to improve care.7–9 Targeting the quality of care for older adults in residency programs provides the potential for substantial synergy for residents to learn both about QI and about important care processes for older patients in their clinics.10

The primary objective of this exploratory, cluster-randomized, comparative before–after study was to determine whether IM and FM residency programs, using a Web-based assessment tool called the American Board of Internal Medicine (ABIM) Care of the Vulnerable Elderly (CoVE) Practice Improvement Module (PIM), can assess and improve the quality of care provided to older patients in their ambulatory clinics. Although previous research found ABIM PIMs to be effective in identifying gaps in care,11–16 to our knowledge, no comparative trial of the PIMs has ever been done in residency training, and no study has attempted to both examine and improve the quality of geriatric care delivered in residency ambulatory clinics. Given the rapidly growing population of older adults in developed countries worldwide, effectively preparing the future physician workforce to care for this population will be essential.1

Back to Top | Article Outline


The ABIM CoVE PIM is based on RAND’s Assessing Care of the Vulnerable Elderly (ACOVE) project.17 The module allows a physician or group of physicians (hereafter simply “physician”) to recognize gaps between actual and ideal practice performance in several areas that are critical to providing high-quality care to the elderly by performing a self-audit of their practice. Specifically, the module addresses fall screening and prevention; screening for incontinence, cognitive impairment, depression, and polypharmacy; immunization to prevent influenza and pneumonia; and documentation of end-of-life preferences.

The ABIM CoVE PIM takes data from three sources: (1) a medical record audit, (2) a patient survey, and (3) a survey about the microsystem (i.e., the organization of the practice). The quality measures for the medical record audit focus on the conditions above and are taken from the ACOVE guidelines.4,17 The patient survey, available in English and Spanish, includes 32 questions that allow the physician to get patients’ feedback on the care they receive, specifically their access to care, their communication with medical providers, and the health-related education they receive. Questions about the practice microsystem have been modified from the National Committee for Quality Assurance Provider Practice Connections survey to address elements of geriatric care that may be best addressed through the microsystem as a whole, including the use of information technology, access and reminder systems, and methods used to activate patients.5

The data gleaned from the medical record audit, patient survey, and microsystem survey are submitted over the Internet, and the physician then receives a comprehensive report that provides information from all three of these sources. The physician is asked to reflect on the results and then choose several areas of practice that he or she would like to improve as part of a personal improvement plan. To finish the module, the physician completes an impact statement reporting the results of the improvement plan, including the barriers that she or he encountered. The PIM is available to training programs for $100 per module plus $25 per participant (the cost of the PIM was waived for study participants).

Back to Top | Article Outline


Study participants

In March 2006, the ABIM invited all the program directors and geriatric fellowship directors of residency programs in the United States (N = 840) to apply for participation in a study to determine and improve the current state of care for the older patients seen in their training programs. ABIM offered programs an incentive of $250 per participating resident for successful completion of the study in order to offset the anticipated costs of participation. In addition, ABIM offered $1,000 per program, per audit period, to access and twice audit (at baseline and follow-up) 50 to 75 patient charts. First, the authors (E.S.H., L.A.L., and L.N.C.) reviewed all the applications to ensure that they met eligibility criteria. Then, a panel of seven experts in geriatric medicine and QI used a five-point rating scale to score both the quality of the proposed implementation plan and the institutional support of the applicant training programs that passed the initial review (Figure 1).

Once we selected programs for participation, we randomly assigned them to either intervention or control conditions, after stratifying them on the basis of major program characteristics such as IM and FM specialty, geographic region, size of program, community versus university-based program, and presence of a fellowship in geriatric medicine.

Back to Top | Article Outline
Study design

We designed this investigation to be an exploratory, cluster-randomized, comparative before–after study17 using the CoVE PIM. The residency clinic served as the unit of analysis for several reasons. First, residents are nested within the clinic. Second, QI is performed as a group activity and involves changes to clinic systems. Because quality performance is often variable between clinics, we allowed the clinics in the intervention group to choose what quality indicator they wished to improve on the basis of the baseline (“before”) results.

Our study is exploratory in nature. We followed the principles and guidance of the Medical Research Council–United Kingdom (MRC-UK) for investigating complex health-services interventions by using a comparative trial design that seeks to illuminate the effects of a multicomponent, replicable, complex intervention on patient care.18–20 As such, this study specifically targets level 4 (results for patients) in the Kirkpatrick evaluation framework.21 The MRC-UK defines complex interventions as “built up from a number of components, which may act both independently or interdependently” and recommends that a randomized controlled trial (RCT) should be preceded by exploratory comparative studies that inform the design of a potential RCT.18

Back to Top | Article Outline
Intervention group

A previous qualitative study with residents suggested that the self-audit activity of PIMs is valuable to help residents learn QI skills.16 Thus, for the intervention arm of our current study, residents completed medical record audits and the system survey at baseline to gather all the data needed for the performance report at the clinic-site level. We expected the residents at the programs in the intervention arm to audit at least 50 patient charts per program (up to 5 per resident of his or her own older patients). As a component of the intervention, we instructed residents in the intervention programs to review the results of their clinic’s aggregate baseline performance data and to work with local faculty to plan and implement a QI intervention based on the Plan-Do-Study-Act model.22 We encouraged the programs with more than one clinic site to collect an equal number of charts from each site, again with a total of at least 50 patient charts per program.

However, consistent with principles of exploratory comparative trials of complex interventions,18 we gave, as mentioned, intervention clinics in this study the flexibility to focus on improving the quality measure (e.g., improving fall-risk screening) they believed most important for their clinic setting and context. In addition to being more patient centered and context relevant, this approach also enabled us to determine the magnitude of improvement that we could then use, in turn, to appropriately power a future and formal RCT.18,19 Both the control and intervention groups received a QI tool kit consisting of geriatric-specific educational materials22–24 and a book on implementing QI.22 The control group residents did not perform any audit activities and did not receive any performance data.

Back to Top | Article Outline

Data collection for analyses

A local, trained abstractor independently completed a patient chart audit for 50 to 75 patients per program (a minimum of 25 charts per clinic site) for both the intervention and control clinics. For the intervention group, this audit occurred parallel to the residents’ audit of charts. Thus, for the intervention clinics, charts were abstracted twice; we used the abstractors’ data to analyze the effectiveness of the intervention. We did not share the data that this local abstractor collected with the study participants in either arm, and the trained abstractor was blinded to the aggregate submitted results throughout the study period (i.e., the abstractor was unaware of the aggregate results of the baseline data at all times, including during follow-up data collection). The abstractor used a retrospective sequential sampling strategy; specifically, we instructed him or her to review the clinic’s appointment log, going back at least one year, to select the first 50 to 75 patients who visited the clinic and who met the inclusion criteria. Patients must have been in the practice for at least 12 months; thus, new patients were excluded. Patients who were nonambulatory (bed- or wheelchair bound), had severe cognitive impairment (defined as impairment that requires total care), or had a terminal illness (life expectancy less than one year) were also excluded per the CoVE PIM and ACOVE17 guidelines.

This population of patients served as the study cohort to assess the effect of the interventions, and these patients’ charts were audited at both baseline and follow-up. We have reported the results of these audits here. ABIM provided an abstractor’s manual as well as technical and study-related support as needed for the local abstractor. All programs received local institutional review board (IRB) approval before data collection began. The range of time for each program to obtain IRB approval was 1 to 56.5 weeks; average time was 18 weeks. (Approval times varied so greatly because, among other reasons, some IRBs mounted full ethical reviews.25) Programs began baseline data collection in October 2006, and the last program completed this in December 2007; follow-up data collection began in April 2008 and ended in December 2008.

Back to Top | Article Outline
Study variables

The chart audit for the ABIM CoVE PIM involves a review of the basic medical history information that a chart should capture, including patient demographics (e.g., age, sex), occurrence of chronic conditions important in older adults (e.g., hypertension, Parkinson disease), and health-related habits (e.g., smoking, alcohol use, physical activity)17; see Table 1. The chart audit also assesses whether or not providers performed various geriatric processes of care (e.g., screens for cognitive impairment, fall risk, and urinary incontinence; documentation of ability to provide self-care and end-of-life preferences) and nongeriatric processes of care (e.g., documenting current levels of exercise and alcohol use, providing influenza vaccination); see Table 2. Finally, we calculated the number of clinics, residents, and patients that were involved with the two specific quality measures that programs most frequently targeted for improvement, specifically fall-risk screening and documentation of end-of-life preferences (Table 3).

Back to Top | Article Outline
Statistical analyses

We used results from prior research from practicing physicians, which suggested that average performance on ACOVE quality measures ranged between 31% and 52%,4 to estimate a sample size that would have power to detect a reasonable difference in compliance rates at the patient level for a single performance measure. Assuming, a priori, a 20% dropout rate for programs, an effect size of 10%, and a baseline compliance rate of 40% to 50% on the ACOVE quality measures, we estimated that a total sample size of 910 patients (455 per group) would have the power to detect differences in compliance rates.

To compare the intervention and control clinics on patient characteristics and process measures at baseline, we used chi-square (χ2) significance tests and phi coefficients (effect size) to compare observed proportions at the patient level; exceptions were the mean age of patients and the number of chronic conditions per patient (which we compared using t tests and Cohen d effect sizes). To assess the change in the performance of selected measures for improvement, we performed a logistic regression analysis because the performance outcome was at the patient level (e.g., we coded patients 0 if they did not receive the process or 1 if they did receive it). We began by conducting an analysis for the intervention and control groups separately in an effort to assess the degree of change in performance from baseline to follow-up. Next, to assess whether the difference was larger for the intervention group compared with the control group, we tested the following logistic model:

Y = β0 + βTime + βGroup + βTime × Group + e,


Y is the likelihood that the measure was performed,

β0 is the intercept,

βTime is the coefficient for time (baseline versus follow-up),

βGroup is the coefficient for group (intervention versus control),

βTime × Group is the interaction coefficient, and

e is the error term.

A statistically significant time × group interaction coefficient indicated that the degree of change observed between the two groups was different. We also reported odds ratios (ORs) to describe all observed differences. Because targeted measures for improvement were processes of care specifically selected for vulnerable elderly and, therefore, in direct control of physicians, we did not adjust for patient characteristics.4,17 For the comparative analysis, we included only those clinics in each arm that completed both the baseline and follow-up data collection (per-protocol analysis). We assessed statistical significance at P < .05, and we performed all analyses using SPSS statistical software 12 (SPSS Inc, Chicago, Illinois).

Back to Top | Article Outline



Of the 840 total IM and FM residency training programs, 110 applied for inclusion. Of those 110, the panel of seven geriatric care and QI experts selected 60 for final review. In total, we chose 46 programs (12 FM and 34 IM) programs for inclusion (Figure 1); however, before starting baseline data collection, 4 programs dropped out of the study (2 in the intervention arm and 2 in the control arm). Further, after beginning data collection, 1 intervention arm program dropped out because of hospital closure, leaving 20 programs in the intervention group and 21 in the control group. The final cohort of participating programs was broadly distributed across the United States, represented university-based, university-affiliated, and community-based programs, and included programs of varying size.

Back to Top | Article Outline
Patient characteristics and baseline processes of care

Of the 41 programs that completed the baseline data collection, 37 (90%) also completed the follow-up data collection. This translates into 17 intervention programs (85% of 20) with 18 clinics, and 20 control programs (95% of 21) with 27 clinics (Figure 1). At baseline, the 17 intervention programs that completed the study had 829 patients, and the 20 control programs had 1,096 patients. At follow-up, the abstractors reaudited 668 patients (81%) from the intervention arm and 910 patients (83%) from the control arm.

Table 1 presents baseline patient characteristics from all 37 programs (intervention and control) that completed the study. We did observe some differences in characteristics between the intervention clinic and control clinic patients; however, we judged these to be small on the basis of the magnitude of the observed differences (i.e., effect sizes were 0.13 and smaller). The largest differences indicated that the intervention clinics had a significantly greater percentage of patients who were female and had patients with a smaller number of chronic conditions on average. Table 2 presents baseline performance on process-of-care measures by the intervention and control clinics. Some differences between the two groups were significant; however, these differences were also modest (effect sizes were 0.13 and smaller). The performance for both the intervention and control clinics across the geriatric-specific performance measures was poor. For example, of the 10 geriatric-specific measures, 9 were below 30% performance, and none were above 35% performance at baseline.

Back to Top | Article Outline
Performance on measures targeted for improvement

Of the 18 intervention clinics that completed the study, 9 chose to improve fall-risk screening, 2 clinics chose documentation of end-of-life preferences (i.e., preference for life-sustaining care and surrogate decision maker), and 1 clinic chose to improve both fall-risk screening and documentation of end-of life preferences. The remaining 6 clinics in the intervention arm selected other measures (health care maintenance exams, orthostatic hypotension, home safety surveys, geriatric health maintenance forms, improved documentation in general, and general evaluation for core geriatric conditions; data not shown). Table 3 presents demographic information for the control clinics and the two subsets of intervention clinics focusing on fall-risk screening and end-of-life preference documentation. The mean age of patients was consistent across the clinic groups at baseline and at follow-up.

Table 4 presents the performance rates on the fall-risk screening and documentation of end-of-life preference measures at baseline and at follow-up. For the fall-risk screening measure, both the intervention clinics targeting falls and the control groups performed similarly at baseline (11.4% versus 11.6%, χ2 = 0.02, P = .89, phi = 0.003), and both groups showed significant pre–post improvements in performance at follow-up (Table 4). However, the time × group interaction term from the logistic regression model was significant (β = 0.69, P = .003, OR = 2.0, 95% confidence interval [CI]: 1.25–3.75) for the intervention groups targeting this measure. That is, the change in performance at follow-up for the clinics that targeted fall-risk screening was significantly greater than the change observed by the control clinics (+23.3% versus +9.7%). Examination of the intervention groups’ QI plans indicated that 9 of the 10 clinics used some form of a clinical reminder, such as a template or fall-screening tool in either a paper-based (5 clinics) or electronic (4 clinics) format.

For documentation of preference for life-sustaining care, the intervention clinics performed significantly lower at baseline (2.9% versus 9.7%, χ2 = 7.6, P = .006, phi = 0.08), but only the intervention clinics that targeted this measure for improvement showed a significant change in performance at follow-up (Table 4), as evidenced by a significant time × group interaction (β = 1.84, P = .002, OR = 6.3, 95% CI: 2.0–19.6). That is, the change in performance at follow-up for the intervention clinics that targeted documentation of preference for life-sustaining care for improvement was significantly greater than the change in performance observed by the control clinics (+16.4% versus +2.8%). Examination of the intervention groups’ QI plans revealed that one clinic used a new summary form, and another clinic used a form that triggered questions about advance directive.

For documentation of a surrogate decision maker, the intervention clinics performed significantly lower at baseline (2.2% versus 8.8%,χ2 = 7.7, P = .005, phi = 0.08), but only the intervention clinics that targeted this measure for improvement showed a significant change in performance at follow-up (Table 4), as evidenced by a significant time × group interaction (β = 1.91, P = .003, OR = 6.8, 95% CI: 1.9–24.4). That is, the change in performance at follow-up for the intervention clinics that targeted documentation of a surrogate decision maker for improvement was significantly greater than the change in performance observed by the control clinics (+14.3% versus +2.8%). Examination of the intervention groups’ QI plans revealed that one clinic placed a sticker on the chart regarding the patient’s power of attorney.

Back to Top | Article Outline

Discussion and Conclusions

This cluster-randomized, comparative before–after, complex-intervention study suggests that, indeed, residency programs can incorporate a multifaceted, Web-based QI tool to improve the quality of care they provide to older adults. The ultimate goal of any intervention targeting clinical education should be to facilitate both enhanced competence at the trainee level and, ultimately, better care for patients seen in training settings.1 However, although the change in performance was higher for clinics using the ABIM CoVE PIM for important performance measures, the overall quality level of care for older adults in residency clinics was poor, even after QI interventions.

Although the overall level of performance is sobering, we should emphasize several points. First, the intervention clinics did target areas of performance in which they were performing especially poorly at baseline, highlighting the fact that PIMs can illuminate areas in need of improvement. Second, improving quality is an iterative process that requires repeated cycles of change to reach optimal performance.26 Because of protocol, we had to assess the effect of the QI intervention within a predefined period of time. Future cycles should examine the longitudinal effects of ongoing audit and QI efforts in residency programs; whether programs that began interventions to improve care experienced a further upward or, conversely, a downward, trajectory post follow-up data collection is unknown. Third, the ABIM CoVE PIM in the intervention arm did catalyze a QI intervention by providing data on the programs’ levels of performance on validated, important processes of care for older adults.

We note several limitations. First, we selected the programs that participated through an application process, and therefore our findings may not be generalizable. However, this study may represent a best-case scenario in that these programs were highly motivated and received monetary support. Second, programs had highly variable experiences in obtaining local IRB approval that led, in turn, to variable amounts of exposure to the intervention and time to implement their QI plan. Longer amounts of time procuring IRB approval for some programs likely resulted in increased turnover at the resident level during the course of the study and may have contributed to lack of greater improvement. However, resident turnover is an inevitable aspect of training, and programs need to put into place systems that ensure that clinical care and education continually advance and improve.

Third, residency programs today are in a perpetual state of flux, and we lost four programs between the completion of baseline to the follow-up period because of a clinic closure and loss of on-site principal investigators; however, changing programs, like turnover, are an expected part of graduate medical education. Programs should ensure that QI programs will survive despite and with programs’ evolutions. Fourth, a proportion of the patient cohort was lost to follow-up between the baseline and follow-up audit, and programs did not have mechanisms to determine what happened to the patients. Finally, although programs were cluster randomized, the study did not have sufficient statistical power to perform a cluster-specific analysis for all the performance measures27 because the programs, per exploratory study design,18,19 selected different measures to target.

Lastly, though not a limitation, we should acknowledge that the research team, employed by the ABIM, has an interest in the success of the ABIM PIMs. We undertook multiple steps to ensure the integrity of the study, and we believe that reporting results that are modest in effect is important in advancing our understanding of the challenges facing residency education.

Despite some limitations, our findings highlight both the difficulty of providing high-quality care for older adults and of conducting multisite intervention studies in residency training clinics. As recommended by others, we used a cluster-randomized, before–after trial design to examine the effects of an intervention on patient care as a precursor to an RCT.18–20 This study provides early data and insight into what will be necessary to move QI forward in the diverse, fragmented environment of graduate medical education. However, our results also suggest that RCTs may not be the best methodology for studying complex problems in medical education. A growing number of investigators are arguing for different types of research strategies that embrace the complexity of the environments and the contexts in which complex interventions are implemented.18–20,28 Given the heterogeneity of the clinics that participated and the variable results, future studies should seriously consider mixed qualitative–quantitative approaches—for example, combining on-site interviews about how a QI intervention was implemented in combination with performance measures. Further, designing programs to implement the recently released specific geriatric competencies for medical students and residents is a logical next step because these competencies target many of the important ACOVE performance measures.17,29

Despite the modest effects indicated by our results, we believe that several lessons can be gleaned from our study. First, this study suggests that the overall quality of care delivered in IM and FM clinics, which are core to the care of older adults, is poor and that residency clinics should focus more attention on improving the quality of care for this population. With the rapid aging of the U.S. population and the need for IM and FM physicians to possess the knowledge, skills, and attitudes to care for these older adults, our findings should heighten concern and provide further impetus to accelerate needed changes to residency training in the ambulatory setting.1 Future studies should investigate why some clinics were more successful than others.30

In conclusion, we believe that medical educators should feel—and act on—a sense of urgency both to advance the care of older adults seen in training settings and to move QI research forward in training across multiple sites. Despite the challenges facing graduate medical education, the ABIM CoVE PIM constitutes a potentially good tool to help training programs move forward.

Acknowledgments: The authors thank all the participating residency program faculty and institutions, as well as the research assistants, medical record abstractors, coordinators, residents, and fellows from each site; the authors acknowledge their courage and leadership in exploring the quality of care provided to older patients. The authors thank the patients who took part in the study. Lastly, they would like to thank Rebecca S. Lipner, PhD, for reviewing the manuscript and Weifeng Weng, PhD, for helping with the statistical analysis.

Funding/Support: The study was funded by grant no. B06-01 from the Josiah Macy Jr. Foundation and by the American Board of Internal Medicine Foundation (ABIMF). The Josiah Macy Jr. Foundation and the ABIMF had no role in the design or conduct of the study, in the collection or analysis of the data, or in the preparation of the manuscript.

Disclosures: Dr. Holmboe reported that he received royalties for a textbook on Assessment from Mosby-Elsevier. Dr. Holmboe also serves on committees at the Accreditation Council for Graduate Medical Education and the American Board of Medical Specialties. He is a board member of the National Board of Medical Examiners. All authors are employed by the American Board of Internal Medicine.

Ethical approval: All study sites obtained local internal review board approval before starting the study.

Previous presentations: Portions of this work were previously presented at the 2009 Association of Medical Education of Europe (AMEE) Conference, August 29 to September 2, 2009, Málaga, Spain.

Back to Top | Article Outline


1. Institute of Medicine, Committee on the Future Health Care Workforce for Older Americans.. Retooling for an Aging America: Building the Health Care Workforce. 2008. Washington, DC National Academies Press
2. Hauer KE, Durning SJ, Kernan WN, et al. Factors associated with medical students’ career choices regarding internal medicine. JAMA. 2008;300:1154–1164
3. Garibaldi RA, Popkave C, Bylsma W. Career plans for trainees in internal medicine residency programs. Acad Med. 2005;80:507–512
4. Wenger NS, Solomon DH, Roth CP, et al. The quality of medical care provided to vulnerable community-dwelling older patients. Ann Intern Med. 2003;139:740–747
5. Lynn LA, Hess BJ, Conforti LN, Lipner RS, Holmboe ES. Clinic systems and the quality of care for older adults in residency clinics and in physician practices. Acad Med. 2009;84:1732–1740
6. Reuben DB, Bachrach PS, McCreath H, et al. Changing the course of geriatrics education: An evaluation of the first cohort of Reynolds geriatrics education programs. Acad Med. 2009;84:619–626
7. Ogrinc G, Headrick LA, Mutha S, Coleman MT, O’Donnell J, Miles PV. A framework for teaching medical students and residents about practice-based learning and improvement, synthesized from a literature review. Acad Med. 2003;78:748–756
8. Windish DM, Reed DA, Boonyasai RT, Chakraborti C, Bass EB. Methodological rigor of quality improvement curricula for physician trainees: A systematic review and recommendations for Change. Acad Med. 2009;84:1677–1692
9. Philibert I. Involving Residents in Quality Improvement: Contrasting “Top-down” and “Bottom-up” Approaches. Accessed January 20, 2012
10. Medicare Payment Advisory Commission.. Chapter 1: Medical education in the United States: Supporting long-term delivery system reforms. In: June 2009 Report to the Congress. Improving Incentives in the Medicare Program. Accessed January 23, 2012
11. Holmboe ES, Meehan TP, Lynn L, Doyle P, Sherwin T, Duffy FD. Promoting physicians’ self-assessment and quality improvement: The ABIM diabetes practice improvement module. J Contin Educ Health Prof. 2006;26:109–119
12. Simpkins J, Divine G, Wang M, Holmboe E, Pladevall M, Williams LK. Improving asthma care through recertification: A cluster randomized trial. Arch Intern Med. 2007;167:2240–2248
13. Duffy FD, Lynn LA, Didura H, et al. Self assessment of practice performance: Development of the ABIM Practice Improvement Module (PIM). J Contin Educ Health Prof. 2008;28:38–46
14. Mladenovic J, Shea JA, Duffy FD, Lynn LA, Holmboe ES, Lipner RS. Variation in internal medicine residency clinic practices: Assessing practice environments and quality of care. J Gen Intern Med. 2008;23:914–920
15. Oyler J, Vinci L, Arora V, Johnson J. Teaching internal medicine residents quality improvement techniques using the ABIM’s practice improvement modules. J Gen Intern Med. 2008;23:927–930
16. Bernabeo EC, Conforti LN, Holmboe ES. The impact of a preventive cardiology quality improvement intervention on residents and clinics: A qualitative exploration. Am J Med Qual. 2009;24:99–107
17. Shekelle PG, MacLean CH, Morton SC, Wenger NS. ACOVE quality indicators. Ann Intern Med. 2001;135:653–667
18. Medical Research Council (United Kingdom).. A Framework for the Development and Evaluation of RCTs for Complex Interventions to Improve Health. 2000 London, UK Medical Research Council Accessed January 11, 2012
19. Campbell M, Fitzpatrick R, Haines A, et al. Framework for design and evaluation of complex interventions to improve health. BMJ. 2000;321:694–696
20. Campbell NC, Murray E, Darbyshire J, et al. Designing and evaluating complex interventions to improve health care. BMJ. 2007;334:455–459
21. Kirkpartick DL. Evaluating Training Programs: The Four Levels. 1998. San Francisco, Calif Berrett-Koehler Publishers
22. Langley GJ, Nolan KM, Nolan TW, Norman CL, Provost LP. The Improvement Guide: A Practical Approach to Enhancing Organizational Performance. 1996. San Francisco, Calif Jossey-Bass
23. Reuben DR, Herr KA, Pacala JT, Pollock BG, Potter JF, Semla TD Geriatrics at Your Fingertips.. 20079th ed New York, NY American Geriatrics Society
24. Beck JC. American Geriatrics Society. The Geriatrics Review Syllabus: A Core Curriculum in Geriatric Medicine.. 2006.6th ed (GRS6) New York, NY American Geriatrics Society
25. Conforti LN, Ross KM, Hess BJ, Lynn LA, Holmboe ES. Length of time needed for institutional review board approval or exemption of quality improvement projects among subset of US training programs. Presented at: Academy for Healthcare Improvement Symposium; December 2008; Nashville, Tennessee. Accessed January 11, 2012
26. Nelson EC, Batalden PB, Godfrey MM. Quality by Design: A Clinical Microsystems Approach. 2007. San Francisco, Calif Jossey-Bass
27. Moineddin R, Matheson FI, Glazier RH. A simulation study of sample size for multilevel logistic regression models. BMC Med Res Methodol. 2007;7:34
28. Pawson R, Tilley N. Realistic Evaluation. 1997. London, UK Sage Publications
29. Leipzig RM, Granville L, Simpson D, Anderson MB, Sauvigné K, Soriano RP. Keeping granny safe on July 1: A consensus on minimum geriatrics competencies for graduating medical students. Acad Med. 2009;84:604–610
30. Pham HH, Bernabeo EC, Chesluk BJ, Holmboe ES. The roles of practice systems and individual effort in quality performance. BMJ Qual Saf. 2011;20:704–710
© 2012 Association of American Medical Colleges