Hess, Brian J. PhD; Weng, Weifeng PhD; Holmboe, Eric S. MD; Lipner, Rebecca S. PhD
Substantial deficiencies in the quality of health care delivered to patients in the United States are well documented.1,2 Effective microsystems must have competent physicians who possess sufficient knowledge and cognitive skills to make the correct diagnosis and who exercise informed decision making.3 The link between what physicians “know” and what physicians “do” (e.g., performance) in clinical practice is critical to measuring quality patient care4 because effective clinical performance requires the integration of medical knowledge with other important competencies. In Miller's pyramidal hierarchy of testing clinical competence and performance,5 testing of knowledge forms the foundation, followed by application of knowledge. Therefore, a secure examination of cognitive skills is an important component of specialty board certification and maintenance of certification (MOC) programs. The examination is an effective and efficient way of testing physicians' knowledge of the breadth of a medical field. Its purpose is to assess physicians' ability to incorporate and synthesize new knowledge to arrive at a correct diagnosis or treatment, and it is used to ensure the public that physicians are held accountable to meeting a minimum level of competence of cognitive ability.6
Studies have shown that physicians with stronger cognitive skills, as measured by a secure examination, provide better quality of care.7–10 For practicing physicians, higher MOC examination scores have been associated with higher rates of processes of care (e.g., mammography screening) for Medicare beneficiaries.7 Although the link between physicians' cognitive performance and quality of care assessed through individual process and patient outcome measures has been demonstrated, previous research has not examined this relationship using a composite measure of performance for a specific chronic medical condition (e.g., diabetes) and for a broad range of patients. Composite measures comprising intermediate outcome, process, and patient experience measures that are psychometrically robust can comprehensively represent the overall clinical care that physicians provide to patients with a specific condition,11–13 provided that they include valid measures that are evidence based, relevant to practice, and statistically sound.14 Research examining the relationship between cognitive skills and clinical performance in the context of a specific clinical domain is negligible—for instance, do practicing physicians with greater knowledge and skill in endocrine disease provide better care to diabetic patients? Furthermore, prior research examining the relationship between cognitive performance and quality care has focused on initial certification examination performance; only one study has focused on MOC examination scores,7 and, as a result, others have called for additional studies to better elucidate the value of MOC programs.15 Therefore, the purpose of our study is to examine the relationship between cognitive skills and care of diabetic patients using internal medicine MOC examination scores and diabetes composite scores.
We used the American Board of Internal Medicine (ABIM) internal medicine MOC examination, a secure, computer-based examination comprising 180 patient vignettes that require a single-best-answer response, to measure practicing physicians' cognitive skills (scores reflect fund of medical knowledge, diagnostic acumen, and clinical judgment in general internal medicine). Physicians were expected to integrate information, prioritize alternatives, and/or use clinical judgment to reach an appropriate decision about a course of action in each question. Table 1 displays the medical content domains and percentage of questions in that domain as defined by the blueprint (the expanded blueprint is available at http://www.abim.org/pdf/blueprint/im_moc.pdf).
We used clinical performance data from the ABIM Diabetes Practice Improvement Module (Diabetes PIM), a Web-based self-evaluation tool that guides physicians through collecting data from their own practice, using medical chart reviews, patient surveys, and a practice system survey—all of which form a comprehensive performance assessment.16 We encouraged physicians to abstract 25 patient charts and distribute 25 patient surveys using a retrospective or prospective sequential sample, or a systematic random sample, with a minimum of 10 charts and 10 surveys required. Patients were eligible if they had type 1 or type 2 diabetes, were between 18 and 75 years old, and received care from the physician for at least 12 months (including at least one visit within the past 12 months), with diabetes care management decisions made primarily by that physician. To acknowledge the difficulty of scheduling office visits, we gave a grace period of one to three months, depending on the recommended interval, to periodic measures (e.g., retinal exam).
The development of the diabetes composite measure has been previously described.13 Briefly, measures used in the diabetes composite are shown in Table 2. The intermediate outcome and process measures, originally developed by the National Committee for Quality Assurance in partnership with the American Diabetes Association, use guidelines that are evidence based describing ideal diabetes care.17 We defined performance on intermediate outcome measures as the percentage of a physician's patient panel that met the recommended performance level. For process measures, we calculated the performance as the percentage of a physician's patient panel that received the test/exam or counseling. The two patient experience measures, created using specific patient survey questions (see the footnote under Table 2), were included because they underscore the importance of patient-centered care.
We obtained data from a retrospective cohort of 676 physicians certified only in general internal medicine between 1990 and 2002, were enrolled in MOC (2% were enrolled for the second time), and completed the MOC examination and the Diabetes PIM between 2005 and 2009 to satisfy MOC requirements. Certificates have a 10-year time limit, and physicians certified since 1990 must complete MOC every 10 years to maintain certification.
We used physicians' scores from their first attempt on the internal medicine MOC examination. Because physicians took the examination at different times, overall scores were equated and reported on a standardized score scale (mean = 500, SD = 100).18 Equated scores were not available for each medical content domain (e.g., endocrine disease); therefore, percentiles were used to measure physicians' relative ability in each domain. Percentiles are more stable than raw scores because the average ability and distribution of first takers (as measured by the overall equated score) was not significantly different across administrations (2005 through 2009). The average reliability (alpha) coefficients across administrations for individual medical content domains ranged from .32 to .65 (Table 1); coefficients for individual domains were generally similar across administrations. Alpha coefficients for overall scores ranged from .89 to .91 across administrations.
To compute physicians' scores on the diabetes composite, each physician's actual performance rate for an individual measure was multiplied by a point value (or weight) assigned to it by an expert panel (Table 2).13 For example, if a physician completed a foot examination for 50% of his or her patients and its weight was four points, then the physician would receive two points for that measure (.50 × 4 = 2). Because physicians typically have direct control of processes of care, if a physician's performance rate did not meet or exceed the threshold for a process measure (Table 2), then the physician earned zero points for that measure. We determined process measure thresholds, as well as points for each measure, through a consensus-based, standard-setting process previously described.19 Points earned for individual measures were summed to yield a total score between 0 and 100 points. Psychometric evidence supports the composite score; the reliability coefficient, estimated using a bootstrap (resampling) method that takes into account the effect of nesting patients within physicians, is high—approximately .92.13 To examine associations with the types of measures that make up the composite, we also computed separate subcomposite scores for the intermediate outcome, process, and patient experience measures by summing the points earned for those measure sets.
We used multiple regression analysis to examine the relationship between overall internal medicine MOC examination scores and the diabetes composite scores, controlling for physician and patient characteristics (at the physician level). We also examined regression models for each individual measure and for the intermediate outcome, process, and patient experience subcomposites separately. Physician characteristics obtained from the ABIM database of MOC enrollees were age, gender, type of practice (solo versus nonsolo), average percent of time practicing in an ambulatory setting, and birth country (United States/Canada versus international) and medical school graduation country (United States/Canada versus international), which were combined and dummy coded so that each group was compared with U.S./Canadian born and internationally trained (reference group). Research has shown that U.S. citizens who graduated from international medical schools perform lower on quality-of-care measures compared with physicians who graduated from international medical schools and were not U.S. citizens.20 Patient characteristics were age and gender averaged at the physician level. Interactions between explanatory variables were also tested. Stepwise regression dictated variable inclusion in the model, starting with the full model and ending when the minimum Akaike information criterion value was found among the series of models. To ensure that all other explanatory variables were not highly correlated in each regression model, we examined bivariate correlations and collinearity diagnostic statistics (i.e., variance inflation factor and tolerance indexes). Next, we used a series of similar models to examine the association between percentile scores from each individual medical content domain and the diabetes composite scores, again controlling for physician and patient characteristics. Because individual medical content domain scores were less reliable than overall examination scores, we adjusted the coefficient associated with the medical content domain performance to correct for the attenuation in reliability (i.e., assume perfect reliability).21 In each of these models, we controlled for overall examination performance using pass/fail status instead of overall examination scores because of the high correlation between the domain scores and overall examination scores. Analyses were performed using SPSS, version 12.0 (SPSS Inc., Chicago, Illinois). Ethical approval for our study was given by the Essex Institutional Review Board, Inc.
Of the 676 physicians in our sample, 189 (28%) were in solo practice and 451 (67%) were men. Two hundred seventy (40%) were born and graduated from medical school in the United States or Canada, whereas 41 (6%) were born in the United States or Canada but graduated from an international medical school; conversely, 304 (45%) were internationally born and graduated from an international medical school, whereas 61 (9%) were internationally born but graduated from a U.S. or Canadian medical school. Most of physicians' time in clinical practice was spent in an ambulatory setting (mean = 77%, SD = 20%); mean age was 46 (SD = 6 years). This sample was similar to the population of all general internists with time-limited certificates enrolled in MOC except for a slightly higher percentage of solo practitioners. It is not representative of older physicians with time-unlimited certificates not enrolled in MOC.
The mean number of charts abstracted per physician was 21 (SD = 7.1), which yielded 14,095 patient charts. The mean patient age for the charts was 58.9 (SD = 10.7), and 7,188 (51%) patients were male. The mean number of patient surveys per physician was 19.7 (SD = 7.2), which yielded 15,267 patient surveys. The mean patient age for the surveys was 58.9 (SD = 10.7), and 7,634 (50%) were male. Patient age and gender were similar for the two data sources (chart and patient survey).
Physicians performed consistently across the medical content domains (Table 1) and tended to score slightly lower than the average physician who took the examination; the mean equated overall MOC examination score was 487 (SD = 92), and 566 (84%) passed the examination on the first attempt across the administrations used in our study. The mean diabetes composite score was 70.12 (SD = 11.96; range = 18.72–93.24). Table 2 shows that physicians performed somewhat better on the process and patient experience measures compared with intermediate outcomes.
Table 3 presents the results of the multiple regression analysis associating overall MOC examination scores with the diabetes composite scores, controlling for physician and patient characteristics; bivariate correlations and collinearity statistics indicated that the variables in the model were not highly correlated. Overall examination scores were positively associated with the diabetes composite scores (β = .22, P < .001). The β coefficients are standardized and reflect the relative importance of each explanatory variable; both overall MOC examination scores and patient age (β = .23, P < .001) contributed the most to the model (e.g., physicians with older patient panels tended to provide better care). Physicians who were internationally born and trained had significantly higher diabetes composite scores than physicians born in the United States/Canada but were internationally trained (β = .09, P = .03). Physicians had significantly higher diabetes composite scores if they were women, spent more time in ambulatory practice, and treated a higher percentage of male patients. The adjusted R2 for the model was 13.0%; interaction terms were not statistically significant, and adding solo practice status and physician age to the model did not substantially improve the goodness-of-fit, nor did it substantially change the regression coefficients for other variables in the model (results not shown).
Results of follow-up multiple regression analyses associating MOC examination scores with performance on each individual measure and on the intermediate outcome, process, and patient experience subcomposites are presented in the Appendix. Associations were stronger with the intermediate outcome subcomposite (β = .23, P < .001), specifically for poor LDL control (β = .23, P < .001) and superior LDL control (β = .25, P < .001), compared with the association observed with the process subcomposite (β = .13, P = .01). Association with the patient experience subcomposite was the weakest (β = .08, P = .04).
Table 4 presents the unadjusted and adjusted standardized coefficients when associating each medical content domain with the overall diabetes composite scores, controlling for MOC examination outcome (fail versus pass) and physician and patient characteristics in each separate regression model. Based on the adjusted β coefficients, the endocrine disease content domain percentiles were significantly (but modestly) associated with the diabetes composite scores (β = .19, P < .001). This relationship was stronger than the relationship exhibited by each of the other medical content domains (β = .06–.14). Finally, the endocrine disease content domain percentiles were also more strongly associated with the intermediate outcome subcomposite (β = .20, P < .001) than with the process measure subcomposite (β = .11, P = .01) and patient experience subcomposite (β = .06, P = .11).
Our findings demonstrate that physicians' cognitive skills, as measured by internal medicine MOC examination scores, are related to a comprehensive measure of diabetes care based on real practice data. We also found that physicians' performance on the endocrine disease domain questions was more strongly associated with better diabetes care compared with other general internal medicine content domains. When we examined the specific types of measures that make up the composite, overall examination scores and particularly the endocrine disease domain scores yielded stronger relationships with the intermediate outcomes than with process and patient experience measures. Our study is consistent with previous findings from Holmboe and colleagues,7 which showed a positive relationship between ABIM MOC examination scores and quality of care on a set of process measures for Medicare beneficiaries. However, our study extends the understanding of this relationship by including outcome and patient experience measures for a specific chronic condition from a broader sample of patients to create a more comprehensive composite performance measure.
Although the associations that we observed were statistically significant, they were modest, and not surprising given the complexity of clinical practice. This complexity is evidenced by the description of the six competencies that are necessary for providing high-quality patient care.22 That is, other competencies affect clinical performance, such as interprofessional communication skills, practice-based learning and improvement, and systems-based practice. For example, physician empathy, an element of good communication skills, is associated with positive clinical outcomes for diabetic patients.23 Future research should include additional measures of these competencies as well as other personal characteristics (e.g., physicians' participatory decision making) to understand the relative contribution of cognitive skills for explaining performance on quality measures.
We do, however, offer some explanations for the positive relationships observed. Physicians' cognitive skills are an essential element in making an accurate diagnosis and in executing informed decision making,3 and therefore these skills are foundational and necessary but not sufficient in providing high-quality patient care. One expects that physicians with stronger cognitive skills may be more effective at “doing the correct thing” by demonstrating a greater likelihood that appropriate care processes are performed, better intermediate outcomes are achieved, and patient self-care programs are supported. When we examined specific measures in the composite, we found stronger associations between examination scores and intermediate outcomes, specifically LDL control. One explanation is that because process measures are widely accepted and can be implemented by nonphysician staff in a well-functioning office practice, it is logical that a weaker association with process measures would be seen. On the other hand, having patients take control of outcomes, like their LDL, is more complicated, which might require higher cognitive skills. For example, lowering LDL does require a degree of physician decision making and recognition. Many decisions are packed into a short patient visit, and thus the physician may be more likely to either miss or delay starting therapies if the physician uses a “physician-centric” approach to care (i.e., does not engage other providers within the clinic to help make decisions and start needed therapies). Most process measures really do not need much physician decision—They can be automated in effective practices. Conversely, many physician offices still lack effective systems to ensure processes of care, and these systems may not be directly under the physician's control (e.g., although physicians might know the “correct thing,” their office staff may not know or do not comply). Furthermore, examination scores yielded weaker associations with the patient experience measures, which is expected because other foundational competencies such as communication and interpersonal skills would more likely exhibit stronger relationships with patient experience measures.
The modest associations between the cognitive examination and the performance composites suggest that there are unique aspects of physician competence captured by each assessment alone and that both must be considered when assessing a physician's ability to provide high-quality care. For example, our diabetes composite is condition-specific and does not measure general diagnostic skill. Thus, if the diagnosis is wrong, it does not matter that “appropriate” processes of care are being performed or that the patient is being educated to take care of a condition he or she might not have. The cognitive examination, on the other hand, does assess diagnostic skill across a broad spectrum of conditions. Moreover, the ABIM MOC examination and the Diabetes PIM are part of the ABIM's professional self-regulatory requirements to demonstrate to the public physicians' commitment to maintaining competence in medical knowledge and patient care, respectively; together, these capture different levels of Miller's5 competency hierarchy by providing an assessment of what physicians “know” and “do” in practice.
The role of physician and patient characteristics in explaining variation in the diabetes composite scores was also consistent with our expectations. Physicians caring for older diabetic patients demonstrated a higher quality of care, perhaps because older patients may be seen more often or are more likely to comply with physician recommendations.24 Physicians who practiced largely in an ambulatory setting also demonstrated higher-quality care, possibly because they have a systematic approach to managing diabetic patients,25 whereas those practicing largely in an inpatient setting typically spend less time managing diabetic patients. Female internists also tended to demonstrate higher-quality care. This finding is somewhat inconsistent with a recent study26 showing that male and female physicians tend to provide similar-quality diabetes care (however, those findings showed that women performed somewhat better on A1C and LDL measures). The inconsistent findings may be due to the use of individual measures in the latter study, which are less reliable than composite measures. Finally, consistent with another study,20 we found that physicians who were born in the United States/Canada but who had graduated from international medical schools provided poorer diabetes care than physicians who were internationally born and trained.
Our study has limitations. First, the associations between medical content domain performance and the diabetes composites should be interpreted with care because the observed reliability of each medical content domain score was modest and because correcting for attenuation provides only estimates of effects. Second, we controlled for differences in patient age and gender at the physician level (and did not adjust for patient case mix at the patient level) because risk adjustment is less relevant for process measures and because the A1C-at-goal measure by definition accounts for some differences in patients. Kaplan and colleagues11 have shown that adjusting for patient case mix for a similar set of diabetes measures did not impact how physicians were ranked. Third, physicians selected the Diabetes PIM to earn credit toward satisfying the MOC practice performance requirement, thus limiting the generalizability of our findings to other physicians. Fourth, our sample scored slightly below average on the MOC examination, and therefore may not generalize to physicians in MOC with average cognitive skills. Fifth, some physicians may not have adhered to the PIM sampling instructions, and therefore their performance on individual measures might be inflated. However, because there was no consequence for performing poorly on the Diabetes PIM, there is no reason to believe that physicians “cherry-picked” patients; previous work has also confirmed the accuracy of physician-reported data in the Diabetes PIM.16
Our findings deepen the understanding of physicians' cognitive skills and their relationship to condition-specific quality of care by using a psychometrically robust composite measure of diabetes care composed of evidence-based clinical and patient experience measures instead of less reliable, individual clinical measures. The one previous study known to us that found an association between MOC examination scores and quality of care7 used only relatively few process measures obtained from Medicare claims. We thus extend our understanding of this relationship by using medical record data to capture performance on outcomes and processes from a broader sample of patients. Our findings suggest that cognitive skills are an important foundational competency to facilitate other patient-care activities, even when measuring the quality of care comprehensively through composites. This should appeal to both the policy and professional communities that have been asking for more evidence on the validity of cognitive testing as part of MOC.15 More research is needed to examine the relationship between cognitive skills and other specific clinical abilities, such as accuracy of diagnosis and treatment decisions. Notwithstanding, it should be reassuring to patients that clinical knowledge, an essential competency for clinical practice, matters.
1. Institute of Medicine. Improving the Quality of Health Care of Mental and Substance-Use Conditions. Washington, DC: National Academy Press; 2006.
2. McGlynn EA, Asch SM, Adams J, et al.. The quality of health care delivered to adults in the United States. N Engl J Med. 2003;348:2635–2645.
3. Gruppen LD, Frohna AZ. Clinical reasoning. In: Norman GR, van der Vleuten CP, Newble DI, eds. International Handbook of Research in Medical Education. Dordrecht, Netherlands: Kluwer Academic; 2002:205–230.
4. Holmboe ES, Lipner RS, Greiner A. Assessing quality of care: Knowledge matters. JAMA. 2008;299:338–340.
5. Miller GE. The assessment of clinical skills/competence/performance. Acad Med. 1990;65(9 suppl):S63–S67.
6. Brennan TA, Horwitz RI, Duffy FD, Cassel CK, Goode LD, Lipner RS. The role of physician specialty board certification status in the quality movement. JAMA. 2004;292:1038–1043.
7. Holmboe ES, Wang Y, Meehan TP, et al.. Association between maintenance of certification examination scores and quality of care for medical beneficiaries. Arch Intern Med. 2008;168:1396–1403.
8. Norcini JJ, Lipner RS, Kimball HR. Certifying examination performance and patient outcomes following acute myocardial infarction. Med Educ. 2002;36:853–859.
9. Tamblyn R, Abrahamowicz M, Dauphinee WD, et al.. Association between licensure examination scores and practice in primary care. JAMA. 2002;288:3019–3026.
10. Wenghofer E, Klass D, Abrahamowicz M, et al.. Doctor scores on national qualifying examinations predict quality of care in future practice. Med Educ. 2009;43:1166–1173.
11. Kaplan SH, Griffith JL, Price LL, Pawlson LG, Greenfield S. Improving the reliability of physician performance assessment: Identifying the “physician effect” on quality and creating composite measures. Med Care. 2009;47:378–387.
12. Lipner RS, Weng W, Arnold GK, Duffy FD, Lynn LA, Holmboe ES. A three-part model for measuring diabetes care in physician practice. Acad Med. 2007;82(10 suppl):S48–S52.
13. Weng W, Hess BJ, Lynn LA, Holmboe ES, Lipner RS. Measuring physicians' performance in clinical practice: Reliability, classification accuracy, and validity. Eval Health Prof. 2010;33:302–320.
14. Landon BE, Normand SL, Blumenthal D, Daley J. Physician clinical performance assessment: Prospects and barriers. JAMA. 2003;290:1183–1189.
15. Goldman L, Goroll AH, Kessler B. American Board of Internal Medicine maintenance of certification program. N Engl J Med. 2010;362:948–952.
16. Holmboe ES, Meehan TP, Lynn L, Doyle P, Sherwin T, Duffy FD. Promoting physicians' self-assessment and quality improvement: The ABIM Diabetes Practice Improvement Module. J Contin Educ Health Prof. 2006;26:109–119.
17. American Diabetes Association. Standards of medical care in diabetes—2009. Diabetes Care. 2009;32(suppl 1):S13–S61.
18. Holland PW, Dorans NJ. Linking and equating. In: Brennan RL, ed. Educational Measurement. 4th ed. Westport, Conn: Praeger Publishers; 2006:187–220.
19. Hess BJ, Weng W, Lynn LA, Holmboe ES, Lipner RS. Setting a fair performance standard for physicians' quality of patient care. J Gen Intern Med. 2011;26:467–473.
20. Norcini JJ, Boulet JR, Dauphinee WD, Opalek A, Krantz ID, Anderson ST. Evaluating the quality of care provided by graduates of international medical schools. Health Aff (Millwood). 2010;29:1461–1468.
21. Bohrnstedt GW. Measurement. In: Rossi PH, Wright JD, Anderson AB, eds. Handbook of Survey Research. New York, NY: Academic Press; 1983:69–121.
23. Hojat M, Louis DZ, Markham FW, Wender R, Rabinowitz C, Gonnella JS. Physicians' empathy and clinical outcomes for diabetic patients. Acad Med. 2011;86:359–364.
24. Holmboe ES, Wang Y, Tate JP, Meehan TP. The effects of patient volume on the quality of diabetic care for Medicare beneficiaries. Med Care. 2006;44:1073–1077.
25. Kitahata MM, Koepsell TD, Deyo RA, Maxwell CL, Dodge WT, Wagner EH. Physicians' experience with the acquired immunodeficiency syndrome as a factor in patients' survival. N Engl J Med. 1996;334:701–706.
26. Kim C, McEwen LN, Gerzoff RB, et al.. Is physician gender associated with the quality of diabetes care? Diabetes Care. 2005;28:1594–1598.
Appendix The Associa...Image Tools
All authors are employed by the ABIM. Drs. Hess, Weng, and Lipner are coinventors of a business method invention describing the application of a standard-setting method to practicing physicians. The invention is patent pending. Dr. Holmboe received honoraria for teaching about clinical assessment from the Uniformed Services University of the Health Sciences, the University of Kansas, and the Harvard–Macy Systems Assessment Course. Dr. Holmboe receives royalties for a textbook on assessment published by Mosby-Elsevier.
Essex Institutional Review Board, Inc., approved this study.
This study was presented at the annual meeting of the American Educational Research Association, May 3, 2010, Denver, Colorado.