Purpose: To appraise the reported validity and reliability of evaluation methods used in high-quality trials of continuing medical education (CME).
Method: The authors conducted a systematic review (1981 to February 2006) by hand-searching key journals and searching electronic databases. Eligible articles studied CME effectiveness using randomized controlled trials or historic/concurrent comparison designs, were conducted in the United States or Canada, were written in English, and involved at least 15 physicians. Sequential double review was conducted for data abstraction, using a traditional approach to validity and reliability.
Results: Of 136 eligible articles, 47 (34.6%) reported the validity or reliability of at least one evaluation method, for a total of 62 methods; 31 methods were drawn from previous sources. The most common targeted outcome was practice behavior (21 methods). Validity was reported for 31 evaluation methods, including content (16), concurrent criterion (8), predictive criterion (1), and construct (5) validity. Reliability was reported for 44 evaluation methods, including internal consistency (20), interrater (16), intrarater (2), equivalence (4), and test–retest (5) reliability. When reported, statistical tests yielded modest evidence of validity and reliability. Translated to the contemporary classification approach, our data indicate that reporting about internal structure validity exceeded reporting about other categories of validity evidence.
Conclusions: The evidence for CME effectiveness is limited by weaknesses in the reported validity and reliability of evaluation methods. Educators should devote more attention to the development and reporting of high-quality CME evaluation methods and to emerging guidelines for establishing the validity of CME evaluation methods.
Please see the end of this article for information about the authors.
Dr. Ratanawongsa is assistant professor, Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland.
Dr. Thomas is associate professor, Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland.
Dr. Marinopoulos is assistant professor, Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland.
Dr. Dorman is associate professor, Department of Anesthesiology and Critical Care Medicine, and associate dean and director, Continuing Medical Education, Johns Hopkins University School of Medicine, Baltimore, Maryland.
Ms. Wilson is senior project coordinator, Department of Medicine, Johns Hopkins University School of Medicine, and senior project coordinator, Evidence-Based Practice Center, Johns Hopkins University Bloomberg School of Public Health, Baltimore, Maryland.
Dr. Ashar is assistant professor, Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland.
Dr. Magaziner is assistant professor, Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland.
Dr. Miller is associate professor, Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland.
Dr. Prokopowicz is assistant professor, Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland.
Dr. Qayyum is assistant professor, Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland.
Dr. Bass is professor, Department of Medicine, Johns Hopkins University School of Medicine, and director, Evidence-Based Practice Center, Johns Hopkins University Bloomberg School of Public Health, Baltimore, Maryland.
Correspondence should be addressed to Dr. Ratanawongsa, Johns Hopkins University School of Medicine, Johns Hopkins Bayview Medical Center, 5200 Eastern Avenue, Suite 2300, Baltimore, MD 21224; telephone: (410) 550-1862; fax: (410) 550-3403; e-mail: (firstname.lastname@example.org).
The goals of continuing medical education (CME) are to “maintain, develop, or increase the knowledge, skills, and professional performance and relationships a physician uses to provide services for patients, the public, or the profession.”1 According to the American Medical Association, 60 medical boards require 12 to 50 hours of CME per year for continued physician licensure.2 Educators must identify the most effective CME tools and techniques to improve the delivery of CME and reduce the gap between evidence and clinical practice.3 This becomes more important as CME media and techniques are used increasingly for quality-improvement initiatives.
Medical educators use a variety of evaluation methods to determine whether learners have achieved specific learning objectives. Such methods may include knowledge tests, attitudinal instruments, checklists of observable skills, chart audits of physician behaviors, or measures of patient health outcomes. Researchers should consider the validity and reliability of evaluation methods when investigating the effectiveness of CME. The validity of the evaluation method is “the degree to which evidence and theory support the interpretations of test scores entailed by the proposed uses of tests.”4 With strong evidence for validity, researchers can more confidently use an evaluation method to measure learners' achievement of the stated objective of an educational intervention, broadly categorized as knowledge, attitudes, skills, practice behaviors, or clinical outcomes.
The reliability of the evaluation method is “the consistency or reproducibility of measurements.”5 As one measurement expert emphasizes, “small amounts of unreliability may cause misclassification errors and large score differences on retesting.”6 In addition, an unreliable method leads to attenuation of the relationship between an intervention and its outcomes, reducing the statistical power of a study. Establishing reliability is necessary—but not sufficient—for demonstrating the validity of an assessment method, and reliability is considered a component of validity in the educational community.7
Thus, evaluation methods must have strong evidence for validity and reliability to establish the effectiveness of CME. As part of a systematic review of CME, we investigated the reported validity and reliability of the methods that have been used for measuring the effects of CME. Under the Agency for Health Care Research and Quality, and in response to a nomination by the American College of Chest Physicians (ACCP), the Johns Hopkins Evidence-Based Practice Center established a team to develop the evidence report in consultation with an external technical expert panel. Detailed findings are in the complete technical report.8
This article focuses on one of the questions addressed in that project: what is the reported validity and reliability of the methods used to measure how CME imparts knowledge, changes attitudes, imparts skills, changes practice behavior, or changes clinical practice outcomes?
Technical expert panel
In addition to representatives of AHRQ and ACCP, a panel of 12 external technical experts with strong expertise in CME provided input on all steps of the review process. This panel included leaders of CME activities and research at the Accreditation Council for Continuing Medical Education, the National Board of Medical Examiners, the American Medical Association, the American Academy of Family Physicians, the Maryland State Medical Board, and academic institutions in the United States and Canada.
We used a systematic approach for searching the literature, using specific eligibility criteria, electronic searching, and hand searching to minimize the risk of bias in selecting articles. In February 2006, we searched for primary literature on the effectiveness of CME using the following databases: MEDLINE, EMBASE, The Cochrane Central Register of Controlled Trials, PsycINFO, and the Educational Resource Information Center. From our electronic search, we identified the 13 journals most likely to publish articles on this topic and scanned the table of contents of each issue of these journals for relevant citations from February 2005 through February 2006. Reviewers and the technical expert panel flagged references of interest for comparison with database search results. Please see the technical report for complete details on the search strategy and terms.8
We included trials of CME effectiveness that met the following eligibility criteria: (1) a randomized controlled design or a quasi-experimental design with a historic or concurrent comparison group, (2) conducted in the United States or Canada, (3) written in English, and (4) published between 1981 and February 2006. In conjunction with our technical expert panel, we decided to include only trials from the United States or Canada to limit heterogeneity attributable to differences in CME systems in other countries, including requirements, incentives, and techniques.
We excluded trials that did not evaluate an educational activity, did not include at least 15 fully trained physicians, or did not analyze outcomes separately for fully trained physicians if they comprised less than half of the study population. We did include quality-improvement studies if they included some type of physician education.
Two independent reviewers (N.R., P.T., S.M., T.D., L.W., B.A., J.M., R.M., and G.P.) conducted parallel title and abstract reviews. Two investigators (N.R., P.T., S.M., T.D., L.W., B.A., J.M., R.M., and G.P.) also independently reviewed the text of articles to determine whether they should be included before data abstraction. Reviewers (N.R., P.T., S.M., T.D., L.W., B.A., J.M., R.M., and G.P.) resolved differences regarding eligibility through consensus adjudication.
For the question addressed in this article, we included articles that reported the validity or reliability of at least one evaluation method. Evaluation methods included any tests, scales, instruments, checklists, chart audits, patient surveys, or other tools used by authors to assess outcomes of CME activities. Reviewers searched articles for the terms valid, validity, reliable, or reliability; for descriptions matching the definitions in Table 1; or for statistical results related to measuring validity or reliability. If the description did not include any of the above, we did not abstract information about the evaluation method.
The abstraction form contained the following information about each evaluation method: (1) its description and intended purpose, (2) whether it was newly created and/or pilot tested, (3) whether it was drawn or adapted from other sources, (4) the type of validity or reliability reported, and (5) statistical or psychometric testing within the current or another study population.
We organized these results in two ways: by type of educational outcome and type of validity/reliability. Educational outcomes categories were (1) knowledge or cognitive skills (learning medical information and/or applying information to problem solving or clinical decision making outside the practice setting), (2) attitudes (values, beliefs, or emotions related to medical performance), (3) skills (psychomotor or procedural tasks related to medical communication or physical examination), (4) practice behaviors (incorporation of the above into actions in clinical practice settings), or (5) clinical outcomes (indicators of the health or satisfaction of physicians' patients).9
To classify validity, we used a “traditional” approach based on content, criterion, and construct validity,10,11 which has been used in recent systematic reviews and most measurement textbooks.10,12,13 To maximize content validity of the form, we adapted definitions for validity and reliability (Table 1) from a textbook on curriculum development in medical education,9 a recent review about educational interventions,5 and a textbook on scale development.14 On the basis of piloting, we designed our abstraction form to capture the information about validity and reliability as described in the eligible articles, which dated back to 1981.
To facilitate interpretation of our findings, we translated our results from the traditional approach to concepts in a contemporary approach to validity (Table 1).4,7,15,16 Educators using a contemporary approach regard validity as a unitary concept, rather than separating content validity, criterion validity, construct validity, and reliability.7 This contemporary approach regards validity not as inherent to an evaluation method but, rather, as specific to the attainment of a particular objective in specific learners.
Evidence sources included those related to content, response process, internal structure, relations to other variables, and consequences. In our review, we abstracted data for face and content validity together, because many authors reported the information in that manner. However, a contemporary approach to validity does not lend credence to face validity, traditionally described as an appearance of validity to general audiences.7,15 Content validity in the contemporary approach requires evidence that the targeted construct is represented in its entirety and for its intended purpose.7 Response process relates to the minimization of error in the data collection process, such as through quality control.15 Internal structure relates to internal consistency reliability, with statistical calculations such as Cronbach alpha as necessary but without sufficient evidence for validity.7 Criterion validity and known-group validity would be categorized as relationships to other variables, that is, the theoretical relationships between the outcome of interest and other variables.15 Finally, the contemporary approach requires reporting about consequences: the implications of results to “examinees, faculty, patients and society.”15
For data abstraction, we used a sequential review process where the first reviewer (N.R., P.T., S.M., T.D., L.W., B.A., J.M., R.M., and G.P.) completed the review, with a second reviewer (N.R., P.T., S.M., T.D., L.W., B.A., J.M., R.M., and G.P.) checking each form for completeness and accuracy. Reviewers were not masked to the articles' authors, institution, or journal. Differences in data abstraction were resolved through consensus adjudication. The percent agreement for five questions with binary or categorical response options ranged from 86% to 97%, with Kappa values of 0.723 to 0.926.
The results of the review process are detailed in Figure 1. A total of 136 articles were included after title, abstract, and full article review. Overall, these studies were heterogeneous in their study populations, learning goals, educational techniques, and evaluation methods. Authors rarely delineated specific learning objectives, precluding evaluation of how their chosen evaluation methods related to these objectives.8
Authors reported the validity and/or reliability of at least one evaluation method in 47 out of 136 total articles (34.6%) (Figure 1). Among these 47 articles, 11 reported on the validity and/or reliability of more than one method: eight studies described two methods,17–24 two studies described three methods,25,26 and one study described four methods.27 Thus, a total of 62 evaluation methods were accompanied by validity or reliability data. For the results below, percentages are based on the total number of methods, because some articles reported on multiple methods.
Use of newly created and previously developed evaluation methods
Among these 62 evaluation methods, researchers cited previous sources for 31 (50%) methods, whereas 28 (45.2%) were created for the current studies. For three methods (4.8%), authors did not clearly report whether the methods were newly created or previously used. For 23 of the 31 previously used methods, the authors reported that reliability had been assessed: 13 within the current study population, 9 within previous study populations, and 1 within current and previous study populations. However, authors presented specific statistical data about reliability for only 15 methods. For 12 of the 28 newly created methods, the authors reported performing pilot and/or cognitive testing.
Evaluation methods by type of educational outcome
Knowledge or cognitive skills were evaluated by 15 methods (24.2%). Attitudes were evaluated by seven methods (11.3%). Two methods focused exclusively on attitudes. Five methods evaluated a combination of attitudes and knowledge/cognitive skills.
Skills were evaluated by 11 methods (17.7%). One method evaluated physical exam skills in an educational setting. The remaining 10 methods were implemented to measure a combination of skills (psychomotor or procedural) and practice behaviors, using standardized patients to visit physicians at their practice setting or analyzing interactions with real patients.
Practice behaviors (without clinical outcomes) were evaluated by 21 methods (33.9%). Specific data-collection methods varied. Eight methods used physician self-report about their clinical practices. Three methods used patients' reports of physician practice behaviors, including a mailed survey about clinicians' communication behaviors,23 an in-person interview about providers' screening behaviors with adolescents,28 and a telephone interview about physician prescribing related to depression guidelines.29 Ten methods used chart reviews of medical records or claims data.
Clinical outcomes (with or without practice behaviors) were evaluated by eight methods (12.9%). Two studies performed chart reviews: one abstracted information regarding hypertension clinical outcomes,30 and one reviewed health care maintenance related to the provision of influenza vaccinations, screening mammograms, and screening breast exams.22 One study surveyed patients and families about their attitudes toward cancer pain management.25 Two methods solicited patients' reports of their own behavior—including medication adherence25 and participation in preventive screening22—as outcomes. Finally, three studies used measures of the patient's health, including the General Health Questionnaire to detect psychiatric distress,26 the Brief Pain Inventory to measure cancer pain,25 and one questionnaire that combined two physical activity instruments.31
Examples of evaluation methods for each type of educational outcome are offered in Table 2. For complete descriptions of the evaluation methods in each study, please see Evidence Table 16 in the referenced technical report.8
Evaluation methods, organized by traditional classification of validity and reliability
Table 2 presents the 62 evaluation methods, organized by the type of validity or reliability under the traditional classification system. Of the 62 evaluation methods, 16 (25.8%) included descriptions of validity alone, 30 (48.4) included descriptions of reliability alone, and 10 (16.1%) had descriptions of both validity and reliability. Authors described six methods (9.7%) as valid and/or reliable, without giving further detail.
Validity was reported for 31 (50%) of the 62 evaluation methods. Content validity was reported for 16 methods. For example, Fordis et al17 developed a knowledge test for cholesterol management, including content validation by experts with piloting and item number reduction. The specific “experts” who reviewed the assessment were reported for 11 of the remaining 15 methods. Concurrent criterion validity was reported for eight methods; for example, some researchers compared their results using provider surveys with data abstraction using chart reviews. Predictive criterion validity was reported for only one method, based on comparison between physicians' and patients' reports of asthma management behaviors.56
Construct validity was reported for five methods, commonly through “known-group validity.” Gerstein et al32 argued the validity of a diabetes questionnaire by demonstrating that physicians with more advanced endocrinology training scored higher than those early in their training. High statistical validity was only demonstrated for two methods.22,23 Five methods were described as valid but were not otherwise specified. Thus, the vast majority of CME studies offered no or limited psychometric data for the validity of their evaluation methods.
Reliability was reported for 44 of 62 evaluation methods (71%). Authors reported internal consistency reliability for 20 methods: 14 learner instruments, 1 observer instrument for audiotaped interactions, 1 standardized patient instrument, and 4 patient instruments. Roter et al26 used the General Health Questionnaire–28 to detect emotional distress in patients and found a high internal consistency (Cronbach alpha = 0.90–0.92) within their study population.
Interrater reliability was reported for 16 methods, specifically, nine medical data abstractions, six skills assessments, and one cognitive test. Authors assessed intrarater reliability for two medical data abstraction studies. Equivalence reliability was reported for four methods, and test–retest reliability for five methods. For example, Socolar et al33 conducted test–retest reliability on a chart audit of quality of physician documentation in evaluation for sexual abuse. Authors described four methods as reliable without specific details. When reported, statistical tests yielded primarily modest evidence of reliability based on Cronbach alpha, kappa, or correlation statistics.
Evaluation methods, organized using a contemporary approach to validity
No authors in our review reported face validity alone. Only 11 of 15 methods contained specific descriptions of the experts consulted in determining whether the content of their evaluation methods was comprehensive. The level of detail about the steps for ensuring content validity was variable (e.g., specific descriptions pilot testing or the changes that resulted). Our abstracted data for intrarater, interrater, test–retest, and equivalence reliability from the traditional classification system provides some support for response process validity for some evaluation methods. Twenty methods included information about internal structure. Our data for criterion validity and known-group validity would be categorized as evidence for relationships with other variables. Finally, we did not find information about consequences reported in the results of the eligible articles in our review.
Our review found that the validity or reliability of CME evaluation methods was reported for only one third of studies about the effectiveness of CME, most commonly content validity and interrater reliability. Internal consistency—a measure of both reliability and potential construct validity—was reported for 16 methods. Few articles offered strong evidence for construct or criterion validity or other types of reliability. Thus, the overall strength of evidence for the effectiveness of CME is limited by the lack of reported evidence of the validity and reliability for the evaluation methods as applied in these studies.
Our findings are consistent with systematic reviews revealing poor validity evidence for evaluation methods in medical education. In a review of physician self-assessment, Davis et al34 found that 9 of 17 studies used pretested or prevalidated measures. In a review of instruments to measure medical professionalism, Veloski et al12 found that only 11% of studies provided strong evidence for validity, with content validity and internal consistency reliability topping their frequency counts for types of reported evidence. Shaneyfelt et al16 found that 10% of evidence-based practice evaluation instruments had multiple types of validity evidence, most commonly response process. Boon and Stewart35 found limited evidence for the validity of many patient–physician communication assessment instruments in the educational literature.
Educators should provide strong evidence for the validity and reliability of an evaluation method for the achievement of that educational objective by the targeted learners. If an evaluation method does not accurately and specifically measure the targeted outcome, the postintervention measurements may be more influenced by confounding variables than by the intervention itself. Similarly, reliability allows educators to detect true differences between intervention and control groups or changes between pre- and postintervention scores. Otherwise, measurement error may lead to a lower observed statistical association between the educational intervention and the targeted outcome. Although the researcher may use statistical methods to correct for this “attenuation effect,” more reliable evaluation methods would improve the power of educational studies in demonstrating effectiveness.36
Our systematic review revealed inadequate reporting about previously developed evaluation methods. Most articles did not reveal how the researchers determined the validity of the method for their targeted outcome or whether they changed the content, response process, or administration of the method in ways that could affect its validity or reliability. In addition, few researchers repeated the reliability testing for the method in their current study populations. Reliability is population-specific because it is affected by both the variance in true scores within a population as well as the variance in measured scores. Thus, a method previously demonstrated as reliable may perform poorly when applied in a new study population, and rigorous research would require repeating the reliability calculations.
Our conclusions are tempered by the limitations of this review. First, the heterogeneous nature of the studies inhibits a systematic comparison across evaluation methods, targeting different learners using multiple techniques across diverse content areas. Second, the quality of methods reporting was variable, often lacking sufficient details to assess whether evaluation methods were congruent with learning objectives. Third, our search strategy may be subject to publication bias, because it was limited to published English-language articles about educational studies within the United States and Canada. Fourth, the systematic review was limited to studies using a comparison group. CME interventions using only a pre/post design may be more or less likely to report the validity and/or reliability of their evaluation methods. Fifth, authors may have reported inconsistently the evidence for the validity and reliability of evaluation methods drawn from previous studies. Because our focus was on the “reported” validity or reliability, our findings may reflect deficiencies in reporting, rather than the quality of the evaluation methods themselves.
Finally, our abstraction form is based on a traditional classification system for the types of validity. We designed our abstraction form to collect data that were most likely to be reported in the CME literature, which rarely used the terminology of the classification systems now recommended by many educators as standards for demonstrating validity and reliability.4,15,16 We designed our abstraction form to capture the information about validity and reliability that was most likely to be reported in the CME literature during the years of our search. Although other recent reviews have used the traditional classification system,12,13 our approach was not designed to capture some relevant information such as response process validity or consequence validity. However, given the poor quality of reporting about validity evidence, such evidence for validity was rarely reported in these articles.
Our study findings suggest future directions for CME research. As CME is increasingly applied to quality-improvement initiatives, it becomes even more critical that resources be allocated toward CME techniques that are demonstrated to be effective. We recommend the following for improving the quality of future CME studies:
* Curricula designers should delineate and report their specific measurable educational objectives, which should be congruent with their specified outcomes. This is crucial to linking the objectives to the educational methods and to the evaluation methods.9
* When feasible, researchers should target clinical outcomes, which represent a higher order of objectives above knowledge, attitudes, skills, or behaviors.9
* Educational researchers should consider the theory related to their targeted outcome and the factors associated with it. This will enable them to define the outcome and operationalize it, a crucial step for establishing construct validity.
* Educational researchers should more rigorously assess content validity for new instruments. Although many researchers describe content validity based on a literature search and review by a few “experts,” strong content validity requires delineating all domains within a targeted outcome and ensuring that the instrument samples these domains in a way that is proportional to their importance.15
* Educational researchers should conduct pilot or cognitive testing of their new evaluation instruments, such as by asking respondents open-ended questions about how they completed the instrument.37
* Educational researchers should work with psychometric statisticians who can assist them with creating instruments, designing studies to demonstrate construct validity, and performing statistical analyses such as Cronbach alpha and factor analysis.
* Because validity and reliability are not inherent to an instrument and depend on the way an instrument is applied, educators should consider whether previously developed instruments are valid for their targeted objectives and learners and then reassess reliability within their current settings. Any changes in item content, response process, or administration should prompt reassessment of validity and reliability.
* Educators should work with publishers to achieve more consistent reporting of the validity and reliability of evaluation methods, using an established approach to classifying validity evidence. Authors should justify their approach. Greater use of the contemporary approach to validity may increase familiarity with and application of this approach in the wider CME community.
* Finally, resources should be allocated for the development and testing of high-quality evaluation methods for CME. A recent survey demonstrated that educational research is generally unfunded or underfunded.38
In summary, the overall strength of evidence for CME effectiveness is limited by weaknesses in the reported validity and reliability of evaluation methods. Greater resources for the development and reporting of high-quality CME evaluation methods may allow educators to demonstrate which CME techniques are most effective, enabling greater cost-effectiveness in CME delivery over time.
Funding was provided by the Agency for Health Care Research and Quality through the Johns Hopkins Evidence-Based Practice Center, contract no. 290–02-0018. Dr. Marinopoulos also received salary support from the Osler Center for Clinical Excellence at Johns Hopkins University.
The authors are responsible for the content of this article, including any clinical information. No statement in this article should be construed as an official position of the Agency for Health Care Research and Quality or of the U.S. Department of Health and Human Services.
2 American Medical Association. State Medical Licensure Requirements and Statistics. Chicago, Ill: American Medical Association; 2007.
3 Chassin MR, Galvin RW. The urgent need to improve health care quality. Institute of Medicine National Roundtable on Health Care Quality. JAMA. 1998;280:1000–1005.
4 American Psychological Association, American Educational Research Association, National Council on Measurement in Education, American Psychological Association. Standards for Educational & Psychological Tests: Standards for Educational and Psychological Testing. Washington, DC: American Educational Research Association; 1999.
5 Reed D, Price EG, Windish DM, et al. Challenges in systematic reviews of educational intervention studies. Ann Intern Med. 2005;142:1080–1089.
6 Downing SM. Reliability: On the reproducibility of assessment data. Med Educ. 2004;38:1006–1012.
7 Cook DA, Beckman TJ. Current concepts in validity and reliability for psychometric instruments: Theory and application. Am J Med. 2006;119:166.e7–e16.
8 Marinopoulos SS, Dorman T, Ratanawongsa N, et al. Effectiveness of Continuing Medical Education. Evidence Report/Technology Assessment No. 149 (prepared by the Johns Hopkins Evidence-Based Practice Center, under contract no. 290–02-0018). AHRQ publication no. 07-E006. Rockville, Md: Agency for Healthcare Research and Quality; January 2007.
9 Kern DE, Thomas PA, Howard DM, Bass EB. Curriculum Development for Medical Education: A Six-Step Approach. Baltimore, Md: Johns Hopkins University Press; 1998.
10 Goodwin LD, Leech NL. The meaning of validity in the new “Standards for Educational and Psychological Testing”: Implications for measurement courses. Meas Eval Couns Dev. 2003;36:181.
11 Brualdi A. Traditional and Modern Concepts of Validity. Contract no. ED99CO0032; report no. EDO-TM-99-10. ERIC/AE Digest. Washington, DC: Office of Educational Research and Improvement; 1999.
12 Veloski JJ, Fields SK, Boex JR, Blank LL. Measuring professionalism: A review of studies with instruments reported in the literature between 1982 and 2002. Acad Med. 2005;80:366–370.
13 Evans R, Elwyn G, Edwards A. Review of instruments for peer assessment of physicians. BMJ. 2004;328:1240.
14 DeVellis RF. Scale Development: Theory and Applications. Thousand Oaks, Calif: SAGE Publications; 2003.
15 Downing SM. Validity: On meaningful interpretation of assessment data. Med Educ. 2003;37:830–837.
16 Shaneyfelt T, Baum KD, Bell D, et al. Instruments for evaluating education in evidence-based practice: A systematic review. JAMA. 2006;296:1116–1127.
17 Fordis M, King JE, Ballantyne CM, et al. Comparison of the instructional efficacy of Internet-based CME with live interactive CME workshops: A randomized controlled trial. JAMA. 2005;294:1043–1051.
18 Stewart M, Marshall JN, Ostbye T, et al. Effectiveness of case-based on-line learning of evidence-based practice guidelines. Fam Med. 2005;37:131–138.
19 Mann KV, Lindsay EA, Putnam RW, Davis DA. Increasing physician involvement in cholesterol-lowering practices: The role of knowledge, attitudes and perceptions. Adv Health Sci Educ Theory Pract. 1997;2:237–253.
20 Hergenroeder AC, Chorley JN, Laufman L, Fetterhoff A. Two educational interventions to improve pediatricians' knowledge and skills in performing ankle and knee physical examinations. Arch Pediatr Adolesc Med. 2002;156:225–229.
21 McBride P, Underbakke G, Plane MB, et al. Improving prevention systems in primary care practices: The Health Education and Research Trial (HEART). J Fam Pract. 2000;49:115–125.
22 Kim CS, Kristopaitis RJ, Stone E, Pelter M, Sandhu M, Weingarten SR. Physician education and report cards: Do they make the grade? Results from a randomized controlled trial. Am J Med. 1999;107:556–560.
23 Brown JB, Boles M, Mullooly JP, Levinson W. Effect of clinician communication skills training on patient satisfaction. A randomized, controlled trial. Ann Intern Med. 1999;131:822–829.
24 White CW, Albanese MA, Brown DD, Caplan RM. The effectiveness of continuing medical education in changing the behavior of physicians caring for patients with acute myocardial infarction. A controlled randomized trial. Ann Intern Med. 1985;102:686–692.
25 Elliott TE, Murray DM, Oken MM, et al. Improving cancer pain management in communities: Main results from a randomized controlled trial. J Pain Symptom Manage. 1997;13:191–203.
26 Roter DL, Hall JA, Kern DE, Barker LR, Cole KA, Roca RP. Improving physicians' interviewing skills and reducing patients' emotional distress. A randomized clinical trial. Arch Intern Med. 1995;155:1877–1884.
27 Gerrity MS, Cole SA, Dietrich AJ, Barrett JE. Improving the recognition and management of depression: Is there a role for physician education? J Fam Pract. 1999;48:949–957.
28 Ozer EM, Adams SH, Lustig JL, et al. Increasing the screening and counseling of adolescents for risky health behaviors: A primary care intervention. Pediatrics. 2005;115:960–968.
29 Rost K, Nutting P, Smith J, Werner J, Duan N. Improving depression outcomes in community primary care practice: A randomized trial of the quEST intervention. Quality Enhancement by Strategic Teaming. J Gen Intern Med. 2001;16:143–149.
30 Gullion DS, Tschann JM, Adamson TE. Management of hypertension in private practice: A randomized controlled trial in continuing medical education. J Contin Educ Health Prof. 1988;8:239–255.
31 Norris SL, Grothaus LC, Buchner DM, Pratt M. Effectiveness of physician-based assessment and counseling for exercise in a staff model HMO. Prev Med. 2000;30:513–523.
32 Gerstein HC, Reddy SS, Dawson KG, Yale JF, Shannon S, Norman G. A controlled evaluation of a national continuing medical education programme designed to improve family physicians' implementation of diabetes-specific clinical practice guidelines. Diabet Med. 1999;16:964–969.
33 Socolar RR, Raines B, Chen-Mok M, Runyan DK, Green C, Paterno S. Intervention to improve physician documentation and knowledge of child sexual abuse: A randomized, controlled trial. Pediatrics. 1998;101:817–824.
34 Davis DA, Mazmanian PE, Fordis M, Van Harrison R, Thorpe KE, Perrier L. Accuracy of physician self-assessment compared with observed measures of competence: A systematic review. JAMA. 2006;296:1094–1102.
35 Boon H, Stewart M. Patient-physician communication assessment instruments: 1986 to 1996 in review. Patient Educ Couns. 1998;35:161–176.
36 Wetcher-Hendrinks D. Adjustments to the correction for attenuation. Psychol Methods. 2006;11:207–215.
37 Willis B. Cognitive interviewing and questionnaire design: A training manual. Cognitive Methods Staff Working Paper Series no. 7. Hyattsville, Md: National Center for Health Statistics; 1994.
38 Reed DA, Kern DE, Levine RB, Wright SM. Costs and funding for published medical education research. JAMA. 2005;294:1052–1057.
39 Goodwin LD. Changing conceptions of measurement validity: An update on the new standards. J Nurs Educ. 2002;41:100–106.
40 Downing SM. Face validity of assessments: Faith-based interpretations or evidence-based science? Med Educ. 2006;40:7–8.
41 Macrae HM, Regehr G, McKenzie M, et al. Teaching practicing surgeons critical appraisal skills with an internet-based journal club: A randomized, controlled trial. Surgery. 2004;136:641–646.
42 Chung S, Mandl KD, Shannon M, Fleisher GR. Efficacy of an educational Web site for educating physicians about bioterrorism. Acad Emerg Med. 2004;11:143–148.
43 Meredith LS, Jackson-Triche M, Duan N, Rubenstein LV, Camp P, Wells KB. Quality improvement for depression enhances long-term treatment knowledge for primary care clinicians. J Gen Intern Med. 2000;15:868–877.
44 Gifford DR, Holloway RG, Frankel MR, et al. Improving adherence to dementia guidelines through education and opinion leaders. A randomized, controlled trial. Ann Intern Med. 1999;131:237–246.
45 Chan DH, Leclair K, Kaczorowski J. Problem-based small-group learning via the Internet among community family physicians: A randomized controlled trial. MD Comput. 1999;16:54–58.
46 Doucet MD, Purdy RA, Kaufman DM, Langille DB. Comparison of problem-based learning and lecture format in continuing medical education on headache diagnosis and management. Med Educ. 1998;32:590–596.
47 Gifford DR, Mittman BS, Fink A, Lanto AB, Lee ML, Vickrey BG. Can a specialty society educate its members to think differently about clinical decisions? Results of a randomized trial. J Gen Intern Med. 1996;11:664–672.
48 Andersen SM, Harthorn BH. Changing the psychiatric knowledge of primary care physicians. The effects of a brief intervention on clinical diagnosis and treatment. Gen Hosp Psychiatry. 1990;12:177–190.
49 Maxwell JA, Sandlow LJ, Bashook PG. Effect of a medical care evaluation program on physician knowledge and performance. J Med Educ. 1984;59:33–38.
50 Premi J, Shannon SI. Randomized controlled trial of a combined video–workbook educational program for CME. Acad Med. 1993;68(10 suppl):S13–S15.
51 Sibley JC, Sackett DL, Neufeld V, Gerrard B, Rudnick KV, Fraser W. A randomized trial of continuing medical education. N Engl J Med. 1982;306:511–515.
52 Schroy PC, Heeren T, Bliss CM, Pincus J, Wilson S, Prout M. Implementation of on-site screening sigmoidoscopy positively influences utilization by primary care providers. Gastroenterology. 1999;117:304–311.
53 Greenberg LW, Jewett LS. The impact of two teaching techniques on physicians' knowledge and performance. J Med Educ. 1985;60:390–396.
54 Schectman JM, Schroth WS, Elinsky EG, Ott JE. The effect of education and drug samples on antihistamine prescribing costs in an HMO. HMO Pract. 1996;10:119–122.
55 Maiman LA, Becker MH, Liptak GS, Nazarian LF, Rounds KA. Improving pediatricians' compliance-enhancing practices. A randomized trial. Am J Dis Child. 1988;142:773–779.
56 Clark NM, Gong M, Schork MA, et al. Long-term effects of asthma education for physicians on patient satisfaction and use of health services. Eur Respir J. 2000;16:15–21.
57 White M, Michaud G, Pachev G, Lirenman D, Kolenc A, FitzGerald JM. Randomized trial of problem-based versus didactic seminars for disseminating evidence-based guidelines on asthma management to primary care physicians. J Contin Educ Health Prof. 2004;24:237–243.
58 Lockyer JM, Fidler H, Hogan DB, Pereles L, Lebeuf C, Wright B. Dual-track CME: Accuracy and outcome. Acad Med. 2002;77:S61–S63.
59 Harris JM Jr, Kutob RM, Surprenant ZJ, Maiuro RD, Delate TA. Can Internet-based education improve physician confidence in dealing with domestic violence? Fam Med. 2002;34:287–292.
60 Lane DS, Messina CR, Grimson R. An educational approach to improving physician breast cancer screening practices and counseling skills. Patient Educ Couns. 2001;43:287–299.
61 Terry PB, Wang VL, Flynn BS, et al. A continuing medical education program in chronic obstructive pulmonary diseases: Design and outcome. Am Rev Respir Dis. 1981;123:42–46.
62 Chodosh J, Berry E, Lee M, et al. Effect of a dementia care management intervention on primary care provider knowledge, attitudes, and perceptions of quality of care. J Am Geriatr Soc. 2006;54:311–317.
63 Beaulieu MD, Rivard M, Hudon E, Beaudoin C, Saucier D, Remondin M. Comparative trial of a short workshop designed to enhance appropriate use of screening tests by family physicians. CMAJ. 2002;167:1241–1246.
64 Carney PA, Dietrich AJ, Freeman DH Jr, Mott LA. A standardized-patient assessment of a continuing medical education program to improve physicians' cancer-control clinical skills. Acad Med. 1995;70:52–58.
65 Levinson W, Roter D. The effects of two continuing medical education programs on communication skills of practicing primary care physicians. J Gen Intern Med. 1993;8:318–324.
66 Margolis PA, Lannon CM, Stuart JM, Fried BJ, Keyes-Elstein L, Moore DE Jr. Practice based education to improve delivery systems for prevention in primary care: Randomised trial. BMJ. 2004;328:388.
67 Schectman JM, Schroth WS, Verme D, Voss JD. Randomized controlled trial of education and feedback for implementation of guidelines for acute low back pain. J Gen Intern Med. 2003;18:773–780.
68 Jennett PA, Laxdal OE, Hayton RC, et al. The effects of continuing medical education on family doctor performance in office practice: A randomized control study. Med Educ. 1988;22:139–145.
69 Slotnick HB. Educating physicians through advertising: Using the brief summary to teach about pharmaceutical use. J Contin Educ Health Prof. 1993;13:299–314.
70 Moran JA, Kirk P, Kopelow M. Measuring the effectiveness of a pilot continuing medical education program. Can Fam Phys. 1996;42:272–276.
© 2008 Association of American Medical Colleges
This article has been cited