Continuing medical education (CME) is required on a continuous basis to enable physicians to enhance and maintain their clinical knowledge and skills, but evidence of its effectiveness is hampered in part by the lack of theory-based, valid, and reliable evaluation methods and tools. According to the Accreditation Council for Continuing Medical Education, CME includes educational activities that aim at maintaining, developing, or increasing the knowledge, skills, and professional performance and relationships used by a physician to provide services for patients, the public, or the profession.1 The ultimate goal of CME programs is to enhance the quality of patient care in the United States and elsewhere through professional education. Physicians spend a considerable amount of time in CME activities to maintain their medical licenses: U.S. medical licensing boards require completion of 12 to 50 hours of CME per license reregistration.2
Although CME activities are underpinned by a belief that gains in knowledge lead physicians to improve their medical practices and patient outcomes,3 previous reviews of CME evaluations have shown that the questionnaires used have generally lacked a theoretical background, which may have resulted in misinterpretation of study results and which limits comparison across different CME evaluations.3–6 This finding was confirmed by a report on the effectiveness of CME by the Agency for Healthcare Research and Quality (part of the U.S. Department of Health and Human Services), which indicated that the lack of reliability and validity of the CME evaluation instruments limited the evidence.7 In a recent review of 32 randomized clinical trials of CME evaluation studies, our group found that various questionnaires, surveys, and scales were used to evaluate CME outcomes.8 Sixteen of the 32 reviewed studies used questionnaires specific to the clinical domains addressed; 6 other studies (18.8% of the 32) adapted existing instruments and provided reliability and validity information.8 The 10 remaining studies (31.3% of the 32) developed their own instruments, but none of these instruments documented reliability or validity information.8
In 1994, Kirkpatrick9 developed a four-level outcome evaluation model that has been widely used in assessing training effectiveness. According to this model, highly effective training programs should result in four levels of outcome evaluation: (1) evaluation—reaction, (2) evaluation—learning, (3) evaluation—behavior, and (4) evaluation—results. In 2005, Curran and Fleet5 adapted Kirkpatrick's model for the field of CME. Explained in the context of CME, the four levels of outcome evaluation in this model are learner satisfaction (level 1), learning outcomes (level 2), performance improvement (level 3), and improved patient/health outcomes (level 4). According to this model, moreover, evaluation should always begin with level 1 and then, as time and budget allow, move sequentially through the other three levels. Information from each level serves as a basis for the next level's evaluation. Thus, each successive level represents a more precise measure of training program effectiveness but, at the same time, requires a more rigorous and time-consuming analysis.
The theory of planned behavior (TPB) has long been used to investigate physicians' changes in their clinical practices (behavioral intention).10–12 In the context of CME, TPB indicates that a physician's clinical behavior is determined by his or her intention to perform certain clinical practices. Intentions are the result of positive or negative views about performing these clinical practices (attitude toward the behavior), the influence of significant others such as experts in the specific clinical field (subjective norms), and the perceived ease or difficulty of performing the clinical practices (perceived behavioral control). Although outcome expectation and self-efficacy, constructs from social cognitive theory (SCT), are measured more often in CME evaluation studies than are TPB constructs,13,14 the SCT constructs are parallel to those in TPB. Specifically, perceived behavioral control in TPB is similar to self-efficacy in SCT, and behavioral belief in TPB is similar to outcome expectation in SCT.
TPB constructs fit well into the Kirkpatrick model. The constructs from level 2 (learning outcomes related to attitude and skill improvements) determine behavioral intention, whereas those from level 3 predict behavior (physician practices). Therefore, it is located between the second and third levels. Because behavioral intention is an immediate antecedent of behavior, questions about behavioral intention can serve as a proxy measure of physician behavior,15 and they relate to physician behavior outcomes in level 3. Using TPB constructs at evaluation level 2 also builds on previous CME evaluations by enabling comparisons of attitude and skill improvement across CME courses.8
The purpose of this study was to create a theoretically driven, valid, reliable, and adaptable CME evaluation instrument whose core items addressed attitudinal determinants of physician behavior change—that is, attitudes, beliefs (expectations), subjective norms, perceived behavioral control (self-efficacy), and behavioral intention—that can be adapted for various CME programs for evaluation and comparison. An adaptable instrument would provide core items assessing those concepts but would be modified according to content in different clinical domains. Such an instrument would enable researchers to standardize results and better understand factors influencing the effectiveness of various CME programs. Analysis of specific results and comparisons across CME offerings could be used to design future clinical education.
This study used a multistep procedure involving three phases: (1) instrument development, (2) instrument validation, and (3) data collection and analyses. The third phase used cross-sectional data that were collected from physicians attending a two-day CME conference on preoperative therapy in invasive breast cancer sponsored by the National Cancer Institute (NCI) in 2007.
Instrument development (phase 1)
Scale development involved eight steps.16 Those steps were to (1) develop template items addressing each construct, (2) develop items customized to the CME content, (3) determine the format for measurement of responses, (4) submit the template for expert review, (5) submit an initial item pool for expert review of face validity, (6) finalize the initial item pool, (7) conduct cognitive testing with the target population, and (8) pilot-test and revise the instrument to improve internal validity.
Evaluation instrument template.
Our instrument addressed five constructs based on the TPB: (1) behavioral beliefs about performing a certain clinical practice, (2) attitudes toward performing a certain clinical practice (i.e., positive or negative evaluations of the behavioral beliefs), (3) the subjective norms for performing a certain clinical practice, (4) perceived behavioral control in performing a specific clinical practice, and (5) behavioral intention to perform a specific clinical practice. These constructs can easily be modified according to specific clinical dimensions from a pool of more general items. Skills and knowledge, however, are content-specific and were not addressed in this standardized instrument template, as shown in Supplemental Digital Chart 1 (https://links.lww.com/ACADMED/A22).
We adapted the instrument to the clinical domain of preoperative therapy in invasive breast cancer by using the conference's learning objectives to operationalize adaptable measures for TPB constructs. Items were specific to the clinical domain, and no validated instrument in the clinical area existed, so we developed new items for each construct. In addition, we used clear and unambiguous language to write the items and defined the abbreviations to ensure understanding.
Format of measurement.
We rated all items on a seven-point semantic scale. Responses to questions on behavioral intention, behavioral beliefs (outcome expectation), attitudes, and subjective norms ranged from 1 (unlikely) to 7 (likely). Responses to questions on perceived behavioral control (self-efficacy) ranged from 1 (unconfident) to 7 (confident).
We sought expert feedback during the construction of preliminary items, asking National Institutes of Health CME committee members, university faculty, and conference organizers and instructors to provide feedback. An expert in measurement and quantitative methods examined the psychometric scales for dimensionality and optimal category usage-related issues. We revised the questions accordingly, constructed the initial instrument, and formatted it for cognitive and pilot testing.
The meeting organizer recommended four physicians to participate in the cognitive testing. During individual interviews, we asked these clinicians to read the questions aloud to themselves, to talk to the interviewer about their reactions, and to comment if any of the questions were difficult to understand, were hard to answer, or did not make sense. We also asked them to answer the questions one-by-one and to specify which responses they selected and why they selected them.
The section on instrument validation (below) discusses pilot testing.
Instrument validation (phase 2)
We pilot-tested the CME evaluation instrument that resulted from this process with four additional medical oncologists to identify ambiguities and difficult questions. The amount of time they took to complete the instrument was recorded; all four physicians were able to complete it in five minutes or less, which suggested that the questionnaire was of a reasonable length. They answered all items in the instrument as expected and experienced no confusion. The various responses for the same questions indicated that the questions could differentiate among respondents. The pilot testing indicated the need for no further revisions. The meeting organizer and experts reviewed the final draft instrument before it was administered to conference participants.
Data collection and analysis (phase 3)
We distributed the paper-and-pencil instruments to in-person attendees of the two-day NCI conference that was held in the Natcher Conference Center, National Institutes of Health, Bethesda, Maryland, in March 2007. The conference was a highly structured, didactic CME intervention with a synchronized Internet broadcast. The conference participants were invited to participate in the study when they picked up their registration packets. Announcement of this evaluation instrument study was made before the conference started and during the first two breaks, as a spoken notice and by using a PowerPoint slide. Completed instruments were collected during the breaks.
We obtained written informed consent from all participating physicians. The University of Maryland institutional review board reviewed and approved the research protocol.
We used SPSS software (version 14.0; SPSS, Inc., Chicago, Illinois) to perform all analyses. We used the pretest data to perform exploratory factor analysis, item analysis, and estimations of reliability and validity. We also collected posttest data, but they are not included in this analysis.
We checked the sample size to ensure that it met the criteria for the sample size:variable ratio.17 Of the 269 on-site participants, 134 physicians and 30 nonphysicians filled out the evaluation forms, for a response rate of 61%. This study concentrated on the 134 physician participants. The final instrument included 22 items, for a participant:item ratio of 6:1.
We calculated means, medians, and standard deviations for each subscale; scale scores as the total score of all item scores in a subscale; and Pearson correlations between each item score and total subscale score. We used a correlation coefficient of ≥0.20 between the item score and the total scale score as the threshold for adequacy.18 We used listwise deletion to eliminate individual items that had missing data. We calculated descriptive statistics for all of the items, subscales, and demographic variables and tested bivariate associations between all items by using Pearson correlation coefficients.
Exploratory factor analysis using principal component analysis (PCA) for initial factor extraction and principal axis factoring (PAF) for final factor extraction was used to determine the number and content of factors underlying the initial set of items. We determined the number of factors to be retained by a convergence of criteria including eigenvalues >1, the scree plot level point, and theoretical interpretability of the resulting factor structure.
We then examined the items in each subscale to see whether they were from the same theoretical domain as proposed or whether they were unexpected measurement subscales. We also examined item–total (total score of each subscale) correlation, corrected item–total correlations, and alpha-if-item-deleted to make judgments about which items to retain or drop for the final subscales. We calculated interitem correlations and Cronbach alphas for each subscale to examine internal consistency reliability.
Most of the physicians were affiliated with academic institutions (n = 79; 59.4%), community practices (n = 27; 20.3%), or government agencies (n = 19; 14.3%); some physicians had more than one affiliation. Most of the participants represented the conference target audiences: medical oncologists (n = 70; 51.9%), surgeons (n = 36; 26.7%), and radiation oncologists (n = 17; 12.6%). Male participants (n = 62) made up 47% of the sample. The physicians had a mean age of 48. The mean number of years that physicians engaged in patient care and research at the same time (12.2 years) was much higher than the mean numbers for taking care of patients only (3.8 years) and conducting research only (1.0 years).
We undertook initial extraction by using unrotated PCA and retaining as many factors as possible. The Cattell19 scree plot showed that six points beyond the line should be retained. In addition, the Cattell–Nelson–Gorsuch objective scree20 demonstrated a proportional decrease (50%) in the slope score when the sixth component was added. This finding suggested that six factors should be retained.
We compared initial and extracted communalities in PCA and PAF among the six retained factors. Initial communalities in PAF were much less than 1.0 (range: 0.16–0.85). The six factors of PCA and PAF accounted for 65.89% and 59.37% of the total variance, respectively (Table 1). The fact that PAF extraction accounted for less variance than did the PCA extraction illustrates that the factors from PAF do not account for unique variances, whereas those from PCA do. PAF was proved to be a better approach than PCA by the fact that the initial communalities were much less than 1.0. We applied direct oblimin rotation to examine factor correlations. The factor correlation matrix (Table 2) showed that all six factors were correlated with each other (nonzero values). The absolute values of the factor correlations ranged from 0.02 to 0.47, which supports the use of direct oblimin rotation.21
The factor-loading matrix with PAF had 25 items clustered onto six factors (Table 3). The first factor consisted of seven perceived behavioral control variables. The second factor consisted of three positive belief items. Five attitude items loaded on the third factor, and two negative belief items loaded on the fourth factor. The fifth factor consisted of three intention variables. The sixth factor consisted of four subjective norm variables and one intention variable. Although the item “intention to share information of preoperative breast cancer therapy” loaded onto the subjective norm scale, it was not conceptually consistent with this scale and was not retained to secure the measurement accuracy of the subscales. Table 2 shows the accounted variances for each factor. Seven items did not load on any of the factors (subscales), as shown in Supplemental Digital Chart 2 (https://links.lww.com/ACADMED/A22), which suggests that they should be eliminated.
On the basis of the item analysis for the six subscales, we developed a 22-item instrument. Table 4 shows the subscale names, their reliability coefficients, and the number of items in each final subscale.
Four subscales retained all of the items from the original factor loadings. The perceived behavioral control subscale had a Cronbach alpha coefficient of 0.94, and all interitem correlations indicated good correlations among the scale items,22 so we retained all seven items. The five-item attitude subscale had an acceptable Cronbach alpha coefficient of 0.90, and all interitem correlations met the criteria, so we retained all five items. The subjective norm subscale contained four items with a Cronbach alpha coefficient of 0.91, so we retained all four items.
The positive belief scale contained three items with a Cronbach alpha coefficient of 0.73. The item–total correlation of one item (0.46) was much lower than those of the other two items (0.61 and 0.60), which is consistent with interitem correlation results and which suggests the deletion of the item (i.e., “Preoperative [as opposed to postoperative] systemic chemotherapy will lead to a lower mortality rate in operable breast cancer patients”). Alpha value improved to 0.76 after the item elimination. The negative belief subscale contained two items with a Cronbach alpha coefficient of 0.74 and a good interitem correlation of 0.54. The behavioral intention subscale contained three items with a Cronbach alpha coefficient of 0.81, but two of three interitem correlations did not meet the criteria of being higher than 0.5.22 The alpha value improved to 0.88 after elimination of one item: “intention to apply knowledge of preoperative systemic chemotherapy in developing or deciding to participate in research studies as a researcher.”
CME evaluation activities vary greatly by study design, intervention strategy, length of follow-up, clinical domain, and target audience.8 Special considerations are required in interpreting study results and in generalizing results to other CME activities. Although we developed the current instrument for a fairly homogenous set of breast cancer clinicians participating in a highly structured, didactic CME intervention, modifications to the instrument could be made that would have broader utility for different clinical specialties and, possibly, for different types of CME activity.
Our use of expert review and cognitive testing to validate the questionnaire items was one advantage of this study. According to a recent review of CME evaluation research,8 evaluators have often used surveys or questionnaires without either psychometric testing for validity and reliability or cognitive testing in the development process.14,23,24 Expert review, pilot testing, and cognitive testing should be standard procedures in any process of developing a survey instrument.25 In the current study, experts helped identify the most acceptable response scale format, and cognitive testing ensured that instrument items were measuring the theoretical concepts as intended. Pilot testing showed that the completion time was reasonable for on-site administration. In addition, the instrument was shown to have good content validity: Each iteration required fewer adjustments.
All of the items loaded on the TPB constructs used to develop them with “good” to “excellent” loadings, except the belief items. According to the criteria of Comrey and Lee,26 these subscales have evidence for validity and could be confidently named as perceived behavioral control, attitudes, and subjective norms. These results also supported the reliability of the three subscales.27 Belief items were developed as a single scale, although they loaded separately on two factors, “positive beliefs” and “negative beliefs.” According to TPB, attitude is determined by the behavioral belief score (rated from −3 to +3) weighted by an evaluation of each individual belief (rated from −3 to +3). This arrangement captures the psychology of double negatives.28 Instead of using a bipolar scoring system, in the current study, we numbered beliefs from 1 to 7. Responses to the negative belief items were reverse-coded so that a higher score on positive beliefs and a lower score on negative beliefs related to a more positive attitude toward adapting the preoperative breast cancer therapy. Beliefs about a new clinical practice can be positive or negative. In general, TPB does not separate positive and negative beliefs into different constructs, and previous CME belief scales have had only one belief subscale. However, the decisional balance construct in the transtheoretical model29 provided a rationale for our results, differentiating between the pros and cons of behavior change and reflecting the individual's relative weighing of them.29
Overall, all subscales had sufficient reliability (alpha ≥0.65) for early-stage instrument development.30 In addition, scale modifications for this study were based on item analyses that considered item–total correlations, interitem correlations, and alpha-if-item-deleted values. The item analyses conducted for the six subscales and the scale modifications that were based on the results provided an additional benefit to this research. Although three preexisting CME evaluation questionnaires that addressed variables in the second evaluation level provided validity and reliability information, all of those studies calculated only Cronbach alpha coefficients; the investigators did not conduct item-level analyses to examine the scale reliability.14,23,24 For subscales with borderline Cronbach alpha coefficients, in our study we were able to eliminate problematic items; these subscales maintained a strong content validity after the item deletions. Although Cronbach alpha is positively related to the number of items in the scale,31 the elimination of items in the two subscales of positive beliefs and behavioral intention increased Cronbach alpha coefficients and interitem correlations.
CME activities often vary greatly by clinical domain and by target audience. Therefore, to develop valid and reliable evaluation instruments, CME conference organizers, first, must modify the items in the template (Supplemental Digital Chart 1; https://links.lww.com/ACADMED/A22) in consultation with subject-area experts to address the specific medical content. Second, cognitive testing with a sample of the target population is recommended to ensure consistency between the information in the instrument and the message delivered to the participants. Third, draft instruments should be revised according to the cognitive testing results and the experts' feedback. Fourth, completion of the revised survey by another sample of clinicians from the target population would identify ambiguities and difficult questions and provide information on the time needed to complete the instrument.
Several limitations should be mentioned with regard to the results and implications of this study. The first limitation is the relatively small on-site sample size. Because there was synchronized Internet broadcast of the conference, half of the registrants participated in the conference from their own offices, which limited the number of on-site participants. Second, despite a diverse population, representativeness of the target audience, and a good response rate16 for a full exploratory model,17 the sample was somewhat small for the purpose of psychometric analyses. Third, three constructs of the instrument were represented by a small number of items. Instead of deletion, modifications could be made for the problematic items in the future to increase the number of items per subscale. Fourth, given that some studies have shown that physicians may inaccurately assess their own level of competence,4 social desirability bias may be evident in self-reported data and may threaten validity. Fifth, preoperative breast cancer therapy is a controversial area. The selection bias of the sample made the results less representative of the more general physician population. That is, physicians attending the CME event were more likely to be engaged in research and to be supportive with respect to practicing the preoperative breast cancer therapy. In addition, the conference was held in a large conference facility in a major city, and the results may not be generalizable to physicians practicing in smaller communities. Sixth, although this instrument was adapted from the CME evaluation instrument template (see Supplemental Digital Chart 1; https://links.lww.com/ACADMED/A22), it was designed uniquely for breast cancer clinicians who were participating in a didactic CME intervention. Special considerations should be made in interpreting the study results and generalizing these results to other CME activities with respect to both content and educational strategy (e.g., didactic versus hands-on).
Recommendations for future studies
The current study was able to develop a clinical domain-specific CME evaluation that examined the validity and reliability of this instrument. This development process can serve as a model for creating other CME evaluation instruments. In future CME evaluation studies, evaluation designs should include several measurement observations including pretest and posttest, with the possibility of additional follow-up data collection to further evaluate the ability of the instrument to measure longer-term changes in attitudes, beliefs, intentions, and, ultimately, changes in clinical practice. Such longitudinal research could also help determine the duration of CME intervention effects, detect the transfer of the intervention to clinical practice and patient health outcomes, and identify strategies needed to prevent the decay of effects. As suggested in a previous review,8 12 months may be the minimum period of follow-up needed to detect changes in physician behavior.
A larger sample size in future studies using the adapted instrument could increase the participant:item ratio, interpretability of factors, and the feasibility of item analyses for subscales, changes that could be used to confirm these initial study results. As discussed earlier, one important reason for the small sample size was the use of a synchronized Internet broadcast of the conference, and future evaluations should develop methods for online response, given the widespread and increasing use of Web-based CME. Future studies also should include comparisons of participants attending on-site with participants participating via the Web. This strategy for additional data collection could potentially help build a multitrait–multimethod matrix from which to assess the convergent and discriminant construct validity of the instrument reported here.22
Although this study indicated acceptable reliabilities for all the subscales, the development and addition of two new items to the subscales of positive beliefs, negative beliefs, and behavioral intentions could increase the number of items per subscale. The addition of these items also would further improve reliability.
Future research could adapt the developed template to other CME activities to examine whether the items and their corresponding constructs can evaluate other CME conferences addressing different clinical domains. These future evaluation efforts may yield information about whether and how the template instrument adaptation process can be streamlined.
The current study established the reliability and validity of the adapted CME evaluation instrument that was based on a theoretical model. These preliminary findings could serve as a research base for future TPB theory testing research on CME evaluation activities. A structural equation model could be built on the validated interfactor and factor–variable relationships to further confirm the underlying theoretical framework of this instrument as well as to investigate the nuances among those relationships.
The authors acknowledge Dr. Barnett Kramer, Sylvia K. Scherr, Dr. Jo Anne Zujewski, Robert S. Gold, Robert Feldman, and Suzanne M. Randolph for providing professional advice and assistance for this study.
This study was supported by National Institutes of Health/Office of Director Evaluation Express Award no. 07-1005 OD-ODP.
The institutional review board of the University of Maryland approved the study protocol.
1Accreditation Council for Continuing Medical Education. ACCME Accreditation Policies. Chicago, Ill: Accreditation Council for Continuing Medical Education; 2008.
2American Medical Association. State Medical Licensure Requirements and Statistics. Chicago, Ill: American Medical Association; 2009.
3Davis D, O'Brien MA, Freemantle N, Wolf FM, Mazmanian P, Taylor-Vaisey A. Impact of formal continuing medical education: Do conferences, workshops, rounds, and other traditional continuing education activities change physician behavior or health care outcomes? JAMA. 1999;282:867–874.
4Davis DA, Mazmanian PE, Fordis M, Van HR, Thorpe KE, Perrier L. Accuracy of physician self-assessment compared with observed measures of competence: A systematic review. JAMA. 2006;296:1094–1102.
5Curran VR, Fleet L. A review of evaluation outcomes of Web-based continuing medical education. Med Educ. 2005;39:561–567.
6Jaussent S, Labarere J, Boyer JP, Francois P. Psychometric characteristics of questionnaires designed to assess the knowledge, perceptions and practices of health care professionals with regards to alcoholic patients [in French]. Encephale. 2004;30:437–446.
7Marinopoulos SS, Dorman T, Ratanawongsa N, et al. Effectiveness of Continuing Medical Education. Evidence Report/Technology Assessment No. 149. Rockville, Md: Agency for Healthcare Research and Quality; 2007. AHRQ Publication No. 07-E006.
8Tian J, Atkinson NL, Portnoy B, Gold RS. A systematic review of evaluation in formal continuing medical education. J Contin Educ Health Prof. 2007;27:16–27.
9Kirkpatrick DL. Evaluating Training Programs: The Four Levels. San Francisco, Calif: Berrett-Koehler; 1994.
10Millstein SG. Utility of the theories of reasoned action and planned behavior for predicting physician behavior: A prospective analysis. Health Psychol. 1996;15:398–402.
11McDermott MM, Hahn EA, Greenland P, et al. Atherosclerotic risk factor reduction in peripheral arterial diseasea: Results of a national physician survey. J Gen Intern Med. 2002;17:895–904.
12Montano DE, Phillips WR, Kasprzyk D. Explaining physician rates of providing flexible sigmoidoscopy. Cancer Epidemiol Biomarkers Prev. 2000;9:665–669.
13Cabana MD, Rand CS, Powe NR, et al. Why don't physicians follow clinical practice guidelines? A framework for improvement. JAMA. 1999;282:1458–1465.
14Jacobs LM, Burns KJ, Luk SS, Marshall WT 3rd. Follow-up survey of participants attending the Advanced Trauma Operative Management (ATOM) course. J Trauma. 2005;58:1140–1143.
15Ajzen I. The theory of planned behavior. Organ Behav Hum Decis Process. 1991;50:179–211.
16Babbie ER. The Practice of Social Research. Stamford, Conn: Cengage Learning; 2006.
17Gorsuch RL. Factor Analysis. 2nd ed. Hillsdale, NJ: Lawrence Erlbaum Associates; 1983.
18Torabi MR, Jeng I. Health attitude scale construction: Importance of psychometric evidence. Am J Health Behav. 2001;25:290–298.
19Cattell RB. The scree test for the number of factors. Multivariate Behav Res. 1966;1:629–637.
20Nasser F, Benson J, Wisenbaker J. The performance of regression-based variations of the visual scree for determining the number of common factors. Educ Psychol Meas. 2002;62:397–419.
21Pett MA, Lackey NR, Sullivan JJ. Making Sense of Factor Analysis: The Use of Factor Analysis for Instrument Development in Health Care Research. Thousand Oaks, Calif: Sage; 2003.
22Trochim W. The Research Methods Knowledge Base. Mason, Ohio: Atomic Dog Publishing; 2006.
23Sanci LA, Coffey CM, Veit FC, et al. Evaluation of the effectiveness of an educational intervention for general practitioners in adolescent health care: Randomised controlled trial. BMJ. 2000;320:224–230.
24Sanders MR, Tully LA, Turner KM, Maher C, McAuliffe C. Training GPs in parent consultation skills. An evaluation of training for the Triple P-Positive Parenting Program. Aust Fam Physician. 2003;32:763–768.
25Collins D. Pretesting survey instruments: An overview of cognitive methods. Qual Life Res. 2003;12:229–238.
26Comrey AL, Lee HB. A First Course in Factor Analysis. 2nd ed. Hillsdale, NJ: Erlbaum; 1992.
27Guadagnoli E, Velicer WF. Relation of sample size to the stability of component patterns. Psychol Bull. 1988;103:265–275.
28Glanz K, Lewis FM, Rimer BK. Health Behavior and Health Education: Theory, Research, and Practice. 3rd ed. San Francisco, Calif: Jossey-Bass Inc.; 2003.
29Prochaska JO, DiClemente CC. Handbook of Psychotherapy Integration. 2nd ed. New York, NY: Oxford University Press; 2005.
30Nunnally JC, Bernstein IH. Psychometric Theory. New York, NY: McGraw-Hill; 1994.
31Hair JF, Anderson RE, Tatham RL, Black WC. Multivariate Data Analysis. 5th ed. Upper Saddle River, NJ: Prentice Hall; 1998.