An Empirical Validity Study of a Preceptor Evaluation Instrument : Academic Medicine

Secondary Logo

Journal Logo


An Empirical Validity Study of a Preceptor Evaluation Instrument


Editor(s): Stern, David T. MD, PhD

Author Information
  • Free

As clinical medical education has shifted to the outpatient setting, the assessment of instructional quality within ambulatory sites has become increasingly important.1 Leading medical educators have judged the measurement of the clerkship process a prerequisite to improving its educational value.2 Clinical instruction is dependent on well-trained clinician teachers, motivated learners, and an environment conducive to the learning process.3 The focus of faculty development for community preceptors has emphasized enhancing teaching skills of preceptors with less emphasis on the importance of the clinical site as a contributor to instruction.

In response to concerns about educational quality in ambulatory settings, the MedEd IQ™ was developed to measure four dimensions of educational process within the outpatient environment.4 Three subscales on this instrument are designed to measure constructs related to clinical site characteristics for learning. These include the learning environment, learning opportunities, and learner involvement. The fourth scale assesses the activities of the preceptor. The preceptor scale has been shown to correlate moderately with constructs measured by the site-related subscales.5 It is unknown, however, whether the preceptor construct, as measured by the MedEd IQ preceptor scale, primarily reflects the teaching capacity of the site or the teaching attributes of the individual preceptor.

In completing the MedEd IQ, students rate three site characteristics as part of a unitary rating structure, but each individual preceptor is evaluated separately. Mean preceptor scores are provided for each preceptor. For the site as a whole, mean scores are reported on the three constructs of learning environment, learning opportunities, and learner involvement. A student's evaluation of site characteristics on the MedEd IQ might be dependent on influences of the preceptor's instructional style, or alternately, site characteristics might influence how students view their preceptor. The purpose of this study was to examine the degree to which scores from the preceptor scale are sensitive to site characteristics.


During the 2000–01 academic year, 1,605 first- and third-year students from nine medical schools rated 522 sites and 1,056 preceptors using the MedEd IQ instrument. Over one third of the sites (36%) within this sample had two or more preceptors, but individual preceptors were always rated within the context of a single site, reflecting the fact that in practice physician—preceptors typically work and teach within a single clinic. Because of the nesting of preceptors within a single site, it is not possible to directly estimate a preceptor reliability that is independent of site influences. Instead, an indirect method employing two sampling schemes is used here to answer reliability and validity questions related to preceptor measurement. To estimate the proportion of preceptor score variance attributable to the site, two samples of ratings were selected.

The samples. In the first sample, designated sample A (multiple preceptors per site), preceptor ratings for different preceptors within each site were randomly sampled to estimate variance components reflecting the site-related influences. For the second sample, sample B (one preceptor per site), multiple ratings for a single preceptor per site were randomly selected. Sample A was designed to gauge the site-related variance of the preceptor scale, and sample B was selected to provide an estimate of the combined influences of a site and a preceptor.

Data eligible for inclusion in sample A consisted of preceptor ratings from sites that utilized and had ratings for two or more preceptors. To maximize the number of observations while maintaining a balanced design, ratings by two student raters of two preceptors within each site were randomly selected. This yielded a total of 378 preceptor observations within 189 sites. In sample B, ratings for any preceptor rated two or more times were eligible for inclusion. The random sampling of two ratings for one preceptor per site yielded a total of 526 raters evaluating 263 preceptors. Figure 1 summarizes the two sampling schemes.

Figure 1:
A diagram of two sampling techniques: In sample A, raters [r] rate different preceptors [p] within each site [gray circles] (variance from site). In sample B, raters rate the same preceptor within each site (variance from site and preceptor).

The analyses. Generalizability analysis6 allows one to examine the dependability of behavioral measurements. The summary statistics generated by the analysis allows an estimate of the accuracy of generalizing from an observed score to an average score obtainable over repeated measurements. Analysis of variance techniques allow an estimate of each effect that might influence the score by using mean-square calculations to partition various influences. In such an analysis, one estimated effect is regarded as “true score” variance and the other effects as error. In this respect, it is similar to classical test theory, which also partitions variance into these two broad categories. However, generalizability theory allows each source of error to be estimated separately and the “true score” variances to be estimated within a context appropriate to the generalization. Because it is capable of generating variance estimates attributable to specific influences within the measurement process, it is also more flexible in addressing validity questions, as demonstrated in this study.

A generalizability analysis (G study) was conducted on each of the two samples. Genova™ software was used to calculate variance components. For sample A, a raters-nested-within-site [r:s] analysis of the preceptor score was conducted. For sample B, a raters-nested-within-preceptor [r:p] analysis was preformed on the preceptor score. The object of measurement for the analysis of sample A was the site. Sample B had as the object of measurement the preceptor. However, as previously mentioned, the preceptor is evaluated only within the context of a single site. Hence, estimated preceptor effects within sample B were effectively entangled or confounded with site influences. The practical effect of this confounding is that estimated preceptor effects obtained within the G study of sample B reflected the influences of systematic preceptor and site factors and their interaction. Hence, sample B object-of-measurement variance equaled a sum of preceptor and site influences and their interaction. On the other hand, sample A, containing multiple preceptors rated once within each site, did provide an unconfounded estimate of the site-related variance of scores generated by the preceptor scale.

A comparison of the object of measurement variances from the two G studies described provided an empirical method for isolating and gauging the proportion of variance attributable to the site. This was possible as the analysis of sample A provided an estimate of the variance associated with site, or as described in equation 1, sample A object of measurement variance (σA2) is an estimate of site variance (σsite2).

The object of measurement in sample B provided the summed contribution of site, preceptor, and interaction influences on preceptor rating. Since variance is additive, these contributions to sample B object-of-measurement variance (σB2) can be represented as equation 2.

Therefore, the proportion of variance related to site can be estimated as the ratio of object-of-measurement variance in analyses A and B. Equation 3 demonstrates this relationship.


The results of the two generalizability analyses are displayed in Table 1. The analysis of sample A, shown in the top half of the table, reveals that for a single rating, approximately 14% of the score variance was attributable to influences of the site. The bottom portion of the table, displaying the results of the analysis of sample B, suggests that the combined influence of preceptor, site, and interaction on a single preceptor rating represented approximately 31% of the variance. The proportion of site-related variance is estimated, using equation 3, as .14/.31 or 45%.

Results of a G Study of Sample A and Sample B


The relevance of the clinical site as a contributor to instructional quality has been debated. Researchers have focused on the assessment of teacher and site characteristics, particularly in academic health centers, but with an emphasis on teaching rather than learning.7,8,9,10 These studies suggested that the teaching environment had little impact on student perceptions of effective teaching. However, this finding contrasts with what has been found in primary care settings, where the environment has been shown to be an important contributor to learning.11,12,13,14

Our study for the first time objectively demonstrates in quantitative terms the relationship between clinical teacher and site of instruction. The performance of a preceptor, as gauged by ratings on the MedEd IQ preceptor scale, is related to the clinical site in which the preceptor practices. This study demonstrates that slightly less than half of the true score differences between preceptors can be attributed to the clinical site. This implies that the interpretation of a preceptor's score should be considered in reference to the site in which the ratings took place. It is possible that preceptors practicing within a poorly rated site are more likely to receive lower ratings than are preceptors teaching in a highly rated site. Therefore, it may be necessary to statistically adjust a preceptor's mean score based on results from the site-related subscales to get a more accurate site-independent assessment of the preceptor's performance. Further research is needed to determine the optimal procedure for such an adjustment.

The usefulness of these results depends on the degree to which our sample is representative of the larger population of MedEd IQ users. Several sources of potential bias may have impacted the results. For instance, our two random samples of sites were selected to maximize the number of observations in order to produce more stable estimates. However, in choosing this strategy, sample A included a higher proportion of larger clinic sites than sample B. This occurred as a product of our sampling selection criteria for sample A, which included any site with two or more preceptors. To investigate possible systematic bias, we examined the mean preceptor ratings in the two samples. Sample A had a mean of 5.141 and sample B had a mean of 5.147. They were nearly identical. In addition, sample A contained approximately 70% of the same sites as sample B. It seems systematic biasing influences are unlikely. It is possible, however, that among the 30% unique sites in sample B the site-by-preceptor interaction term may have been somewhat different. However, it seems unlikely that the magnitude of such a bias would be large, given this difference would be present in only 30% of the observations. Our choice to maximize the number of observations may also have added another potential source of bias. With the inclusion of both first- and third-year medical students in our sample, it is possible that rating styles are somewhat different in these two student populations, and the results within these two subpopulations may vary. However, an informal follow-up analysis using just students from the third year showed results very similar to those obtained in the combined sample.

These findings also have important implications for faculty development of clinical preceptors. Enthusiastic clinical teachers are best when they have clinical sites that can adapt to the educational mission for learners. While we suspect that many of our clinicians who are in private practice may be able to control these variables, some ambulatory offices may not be able to adapt to the learner. Further research is under way to assess the types of learning environments that make clinical instruction more effective.


1. Council on Graduate Medical Education. Physician Education for a Changing Health Care Environment. Thirteenth Report. Rockville, MD: US DHHS HRSA, March 1999.
2. Bordage G, Burack JH, Irby DM, Stritter FT. Education in ambulatory settings: developing valid measures of educational outcomes, and other research priorities. Acad Med. 1998;73:743–50.
3. Shipengrover JA, James PA. Measuring instructional quality in community-oriented medical education: looking into the black box. Med Educ. 1999;33:846–53.
4. James P, Osborne J. A measure of medical instructional quality in ambulatory settings: the MedIQ. Fam Med. 1999;31:263–9.
5. James PA, Kreiter CD, Shipengrover J, Crosson J. Identifying the attributes of instructional quality in ambulatory teaching sites: a validation study of the MedEd IQ. Fam Med. In press.
6. Brennan RL. Generalizability Theory: Statistics for Social Science and Public Policy. New York: Springer Verlag, 2001.
7. Irby DM. Clinical teacher effectiveness in medicine. J Med Educ. 1978;53:808–15.
8. Irby DM, Ramsey PG, Gillmore GM, Schaad D. Characteristics of effective clinical teachers of ambulatory care medicine. Acad Med. 1991;66:54–5.
9. MacDonald PJ, Bass MJ. Characteristics of highly rated family practice preceptors. J Med Educ. 1983;58:882–93.
10. Usatine R, Nguyen K, Randall J, Irby D. Four exemplary preceptors' strategies for efficient teaching in managed care settings. Acad Med. 1997;72:766–9.
11. Biddle WB, Riesenberg LA, Darcy PA. Medical students' perceptions of desirable characteristics of primary care teaching sites. Fam Med. 1996;28:629–33.
12. Gruppen LD. Implications of cognitive research for ambulatory care education. Acad Med. 1997;72:117–20.
13. Jacobson E, Keough W, Dalton B, Giansiracusa D. A comparison of inpatient and outpatient experiences during an internal medicine clerkship. Am J Med. 1998;104:159–62.
14. Prislin MD, Fitzpatrick C, Munzing T, Oravetz P, Radecki S. Ambulatory primary care medical education in managed care and non-managed care settings. Fam Med. 1996;28:478–83.
© 2002 by the Association of American Medical Colleges