The traditional organization of medical education into distinct clinical rotations, years, and phases (i.e., undergraduate and graduate or postgraduate) runs counter to the longitudinal model of competency-based medical education.1 Providing information about learners’ performance across clinical rotations or phases, often referred to as forward-feeding or educational handover, may help to mitigate educational discontinuity and may benefit learners, programs, and patients.2,3 Learners could benefit from earlier provision of tailored (i.e., individualized) learning opportunities, graded supervision, and targeted feedback from supervisors.2–6 Programs and patients could benefit from faster identification and management of learner weaknesses, which could include failing or even excluding incompetent learners when necessary.2,3,5–7 Despite these promising benefits, adopting educational handover carries potential risks. Providing raters with information about learners’ past performance could inappropriately influence subsequent assessment. Assessing clinical performance necessitates judgment and involves selectively attending to observable aspects of clinical performance, processing this information, and synthesizing it.8 Receiving information about learners’ past performance could affect what raters pay attention to or how they weigh and interpret what they observe. If the information received in the context of educational handover is inaccurate or irrelevant to subsequent assessments (e.g., if poor performance in a previous rotation was due to resolved personal issues or context-specific factors no longer present in the current rotation), then that information could introduce measurement error into subsequent assessment, jeopardize the validity of assessment, and potentially lead to the development of self-fulfilling prophecies (whereby teachers and learners behave in ways that lead to learner performance conforming to inaccurate expectations).4,6,7,9
Educational and social psychologists have examined whether information about previous performance biases subsequent assessment. In educational psychology, a meta-analysis of experimental studies on bias in grading showed that knowledge of poor previous performance did bias subsequent assessment, with a small to medium effect size10; however, this finding is based on only 3 studies. In social psychology, studies have shown that written reports on previous performance can lead to judgment bias, with medium to large effect sizes.11–13 The differences in effect sizes across the 2 psychology disciplines10–13 could be due to differences in the experience levels of raters, as has been tentatively suggested in educational psychology studies.10 Teachers might be expected to be more experienced raters—and, hence, less prone to biases—than participants in social psychology studies (in 2 of the studies, participants were undergraduate college students12,13; the third study provided no details regarding recruitment11). A recent scoping review of the multidisciplinary literature on providing raters with prior performance information indicated that studies consistently showed the existence of an effect, although the magnitude of the effect varied according to multiple factors. Factors included not only whether the information provided was negative or positive (with larger effect sizes for negative information) but also how positive or negative the information was (with a “dose–response effect”).14 Other factors included rater factors (e.g., experience, mindset, need for cognition) and context factors (e.g., cognitive load).14 Regardless of the effect size, the literature in areas outside of health professions education supports the notion that educational handover does entail some risk; however, experimental studies are scarce, particularly in health professions education contexts. As implementation of formal educational handover proceeds,15 clarifying the circumstances in which the benefits may outweigh the risks is crucial, which requires empirical data on the existence and potential magnitude of benefits and harms, as well as on the moderating variables that would maximize benefits and minimize harms.
As a step toward developing empirical evidence, we conducted this study to examine potential bias from educational handover on workplace-based assessment scores in medical education. We hypothesized that supervisors presented with handover reports mentioning weaknesses would provide lower assessment scores and more negative comments than those who did not receive learner reports. We also wanted to examine whether educational handover would influence the focus of feedback since targeted feedback based on areas of weakness is one of its potential benefits. We hypothesized that when the handover report mentioned a specific area of weakness, supervisors would provide more comments targeting that area. Finally, we sought to examine the potential role of rater variables in moderating any effects, specifically experience as a rater and rater mindset. Mindsets, or implicit theories, are beliefs about the malleability of human attributes such as intelligence or moral character.16 People with growth mindsets believe that attributes can develop or evolve, whereas people with fixed mindsets believe that such attributes are unchangeable. One study in organizational psychology found that supervisors with a growth mindset were more likely to detect change in employee performance than those with a fixed mindset.17 We hypothesized that supervisors with a fixed mindset would be more likely to be biased by educational handover than those with a growth mindset.
We conducted this mixed-methods, randomized, controlled, experimental study18 in 2018. Participants all viewed the same 2 videos (in the same order) of simulated resident–patient encounters and assessed residents’ performance. We randomized participants into 3 groups that differed based on educational handover condition: (1) no educational handover report (control group), (2) educational handover report indicating weaknesses in medical expertise, and (3) educational handover report indicating weaknesses in communication. The institutional review board (IRB) of the McGill Faculty of Medicine approved this study (IRB# A02-B08-17B).
Participants and recruitment
Using G*Power 3.1 (Düsseldorf, 2014),19 we calculated that a total sample size of 69 clinical supervisors would enable the detection of a moderate effect size of 0.3, with an alpha level of 0.05, and power of 0.8, assuming low correlations between measures of 0.2. Given the relatively large sample size needed, we designed the scenarios to be applicable to the educational context of several generalist-oriented residency training programs. To recruit participants, the principal investigator (V.D.) emailed program directors in family medicine, surgery, internal medicine, emergency medicine, and pediatrics at McGill University, asking them to circulate the invitation email to clinical supervisors in their programs. While the exact population size is difficult to determine given our recruitment approach, program directors reported circulating the email to, collectively, approximately 700 supervisors (although this number includes supervisors who may have been contacted by several program directors). The principal investigator also presented the study at departmental and educational meetings in the relevant departments. To avoid biasing recruitment and cueing potential participants, these oral presentations of the study and invitation emails were intentionally vague about the study purpose; however, the emails and descriptions did mention that the study focused on improving the medical education community’s understanding of workplace-based assessment.
Two investigators who are experienced clinical educators (L.H.P. and D.D.) drafted 2 different scripts, each describing a resident taking a history from a patient. Importantly, the resident and patient were unique to each script. Even though the 2 scripts featured a different resident and patient, both scripts depicted a 17-year-old female patient who might present in either adult or pediatric settings, which enabled us to recruit participants from several disciplines including pediatrics. We chose clinical presentations (neck lump and acute abdominal pain) that would be relevant to the range of programs selected for participant recruitment. To reduce potential gender effects, the 2 scripts involved (different) male residents. In both scripts, the resident performed at a level considered average for the end of year 1 or beginning of year 2 of residency, and in both scripts, the resident displayed minor weaknesses in medical expertise as well as in communication. A third investigator (B-.A.C.) provided feedback on the draft scripts, leading to revisions. Two clinical educators (from the Department of Medicine and the Department of Pediatrics) who were not involved in the study read the revised scripts and assessed the fictitious residents (providing scores and comments using the assessment instrument described below) to ensure (1) that the scripts portrayed the intended performance level and characteristics and (2) that the scripts were realistic. Their comments were consistent with the intended strengths and weaknesses, and scores were in the intended average range (average score of 5.7/9 for Script 1 [neck lump] and 3.8/9 for Script 2 [abdominal pain]). We therefore proceeded with filming, using the 2 scripts with no further revisions.
Educational handover reports.
In 2 of the 3 conditions, we provided participants with educational handover reports before they viewed each video. The first author (V.D.) drafted the reports and 3 clinical educators on the team (L.H.P., D.D., and B-.A.C.) provided feedback. The reports were in the format of traditional In-Training Assessment Reports (previously referred to as In-Training Evaluation Reports or ITERs). The reports included 2 items for each competency domain (CanMEDS framework20) and provided a 4-point rating scale (1 = Unsatisfactory, 2 = Borderline, 3 = Satisfactory, 4 = Superior) along with a “could not judge” option. This format is similar to the assessment forms used in residency programs at McGill University, so we felt it would be familiar to study participants. Each report also provided comments in a free-text box. In the report, all items were scored as satisfactory, and weaknesses were described only in the free-text comments, which we felt would reflect common practice for minor weaknesses.21 The content of comments differed in each condition, suggesting minor weaknesses either in medical expertise (e.g., synthesizing information, line of questioning) or in communication (e.g., shy, awkward, distant). Other comments were identical for both conditions in which we provided educational handover information. See Supplemental Digital Appendix 1 at http://links.lww.com/ACADMED/A980.
Participants assessed the simulated residents using the mini-Clinical Evaluation Exercise (mini-CEX) form.22 The mini-CEX form has garnered validity evidence as a tool to assess competence from direct observations in the workplace.23,24 As the videos did not portray physical examination or counseling, we did not include those 2 items on the form. This shortened form prompted participants to score each resident’s performance on 5 items (quantitative data) using a 9-point scale (1–3 = Unsatisfactory, 4–6 = Satisfactory, 7–9 = Superior) and to provide narrative comments on the performance (qualitative data). See Supplemental Digital Appendix 2 at http://links.lww.com/ACADMED/A980.
Survey on participant characteristics.
Participants completed a questionnaire, allowing us to gather information on the following: demographic variables (age, gender), clinical and educational variables (specialty, years of experience supervising, years of experience assessing), and mindset. For mindset, we used items based on Dweck and colleagues’ original instrument to measure mindsets regarding intelligence,16 as well as items adapted to measure mindsets regarding clinical reasoning and empathy.25 In a previous pilot study of clinical supervisors from other specialties, the instruments to measure mindsets about clinical reasoning and empathy had high internal consistency, moderate test–retest reliability, and evidence of divergent validity (i.e., measures for intelligence, clinical reasoning, and empathy had low intercorrelations, while measures for empathy and moral character were strongly correlated).25
The recruitment email contained basic information about the study and a link to the online platform, LimeSurvey (LimeSurvey GmbH, Hamburg, www.limesurvey.org). The link randomly provided 1 of 3 further links, each of which led to one version of the experiment, thus creating 3 random groups each with a different experimental condition.
Each of the links provided the same information about the study and a question to indicate consent. As during recruitment, we withheld details (partial deception) about the specific purpose of the study (examining the influence of educational handover) and the randomization. After participants completed the online study, a debriefing page provided a full explanation regarding the study purpose and methods, including information regarding the initial partial deception. Participants could then reaffirm or withdraw their consent immediately, or they could request a delay to consider their response. We asked the participants who wished to receive an honorarium (an Amazon gift card worth 100 Canadian dollars) to provide their email address. We separated the files containing participant email addresses and questionnaire responses.
Narrative data transformation.
We performed content analysis on the narrative comments,18,26 which involved breaking down participants’ comments into units of meaning and coding each unit. In the first step, 2 independent coders (S.T.G. and N.E.P.), blinded to participant condition, began by familiarizing themselves with the participants’ comments on each resident’s performance. The coders then independently reviewed 10 quotations about Video 1, breaking down each quotation into units of meaning and inductively assigning a single code to every unit. They then discussed their coding strategy until they reached consensus. They consulted a third coder (V.D.) for any discrepancies. This iterative process continued in batches until they had coded all the quotations for both videos. Following this, the third coder (V.D.) reviewed all the coded quotations, and the 3 coders discussed the codes until they reached a final consensus. To quantify the qualitative data, we then categorized codes deductively, as positive or negative comments, and as pertaining to medical expertise, communication, or other. Finally, we counted the number of comments in each coding category. We used Atlas.ti software (2018, Berlin Scientific Software Development GmbH, Berlin, Germany) to perform the data coding and count the number of comments in each category for each participant.
We first conducted descriptive analyses of participant characteristics, the mini-CEX item scores for each resident, and the quantified qualitative data for each resident (as described above; i.e., percentage of negative and positive comments and percentage of medical-expertise-related and communication-related comments). Next, we examined differences between participant characteristics in each condition using one-way analyses of variance (ANOVAs) for continuous variables (age, number of years supervising residents, number of years assessing residents), and chi-square analyses for categorical variables (gender, specialty, and mindsets). We considered P < .05 to be statistically significant.
We examined differences in the overall scores of the residents for each video in each condition using a repeated-measures ANOVA, in which the main effect was experimental condition (control, communication weaknesses, or medical expertise weaknesses), and the within-subjects effect was video (Video 1 or Video 2). We compared the percentages of positive and negative comments and the percentages of medical-expertise-related and communication-related comments between experimental conditions using one-way ANOVAs. Due to restrictions in our coding software, we were unable to analyze the narrative data by video. For significant findings, we conducted post hoc Tukey tests.
We examined the influence of 3 moderating variables—gender, mindset, and experience assessing residents—on overall assessment scores by conducting separate repeated-measures ANOVAs, in which experimental condition and moderating variable were the main effects; the within-subjects effect was Video; and the effect of interest was the interaction between experimental condition and moderating variable. We transformed the continuous variable for experience assessing residents into 3 categories (i.e., less than 5 years, 5–10 years, and more than 10 years).
We conducted all analyses using IBM SPSS for Windows versions 24 and 25 (IBM Corp., Armonk, New York, 2016).
Seventy-three clinical supervisors completed the study. Of these, the 72 who reaffirmed consent after viewing the debriefing information were included in the study. Although we had randomly allocated them into their experimental groups, 1 group was larger than the 2 others. Specifically, 21 were in the control group, 21 in the group receiving educational handover indicating weaknesses in medical expertise, and 30 in the group receiving educational handover indicating weaknesses in communication. We detected no differences in demographic characteristics, rater experience, or mindset across the 3 experimental groups. Few participants had fixed mindsets for empathy (n = 12, 17%), and none had fixed mindsets for clinical reasoning. See also Table 1.
Overall mini-CEX scores were in the intended average range (Video 1: 5.1/9, 95% confidence interval or CI [4.8, 5.5]; Video 2: 4.9/9, 95% CI [4.6, 5.2]). Overall mean scores across the 2 videos for each condition were as follows: control 5.2, 95% CI [4.6, 5.7]; medical expertise 5.0, 95% CI [4.4, 5.6]; and communication 4.9, 95% CI [4.5, 5.3]. We detected no effect for educational handover report (F(2, 69) = 0.31, P = .74), no effect for video (F(1, 69) = 2.28, P = .14), and no interaction effect between video and handover condition (F(2, 69) = 1.90, P = .16). See also Table 2.
In the moderating variable analyses, we found no interaction effects for gender, assessment experience, or mindset (see Supplemental Digital Appendix 3 at http://links.lww.com/ACADMED/A980). Because no participant had a fixed mindset for clinical reasoning, we were able to examine only the influence of empathy-related mindset, comparing overall assessment scores in the control group versus the group where the handover report mentioned weaknesses in communication; we found no influence of mindset.
We detected no differences in the percentage of negative comments between conditions (F(2, 60) = .33, P = .72; see also Table 2). We did, however, note differences between conditions in the percentage of comments targeting medical expertise (F(2, 60) = 10.17, P < .001) and communication (F(2,60) = 5.60, P = .01). Participants who received an educational handover report indicating weaknesses in communication provided a higher percentage of comments on communication compared with participants in the control group (63% vs 50%, P = .03) and a lower percentage of comments on medical expertise compared with participants in the control group (27% vs 47%, P = .001). Participants who received an educational handover report indicating weaknesses in medical expertise provided similar percentages of comments as participants in the control group, both on medical expertise (46% vs 47%, P = .98) and on communication (48% vs 50%, P = .95).
Educational handover may reduce educational discontinuity and has potential advantages for learners, programs, and patients,2–7 but this forward-feeding may also carry risks.4,6,7,9 The controversy surrounding educational handover prompted us to examine the influence of educational handover reports on rater scores and narrative comments using an experimental design.
To investigate the potential benefits of educational handover, we examined whether providing raters with information about learners’ previous performance would encourage feedback targeting learners’ described weaknesses. To this end, we analyzed the content of narrative comments contained in the assessment forms across 3 experimental conditions. Raters who had received information about learner weaknesses in medical expertise did not provide more comments on expertise than raters who had not received a handover report; however, those who had received information suggesting weaknesses in communication did provide a higher percentage of comments specifically targeting communication. In a way, providing an educational handover report suggesting weaknesses in communication “biased” rater comments toward the area of described weakness; that is, raters providing more comments in a targeted area could be considered a positive, and intended, bias of rater behavior.
We can only speculate as to why the effect depended on the competency domain. One hypothesis is that raters may be more prone to commenting on medical expertise than other competencies since they attend to it more or, perhaps, because they feel more comfortable commenting on medical expertise. This comfort or proclivity may produce a ceiling effect for comments in the medical expertise domain; however, evidence suggests that raters do frequently provide comments about performance in other competencies.27,28 Participants randomized to the control group, who did not receive any information regarding prior performance, provided comments evenly split between medical expertise and communication. Nonetheless, there may have been more room to increase raters’ attention to and willingness to comment on communication (compared with medical expertise), thereby allowing us to detect an increase in comments related to communication when it was a weakness identified in the educational handover report.
While providing more targeted feedback can be educationally beneficial, we also investigated the potential risks of educational handover. Specifically, we compared the scores that raters attributed to the 2 simulated residents on a modified mini-CEX form across experimental groups. We found no significant differences in assessment scores and no differences in the percentage of negative versus positive narrative comments across the 3 experimental groups. Our findings contradict work done in other domains such as educational and social psychology,10–14 and they provide initial reassurance concerning the influence of educational handover on subsequent assessment.
Notably, however, an experimental study conducted concurrently in health professions education by another research team showed that educational handover did influence subsequent scores.29 These contrary findings highlight the need for further research to determine in what circumstances educational handover results in undesirable effects. Specifically, further work is needed to investigate whether different content or wording in educational handover reports would yield different results. In this study, we designed the educational handover reports to depict average learners with typical weaknesses for their level of training. The reports presented all the rating scale scores at the “satisfactory” level and identified learner weaknesses only in the comments section. Given the preliminary nature of our study, we chose to examine the influence of what we considered to be typical educational handover reports. While typical learners likely represent the majority, the medical education community needs reassurance that assessment is free from unintended bias for all learners, including the minority of struggling or excelling learners. Shaw and colleagues’ study, which focused on more extreme levels of performance, showed that in such situations, educational handover did influence assessment scores.29 Moreover, despite our efforts to write realistic comments, real handover reports might differ in their wording or content (e.g., in terms of including more personal criticism or praise), and this remains an area for future research.
The residents depicted in this study were both Caucasian men; therefore, we cannot determine whether the influence of educational handover would differ across gender or for learners from traditionally stigmatized groups. The psychology literature suggests that biasing effects, including negative social stereotypes, are additive.13 The literature on self-fulfilling prophecies in education also suggests that effects may be compounded for learners from stigmatized groups.30 Learners from stigmatized groups may therefore be at higher risk of bias in assessment. Further studies should examine the influence of negative stereotypes in combination with educational handover reports.
Finally, the influence of positive educational handover reports on subsequent assessments merits scrutiny. Although opponents to educational handover reports frequently raise the issue of bias from reports describing learners as poor performers (and the potential for decreasing subsequent scores, stigmatizing learners, and generating detrimental self-fulfilling prophecies), they seldom raise the issue of bias from reports describing learners as excellent performers. Harms from negative reports could be mitigated in a culture imbued with a growth mindset wherein learners perceived to be struggling could benefit from more teaching without the stigma associated with remediation. However, medical education systems should also be concerned—arguably more so—by the risks of positive reports potentially leading to supervisors assuming learners to be more competent than they are and thus risking the safety of patients, particularly in a system where learner competence can be “checked-off” and never revisited. A scoping review of the literature as well as Shaw and colleagues’ experimental study in health professions education showed a “negativity effect,” suggesting that positive educational handover reports would have less influence on subsequent assessments than negative ones.14,29
Our study has inherent strengths and limitations due to its experimental design. Supervisors who agreed to spend their time participating in an educational study may be more likely to have an interest in education, and/or they may have more formal training or experience in supervision and assessment than those who chose not to participate. We have no information about the characteristics of those who did not respond to the invitation to participate, so we are unable to determine whether participants are substantially different from the rest of our study population. Notably, however, in comparison with a shorter study we conducted in the same institution for which we recruited from different clinical specialties, our participants were less likely to have fixed mindsets than others who completed the same questionnaire items (0% vs 8% of participants had fixed mindsets for clinical reasoning, and 17% vs 45% for empathy).25 While this difference could be due to variation across specialties, it could also suggest stronger selection bias. Furthermore, participants, aware that this was a research project, may have assessed the residents in the videos differently than they would actual residents in a live educational context. Although not common, some of the participants’ narrative comments were addressed to the research team as opposed to either the learners or a receiving educational program, suggesting that the study context did influence the behavior of at least a few participants. Still, we did not tell participants before participating in the study that the purpose was to study the influence of educational handover reports, so they would have had little reason to deliberately avoid being influenced by the handover report to conform to a perceived researcher agenda.
As previously mentioned, our study was limited in scope to include only learners with minor weaknesses and learners who were not from visible minorities or underrepresented groups. We would caution against overgeneralizing our findings and reiterate our call for further work in different contexts.
Finally, our study is limited in that we did not measure the accuracy of assessment scores; rather, we assumed that our control group provided accurate assessments. Thus, any difference with these assessment scores (“bias”) would represent error, or reduced accuracy.
This study is one of the first to provide empirical evidence on the potential influence of educational handover reports on subsequent rater-based assessment. Although the findings require replication and extension to a more diverse body of learners and a broader spectrum of reported performance levels, educational handover reports may generate more targeted feedback in some competency domains without influencing subsequent assessment scores.
The authors would like to acknowledge the contributions of Dr. Robert Sternszus and Dr. Linda Snell who reviewed the initial scripts, scored resident performances depicted in the scripts, and provided feedback on the scripts; the Steinberg Centre for Simulation and Interactive Learning (SCSIL) at McGill University which supported the development of the 2 videos by providing a simulated medical evaluation room and by dedicating staff both to hire and train the 4 actors and to film and edit the video clips; the department chairs and program directors who facilitated recruitment; and the clinical supervisors who took part.
1. Nousiainen MT, Caverzagie KJ, Ferguson PC, Frank JR; ICBME Collaborators. Implementing competency-based medical education: What changes in curricular structure and processes are needed? Med Teach. 2017;39:594–598.
2. Cleary L. “Forward feeding” about students’ progress: The case for longitudinal, progressive, and shared assessment of medical students. Acad Med. 2008;83:800.
3. Warm EJ, Englander R, Pereira A, Barach P. Improving learner handovers in medical education. Acad Med. 2017;92:927–931.
4. Gold WL, McArdle P, Federman DD. Should medical school faculty see assessments of students made by previous teachers? Acad Med. 2002;77:1096–1100.
5. Cohen GS, Blumberg P. Investigating whether teachers should be given assessments of students made by previous teachers. Acad Med. 1991;66:288–289.
6. Frellsen SL, Baker EA, Papp KK, Durning SJ. Medical school policies regarding struggling medical students during the internal medicine clerkships: Results of a national survey. Acad Med. 2008;83:876–881.
7. Ziring D, Danoff D, Grosseman S, et al. How do medical schools identify and remediate professionalism lapses in medical students? A study of U.S. and Canadian medical schools. Acad Med. 2015;90:913–920.
8. Gauthier G, St-Onge C, Tavares W. Rater cognition: Review and integration of research findings. Med Educ. 2016;50:511–522.
9. Cox SM. “Forward feeding” about students’ progress: Information on struggling medical students should not be shared among clerkship directors or with students’ current teachers. Acad Med. 2008;83:801.
10. Malouff JM, Thorsteinsson EB. Bias in grading: A meta-analysis of experimental research findings. Aust J Educ. 2016;60:245–256.
11. Reilly SP, Smither JW, Warech MA, Reilly RR. The influence of indirect knowledge of previous performance on ratings of present performance: The effects of job familiarity and rater training. J Bus Psychol. 1998;12:421–435.
12. Smither JW, Reilly RR, Buda R. Effect of prior performance information on ratings of present performance: Contrast versus assimilation revisited. J Appl Psychol. 1988;73:487–496.
13. Nieminen LR, Rudolph CW, Baltes BB, Casper CM, Wynne KT, Kirby LC. The combined effect of ratee’s bodyweight and past performance information on performance judgments. J Appl Soc Psychol. 2013;43:527–543.
14. Humphrey-Murto S, LeBlanc A, Touchie C, et al. The influence of prior performance information on ratings of current performance and implications for learner handover: A scoping review. Acad Med. 2019;94:1050–1057.
15. Royal College of Physicians and Surgeons of Canada. Guidelines for educational handover in competence by design (CBD). September 2018. Endorsed by Royal College Committee on Specialty Education: May 4, 2018. Resolution No. 2018-4; CSE: 2018-05-03. http://www.royalcollege.ca/rcsite/documents/cbd/guidelines-for-educational-handover-in-competence-by-design-e
. Accessed May 27, 2020.
16. Dweck CS, Chiu CY, Hong YY. Implicit theories and their role in judgments and reactions: A world from two perspectives. Psychol Inq. 1995;6:267–285.
17. Heslin PA, Latham GP, VandeWalle D. The effect of implicit person theory on performance appraisals. J Appl Psychol. 2005;90:842–856.
18. Creswell JW, Plano Clark VL. Designing and Conducting Mixed Methods Research. 2017. 3rd ed. Los Angeles, CA: SAGE Publications.
19. Faul F, Erdfelder E, Lang AG, Buchner A. G*Power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behav Res Methods. 2007;39:175–191.
20. Frank JR, Snell L, Sherbino J. CanMEDS 2015 Physician Competency Framework. Ottawa: Royal College of Physicians and Surgeons of Canada. http://www.royalcollege.ca/rcsite/documents/canmeds/canmeds-full-framework-e.pdf
. Published 2015 Accessed May 27, 2020.
21. Watling CJ, Kenyon CF, Schulz V, Goldszmidt MA, Zibrowski E, Lingard L. An exploration of faculty perspectives on the in-training evaluation of residents. Acad Med. 2010;85:1157–1162.
22. Norcini JJ, Blank LL, Duffy FD, Fortna GS. The mini-CEX: A method for assessing clinical skills. Ann Intern Med. 2003;138:476–481.
23. Pelgrim EA, Kramer AW, Mokkink HG, van den Elsen L, Grol RP, van der Vleuten CP. In-training assessment using direct observation of single-patient encounters: A literature review. Adv Health Sci Educ Theory Pract. 2011;16:131–142.
24. Kogan JR, Holmboe ES, Hauer KE. Tools for direct observation and assessment of clinical skills of medical trainees: A systematic review. JAMA. 2009;302:1316–1326.
25. Pal NE, Young M, Danoff D, et al. Teachers’ mindsets in medical education: A pilot survey of clinical supervisors. Med Teach. 2020;42:291–298.
26. Hsieh HF, Shannon SE. Three approaches to qualitative content analysis. Qual Health Res. 2005;15:1277–1288.
27. Ginsburg S, Gold W, Cavalcanti RB, Kurabi B, McDonald-Blumer H. Competencies “plus”: The nature of written comments on internal medicine residents’ evaluation forms. Acad Med. 2011;86(10 suppl):S30–S34.
28. Jackson JL, Kay C, Jackson WC, Frank M. The quality of written feedback by attendings of internal medicine residents. J Gen Intern Med. 2015;30:973–978.
29. Shaw TW, Wood TJ, Touchie C, Pugh D, Humphrey-Murto S. How biased are you?The effect of prior performance information on attending physician ratings & implications for learner handoverAdv Health Sci Educ Theory PractIn press.
30. Jussim L, Harber KD. Teacher expectations and self-fulfilling prophecies: Knowns and unknowns, resolved and unresolved controversies. Pers Soc Psychol Rev. 2005;9:131–155.