There has been a recent increase in the use of narrative comments in assessment to document medical trainees’ performance and make high-stakes decisions about their progression. 1–7 The integration of qualitative data in assessments is being facilitated or driven by, for example, technological developments, 8 and a better understanding of what and how words can contribute to assessment. 9,10 Unfortunately, assessors face challenges when required to provide qualitative feedback about a trainee’s performance. 1,2,6,11–13 In addition, traditional tools of psychometric analysis, such as Cronbach’s alpha, cannot be applied to words. Thus, we need to identify means to promote and evaluate quality of narrative comments used in assessment.
Writing narrative comments may be challenging, and consequently narrative feedback can lack in quality. For example, narrative comments can be filled with ambiguities and general statements. 13 Assessors can—intentionally or unintentionally—use semantic nuances to conceal difficult feedback, requiring learners to read between the lines so that they can grasp the intended message of assessors. 14 Faculty members may provide seemingly positive comments that act as a sort of euphemism to hide negative nuances. 11 These nuances create noise in assessment data and make it difficult to adequately interpret messages being conveyed. 11,13
We posit that providing indicators of high-quality narrative comments to assessors could help them improve the quality of their narrative feedback. Such indicators could also be used by program administrators to provide feedback to faculty members about the quality of their narrative comments. Thus, we explored the breadth of literature, including but not limited to health professions education (HPE) literature, regarding the quality of narrative comments used in assessment. Looking outside of HPE literature offered us the opportunity to be as comprehensive as possible when mapping the literature on the quality of narrative comments used in assessment. We aimed to (1) generate a list of quality indicators for narrative comments, (2) identify recommendations for writing high-quality narrative comments, and (3) document factors that influence the quality of narrative comments used in assessment in higher education.
We opted for a scoping review 15,16 to achieve our purpose of defining quality in narrative feedback by identifying the characteristics of high-quality narrative comments. We followed Arksey and O’Malley’s 17 5-step methodological framework for scoping reviews. Although Levac et al 18 built on the work of Arksey and O’Malley 17 and suggested an optional sixth step of consultation, we opted not to include this step, since guidance regarding the consultation process (i.e., which stakeholders to include, how to include them, etc.) remains somewhat unclear. 19 Moreover, such a step is absent from the PRISMA-ScR. 20 In addition to Arksey and O’Malley’s 17 framework, our scoping review methodology was also informed by the rigor criteria established by Tricco et al. 20
Step 1: Identifying the research question
Our main research question was, “What do we know about the quality of narrative comments in assessment in higher education?” To this main question, we added 2 specific questions: “What are the quality indicators for narrative comments used in assessment in higher education?” and “What are the factors that influence the quality of narrative comments used in assessment in higher education?”
Step 2: Identifying relevant studies
Collaborating with an academic librarian, the first author (M.C.) elaborated a search strategy with the supervision of the senior author (C.S.-O.) based on her extensive knowledge of measurement and assessment literature. The search strategy was then discussed with another team member (K.O.). A verification process was put in place to test our strategy not for exhaustiveness, but to ensure that it would capture the articles that were deemed essential by our content expert (C.S.-O.). We used an iterative revision process until we identified a search strategy that allowed us to capture the key articles 6,21–23 that had been identified. If a key article was not found through this process, we revised the search strategy and searched again. We included databases from education, psychology, and the social sciences (using the EBSCO interface: ERIC, Education Source, Medline, CINAHL, PsycInfo, and Academic Search Complete). We applied the final search strategy on May 4, 2020 (see Supplemental Digital Appendix 1 at https://links.lww.com/ACADMED/B280). We searched the literature without using a specific date range.
Step 3: Study selection
All titles and abstracts (n = 690) were screened for inclusion/exclusion criteria by 2 team members (M.C. and V.R.D.) using Microsoft Excel (Microsoft Corp., Redmond, Washington). They screened the first 5 abstracts together, then screened articles individually, meeting regularly to compare decisions and to reach consensus. Articles needed to (1) be in the context of higher education (excluding continuing education), (2) include assessor-generated narratives about learner performance, and (3) be directed to the learner (exclusion: program evaluations, physical examination of patients). Only peer-reviewed articles written in English or French were included. We excluded knowledge syntheses to avoid redundancy in our data. Self-assessments and/or peer assessments were excluded when they were not associated with assessors’ assessments. All inclusion and exclusion criteria are available in Supplemental Digital Appendix 2 at https://links.lww.com/ACADMED/B280. After screening titles and abstracts for inclusion/exclusion criteria, 213 articles met our inclusion criteria. After full-text review, only 47 articles were included in our final analysis.
Step 4: Data charting
M.C. and C.S.-O. developed the data extraction grid through an iterative process of revision by and discussion with team members and applying the grid to several articles. Two team members (M.C. and V.R.D.) used the final extraction form to document numerical data (i.e., who wrote the narrative comments, what other assessments accompanied the narrative comments, the context of assessment, assessment frequency or timing, delivery format of narrative feedback, and sources for the identification of quality indicators) and qualitative data (factors that influence the quality of narratives, definition and conceptualization of the quality of narrative comments). They started by extracting data together from 4 articles before independently extracting the data from the remaining articles using Dedoose Version 8.0.35 (Social Cultural Research Consultants, LLC, Los Angeles, California). They met each time they had independently extracted data from 5 to 10 more articles to discuss and achieve consensus on the extractions.
To define and conceptualize the quality of narrative comments from the included studies (n = 47), articles were screened to find lists, tables, or figures presenting explicit quality indicators for feedback. Explicit quality indicators were found in 15 articles, 6,24–37 which were used as a starting point to organize data regarding quality of narrative comments in the remaining articles. The remaining 32 articles included general conceptualizations of quality, but no explicit indicators although authors sometimes gave writing tips to assessors. We used the quality indicators as an initial coding tree applied to all 47 articles, along with an open-coding approach to identify other indicators. Once we coded all the extracts that met our conceptualization of a quality indicator, we determined that some were observable indicators (i.e., could be used to appraise the quality of narrative comments) while others were more like recommendations (i.e., could inform the writing of quality narratives but could not be used to appraise the quality). We thus decided to create 2 lists (quality indicators and recommendations), which we present below.
Step 5: Collating and reporting results
We conducted numerical and thematic analysis of the data extracted using both Dedoose Version 8.0.35 and Microsoft Excel. Numerical data were analyzed and summarized using descriptive statistics (i.e., frequencies).
We conducted a thematic analysis of the qualitative data, as per recommendations from Arksey and O’Malley, 17 Levac et al, 18 and Thomas et al, 38 and used Dedoose Version 8.0.35 to manage the data. First, 2 team members (M.C. and V.R.D.) did a thematic analysis of the list of quality indicators identified during the extraction process to create a preliminary coding tree for quality indicators. Then, the same 2 team members (M.C. and V.R.D.) coded the extracts for “factors that influence the quality of narrative comments” and “definition of the quality of narrative comments” and proposed a preliminary coding tree to the other team members. These preliminary coding trees were discussed and revised iteratively; the principal investigator (M.C.) then applied the final consolidated version to all articles. Analysis meetings were arranged between team members (M.C., V.R.D., K.O., and C.S.-O.) to discuss the observed data patterns and identify potential themes. These themes were discussed and refined iteratively until we achieved consensus between team members and agreed upon the interpretation of the results.
The PRISMA flow diagram is presented in Figure 1. After eliminating duplicates and articles in languages other than French or English, 690 articles were included for abstract review. At the inclusion and exclusion phase, 213 articles met our inclusion criteria. After full-article review, 47 articles were determined to address our research questions appropriately. 6,24–37,39–70
Descriptive results for numerical data
The 5 most frequent journals in which the included studies were published are Assessment & Evaluation in Higher Education (7/47, 14.9%), 26,36,39–41,54,66BMC Medical Education, 42,47,62 and Medical Teacher28,34,67 (3/47 each, 6.4%), then Clinical Teacher30,32 and The Journal of General Internal Medicine9,65 (2/47 each, 4.3%).
Of the 47 articles in our archive, 30 included both qualitative and quantitative data (30/47, 63.8%), 6,24,26,28–30,32–34,37,39,40,44–47,49,54–56,58,59,61,62,64–68,70 12 used qualitative data only (12/47, 25.5%), 25,27,31,35,41,43,48,51,57,60,63,69 and 5 contained quantitative data only (5/47, 10.6%). 36,42,50,52,53
Many articles were related to either general medicine or a nondefined medical speciality (14/47, 29.8%), 30,32,42,45,51–53,58,59,61,62,67,69,70 while other articles were in a specific medical speciality (13/47, 27.7%). 6,28,29,34,47–49,55–57,64,65,68 The remaining articles were in multiple fields (5/47, 10.6%), 27,44,46,54,66 education and social sciences (3/47, 6.4%), 25,31,40 sciences (3/47, 6.4%), 24,37,39 nursing (3/47, 6.4%), 33,50,63 languages (2/47, 4.3%), 41,60 a nonspecified field (2/47, 4.3%), 26,35 industrial design (1/47, 2.1%), 43 and business (1/47, 2.1%). 36
Undergraduate medical education studies represented 25.5% (12/47) of included articles. 30,32,42,47,53,55,57–59,64,67,70 Similarly, studies at the undergraduate level (12/47) 24,25,29,33,36,37,39,41,43,49,50,52 and postgraduate medical education level (12/47) 6,28,34,45,48,51,56,61,62,65,68,69 each represented 25.5% of included articles, while studies at the graduate level represented only 4.3% (2/47) of included articles. 27,44 Several articles did not specify the level (6/47, 12.8%), 26,31,35,46,60,66 and others included studies at both undergraduate and graduate levels (3/47, 6.4%). 40,54,63 Most articles tackled formative assessments (21/47, 44.7%), 25,32–34,40,42,45,48,51–53,56,57,60–62,64,65,67,69,70 followed by assessments with nonspecified stakes (13/47, 27.7%), 6,24,26,30,35,37,41,43,44,46,50,54,68 both summative and formative assessments, 27–29,39,66 and summative assessments only (5/47 each, 10.6%). 36,47,49,55,59 There were 2 articles where stakes were not applicable (2/47, 4.3%) 31,63 since there was no concrete assessment, and only 1 article addressed high-stakes assessment (1/47, 2.1%). 58
In the articles reviewed, narrative comments were accompanied by scores (21/47, 44.7%), 6,24,31,34,36,37,39,42,44,46,49,51,55,56,58–60,62,67,68,70 verbal feedback (1/47, 2.1%), 50 or both scores and verbal feedback (5/47, 10.6%). 28,32,45,47,66 For the remaining articles, narrative comments were provided without any additional assessment (20/47, 42.6%). 25–27,29,30,33,35,40,41,43,48,52–54,57,61,63–65,69 Narrative comments were delivered through an electronic platform (e.g., iPad, mobile phone app, online email) in 7 studies (7/47, 14.9%), 24,25,29,30,43,48,51 paper format was used as a delivery format in 4 studies (4/47, 8.5%), 32,34,55,64 and both paper and electronic formats were used in 5 studies (5/47, 10.6%). 27,37,44,46,58 In the remaining articles, delivery format was unspecified (31/47, 66.0%). 6,26,28,31,33,35,36,39–42,45,47,49,50,52–54,56,57,59–63,65–70
In nearly half of the articles, narrative comments were provided exclusively by assessors (20/47, 42.6%), 6,24,27,28,30,37,41,43,45,47,49,50,52,54–57,61,62,67 with the exception of one study where narrative comments were provided by both assessors and peers (2/47, 4.3%). 42,69 In the remaining articles (25/47, 53.2%), the person providing the narrative comments was implicitly defined as an assessor. 25,26,29,31–36,39,40,44,46,48,51,53,58–60,63–66,68,70
Narrative assessments were provided most frequently in the context of work-based assessment (which refers to the assessment of performance in a real workplace environment along with relevant feedback) (20/47, 42.6%) 6,28–30,32,34,45,48,49,52,53,55–57,62,64,67–70 compared with written feedback (8/47, 17.0%), 24,27,36,37,39,41,51,60 multiple contexts (5/47, 10.6%), 25,26,47,54,59 and objective structured clinical examinations (1/47, 2.1%). 58 In 13 articles, the context of assessment was not defined (13/47, 27.7%). 31,33,35,40,42–44,46,50,61,63,65,66Table 1 shows that trainees’ perceptions and scientific literature were the most common sources used to inform quality indicators of narrative comments.
Descriptive results for qualitative data
Our thematic analysis of quality indicators led us to generate a list of 7 indicators that can be used to appraise the quality of narrative comments used for assessment in higher education (Table 2). Furthermore, we identified 12 general recommendations that can inform the writing of high-quality narrative comments (see Table 3).
Factors influencing quality.
In our qualitative thematic analysis, we identified factors that could increase or decrease the quality of narrative comments and categorized them into 3 themes: (1) the learner–assessor relationship and the local feedback culture, (2) the time required for direct observation and to complete the task of providing narrative comments, and (3) the assessors’ abilities and ways to enhance them. Descriptions of these themes and factors are presented below.
The learner–assessor relationship and the local feedback culture.
Learner–assessor relationships greatly influence the content of narrative comments and the level of interaction and discussion during the assessment. 6,27,30,42,45,46 Protecting the learner–assessor relationship can be a barrier to providing constructive recommendations to learners on how to improve their performance, which is one of our quality indicators (see Table 2). 30 Assessors may be reluctant to provide negative comments for fear of damaging their relationship with learners or inducing emotional reactions. To skirt the issue, they may use euphemisms in their narratives. For example, “solid” can be a euphemism for “middle of the road.” A local feedback culture that avoids, dismisses, or downplays negative feedback can unintentionally favor lower-quality narrative comments. 6,26,27,32,42,46,56,58,61,62,69 In such a culture, assessors would tend to provide overall positive comments and neglect critical aspects in their narrative feedback, fearing potential relational consequences of providing learners with negative comments. 30,32–34,42,46,51,56,64,69 Narrative feedback including only positive comments without mentioning weaknesses and recommendations for improvement are less useful for learners. 71 In such situations, learners may need help in interpreting narrative comments, perhaps through discussing feedback with assessors. 45,48,54
The time required for direct observation and completion of narrative comments.
Time is an important factor in the quality of narrative comments. Not surprisingly, time devoted to direct observation of learners and the completion of narrative comments will differ between assessors. When assessors are able to take the time, they provide higher-quality narrative comments than when they are rushed. 6,32,33,40,46,53,56,60,63–66,69 Unfortunately, assessors often have to balance several tasks in addition to providing narrative feedback and thus express assessment fatigue. 30,63 Faculty agreed that spending more time with learners would provide more insight into trainees’ actions, which could lead to more specific narrative feedback. 6,32,33,40,46,53,56,60,63–66,69
The assessors’ abilities and ways to enhance them.
Assessors’ experience and abilities can influence the quality of their narrative comments. 24,27,31,33,39,40,43,46,52,56,62,63,65,66,68,69 The lack of assessors’ knowledge about providing narrative feedback appears to hinder the quality of narrative comments. 6,27,33,39,44–46,50,56,63,67 Interestingly, assessors’ knowledge about high-quality narrative feedback and their writing skills can be enhanced through faculty development (FD)—that is, through interventions and/or guidelines that aim to foster their ability to provide high-quality narrative comments for assessment. 6,24,25,28,30,32–34,37,42,44–49,52,55–57,61,62,64,65,68,69 Studies suggest that providing opportunities for assessors to share their experience, compare good practices, and explore frustrations and confusions associated with their complex role would enhance their ability to provide high-quality narrative comments. 24,27,31,33,39,40,43,45,46,52,56,62,63,65,66,68,69
We conducted this scoping review to address the gap in knowledge about indicators that could be used to document and inform the quality of narrative comments for assessment of learners in higher education. We used an inclusive approach, searching the literature from higher education broadly to avoid missing important insights, but the majority of studies included in this analysis came from the HPE literature.
We identified 7 quality indicators (Table 2) that can be used to appraise the quality of narrative comments used in assessment. The list of indicators demonstrates that narrative assessment has the potential to drive learning. 27,33,36,66 Not only does the list of indicators include elements to help assessors write clear and appropriate narrative comments, but it also includes elements that aim to make sure learners can use this feedback to improve their performance 27,33,35,37,40,60,64,66 or to compare their performance with expectations. 27,28,33,36,44,47,53
In addition, we identified 12 recommendations for writing high-quality narrative comments for assessment (Table 3). Not surprisingly, these recommendations are akin to recommendations for providing high-quality feedback more generally, such as those suggested by Ramani et al. 6,24–37,72 It appears that expectations are similar for high-quality written feedback and high-quality verbal feedback. 9,24–37,73 It is our hope that the use of these indicators and recommendations will be helpful to both learners and programs; learners receiving higher-quality information about their performance will be better equipped to make appropriate changes, and program committees will be equipped to make decisions about learners.
Context is an important determinant of the quality of narrative comments. The learner–assessor relationship can be leveraged to improve the quality of narrative feedback. In fact, learners express a variable level of comfort depending on who their assessors are. In addition, learners judge the credibility of the information they receive about their performance according to their relationship with their assessor. In a survey, learners stated that trust and respect for the teacher are factors that would make them more receptive to feedback. 73,74 When learners build a positive relationship over time with their assessors, there is potential alignment of the assessors’ goals with their own and, as such, they attribute more credibility to the narrative feedback they receive. 6,27,42,45,46,75 Learners’ judgments regarding the credibility of such feedback would certainly influence their learning in a clinical experience. 76
We should also promote a local feedback culture 77 that aligns with principles of assessment for learning (AFL), instead of assessment of learning. 78,79 Assessment should be a continuous act, which guides teaching and learning processes through provision of timely feedback that helps learners to bridge learning gaps and rectify potential shortcomings. 78 This is congruent with Sadler’s 80 perspective of a new “learning culture” that involves engaging learners through appropriate tasks, providing plentiful feedback, and making a commitment to improve learning. AFL is based on giving instant, specific, and clearly articulated feedback after each learning step to prevent delays in correcting learners’ errors. 78 In other words, assessors should reflect more on what to write in their narratives and the way in which they convey a constructive message. 30,32–34,42,46,51,56,64,69 Learners could then better use this feedback to adapt their approach to achieve desired learning outcomes. 81 Stobart 82 identified 3 conditions of effective and useful feedback: (1) the learner needs the feedback, (2) the learner receives the feedback and has time to use it, and (3) the learner is willing and able to use the feedback. The value of high-quality feedback should drive institutions to monitor the quality of narrative comments provided to trainees. 72
Several resources are required to provide high-quality feedback. Time is essential for the provision of high-quality narrative comments. Cook et al 21 proposed that a prolonged duration of assessor–learner engagement is a quality domain for qualitative assessment. Collecting and analyzing qualitative data demands substantial time and energy, as well as various skills. For example, providing insightful narrative feedback obviously takes longer than marking a scale or filling in a checklist. High-quality narrative comments also require that assessors have sufficient observations of learners’ performance and have sufficient time to generate their comments. One assessor may need to make several observations of a learner or may collect observational data in collaboration with other colleagues.
Programs should aim to enhance assessors’ abilities to provide narrative feedback and provide ways to improve these abilities. High-quality narrative feedback requires assessors who possess context and content knowledge. FD workshops and initiatives are a useful mechanism for developing this knowledge. 83,84 For example, it has been shown that prompting assessors to provide specific, behavioral details in their narrative comments offered a potentially high-yield, low-effort FD intervention. 28 A pilot study demonstrated that some assessors improved the specificity of their narrative comments after the addition of a new prompt. 85 Our list of 7 quality indicators could potentially be used in FD to train assessors by having them reflect on the quality of their narrative comments. Individual strategies could also be taught to assessors, such as reflecting on feedback skills, on what went well, on what needs to change, and on what new strategies could be adopted. 72 Educational institutions with a demonstrated need to work on standardizing the quality of narrative feedback should shoulder the responsibility of developing effective FD in this area. 34,45,61,62,68,69
There are some limitations to our scoping review. Since our aim was to map the breadth of the literature on the quality of narrative comments used in assessment in higher education, we did not critically appraise the quality of the articles included in our review. This would need to be done through a systematic review. We did, however, use a rigorous approach based on empirical data, and we reported our methodology according to Tricco and colleagues’ 20 criteria for rigor. Although we included studies published in English and French in our analysis, we did not systematically search for articles in French search engines; therefore, we may have missed some articles written in French. Our review included articles from many fields of study, although databases we chose to search identified mainly articles in HPE. Finally, while we classified articles according to the level of stakes of the assessment described (either formative or summative assessment), we did not analyze the data according to this classification.
The main outcome of this study is a list of quality indicators for narrative comments that can inform those who try to monitor quality of assessment and also those who write narrative feedback in higher education. It is possible that these quality indicators could inform learners and contribute to their understanding of the feedback they receive. By identifying factors that can influence the quality of narrative feedback and a list of recommendations for writing high-quality narrative comments, we hope to increase assessors’ awareness of strategies to favor and pitfalls to avoid. Finally, we hope that providing assessors with such tools and training on how to use them will help them improve the quality of their narrative feedback, thereby enabling them to provide rigorous documentation of trainees’ performance such that reliable decisions about their progression can be made. In the future, validation of these indicators will be important and could include a measure of standardized application.
The team would like to thank Josée Toulouse, academic librarian at the Faculty of Medicine and Health Sciences—Université de Sherbrooke, for her time and her participation in developing the search strategy.
1. Hodges BD, Lingard L. The Question of Competence: Reconsidering Medical Education in the Twenty-First Century. Ithaca, NY: Cornell University Press; 2012.
2. Hodges B. Assessment in the post-psychometric era: Learning to love the subjective and collective. Med Teach. 2013;35:564–568.
3. Frank JR, Snell LS, Cate OT, et al. Competency-based medical education: Theory to practice. Med Teach. 2010;32:638–645.
4. Holmboe ES, Sherbino J, Long DM, Swing SR, Frank JR. The role of assessment in competency-based medical education. Med Teach. 2010;32:676–682.
5. Brutus S. Words versus numbers: A theoretical exploration of giving and receiving narrative comments in performance appraisal. Hum Resour Manag Rev. 2010;20:144–157.
6. Jackson JL, Kay C, Jackson WC, Frank M. The quality of written feedback by attendings of internal medicine residents. J Gen Intern Med. 2015;30:973–978.
7. Salerno SM, Jackson JL, O’Malley PG. Interactive faculty development seminars improve the quality of written feedback in ambulatory teaching. J Gen Intern Med. 2003;18:831–834.
8. Jackson KM, Trochim WM. Concept mapping as an alternative approach for the analysis of open-ended survey responses. Organ Res Methods. 2002;5:307–336.
9. Sulsky LM, Keown JL. Performance appraisal in the changing world of work: Implications for the meaning and measurement of work performance. Can Psychol Can. 1998;39:52.
10. Daft RL, Wiginton JC. Language and organization. Acad Manage Rev. 1979;4:179–191.
11. Ginsburg S, Regehr G, Lingard L, Eva KW. Reading between the lines: Faculty interpretations of narrative evaluation comments. Med Educ. 2015;49:296–306.
12. Dudek NL, Marks MB, Regehr G. Failure to fail: The perspectives of clinical supervisors. Acad Med. 2005;80(10 suppl):S84–S87.
13. White JS, Sharma N. “Who writes what?” Using written comments in team-based assessment to better understand medical student performance: A mixed-methods study. BMC Med Educ. 2012;12:123.
14. Speer AJ, Solomon DJ, Fincher RM. Grade inflation in internal medicine clerkships: Results of a national survey. Teach Learn Med. 2000;12:112–116.
15. Maggio LA, Larsen K, Thomas A, Costello JA, Artino AR Jr. Scoping reviews in medical education: A scoping review. Med Educ. 2021;55:689–700.
16. Peters MDJ, Marnie C, Tricco AC, et al. Updated methodological guidance for the conduct of scoping reviews. JBI Evid Synth. 2020;18:2119–2126.
17. Arksey H, O’Malley L. Scoping studies: Towards a methodological framework. Int J Soc Res Methodol. 2005;8:19–32.
18. Levac D, Colquhoun H, O’Brien KK. Scoping studies: Advancing the methodology. Implement Sci. 2010;5:69.
19. Sideri S, Papageorgiou SN, Eliades T. Registration in the international prospective register of systematic reviews (PROSPERO) of systematic review protocols was associated with increased review quality. J Clin Epidemiol. 2018;100:103–110.
20. Tricco AC, Lillie E, Zarin W, et al. PRISMA Extension for Scoping Reviews (PRISMA-ScR): Checklist and explanation. Ann Intern Med. 2018;169:467–473.
21. Cook DA, Kuper A, Hatala R, Ginsburg S. When assessment data are words: Validity evidence for qualitative educational assessments. Acad Med. 2016;91:1359–1369.
22. Dudek N, Dojeiji S. Twelve tips for completing quality in-training evaluation reports. Med Teach. 2014;36:1038–1042.
23. Hatala R, Sawatsky AP, Dudek N, Ginsburg S, Cook DA. Using In-Training Evaluation Report (ITER) qualitative comments to assess medical students and residents: A systematic review. Acad Med. 2017;92:868–879.
24. Arts JG, Jaspers M, Brinke DJ. A case study on written comments as a form of feedback in teacher education: So much to gain. Eur J Teach Educ. 2016;39:159–173.
25. Austen L, Malone C. What students want in written feedback: Praise, clarity and precise individual commentary. Pract Res High Educ. 2018;11:47–58.
26. Dawson P, Henderson M, Mahoney P, et al. What makes for effective feedback: Staff and student perspectives. Assess Eval High Educ. 2019;44:25–36.
27. Dunworth K, Sanchez HS. Perceptions of quality in staff-student written feedback in higher education: A case study. Teach High Educ. 2016;21:576–589.
28. Gauthier S, Cavalcanti R, Goguen J, Sibbald M. Deliberate practice as a framework for evaluating feedback in residency training. Med Teach. 2015;37:551–557.
29. Georgoff PE, Shaughness G, Leininger L, et al. Evaluating the performance of the Minute Feedback System: A web-based feedback tool for medical students. Am J Surg. 2018;215:293–297.
30. Hauer KE, Nishimura H, Dubon D, Teherani A, Boscardin C. Competency assessment form to improve feedback. Clin Teach. 2018;15:472–477.
31. McGee IE. Developing mentor teachers to support student teacher candidates. SRATE J. 2019;28:23–30.
32. Nesbitt A, Pitcher A, James L, Sturrock A, Griffin A. Written feedback on supervised learning events. Clin Teach. 2014;11:279–283.
33. Regan PJ. Read between the lines; The emancipatory nature of formative annotative feedback on draft assignments. Syst Pract Action Res. 2010;23:453–466.
34. Renting N, Gans RO, Borleffs JC, Van Der Wal MA, Jaarsma AD, Cohen-Schotanus J. A feedback system in residency to evaluate CanMEDS roles and provide high-quality feedback: Exploring its application. Med Teach. 2016;38:738–745.
35. Stone A. How to give written feedback. Educ Prim Care. 2013;24:473–475.
36. Vardi I. Effectively feeding forward from one written assessment task to the next. Assess Eval High Educ. 2013;38:599–610.
37. Voelkel S, Varga-Atkins T, Mello LV. Students tell us what good written feedback looks like. FEBS Open Bio. 2020;10:692–706.
38. Thomas A, Lubarsky S, Varpio L, Durning SJ, Young ME. Scoping reviews in health professions education: Challenges, considerations and lessons learned about epistemology and methodology. Adv Health Sci Educ Theory Pract. 2020;25:989–1002.
39. Donovan P. Closing the feedback loop: Physics undergraduates’ use of feedback comments on laboratory coursework. Assess Eval High Educ. 2014;39:1017–1029.
40. Ferguson P. Student perceptions of quality feedback in teacher education. Assess Eval High Educ. 2011;36:51–62.
41. Fernandez-Toro M, Truman M, Walker M. Are the principles of effective feedback transferable across disciplines? A comparative study of written assignment feedback in languages and technology. Assess Eval High Educ. 2013;38:816–830.
42. Abraham RM, Singaram VS. Using deliberate practice framework to assess the quality of feedback in undergraduate clinical skills training. BMC Med Educ. 2019;19:105.
43. Funk M, van Diggelen M. Feedback conversations: Creating feedback dialogues with a new textual tool for industrial design student feedback. Int J Web-Based Learn Teach Technol. 2017;12:78–92.
44. Ghazal L, Gul R, Hanzala M, Jessop T, Tharani A. Graduate students’ perceptions of written feedback at a private university in Pakistan. Int J High Educ. 2014;3:13–27.
45. Govaerts M, W.J. van de Wiel M, van der Vleuten CPM. Quality of feedback following performance assessments: Does assessor expertise matter? Eur J Train Dev. 2013;37:105–125.
46. Gul R, Tharani A, Lakhani A, Rizvi N, Ali S. Teachers’ perceptions and practices of written feedback in higher J Educ. education. World. 2016;6:10–20.
47. Gulbas L, Guerin W, Ryder HF. Does what we write matter? Determining the features of high- and low-quality summative written comments of students on the internal medicine clerkship using pile-sort and consensus analysis: A mixed-methods study. BMC Med Educ. 2016;16:145.
48. Karim AS, Sternbach JM, Bender EM, Zwischenberger JB, Meyerson SL. Quality of operative performance feedback given to thoracic surgery residents using an app-based system. J Surg Educ. 2017;74:e81–e87.
49. Kelly MS, Mooney CJ, Rosati JF, Braun MK, Thompson Stone R. Education research: The narrative evaluation quality instrument: Development of a tool to assess the assessor. Neurology. 2020;94:91–95.
50. Khowaja AA, Gul RB, Lakhani A, Rizvi NF, Saleem F. Practice of written feedback in nursing degree programmes in Karachi: The students’ perspective. J Coll Physicians Surg Pak. 2014;24:241–244.
51. Barrett A, Galvin R, Steinert Y, et al. Profiling postgraduate workplace-based assessment implementation in Ireland: A retrospective cohort study. Springerplus. 2016;5:133.
52. Bartlett M, Crossley J, McKinley R. Improving the quality of written feedback using written feedback. Educ Prim Care. 2017;28:16–22.
53. Lefroy J, Thomas A, Harrison C, et al. Development and face validation of strategies for improving consultation skills. Adv Health Sci Educ Theory Pract. 2014;19:661–685.
54. Lizzio A, Wilson K. Feedback on assessment: Students’ perceptions of quality and effectiveness. Assess Eval High Educ. 2008;33:263–275.
55. Lye PS, Biernat KA, Bragg DS, Simpson DE. A pleasure to work with—An analysis of written comments on student evaluations. Ambul Pediatr. 2001;1:128–131.
56. Marcotte L, Egan R, Soleas E, Dalgarno N, Norris M, Smith C. Assessing the quality of feedback to general internal medicine residents in a competency-based environment. Can Med Educ J. 2019;10:e32–e47.
57. Beck Dallaghan GL, Higgins J, Reinhardt A. Feedback quality using an observation form. J Med Educ Curric Dev. 2018;5:2382120518777768.
58. Munro AJ, Cumming K, Cleland J, Denison AR, Currie GP. Paper versus electronic feedback in high stakes assessment. J R Coll Physicians Edinb. 2018;48:148–152.
59. Newton PM, Wallace MJ, McKimm J. Improved quality and quantity of written feedback is associated with a structured feedback proforma. J Educ Eval Health Prof. 2012;9:10.
60. Nguyen HT, Filipi A. Multiple-draft/multiple-party feedback practices in an EFL tertiary writing course: Teachers’ and students’ perspectives. Int Educ Stud. 2018;11:1–26.
61. Nichols D, Kulaga A, Ross S. Coaching the coaches: Targeted faculty development for teaching. Med Educ. 2013;47:534–535.
62. Pelgrim EA, Kramer AW, Mokkink HG, Van der Vleuten CP. Quality of written narrative feedback and reflection in a modified mini-clinical evaluation exercise: An observational study. BMC Med Educ. 2012;12:97.
63. Price B. Defining quality student feedback in distance learning. J Adv Nurs. 1997;26:154–160.
64. Prystowsky JB, DaRosa DA. A learning prescription permits feedback on feedback. Am J Surg. 2003;185:264–267.
65. Raaum SE, Lappe K, Colbert-Getz JM, Milne CK. Milestone implementation’s impact on narrative comments and perception of feedback for internal medicine residents: A mixed methods study. J Gen Intern Med. 2019;34:929–935.
66. Weaver MR. Do students value feedback? Student perceptions of tutors’ written responses. Assess Eval High Educ. 2006;31:379–394.
67. Braend AM, Gran SF, Frich JC, Lindbaek M. Medical students’ clinical performance in general practice—Triangulating assessments from patients, teachers and students. Med Teach. 2010;32:333–339.
68. Young JQ, McClure M. Fast, easy, and good: Assessing entrustable professional activities in psychiatry residents with a mobile app. Acad Med. 2020;95:1546–1549.
69. Canavan C, Holtman MC, Richmond M, Katsufrakis PJ. The quality of written comments on professional behaviors in a developmental multisource feedback program. Acad Med. 2010;85(10 suppl):S106–S109.
70. Ryan A, McColl GJ, O’Brien R, et al. Tensions in post-examination feedback: Information for learning versus potential for harm. Med Educ. 2017;51:963–973.
71. Delva D, Sargeant J, Miller S, et al. Encouraging residents to seek feedback. Med Teach. 2013;35:e1625–e1631.
72. Ramani S, Krackov SK. Twelve tips for giving feedback effectively in the clinical environment. Med Teach. 2012;34:787–791.
73. Bing-You RG, Greenberg LW, Wiederman BL, Smith CS. A randomized multicenter trial to improve resident teaching with written feedback. Teach Learn Med. 1997;9:10–13.
74. Hesketh EA, Laidlaw JM. Developing the teaching instinct, 1: Feedback. Med Teach. 2002;24:245–248.
75. Bok HG, Teunissen PW, Favier RP, et al. Programmatic assessment of competency-based workplace learning: When theory meets practice. BMC Med Educ. 2013;13:123.
76. Watling C, Driessen E, van der Vleuten CP, Lingard L. Learning from clinical work: The roles of learning cues and credibility judgements. Med Educ. 2012;46:192–200.
77. Ramani S, Konings KD, Ginsburg S, van der Vleuten CPM. Meaningful feedback through a sociocultural lens. Med Teach. 2019;41:1342–1352.
78. Umar A-T, Majeed A. The impact of assessment for learning on students’ achievement in English for specific purposes: A case study of pre-medical students at Khartoum University: Sudan. Engl Lang Teach. 2018;11:15–25.
79. McDowell L, Wakelin D, Montgomery C, King S. Does assessment for learning make a difference? The development of a questionnaire to explore the student response. Assess Eval High Educ. 2011;36:749–765.
80. Sadler DR. Formative assessment: Revisiting the territory. Assess Educ Princ Policy Pract. 1998;5:77–84.
81. van der Kleij FM, Eggen TJHM, Timmers CF, Veldkamp BP. Effects of feedback in a computer-based assessment for learning. Comput Educ. 2012;58:263–272.
82. Stobart G. Testing Times: The Uses and Abuses of Assessment. London, UK: Routledge; 2008.
83. Brukner H. Giving effective feedback to medical students: A workshop for faculty and house staff. Med Teach. 1999;21:161–165.
84. Krackov SK. Expanding the horizon for feedback. Med Teach. 2011;33:873–874.
85. Minor S, Stumbar S, Marquita Samuels M, Sroka J. Family medicine clerkship faculty development: A longitudinal, team endeavour! Presented at: STFM Conference on Medical Education; April 2019; Toronto, ON. https://www.stfm.org/publicationsresearch/publications/educationcolumns/2020/january
. Accessed April 26, 2022.