Share this article on:

Do Writing and Storytelling Skill Influence Assessment of Reflective Ability in Medical Students' Written Reflections?

Aronson, Louise; Niehaus, Brian; DeVries, Charlie D.; Siegel, Jennifer R.; O'Sullivan, Patricia S.

Section Editor(s): Harrell, Heather MD; Yudkowsky, Rachel MD

doi: 10.1097/ACM.0b013e3181ed3aa7
Complexities in Behavior and Assessment

Background Increasingly, students are asked to write reflections as part of their medical education, but some question the influence of other factors on the evaluation of these reflections. In this pilot study, the investigators determined whether scores from a validated rubric to measure reflective ability were affected by irrelevant variance resulting from writing or storytelling ability.

Method Students in clerkships wrote reflections on professionalism. All were given identical prompts, with half receiving additional structured guidelines on reflection. Sixty reflections, 30 from each group, were randomly chosen and scored for reflection, writing, and storytelling by trained raters using validated rubrics.

Results There was no correlation between reflection and either writing (r = 0.049, P = .35) or storytelling (r = 0.14, P = .13). The guidelines increased reflection, but not writing or storytelling scores.

Conclusions Reflection is a distinct construct unaffected by learners' writing or storytelling skills. These findings support reflective ability as a distinct skill.

Correspondence: Louise Aronson, MD, MFA, Department of Medicine, Division of Geriatrics, University of California, San Francisco, 3333 California St., Ste. 380, San Francisco, CA 94118; e-mail:

Reflection increasingly has been incorporated into all levels of medical education. Nearly a century ago, the educator John Dewey1 framed the importance of reflection in the learning process. More recently, Schön2 and others3,4 have drawn on the importance of reflection in professions education to create lifelong learners. In medicine, reflection has been shown to produce better student exam scores, improved student performance with standardized patients, increased numbers of residents meeting rotation goals, and improved diagnostic accuracy by medical residents.5–8

In response to this growing body of data, medical trainees are required to complete reflective exercises as part of their learning more than ever before. For these written reflections to become an integral part of medical training and assessment, educators must be able to evaluate with confidence the reflective ability of their learners captured in these activities. Several researchers have reported on methods to review written evidence for reflection from journals or vignettes and have demonstrated that the reflections can be reliably scored.9–12 Investigators also have studied issues related to demonstrating validity of these types of measures.11,13,14

Despite the significant interest in reflection and its rapid dissemination, some educators have expressed concerns about both its implementation and evaluation. Freedman15 questioned whether educators have popularized reflection to the point where its intent and meaning as an educational tool are lost. Students and residents at our institution frequently report to us a belief that good writers and storytellers earn more favorable reflection scores. In discussing our work at a national educational research meeting, Kevin Eva similarly queried whether educators were seduced by good writing or storytelling when scoring reflections (personal communication, 2010). Lunz and Bashook,16 responding to concerns of certifying boards, examined a comparable issue in their study of whether candidates' communication skills interfered with assessment of their medical skills and knowledge during oral examinations. The notion that evaluation of what is presented, whether medical knowledge or reflection, might be influenced by how it is presented is known as construct-irrelevant variance, a situation wherein variance is reliably introduced by a factor irrelevant to the construct under study.17,18 If reflection scoring is influenced by other factors, this finding of construct-irrelevant variance would raise critical questions about our ability to accurately evaluate reflective writings.

The purpose of this pilot study was to determine whether a score from a validated scoring rubric to measure reflective ability was affected by irrelevant variance resulting from writing or storytelling ability.

Back to Top | Article Outline


This was a correlational, psychometric study. Third-year medical students at the University of California, San Francisco were required to write reflections on professionalism as part of a mandatory course which takes place between clerkships. All responded to the same prompt asking them to “reflect upon your clerkships so far and select an experience that had a significant impact upon your view of yourself as a professional or of the medical profession in general.” With the prompt, all received in writing a definition of critical reflection, and half also received page-long structured guidelines to critical reflection modeled on the clinical SOAP note and a page of answers to frequently asked questions about reflection.19 We randomly selected 60 reflections from the pool of 163: 30 from students who used the guidelines and 30 from those who did not. Two trained raters scored the reflections for reflective ability, and two different raters with no experience in reflection scored the reflections for writing and storytelling ability. We chose two separate sets of raters to eliminate perceived bias of storytelling and writing on scoring of reflective ability, and vice versa. This study received approval from our institutional review board.

Back to Top | Article Outline

Reflection scores

We scored the reflections using a previously developed rubric13 which provides scores for reflection ranging from “failure to address assignment” (0) to “identifies lessons learned and draws on experiences and feedback to craft a plan for the future and cites a means of determining the plan's success” (6). The reflections in this study were scored for reflective ability by two trained raters with an interrater reliability (intraclass correlation coefficient [ICC]) of 0.91 between them. Validity evidence for these reflection scores already has been provided in a separate study.14

Back to Top | Article Outline

Writing and storytelling rubric development

We searched PubMed, ERIC, and Google Scholar to find writing and storytelling rubrics but found none which were validated, developed for adult learners, and suitable for one- to two-page essays. Following published recommendations for rubric development,20,21 procedures used by others including the College Entrance Examination Board,22,23 and writing textbooks,24,25 we drafted holistic writing and storytelling rubrics. These were reviewed by three secondary school English teachers with advanced degrees in English literature and/or creative writing, four PhD medical educators, and three medical students to ensure that all key elements of each skill set had been included and that the rubric items were comprehensible to those without training in writing or storytelling. Each rubric contained five subdomains as follows: writing—capitalization and punctuation, grammar and spelling, sentence structure, style and diction, and overall structure and organization; and storytelling—setting, character, plot, detail, and narrative approach. Criteria described essays which were excellent (four points), good (three points), marginal (two points), or poor (one point) for each subdomain.

Back to Top | Article Outline

Writing and storytelling rubric validation

Next, we selected 10 one- to two-page essays and stories which showed varying combinations of good and bad writing and storytelling, and the English teachers classified these selections from best to worst for both writing and storytelling. Finally, three fiction writers scored the 10 writings using the two rubrics. The ICC for the three writers was 0.89 for writing and 0.90 for storytelling. The associations between classification of the 10 writings and other scores were analyzed using Spearman correlation because the classification was ordinal. Correlation between classification and average writing score from the rubric was r s = 0.76 (P = .003) and between classification and average story telling score was r s = 0.89 (P < .001), thereby providing evidence of the validity of the scores from both rubrics.

Back to Top | Article Outline

Writing and storytelling scores

The 60 medical student reflective exercises were scored for writing and storytelling by a new set of two raters with backgrounds in both medicine and the humanities. The raters were trained using essays from the validation process and third-year medical student reflections that were not part of the study. Their ICC was 0.75 for writing and 0.76 for storytelling. For this study we averaged their two scores for all measures.

For the analysis of construct-irrelevant variance we calculated descriptive statistics for the writing, storytelling, and reflection scores and used Pearson correlation coefficients to examine the relationships between reflective ability and writing skills and storytelling. We conducted multiple linear regression analysis to examine the two scores simultaneously in their prediction of the reflection score. We also examined the differences in the three scores using independent t test to determine whether the abilities differed with and without the structured reflection guidelines.

Back to Top | Article Outline


Reflection, writing, and storytelling scores were normally distributed. Descriptive statistics for all 60 reflections are as follows: reflection (score range 0–6) mean 3.16 (SD = 1.23); writing (score range 5–20) mean 10.49 (SD = 2.16); storytelling (score range 5–20) mean 9.03 (SD = 1.80). Overall, the students earned 53% of the reflection points, 52% of writing points, and 45% of storytelling points.

There was no correlation between reflective ability score and either writing score (r = 0.049, P = .35) or storytelling score (r = 0.14, P = .13). There was a modest correlation between writing and storytelling scores (0.46, P < .001). In the multiple regression, when considering writing and storytelling simultaneously as predictors of reflection, the scores for writing and storytelling accounted for 2.1% (P = .54) of the variance in reflection scores. Both variables had small and nonsignificant β coefficients, −0.02 (P = .88) for writing and 0.15 (P = .30) for storytelling.

Table 1 shows the impact of the guidelines on reflection, writing, and storytelling scores. The guidelines significantly increased students' scores for reflective ability, but there was no significant difference between the two groups in writing or storytelling scores.

Table 1

Table 1

Back to Top | Article Outline


When asking learners to write reflections, medical educators are often confronted with concerns on the part of both learners and other faculty members that writing and/or storytelling ability, neither of which is a criterion for being a physician, might impact evaluation of reflective ability. We found no evidence for construct-irrelevant variance with reflection and the constructs of storytelling and writing. When we considered writing and storytelling simultaneously, we accounted for virtually no variability in reflective ability scores. From these results, it appears we can feel confident that learners' writing and storytelling skills do not affect their reflection scores on written exercises when raters are trained to use the scoring rubric.

The notion that writing and storytelling skills might influence the grading of reflections makes intuitive sense. A well-written essay gives the impression of greater education and intellectual abilities on the part of the writer. If the quality of the writing influenced the scoring of the written reflections, this would amount to a systematic bias. Likewise, storytelling has existed in human societies across cultures and throughout history because of its ability to seduce the listener or reader with a compelling narrative. Although the students in this study demonstrated a range of abilities in both writing and storytelling, our results suggest it is possible to score reflective capability independent of the very real allures of good writing and a good story.

We found that reflective ability can be separated from writing or storytelling skills even in written reflections. Our finding that guidelines providing a structured approach to reflection increase reflective capacity but not writing or storytelling scores reinforces this contention. Just as writing and storytelling abilities don't determine reflection scores, educating learners about the components of effective reflections does not make them better writers or storytellers.

The finding of low average scores on reflection, writing, and storytelling is not unexpected. Reflection is a skill which requires education, feedback, and practice. The medical students in this study were reflective learning novices who received no instruction beyond the reflection guidelines, which were provided to only half the class. Explicit in the instructions to all the students was the goal to use the exercises to assess reflective abilities and professionalism, not writing. As a result, very few of the exercises reached even the writing skill expected from a competent college-level essay, to which the rubric should also be applicable. Similarly, the objective of the exercise was to reflect on and so learn from a clinical experience, not to tell a story from the wards, so the storytelling scores were also quite low. We do not believe that these scores indicate that our students were poor writers or storytellers; rather, the low average scores likely reflect the range of possible scores on the rubrics and the students' effort on this small, ungraded assignment.

This study has limitations. While the reliability between the two writing and storytelling raters was acceptable, more extensive training might have resulted in higher reliability. Also, using a single set of raters might have produced different results. The rubrics provide scores with high reliability when used by expert writers. Our less expert raters performed satisfactorily with a single 1.5-hour session of training.

Though not statistically significant, guideline use appeared to decrease writing scores. However, this finding may be the result of the lower reliability of the raters and/or a consequence of the study design. Guideline users followed a four-part structure to frame their reflection which was removed during scoring to keep the raters blinded to guideline use. This change would lower scores for “overall structure and organization” on the writing rubric.

Mann and colleagues26 conducted a systematic review on reflection and reflective practice in the health professions which discussed multiple studies supporting the claim that we can measure reflective ability through written activities. Our study further advances the arguments in favor of written reflections by demonstrating that judgment can be made about learners' reflective abilities free of influences from their writing and/or storytelling abilities. Feeling comfortable that writing and storytelling are not influencing the measurement of reflection should allow us to move forward in determining how best to educate learners to become better at reflection as they progress through their professional education.

Back to Top | Article Outline


Dr. Aronson was supported by the UCSF Office of Medical Education's Medical Education Research Fellowship.

Back to Top | Article Outline

Other disclosures:


Back to Top | Article Outline

Ethical approval:

This study received approval from the University of California, San Francisco institutional review board.

Back to Top | Article Outline


1 Dewey J. How We Think: A Restatement of the Relation of Reflective Thinking to the Educative Process. Boston, Mass: D.C. Heath and Co; 1933.
2 Schön DA. The Reflective Practitioner: How Professionals Think in Action. New York, NY: Basic Books; 1983.
3 Boud D, Keogh R, Walker D. Reflection: Turning Experience Into Learning. London, UK: Kogan Page; 1987.
4 Mezirow J. Transformative Dimensions of Adult Learning. San Francisco, Calif: Jossey-Bass; 1981.
5 Blatt B, Plack M, Maring J, Mintz M, Simmens SJ. Acting on reflection: The effect of reflection on students' clinical performance on a standardized patient examination. J Gen Intern Med. 2007;22:49–54.
6 Mamede S, Schmidt HG, Penaforte JC. Effects of reflective practice on the accuracy of medical diagnoses. Med Educ. 2008;42:468–475.
7 Sobral DT. Medical students' mindset for reflective learning: A revalidation study of the reflection-in-learning scale. Adv Health Sci Educ Theory Pract. 2005;10:303–314.
8 Toy EC, Harms KP, Morris RK Jr, Simmons JR, Kaplan AL, Ownby AR. The effect of monthly resident reflection on achieving rotation goals. Teach Learn Med. 2009;21:15–19.
9 Wong FK, Kember D, Chung LY, Yan L. Assessing the level of student reflection from reflective journals. J Adv Nurs. 1995;22:48–57.
10 Pee B, Woodman T, Fry H, Davenport ES. Appraising and assessing reflection in students' writing on a structured worksheet. Med Educ. 2002;36:575–585.
11 Plack MM, Greenberg L. The reflective practitioner: Reaching for excellence in practice. Pediatrics. 2005;116:1546–1552.
12 Boenink AD, Oderwald AK, De Jonge P, Van Tilburg W, Smal JA. Assessing student reflection in medical practice. The development of an observer-rated instrument: Reliability, validity and initial experiences. Med Educ. 2004;38:368–377.
13 Learman LA, O'Sullivan P. Resident physicians' ability to reflect. Paper presented at: American Educational Research Association Annual Meeting; 2007; Chicago, Ill.
14 Learman LA, Autry AM, O'Sullivan P. Reliability and validity of reflection exercises for obstetrics and gynecology residents. Am J Obstet Gynecol. 2008;198:461.e1–e8.
15 Freedman SG. Upon further reflection. New York Times. August 30, 2006. Available at: Accessed February 3, 2010.
16 Lunz ME, Bashook PG. Relationship between candidate communication ability and oral certification examination scores. Med Educ. 2008;42:1227–1233.
17 Downing S, Haladyna TM. Validity threats: Overcoming interference with proposed interpretations of assessment data. Med Educ. 2004;38:327–333.
18 Downing S. Construct-irrelevant variance and flawed test questions: Do multiple-choice item-writing principles make any difference? Acad Med. 2002;77(10 suppl):S103–S104.
19 Aronson L, Robertson P, Lindow J, O'Sullivan P. Guidelines for reflective writing produce higher quality reflections. Paper presented at: Association of American Medical Colleges Annual Meeting; 2009; Boston, Mass.
20 Moskal BM, Leydens JA. Scoring rubric development: Validity and reliability. Pract Assess Res Eval [serial online]. 2000;7(10). Available at: Accessed June 26, 2010.
21 Wiggins G. Educative Assessment: Designing Assessments to Inform and Improve Student Performance. San Francisco, Calif: Jossey-Bass Publishers; 1998.
22 Anderson JS, Mohrweis LC. Using rubrics to assess accounting students' writing, oral presentations, and ethics skills. Am J Bus Educ. 2008;1:85–94.
23 Engelhard GE, Myford CM, eds. Monitoring Faculty Consultant Performance in the Advanced Placement English Literature and Composition Program with a Many-Faceted Rasch Model. New York, NY: The College Board; 2003.
24 Burroway J, Weinberg S. Writing Fiction: A Guide to Narrative Craft. New York, NY: Longman; 2002.
25 Zinsser WK. On Writing Well: The Classic Guide to Writing Non-Fiction. New York, NY: HarperCollins; 2006.
26 Mann K, Gordon J, Macleod A. Reflection and reflective practice in health professions education: A systematic review. Adv Health Sci Educ Theory Pract. 2009;14:595–621.
© 2010 Association of American Medical Colleges