The modifications to be introduced in the Medical College Admission Test in 2015 (MCAT2015)1 represent a significant change to a test which has hitherto seemed to successfully help medical school admissions committees identify students with the requisite ability to succeed in medical school. In particular, the decision to remove the Writing Sample will, in our view, considerably narrow the abilities of candidates which the MCAT2015 exam will be able to assess, notwithstanding the introduction of the Psychological, Social, and Biological Foundations of Behavior (FOB) subtest. We argue that, despite the decision to remove the MCAT Writing Sample, a writing task provides important information about applicants’ cognitive abilities and appropriately remains a key component of assessment for many other tertiary admissions tests. Our purpose is not necessarily to critique the changes to the MCAT exam directly but, rather, to use this decision as a prompt to reconsider the value of a writing test for medical admissions purposes.
The Value of a Writing Test for Medical School Admissions
Writing is nature’s way of letting you know how sloppy your thinking is.
—Richard Guindon (cartoon), San Francisco Chronicle, 1989
Writing tests can be understood, designed, and executed in very different ways. At one extreme are the literacy and language tests that focus intentionally (or in effect) on language knowledge and conventions. At the other extreme are writing tests designed to evaluate generative thinking. These latter tests are not focused on literacy or language skills per se but, rather, on the ability of candidates to develop and articulate substantial and significant ideas. Writing tests can also be differentiated in terms of the extent to which they pose specific and constrained tasks at one end of the spectrum, or broad and open tasks at the other.
We use the term “generative thinking” not only to denote the fact that writing is inherently a generative or productive task but also to refer to the open-ended nature (or the potential for open-endedness) of a written task. That is, with the appropriate scoring guides, a written task can allow scorers to evaluate and reward responses of substantially different content which nevertheless display similar underlying levels of thought. Although convergent thinking, with its logico-deductive nature in which there exists a unique or a limited number of correct answers, is unquestionably important in medicine as in all professional activities, so too we argue is the capacity to think broadly, generate relevant ideas, and articulate these ideas in a coherent and convincing manner. In other words, it is essential for candidates to display higher-order thinking through their writing. We do not suggest that the selected-response format cannot assess higher-order thinking but, rather, that it assesses only the convergent aspect of higher-order thinking. The inability of selected-response items to tap into the capacity for generative thinking is not an insignificant limitation of the format, and the written task allows us to broaden the usual scope of admissions testing, therefore representing one of the key justifications for its use.
Of course, a writing task does not necessarily assess generative or higher-order thinking. Its ability to do so depends not only on the writing prompt or task itself but also on the context and instructions provided with the task and, most important, how its developer conceptualizes the format—specifically, whether it is the quality of thought or the correctness of form and/or content which is emphasized. Similar distinctions have been raised in pedagogical debates about writing instruction, such as the “process versus product” debate2 or the distinction between writing as a “way of knowing” as opposed to a “way of telling.”3 Although not mutually exclusive, such different conceptualizations of writing necessarily lead to different emphases in scoring. If the response to a writing task is valued primarily as a means of communicating information, then the assessment of it will tend to emphasize the correctness of linguistic conventions and/or the accuracy of the content. On the other hand, if writing is considered primarily as a way of conveying ideas, then a piece of writing can be assessed as a representation of the writer’s capacity to generate and express thoughts. In the latter case, an appropriately constructed writing test becomes a proxy measure of the candidate’s capacity for higher-order thinking.
Several high-stakes test batteries explicitly conceptualize their writing tasks primarily as assessments of thinking. The SAT Essay, for example, is described by its developers as a measure of the ability to develop a point of view and support that point of view using reasoning and examples, along with the ability to “follow the conventions of standard written English,” and candidates are explicitly invited to support their position by drawing on their “reading, studies, experience, or observations.”4
We have been closely associated with the writing task on the Graduate Australian Medical School Admissions Test (GAMSAT), which offers very broad and general themes that candidates can respond to in a range of ways. The GAMSAT Information Booklet states, “Candidates are not assessed on the ‘correctness’ of the ideas or attitudes they display.”5 Instead, they are explicitly invited to respond to the writing task in a way that they think will best display their ability to think in a sophisticated way and write effectively. In this way, the GAMSAT Written Communication task aims to use candidates’ writing as a way of tapping into their capacity for higher-order thinking, to complement the information gathered about this capacity from the other multiple-choice questions (MCQs) and more content-focused sections of the test.
In its Analytical Writing Assessment Section, the Graduate Management Admission Test (GMAT) similarly emphasizes the broader cognitive aspects of writing, such as organization, expression, and support.6 The GMAT exam, though, does place greater emphasis on the candidate’s capacity for analytical evaluation of a specific argument,6 suggesting that the scoring criteria for the writing assessment would be less open-ended than the previous two examples. The common element in all of these tests, however, is the notion that a writing task provides schools with useful information about candidates’ capacity to generate meaningful thought and express this in a coherent way.
The MCAT Writing Sample and the Decision to Remove It
As with the writing tests on the SAT, the GAMSAT, and the GMAT exams, early descriptions of the MCAT Writing Sample suggest that the original intention seems to have been to conceptualize this writing task in a similarly broad way:
By requiring candidates to develop and present ideas in a cohesive manner, the MCAT Writing Sample offers medical school admissions committees evidence of an applicant’s writing and analytical skills. These skills are critical to the preparation of useful medical records and to effective communication with patients and other health professionals. The MCAT Writing Sample provides unique information unavailable from other sections of the examination.7
The most recent versions of the MCAT Writing Sample, however, present quite a specific and constrained task, to judge from the sample material. Candidates are offered a specific statement for consideration, but their responses are subsequently constrained by a template for the construction and direction of those responses8:
In a free society, individuals must be allowed to do as they choose.
Write a unified essay in which you perform the following tasks. Explain what you think the above statement means. Describe a specific situation in which individuals in a free society should not be allowed to do as they choose. Discuss what you think determines when a free society is justified in restricting an individual’s actions.
Such instructions simplify and reduce the challenge for candidates, practically rendering the essay format into a series of short-answer questions. Perhaps such a directed prompt is justified on the basis of “leveling” the nature of the task for all candidates, or perhaps it reflects an attempt to make grading as consistent and reliable as possible. However, the resultant prescriptive nature of the task would seem to significantly constrain the possibilities for stronger candidates to display higher levels of thought, construction, and expression, and therefore would potentially reduce the variation in the responses of candidates. Whatever the original intent, such a task may well end up assessing the literacy and language conventions, or, at best, essentially the “correctness” of the responses as opposed to the candidates’ levels of thinking. Even so, the MCAT Writing Sample has at least provided medical schools with some information about candidates’ capacity to think, albeit in a convergent sense.
The MCAT Web site9 offers the following reasons for the decision to remove the Writing Sample from the exam:
The advisory committee for the review decided to drop the Writing Sample section after carefully considering input from medical school admissions officers who reported that scores on this section of the test are used for only a very small group of applicants (i.e., applicants with low Verbal Reasoning or Writing Sample scores, applicants who have difficulty communicating in their interviews, etc.). Additionally, performance on the Writing Sample section offers admissions committees little additional information about applicants’ preparation for medical school relative to what is offered by undergraduate grades and the other sections of the exam.
The 2013 MCAT Essentials states that the writing task has been removed from this year’s exam “to make room for a Trial Section [the FOB component] … for a future version of the MCAT exam.”10 Although there are elements of pragmatism in these justifications (i.e., the use of the Writing Sample by medical schools is limited, the need to create space for the FOB component), essentially all these arguments boil down to validity. Although there may be many reasons for the limited use of the Writing Sample scores by medical schools, ultimately this fact reflects a lack of faith, whether perceived or realistic, in the validity of the information provided by the Writing Sample. Similarly, the decision to replace the writing task with the FOB component is a direct judgment about the perceived validity of the writing task by the developers of the MCAT exam. For us, this raises further questions about the comparative benefits of removing a generative task to create space for extra selected-response items, even if these items focus on the behavioral and social foundations of medicine. Not surprisingly, given our arguments in favor of a writing task outlined above, we can envisage a strong case for assessing candidates’ understanding of such psychosocially oriented foundations precisely through writing.
Despite earlier studies providing some evidence of an association between the MCAT Writing Sample and performance in medical school and clinical rotations,11,12 a meta-analytic study by Donnon and colleagues13 determined that the Writing Sample showed “low predictive validity” for both medical board licensing exam outcomes and medical school performance measures. Gilbert and colleagues14 had earlier posited that the lack of published data on the predictive validity of the Writing Sample may have explained why the scores were not used more widely by U.S. medical schools; Donnon and colleagues’ meta-analytic study may well have sealed its fate. A subsequent study of the MCAT exam’s predictive validity chose not to include the writing task at all,15 partly on the basis of the result of the study by Donnon and colleagues.
However, a wider view of the research on the predictive validity of writing tasks suggests that the association between a writing task and subsequent performance criteria is complicated. Several factors may distort a possible underlying association, including the nature of the writing task, the way the task is scored (i.e., the criteria and rubrics on which the scoring is based), the spread of the results (or lack thereof), the reporting of the results (mentioned by Callahan and colleagues15 as an additional factor for not including the MCAT Writing Sample in their analyses), and the way the results are used to select applicants. The MCAT Writing Sample, as we outlined above, is a particularly constrained writing task, and if the scoring criteria emphasize the content-related correctness of the response and/or language conventions over the broader substantive quality of the thought and expression, then the lack of association between Writing Sample scores and medical school or profession-related skills would not be particularly surprising.
As a result, associations between writing task scores and criterion variables can show high variability across different institutions and/or domains, a point explicitly recognized by Gilbert and colleagues.14 Research on the GAMSAT exam has shown similar cross-institutional variation across all three components of the test, but particularly on the writing task, whose association with first-year course grades for individual institutions ranged from highly negative to moderately positive,16 a finding readily explained by the different ways in which the Written Communication score is used as part of the admissions process. Similar variability occurs with other writing tests in other contexts, such as the GMAT Analytical Writing Assessment Section.17,18 Such contextual factors suggest that the findings of predictive validity studies are more indicative of the association between the specific tests as used by the specific institution, rather than of the nature and predictive utility of the writing assessment in general.
Moreover, designing or justifying test constructs solely or even predominantly on the basis of predictions of future performance is highly problematic. Apart from the statistical difficulties inherent in the process,19 predictive validity is not a satisfactory proxy, as is readily assumed, for a unified concept of validity as argued by theorists such as Messick.20 The last 25 years have seen the development of a holistic concept of validity that has significantly reduced the preeminence traditionally given to predictive validity. An overemphasis on predictive validity limits the construct validity of a test by privileging those components which maximize desirable correlations, despite the potential for common method bias (for example, the multiple-choice format). As Breland and colleagues21 point out, format is a key determinant of the validity of a test construct. We therefore ought not expect broad, general constructs, such as higher-order thinking expressed through writing, to have similar levels of correlation as more narrowly defined discipline-specific constructs. Rather, their inclusion, where appropriate, needs to be justified by arguments based on construct validity.
From a broader construct validity perspective, therefore, it can be argued that a medical school admissions battery should be a broad and balanced test of cognitive skills,22 and on this basis it is clear that a writing test offers distinctly different information about a candidate from that offered by the other (usually selected-response) sections of the test. We say this even though we would defend the value of the multiple-choice format for assessing higher-order thinking, in the ways increasingly being practiced by medical schools and exemplified by the National Board of Medical Examiners guide to item development.23
The Justification for Writing Tasks
Ultimately, a writing test is a valuable part of any assessment process because it elicits a range of thinking skills that are not readily assessed by the MCQ format. This is true despite long-held concerns about the relatively low reliability or perceived “subjectivity” of direct assessment of writing.24 These are assessment challenges that need to be acknowledged and addressed—through increased sampling (i.e., more than a single writing task), scoring guides, rater training, and other admittedly laborious rater calibration processes. But, as medical educators are increasingly realizing in relation to assessment in general and the OSCE in particular,25 the benefits of the increased construct validity of more authentic or appropriate assessment tasks are usually worth the consequent lower (yet potentially adequate) reliabilities.
The inclusion of a writing test for medical admissions testing therefore does not need to be justified on the basis of the importance of communication in medical practice, or by the notion that high-level literacy is crucial for success in medical school, or because of its contribution to higher prediction coefficients. Writing tasks are justified on the basis that a well-developed (and well-scored) writing test can tap into substantive, generative thinking skills, and this provides a crucial addition to what is traditionally tested, and inherently testable, in an exclusively multiple-choice test. Perhaps the addition of the FOB section will go some way toward improving the breadth of what the MCAT exam will assess, but it cannot reach into the cognitive recesses where a generative written task can.
Other disclosures: Dr. McCurry designed the writing test of the Graduate Australian Medical School Admissions Test (GAMSAT) and has worked on it since 1995. Mr. Chiavaroli worked on the GAMSAT as a research fellow at the Australian Council for Educational Research from 1999 to 2006.
Ethical approval: Not applicable.
Disclaimer: The views expressed in this article are those of the authors and are not those of the Australian Council for Educational Research, the University of Melbourne, or the GAMSAT Consortium.
1. Association of American Medical Colleges. The Preview Guide for MCAT2015
. 2011 Washington, DC Association of American Medical Colleges;
2. White EM. Assessing higher-order thinking and communication skills in college graduates through writing. J Gen Educ. 1993;42:105–122
3. McCrimmon JR. Writing as a way of knowing. In: The Promise of English: NCTE 1970 Distinguished Lectures. 1970 Urbana, Ill National Council of Teachers of English:115–130
4. The College Board. . SAT: The essay. http://sat.collegeboard.org/practice/writing/sat-essay
. Accessed January 23, 2013
5. Australian Council on Education Research. GAMSAT (Graduate Australian Medical School Admissions Test) Information Booklet 2013. http://gamsat.acer.edu.au/files/GAMSAT_Info_book_13.pdf
. Accessed January 23, 2013.
6. Graduate Management Admission Council. . Analytical writing assessment section. http://www.mba.com/the-gmat/test-structure-and-overview/analytical-writing-assessment-section.aspx
. Accessed January 23, 2013
7. Association of American Medical Colleges. MCAT Interpretive Manual. 2005 Washington, DC Association of American Medical Colleges;
8. Association of American Medical Colleges. Medical College Admission Test (MCAT) 2000 Announcement. 2000 Washington, DC Association of American Medical Colleges:43
9. Association of American Medical Colleges. . MCAT2015
frequently asked questions (FAQs): Verbal Reasoning and the Writing Sample. https://www.aamc.org/students/applying/mcat/mcat2015/faqs/
. Accessed January 23, 2013
10. Association of American Medical Colleges.. 2013 MCAT Essentials. https://www.aamc.org/students/download/63060/data/mcatessentials.pdf
. Accessed January 23, 2013
11. Hojat M, Erdmann JB, Veloski JJ, et al. A validity study of the Writing Sample section of the Medical College Admission Test. Acad Med. 2000;75(10 suppl):S25–S27
12. Julian ER. Validity of the Medical College Admission Test for predicting medical school performance. Acad Med. 2005;80:910–917
13. Donnon T, Paolucci EO, Violato C. The predictive validity of the MCAT for medical school performance and medical board licensing examinations: A meta-analysis of the published research. Acad Med. 2007;82:100–106
14. Gilbert GE, Basco WT Jr, Blue AV, O’Sullivan PS. Predictive validity of the Medical College Admission Test Writing Sample for the United States Medical Licensing Examination Steps 1 and 2. Adv Health Sci Educ Theory Pract. 1997;1:179–196
15. Callahan CA, Hojat M, Veloski J, Erdmann JB, Gonnella JS. The predictive validity of three versions of the MCAT in relation to performance in medical school, residency, and licensing examinations: A longitudinal study of 36 classes of Jefferson Medical College. Acad Med. 2010;85:980–987
16. Coates H. Establishing the criterion validity of the Graduate Medical School Admissions Test (GAMSAT). Med Educ. 2008;42:999–1006
17. Sireci SG, Talento-Miller E. Evaluating the predictive validity of Graduate Management Admission Test scores. Educ Psychol Meas. 2006;66:305–317
18. Hill KL, Hynes GE, Joyce MP, Green JS. GMAT-AWA score as a predictor of success in a managerial communication course. Bus Comm Q. 2011;74:103–118
19. Bell JF. Difficulties in evaluating the predictive validity of selection tests. Res Matters. 2007;3:5–10
20. Messick SLinn RL. Validity. In: Educational Measurement. 19893rd ed New York, NY MacMillan Publishing Company
21. Breland HM, Bridgeman B, Fowles ME Writing Assessment in Admission to Higher Education: Review and Framework. 1999 New York, NY College Entrance Examination Board
22. Dienstag JL. The Medical College Admission Test—Toward a new balance. N Engl J Med. 2011;365:1955–1957
23. National Board of Medical Examiners. . Constructing Written Test Questions for the Basic and Clinical Sciences. Revised 20023rd ed http://www.nbme.org/publications/index.html
. Accessed January 23, 2013
24. Yancey KB. Looking back as we look forward: Historicizing writing assessment. College Comp Comm. 1999;50:483–503
25. Eva KW, Hodges BD. Scylla or Charybdis? Can we navigate between objectification and judgement in assessment? Med Educ. 2012;46:914–919