The letter to the editor by Frendø1 regarding our review, “A Shift on the Horizon: A Systematic Review of Assessment Tools for Plastic Surgery Trainees,”2 highlights concerns regarding the assessment of validity of the presented tools, and provides a larger commentary on the role of Messick’s unified model of validity in surgical education research.3
We thank Frendø for the thoughtful consideration of our work. Our review aimed to identify and evaluate assessment tools being used for technical and nontechnical competencies in plastic surgery. In doing so, our goal was to provide plastic surgery educators with a practical resource that would allow them to make informed choices about which evaluation tools were available for assessing various competencies.
As such, we chose to report the validity terms provided by each of the authors whose work we included. Our results reveal that assessment tools in plastic surgery have generally not incorporated a modern, integrated definition of validity. Rather, validity is typically reported using traditional concepts such as content, criterion, and predictive validity.2,4 Furthermore, there is a general lack of detail provided in the descriptions of validity, which makes it challenging to further categorize validity measures according to a framework such as Messick’s.
Messick defines validity as “an integrated evaluative judgment of the degree to which empirical evidence and theoretical rationales support the adequacy and appropriateness of interpretations and actions based on test scores or other models of assessment.”3 He suggests that a global assessment of validity should therefore be determined by considering various pieces of evidence. We support the use of this model. However, Messick also argues the importance of considering other critical aspects of measurement, including reliability.4 Our review found that one-third of included studies reported a traditional definition of validity, without having first established acceptable scores of reliability.2 Because reliability scores place an upper limit on validity, we were unable to establish the strength of the reported validity measures, even using the traditional definitions for validity.5 Thus, had we chosen to use Messick’s unitary validity framework instead of the traditional framework, these articles would still have been considered psychometrically weak. As a result, we believe that our conclusions regarding the weak psychometric properties of many of the included studies would have changed very little, if at all, had we used Messick’s framework for validity evaluation.
Our findings are not unique to plastic surgery. For instance, a review by Borgersen et al. demonstrated that only 33 of 498 included studies (6.6 percent) used Messick’s framework to evaluate surgical assessment tools.6 Of note, only 11 percent of these 33 studies (three or four articles) were specific to plastic or maxillofacial surgery. We posit that Messick’s unified framework for validity may be difficult to conceptualize, and more difficult to apply, given the limited pragmatic guidance on how to integrate various sources of evidence to determine an overarching definition of validity. In fact, Sireci, an expert in validity theory, argues that “the unitary conceptualization of validity…is extremely difficult to describe to lay audiences.”7 Furthermore, he highlights the work of Shepard and states that, “the unitary conceptualization of validity has done little to provide guidance regarding how to validate the use of tests in specific situations.”7 This position was reiterated at the recent International Conference on Residency Education, where speakers described Kane’s validity framework as more commonly used in medical education, and Messick’s framework as favored by education and psychology researchers.9,10 We argue that the use of unified validity frameworks may be further limited because of the process by which specialty-specific evaluation forms are often created. This process typically depends on surgeons and residents to develop and evaluate the tools rather than psychometricians who are well versed in this literature; the former are the intended audience for our review article.
In summary, we support Frendø’s position regarding the need to advance the status quo of surgical assessment literature by imposing rigorous standards. However, we respectfully suggest that the limited reliability scores of the included articles must first be addressed before additional evaluation of validity is warranted. We also posit that other validity models, such as Kane’s, may be appropriate in medical education. We support the use and development of pragmatic resources to use such unified validity frameworks, such as those put forward by Borgersen et al.6 and Tavares et al.9 Such literature will certainly advance the assessment of validity in the field of surgical evaluation as recommended by Frendø.
The authors have no financial interest to declare in relation to the content of this communication.
Christine Fahim, Ph.D.
Bloomberg School of Public Health
Johns Hopkins University
Victoria E. McKinnon, M.St.
Michael G. DeGroote School of Medicine
Portia Kalun, M.Sc.
Office of Education Science
Department of Surgery
Mark H. McRae, M.D.
Department of Surgery
Ranil R. Sonnadara, Ph.D.
Office of Education Science
Department of Surgery
Hamilton, Ontario, Canada
1. Frendø M. A shift on the horizon: A systematic review of assessment tools for plastic surgery trainees (Letter). Plast Reconstr Surg. 2019;144:1129e.
2. McKinnon VE, Kalun P, McRae MH, Sonnadara RR, Fahim C. A shift on the horizon: A systematic review of assessment tools for plastic surgery trainees. Plast Reconstr Surg. 2018;142:217e–231e.
3. Messick S. Alkin MC. Validity of test interpretation and use. In: Encyclopedia of Educational Research. 1992:6th ed. New York, NY: Macmillan; 1482–1495.
4. Messick S. Standards of validity and the validity of standards in performance assessment. Educ Meas. 1995;14:5–8.
5. Kane MT. Validating the interpretations and uses of test scores. J Educ Meas. 2013;50:1–73.
6. Borgersen NJ, Naur TMH, Sørensen SMD, et al. Gathering validity evidence for surgical simulation: A systematic review. Ann Surg. 2018;267:1063–1068.
7. Sireci SG. On validity theory and test validation. Educ Res. 2007;36:477–481.
8. Shepard LA. Evaluating test validity. Rev Res Educ. 1993;19:405–450.
9. Tavares W, Brydges R, Myre P, et al. Applying Kane’s validity framework to a simulation based assessment of clinical competence. Adv Health Sci Educ Theory Pract. 2018;23:323–338.
10. Bhanji F, Posner G. KeyLIME session: Best simulation literature. Paper presented at: International Conference on Residency Education Annual Meeting; October 19, 2018; Halifax, Nova Scotia, Canada.
Letters to the Editor, discussing material recently published in the Journal, are welcome. They will have the best chance of acceptance if they are received within 8 weeks of an article’s publication. Letters to the Editor may be published with a response from the authors of the article being discussed. Discussions beyond the initial letter and response will not be published. Letters submitted pertaining to published Discussions of articles will not be printed. Letters to the Editor are not usually peer reviewed, but the Journal may invite replies from the authors of the original publication. All Letters are published at the discretion of the Editor.
Letters submitted should pose a specific question that clarifies a point that either was not made in the article or was unclear, and therefore a response from the corresponding author of the article is requested.
Authors will be listed in the order in which they appear in the submission. Letters should be submitted electronically via PRS’ enkwell, at www.editorialmanager.com/prs/.
We reserve the right to edit Letters to meet requirements of space and format. Any financial interests relevant to the content of the correspondence must be disclosed. Submission of a Letter constitutes permission for the American Society of Plastic Surgeons and its licensees and asignees to publish it in the Journal and in any other form or medium.
The views, opinions, and conclusions expressed in the Letters to the Editor represent the personal opinions of the individual writers and not those of the publisher, the Editorial Board, or the sponsors of the Journal. Any stated views, opinions, and conclusions do not reflect the policy of any of the sponsoring organizations or of the institutions with which the writer is affiliated, and the publisher, the Editorial Board, and the sponsoring organizations assume no responsibility for the content of such correspondence.
The Journal requests that individuals submit no more than five (5) letters to Plastic and Reconstructive Surgery in a calendar year.