This paper is based on my presentation at the annual Research in Medical Education (RIME) program “wrap-up” session at the annual meeting of the Association of American Medical Colleges (AAMC), Washington, DC, November 12, 2003.
The purpose of this paper is to describe the state of research in medical education by analyzing the methods used in the papers accepted for presentation at the AAMC's annual 2003 Research in Medical Education (RIME) meeting as published in the October supplement to Academic Medicine. Based upon this analysis, I speculate on what RIME will look like ten years into the future. The paper begins with some observations on the general state of educational research, particularly vis-à-vis the call for more evidence-based, or best-evidence, education.
The overview does not address the remarkable growth over time of the RIME program reflected in the addition of an invited “keynote” speaker, a RIME review paper session, or the abstract sessions.
Pangaro1 observed that 65% of the program was devoted to undergraduate medical education, 25% to graduate medical education, and 10% to continuing medical education. Approximately 27% of the 97 papers submitted were selected for presentation.1 This suggests a fairly competitive and rigorous review process in which these papers could be considered to be representative of higher-quality or “better” research reports in medical education.
Bruce Thompson, in a 1998 invited address at the American Educational Research Association, noted that
There is no question that educational research, whatever its methodological and other limits, has influenced and informed educational practice (cf. Gage, 1985; Travers, 1983). But there seems to be some consensus that … the quality of published studies in education and related disciplines is, unfortunately, not high. ... [Two] Empirical studies of published research involving methodology experts as judges...found that over 40% & over 60%, respectively, of published research was judged by methods experts as being seriously or completely flawed. Of course, it must be acknowledged that even a methodologically flawed study may still contribute something to our understanding of educational phenomena. As Glass (1979) noted, ‘Our research literature in education is not of the highest quality, but I suspect that it is good enough on most topics.’2, p. 12
For the analysis of the selected papers, both qualitative and quantitative studies were examined. Qualitative studies are more exploratory and hypothesis generating and typically deal with questions of whether and how much something works, while quantitative studies typically are more confirmatory and address questions what, how, and why something works.
All 27 papers in the October 2003 supplement to Academic Medicine were reviewed and classified according to the research methods used using an a priori coding scheme reflecting (1) whether the paper represented either the qualitative or quantitative research paradigm, (2) the subcategory of research within these two broad research paradigms, and (3) the research methods used and reported in these studies. Qualitative studies were classified into one of two general subcategories, either as (a) a descriptive study or (b) a literature review. Quantitative studies were subclassified as (a) an intervention study, (b) a measurement study, or (c) a descriptive study. The research methods used in each study were noted and included whether a research question or an hypothesis was stated, whether statistics, p values, effects size measures, and/or confidence intervals were reported (principally for the quantitative studies), and whether (for qualitative studies only) multiple data sources and/or triangulation (multiple methods) were used, and if more than one investigator collected and analyzed the data.
Four of the 27 papers were classified as qualitative, while 23 were classified as quantitative, representing 15% and 85% of the RIME program, respectively.
Of the four qualitative reports, two were literature reviews and all four could be considered descriptive in nature. All stated one or more explicit research question, while none stated hypotheses, consistent with the nature of qualitative research being more hypothesis-generating than hypothesis-testing in nature. Of the two nonliterature reviews, one was a purely descriptive report of program dissemination/diffusion while the other was a qualitative study that used rigorous methods involving multiple data sources, multiple coders, and triangulation. Although none reported statistics, one of the literature reviews did report p values. One of the qualitative studies used multiple data sources, more than one investigator to collect and analyze the data (and reported interrater agreement), and triangulation.
The majority of the quantitative studies were measurement studies examining the psychometric (reliability, validity) characteristics of assessment tools or methods. This represented 17 of the 23 reports, or 65% of the quantitative studies. Another four (17%) quantitative papers reported evaluations of educational interventions, and four more (17%) were essentially descriptive in nature. All papers stated, either explicitly or implicitly, the research question being addressed, but only seven (30%) of the quantitative papers were hypothesis-driven. All but one of the 23 reports explicitly stated the research question; the question could be inferred from one study but was not explicitly stated. For the seven hypothesis-driven studies, five explicitly stated the research hypothesis, while they could be readily inferred in another two studies. All of the quantitative studies reported statistics, with 15 (65%) reporting p values and 15 (65%) reporting an effect size measure. Effect size measures were almost exclusively (14 of 15) a variance accounted for index in a measurement or descriptive study, such as R2, r, percent variance, validity or reliability coefficients, and generalizability (g) coefficients. Only one study reported an effect size for the difference in means (in standard deviation units) between students who did and did not participate in an educational intervention. Three studies reported confidence intervals when describing means or ratios, but none reported them for effect size measures stemming from formal statistical tests, relying almost exclusively on p values.
Discussion and Conclusions
Current Status of Research in Medical Education.
The findings of this content analysis of research reports selected for presentation at the RIME conference in 2003 are illuminating, and in several instances somewhat surprising. The overwhelming majority of reports were quantitative rather than qualitative in nature, and although this perhaps could be expected it seems striking that the preponderance of quantitative studies were measurement and descriptive studies (83%), that there were few evaluations of educational interventions, and that only one qualitative study used state-of-the art, rigorous qualitative methods.
Although the use of statistics is almost ubiquitous and many quantitative studies reported effect size measures, these were almost exclusively estimates of the amount of variance accounted for in measurement studies. Often these were not explicitly reported or commented upon as effect size measures, but merely reported as statistical results. In no instance were confidence intervals used in summarizing effect size results of statistical tests; rather, investigators relied on reporting traditional p values. Many professional organizations and journals are recommending or requiring that effect size indices be routinely reported.3 For example, the American Psychological Association Task Force on Statistical Inference advocated in 1999 to “Always provide some effect-size estimate when reporting a p value,”4 and in 2001 the Joint Task Force of Academic Medicine and the GEA-RIME Committee's “Review Criteria for Research Manuscript” report proposed that “Measures of functional significance, such as effect size or proportion of variance accounted for, accompany hypothesis-testing analyses.”5 Effect size measures fall into two broad categories, typically either standard difference indices that are “metric-free” and thus facilitate comparisons across studies (e.g., Cohen's d, which is a standardized mean difference index representing the difference between intervention and comparison group means in standard deviation units), or variance accounted for indices (like r2), which typically can be computed in all quantitative studies. In summary, the methodological quality of research in medical education as content analyzed in this report is consistent with the critical observations made by Thompson, Glass, and others about the quality of educational research in general.
What Will the Future of the RIME Conference Look Like?
It is always risky to try to predict the future, especially when it is unknown! This not withstanding, following is a wish list of what I believe research in medical education will look like in ten years. I predict that the quality of both qualitative and quantitative research reports will improve considerably. Qualitative methods will routinely include the use of multiple sources of information and coders in generating new hypotheses. Literature reviews will be more systematic in nature, explicitly stating inclusion criteria for studies reviewed and routinely providing evidence tables with summaries of effect size. Quantitative studies will be more intervention and hypothesis driven, and all studies will routinely report effect size measures along with confidence intervals.
My predictions and hopes are that qualitative studies will routinely use accepted, rigorous methods, that multiple sources and types of information (e.g., triangulation) will be used, that more than one investigator will collect and analyze data and interrater agreement/reliability will routinely be reported, that qualitative studies will be appropriately framed as exploratory and hypothesis-generating in nature, and that results will not be overly interpreted.
Reviews of the literature will be “systematic” and routinely include explicit a priori criteria for including and excluding studies in the review, that multiple independent raters will be used to code the characteristics of each study in the review, that effect sizes and confidence intervals and not p values will be used to summarize the results for each outcome of interest, that evidence tables will be routinely used to succinctly summarize the characteristics of studies, including a description of the participants, methodology, interventions, outcome measures, and results. Some of these literature reviews will use quantitative syntheses (meta-analyses) to pool the results for similar outcomes across studies.
As mentioned, quantitative studies are currently measurement oriented, with far fewer studies testing the educational efficacy or effectiveness of innovative educational interventions. Hopefully the number and percentage of quantitative studies evaluating interventions will increase significantly in the next ten years. This will become even more important with increasing calls for making education as well as health care more evidence based, and for doing more shorter- and longer-term outcomes studies in medical education.6,7 There is clearly a need for more hypothesis-driven research, and it is likely that there will be more complex and hopefully informative multivariate analyses to accompany more common univariate statistical analyses. The trend toward the reporting of effect sizes and confidence intervals in lieu of reporting p values will continue and hopefully become commonplace.
All measurement studies will clearly state that validity and reliability are characteristics of scores (and not tests or tasks). Again, analyses will be more multivariate and not just univariate, and I predict that there will be some meta-analyses of measurement studies addressing similar questions using validity and reliability generalization techniques.
Hopefully the RIME Conference will show significant improvements in the quality of research in medical education in the foreseeable future, and each of us will contribute to this effort to advance the field. So together I hope that we will move cautiously forward, for even those of us who place our faith in the scientific method and in numbers need to be mindful of the cartoon by Alex Graham in which Fred Bassett, working diligently in his lab, stated, “I thought I had the answer to the meaning of life, but everything canceled out.”