Medical education research seeks to improve the community’s understanding of how to train safe and competent doctors. 1 For example, medical education researchers may identify when and how to include real patients in medical education, what steps and processes help reduce rates of medical error, and what causes burnout among students and other trainees. Medical education research is directly relevant to policy in an increasingly globalized medical training environment, and regulators use evidence from such studies to develop training frameworks and monitoring schemes (e.g., for learner assessment or program evaluation). 2 The importance of medical education requires that concerns over the validity of such research be carefully documented and evaluated. Here, we consider a major threat to the integrity of research in many fields, explain and provide examples of the risk it poses to medical education, and offer practical advice, again with examples, on what medical educators and clinicians can do about it.
The Replication Crisis
Over the past 2 decades, investigators in both the natural and social sciences have become aware that a substantial amount of published research evidence—perhaps more than half—is likely false. 3 This “replication crisis,” as it is often described, is characterized by research findings, including major, widely accepted results, that cannot be reproduced by independent research teams closely following the original methodology. Researchers have explored the replication crisis in many fields. For example, a team reviewing drugs and other treatments targeting cancer was able to replicate the findings of only 11% of “landmark” studies. 4 Further, in biology, reviews of preclinical studies have suggested that only 20%–25% of findings were reproducible, 5,6 and, in psychology, a collaborative team replicating methodologies was able to reproduce only around 36% of these earlier studies’ results. 7
If medical education is experiencing a similarly high rate of false but accepted research evidence, the implications for patient safety and policymaking are deeply concerning. Policy derived from false findings in medical education can lead to, among other outcomes, a tendency to pursue impressive-looking innovations that do not actually improve trainee competence and a higher likelihood of exploring novel techniques instead of rigorously evaluating established methods. Such errors ultimately lead to lower quality patient care. Investigators have yet to formally evaluate the likelihood of a replication crisis in medical education. Here we describe why medical education may be vulnerable to such a crisis and explore options to reduce these risks.
Why Medical Education Research May Be Experiencing a Replication Crisis
Experts have critiqued medical education research citing low funding levels, poorly described methodologies, and a focus on observational research. 8 These weaknesses have led to efforts to improve rigor 1 but mostly by individual researchers improving the designs of individual studies. While some of these developments improve replicability, many factors cannot be addressed at the individual study level. 9
Prior research has identified 6 risk factors that predict higher rates of false findings (i.e., published research that appears defensible but which cannot be replicated by others). 3 Here, we discuss the relevance of each of these factors to medical education.
Risk factor 1: Small sample sizes
Research in medical education very commonly uses sample sizes that are small or unrepresentative. 10 Depending on the topic and design, studies with small sample sizes may be so underpowered that the resulting data are impossible to interpret. 11 This limitation can render the entire study immediately useless. 12
Example: A study tests a novel intervention in a small cohort and yields no measurable effect. The researchers conclude the intervention has failed and discontinue it without ever realizing that there may be real—but small—improvements which are undetectable given their sample size.
Risk factor 2: Small effect sizes
The term “effect size” describes the strength of an association or the influence of an intervention. 11 Based on a substantial body of evidence, interventions in medical education are typically expected to produce either small or medium effects. This feature of medical education research is not inherently problematic—small improvements in outcomes may be very beneficial—but small effects require much larger sample sizes to detect. Medical education studies are often underpowered to detect even small effects, that is, the research risks missing genuine improvements. 13 Researchers sometimes fail to prepare for this eventuality and thus recruit too small a sample. 14 The lack of power increases the likelihood of apparently impressive results in small studies that cannot be replicated.
Example: Many researchers—with no knowledge of one another—explore the same understudied variables. Almost all obtain null results and move on to other projects. One study, by chance, appears to produce “significant” findings. This study is disseminated, and the previous contradictory null results go unseen. Due to the typical absence of large effect sizes in medical education, this study is widely discussed as revolutionary.
Risk factor 3: Exploratory designs with many plausible associations to be tested
A huge number of variables are relevant to medical education. These include measures of technical and nontechnical skill, measures of well-being and mental health, demographic factors (e.g., age, sex, ethnicity, and disability), and long-term outcome measures such as retention and probity. As the medical education community has not been able to consistently deploy randomized trials, 11 this variety leads to the risk of data dredging, whereby huge numbers of associations are tested 15—some of which will inevitably appear, at first glance, to look meaningful, but fail to replicate.
Example: Researchers collate student activity, including skills attainment, attendance, professionalism, merits (e.g., award), and learning styles. Making no hypotheses, they correlate variables and find 1 statistically significant association between 2 variables they had never previously considered important. They disseminate this finding without reporting the null results, failing to realize that because they tested so many associations, it was almost inevitable they would find at least some “significant” associations.
Risk factor 4: Flexibility in design choices, approaches to analysis, and outcome measures
Medical education researchers consider, when developing their research, a huge range of design choices and defensible outcome measures. 10 Readers can find it difficult to identify which design choices were planned a priori and which were adjusted ad hoc during research because they appeared to produce more meaningful results. A lack of standardized outcome measures and analytical approaches can make ad hoc and post hoc changes to research protocols difficult to identify and monitor.
Example: A study measures students’ performance upon entry into a training program, annually thereafter, and when students exit. Researchers find no change between entry and exit performance, but they do detect changes between entry and second-year performance. They use second-year performance as an outcome measure and ignore exit performance even though they have no theoretical justification for doing so.
Risk factor 5: Conflicts of interest, including financial considerations
While consumers of research often interpret financial conflicts of interest as the presence of funding that can distort research, 3 the absence of funding can also cause problems. Medical education is underfunded, 8 a reality that influences design choices and thus exacerbates the other problems described here. Similarly, practical issues, such as the need to publish articles and secure advancement, may interfere with research quality. 15
Example: A medical school invests heavily in simulation. A junior researcher shows simulation teaching is not improving performance. Although there is no financial conflict of interest (the researcher does not have funding from any simulation company), a tension exists between senior faculty and administrators (who have strongly advocated simulation) and the junior researcher (who has identified no effect). The researcher chooses to prioritize other work, and the study is never disseminated.
Risk factor 6: Very active fields with many competing research teams
In medical education, a single high-quality study can sometimes be regarded as “enough” 11 to show seeming cause and effect. This unexamined enthusiasm can increase pressure to be the first to publish important results. Combined with not only the flexibility of design and outcome measures described above but also the interest in popular topics among journal editors and readers, this eagerness to publish notable results increases the risk of false findings in very popular, competitive fields. Researchers seeking career advancement and funding may be driven to produce novel work in highly populated fields over replicating established evidence. 3
Example: A new researcher is studying a popular topic. Halfway through data collection, an interim analysis shows an important result. The researcher discontinues data collection and seeks immediate publication. Had the researcher completed data collection, they would have found a null result. Future teams, finding null results, struggle to publish, and later work is far less cited.
The literature on factors driving false findings, the existing critiques of medical education research, 8 and the breadth of fields experiencing a replication crisis suggest that it is plausible that medical education is unknowingly experiencing its own replication crisis, but potential solutions are available.
Potential Investigations and Solutions
Researchers in many disciplines have productively investigated replication rates and improved research processes. 14 The strategies they have used can be integrated into medical education research. Here we discuss techniques for individual researchers and then describe how the medical education community, including journals, can improve replicability.
Evaluating prior research and planning new research
When considering the methodological rigor of previous research, investigators should think explicitly about replicability. As explained above, publication processes that emphasize novel findings and large effect sizes can together skew expectations of what researchers will discover in their research. 12 Evaluating the replicability of past research ensures that new work is based on defensible evidence. When carrying out such evaluations, researchers should consider the transparency of previous studies: whether they explain how hypotheses were created and why tests were selected, and whether they report all analyses or only some. 16 While determining the effect sizes in prior research, educators should remember that published effect sizes are only estimates. They may find much smaller or larger effect sizes in their own studies even where past work is valid. 12 This uncertainty complicates decisions on sample size, and researchers may need a larger sample than expected to achieve the necessary power.
Evaluating prior research and explicitly considering factors such as sample size and effect size will help researchers ensure that their studies are effectively testing plausible hypotheses, which can, in turn, generate meaningful information. Forward planning increases the likelihood that a researcher’s methodology will be as rigorous as possible.
Example: Researchers formally check the effect sizes of past research and realize their own study is underpowered. They solve the problem by pooling data from multiple cohorts.
Exploratory research (i.e., in which researchers examine many variables and make no prior assumptions about what they will find) can be very valuable. Unfortunately, however, as noted above, exploratory approaches reduce the likelihood of findings being replicable. 17 A very simple approach is for researchers to be clear from the outset whether they intend to apply an exploratory approach or restrict themselves to a confirmatory approach with a few predefined variables. In being transparent, researchers will help readers and future investigators better understand the level of replication-related risk in the work.
In addition to explicitly stating the nature of their work, researchers should discuss how the sample characteristics may influence the results. A huge range of potential participants interact within medical education, and this diversity, including age, ethnicity, nationality, gender, and specialty, can cause apparent differences in outcomes that are due to varying sample characteristics rather than the intervention. The tendency for the range of available samples to influence results is well established, 18 and evaluating the sample helps researchers and the reader understand why findings may or may not align with those of past studies.
Researchers should also think about how participants are allocated to conditions. Researchers in medicine and social science have long recognized the importance of blinding investigators to the allocation of participants. This blinding reduces the risk of inadvertent bias, by which investigators find effects where none actually exist. 19 When a control condition is used, it should be an active control wherein the control group participants receive an intervention similar in size, scope, and resources to the main intervention being tested. 20–22 Active controls reduce the risk that an apparent improvement following an intervention is simply due to something happening to participants. Researchers may falsely attribute the effect to the specific intervention of the experimental condition, whereas, in reality, any form of intervention might produce a similar effect. Considering allocation and active controls will increase the likelihood that detected effects are due to the interventions or variables of interest rather than to irrelevant confounders. Research that accounts for such issues is more likely to be replicable.
Example: A study exploring personality and performance specifies outcome variables in advance, rather than correlating all possible variables. This approach increases the likelihood that detected effects are genuine.
Whether applying an exploratory or confirmatory analysis, the choice of outcome measures is extremely important. Valid outcome measures considered to be important based on a large body of prior research are more helpful than novel or untested tools that are difficult to understand or evaluate. 23 When researchers do apply novel tools, testing widely acknowledged moderator variables based on past research (e.g., prior academic performance, personality) is helpful. 18 Importantly, some outcome measures remain popular despite being problematic, and these measures should be avoided if possible. Most notably, a significant body of evidence has found medical educators and students struggle with self-assessment, so a measure of whether people perceived an intervention to be useful, or whether they liked the intervention, should not be treated as a measure of whether the intervention influenced competence. 24
When reporting results, whether from an established or new tool, researchers should always provide effect sizes: the work can then easily be compared with other studies and the importance of the research will be much clearer. Notably, no measurement tool is error free. Outcome measures will always be estimates of the true values. Researchers should describe possible sources of measurement error and the effect they may have on their study. Explaining measurement error helps clarify the confidence readers can have in the results and reduces the likelihood of readers misunderstanding effect sizes. 25
Selecting known measurement tools, examining previously studied variables, reporting effect size, and describing sources of potential error are all steps that will help readers understand the work and increase the likelihood of the findings being replicable. Importantly, these will also help future researchers who try to replicate the work understand the necessary sample sizes and tools required.
Example: Researchers test whether attendance at optional training improves performance. They find a large, positive association between the 2 variables, but after controlling for important moderators (especially prior academic attainment and extracurricular engagement), the effect disappears. The authors disseminate this result as an example of the importance of moderator variables.
The Medical Education Community
The responsibility for improving the replicability of medical education research does not belong to researchers alone; the wider medical education community also has a role in improving replication rates. Collectively, these actions can significantly increase confidence in medical education research findings.
Medical educators should routinely collaborate to replicate major studies
Identifying the scale of a potential replication crisis is an important goal. In other fields, major collaborative efforts have, first, identified a range of key studies that are widely accepted and, then, independently attempted replication using the same underlying methodology; frequently, the replication rates are low. 7 This approach demonstrates the scale of the problem, clarifies which key findings are reliable, and represents a high-profile way of supporting further methodological improvements where necessary. These projects help develop a culture in which cross-institutional collaboration is the norm rather than the exception. 26
Researchers may benefit from adopting a “StudySwap” model 27 through which 2 teams carrying out different research projects pair up and each replicates the work of the other. This method increases the quality and rigor of each project, and teams can both seek and offer data, space, and resources. Another means of increasing replicability is publishing studies on preprint services such as medRxiv (https://www.medrxiv.org/). Such services, which have grown in popularity, publish research on a community website before the work is accepted in an academic journal. This public access enables rapid dissemination and ensures that studies remain available for future reanalysis.
While these changes may challenge the status quo, other fields have substantially improved their approaches to replication on very short timescales. 26 As medical educators can adopt, rather than create, these innovations, the opportunities to improve medical education are feasible and rapid.
Example: Researchers at multiple institutions pool data with the aim of replicating major medical education studies. While each institution is individually too small or underresourced to do this alone, collectively, they publish a range of important findings.
Medical educators should evaluate the dissemination of innovations
Medical education is a practical discipline designed to improve training and patient care. As such, it is worth evaluating whether innovations in the literature spread to new institutions and how the innovations have performed there. Some tools—such as the objective structured clinical examination (OSCE)28—are now extremely widespread, whereas others remain obscure. Monitoring this spread may help identify high-quality tools or reveal underlying weaknesses in the original work. The methodological approach “realist synthesis,” which aims to explore how the context, mechanisms, and outcomes of innovations apply in practice, may be a useful framework here. 29,30
Example: Researchers retrospectively evaluate a series of major—and apparently successful—innovations disseminated 10 years ago. They review which innovations endured, why, and whether the initially reported success continued. They publish a list of factors that predict long-term utility.
Medical education journals should explore new approaches to publication
In the wake of replication crises in other fields, journals have adjusted their approaches. A common approach is preregistration, whereby a study is publicly registered as being underway before publication is sought or whereby a journal accepts a paper before the results are known. 31,32 While preregistration is sometimes a contentious option due to the perceived practical challenges (e.g., the resources required to maintain a preregistration database and to verify submissions to it are substantial), fields that have only recently adopted this process have reported it to be relatively smooth and straightforward. 32 Journals’ encouraging some form of preregistration, or, minimally, regarding its existence as a positive at review, may help reduce the publication of false findings.
Historically, many academic journals have either failed to invite replications or emphasized the need for work to be original. 33 Some journals have recently produced special issues for disseminating null results to counterbalance this tendency for “interesting” but possibly false findings to be prioritized. 34 Accidentally replicating prior work by being unaware of its existence (for which some medical education researchers have been criticized) 15 is different from a formal, preplanned attempt to replicate previous research. The latter is likely to substantially improve confidence in medical education research even though it is not original. Medical education journals’ adoption of some of the innovations from other fields may not only improve the long-term rigor of the discipline overall but also support attempts by individual medical educators to improve the quality of the output.
Example: A journal crowdsources classic studies that medical educators feel have a substantial ongoing influence on teaching. Researchers register to participate in the collaborative replication of 1 or more of these studies, and the journal guarantees that the results will be published if the resulting article meets the quality standards. Based on precedence, it is unlikely that all the studies will be replicated.
Health professionals should help improve the prestige of replication and high-quality medical education research
All health care providers are connected to medical education research in some way. Traditionally, producing medical education research has been seen as very desirable for professional development and promotion, 1 but a substantial number of research reports are rejected by journals for very basic errors (e.g., a failure to describe the rationale for the research or to report the procedures in adequate detail), which reflects ongoing concerns over quality within the field. 8,15 If providing small contributions to large research projects (e.g., preregistered multisite replication studies) were considered as valuable as serving as first author of small, low-power studies, then those who contribute effectively to medical education research would be properly recognized and overall quality would rise.
Some methodological improvements can be delivered without additional funding (e.g., by transferring effort to more productive areas), but the challenges of operating in an underfunded field must be acknowledged. Medical educators must emphasize the need for funding to generate findings that can improve patient outcomes.
Medical educators should also recognize that the burden of producing novel research falls especially heavily on new researchers. 23 Senior educators (i.e., leaders and administrators) should prioritize support for new medical educators working to produce research that is robust and replicable. Financial support, combined with teaching new medical educators about replication and high-quality research (a 2-pronged approach that has worked well elsewhere 35), will improve the overall rigor of the field.
Finally, the evidence discussed here emphasizes the need for public engagement in medical education research. Medical educators should advocate a robust evidence base that is properly funded, and they should be open about potential limitations in the current knowledge base. Researchers may then be better positioned to develop long-term improvements that will support high-quality medical education research.
Example: Senior staff at a medical school encourage new researchers to prioritize collaborative work, especially when it involves replication. All junior medical educators are expected to contribute in a small way to a large replication attempt, and this work is weighed heavily in considering promotions.
Good evidence indicates medical education is experiencing a replication crisis—as are many fields across the natural and social sciences. By adopting strategies shown to successfully increase replicability in other disciplines, medical educators can evaluate the risks to medical education research and, if necessary, develop solutions that are feasible even on short timescales. Doing so may significantly improve the quality and utility of medical education research and so improve the quality of patient care.
1. Ringsted C, Hodges B, Scherpbier A. ‘The research compass’: An introduction to research in medical education: AMEE guide no. 56. Med Teach. 2011;33:695–709.
2. General Medical Council. Outcomes for graduates, plus supplementary guidance.2018. General Medical Councilhttps://www.gmc-uk.org/education/standards-guidance-and-curricula/standards-and-outcomes/outcomes-for-graduates
. Accessed February 17, 2021.
3. Ioannidis JPA. Why most published research findings are false. PLoS Med. 2005;2:e124.
4. Begley CG, Ellis LM. Drug development: Raise standards for preclinical cancer research. Nature. 2012;483:531–533.
5. Ioannidis JPA, Kim BYS, Trounson A. How to design preclinical studies in nanomedicine and cell therapy to maximize the prospects of clinical translation. Nat Biomed Eng. 2018;2:797–809.
6. Prinz F, Schlange T, Asadullah K. Believe it or not: How much can we rely on published data on potential drug targets? Nat Rev Drug Discov. 2011;10:712.
7. Open Science Collaboration.. Psychology. Estimating the reproducibility of psychological science. Science. 2015;349:aac4716.
8. Todres M, Stephenson A, Jones R. Medical education research remains the poor relation. BMJ. 2007;335:333–335.
9. Makel MC, Plucker JA. Facts are more important than novelty: Replication in the education sciences. Educ Res. 2014;43:304–316.
10. Cook DA, Levinson AJ, Garside S. Method and reporting quality in health professions education research: A systematic review. Med Educ. 2011;45:227–238.
11. Cook DA. Randomized controlled trials and meta-analysis in medical education: What role do they play? Med Teach. 2012;34:468–473.
12. Maxwell SE, Lau MY, Howard GS. Is psychology suffering from a replication crisis? What does “failure to replicate” really mean? Am Psychol. 2015;70:487–498.
13. Woolley TW. A comprehensive power-analytic investigation of research in medical education. J Med Educ. 1983;58:710–715.
14. Ioannidis JP, Greenland S, Hlatky MA, et al. Increasing value and reducing waste in research design, conduct, and analysis. Lancet. 2014;383:166–175.
15. Norman G. Data dredging, salami-slicing, and other successful strategies to ensure rejection: Twelve tips on how to not get your paper published. Adv Health Sci Educ Theory Pract. 2014;19:1–5.
16. Wiggins BJ, Chrisopherson CD. The replication crisis in psychology: An overview for theoretical and philosophical psychology. J Theor Philos Psychol. 2019;39:202–217.
17. Stevens JR. Replicability and reproducibility in comparative psychology. Front Psychol. 2017;8:862.
18. Bardi A, Zentner M. Grand challenges for personality and social psychology: Moving beyond the replication crisis. Front Psychol. 2017;8:2068.
19. Dutilh G, Sarafoglou A, Wagenmakers E-J. Flexible yet fair: Blinding analyses in experimental psychology. Synthese. 2019:1–28.
20. Tackett JL, Brandes CM, King KM, Markon KE. Psychology’s replication crisis and clinical psychological science. Annu Rev Clin Psychol. 2019;15:579–604.
21. Ireland MJ, Clough B, Gill K, Langan F, O’Connor A, Spencer L. A randomized controlled trial of mindfulness to reduce stress and burnout among intern medical practitioners. Med Teach. 2017;39:409–414.
22. Britton WB, Lepp NE, Niles HF, Rocha T, Fisher NE, Gold JS. A randomized controlled pilot trial of classroom-based mindfulness meditation compared to an active control condition in sixth-grade children. J Sch Psychol. 2014;52:263–278.
23. Everett JAC, Earp BD. A tragedy of the (academic) commons: Interpreting the replication crisis in psychology as a social dilemma for early-career researchers. Front Psychol. 2015;6:1152.
24. Eva KW, Regehr G. Self-assessment in the health professions: A reformulation and research agenda. Acad Med. 2005;80suppl 10S46–S54.
25. Loken E, Gelman A. Measurement error and the replication crisis. Science. 2017;355:584–585.
26. Moshontz H, Campbell L, Ebersole CR, et al. The psychological science accelerator: Advancing psychology through a distributed collaborative network. Adv Methods Pract Psychol Sci. 2018;1:501–515.
27. Chartier CR, Riegelman A, McCarthy RJ. StudySwap: A platform for interlab replication, collaboration, and resource exchange. Adv Methods Pract Psychol Sci. 2018;1:574–579.
28. Harden RM, Gleeson FA. Assessment of clinical competence using an objective structured clinical examination (OSCE). Med Educ. 1979;13:39–54.
29. Rycroft-Malone J, McCormack B, Hutchinson AM, et al. Realist synthesis: Illustrating the method for implementation research. Implement Sci. 2012;7:33.
30. Sholl S, Ajjawi R, Allbutt H, et al. Balancing health care education and patient care in the UK workplace: A realist synthesis. Med Educ. 2017;51:787–801.
31. Ioannidis JP. Clinical trials: What a waste. BMJ. 2014;349:g7089.
32. Nosek BA, Lindsay DS. Preregistration becoming the norm in psychological science. APS Obs. 2018;31https://www.psychologicalscience.org/observer/preregistration-becoming-the-norm-in-psychological-science
. Accessed March 8, 2021.
33. Martin GN, Clarke RM. Are psychology journals anti-replication? A snapshot of editorial practices. Front Psychol. 2017;8:523.
34. Petty S, Gross RA. Neurology® null hypothesis: A special supplement for negative, inconclusive, or confirmatory studies. Neurology. 2018;91:12–13.
35. Chopik WJ, Bremner RH, Defever AM, Keller VN. How (and whether) to teach undergraduates about the replication crisis in psychological science. Teach Psychol. 2018;45:158–163.