The responsible conduct of research is the foundation of sound scientific practice.1,2 The need to conduct research in a responsible manner is self-evident—if science is to inform our understanding of how the world works, it must be done in an honest, accurate, and unbiased way.3
Irresponsible research behaviors are typically divided into two broad categories: deliberate scientific misconduct (i.e., fabrication, falsification, and plagiarism) and questionable research practices (QRPs).1 Whereas behaviors like data fabrication are clearly unethical, QRPs exploit the ethical shades of gray that color acceptable research practice and “offer considerable latitude for rationalization and self-deception.”4 Consequently, QRPs are more prevalent and, many have argued, more damaging to science and its public reputation than obvious fraud.4–8 Taken together, scientific misconduct and QRPs can waste resources; provide an unfair advantage to some researchers over others; damage the scientific record; and provide a poor example for other researchers, especially trainees.7
Health professions education (HPE) is not immune to the damaging effects of irresponsible research practices. In HPE, QRPs have been defined as poor data management, inappropriate research procedures, insufficient respect for study participants, improper research design, carelessness in observation and analysis, unsuitable authorship or publishing practices, and derelictions in reviewing and editing.7 The need to guard against both deliberate scientific misconduct and QRPs is frequently described in the author instructions for scientific and health professions journals, and such guidelines are often patterned on the recommendations of the Committee on Publication Ethics.9
Commentaries by HPE journal editors have highlighted instances of QRPs in article submissions, including self-plagiarism, “salami slicing” (i.e., inappropriately dividing a single study into multiple papers), and unethical authorship practices.10–12 Additionally, a review of four HPE journals found that 13% of original research articles published in 2013 did not address approval by an ethics review board or stated that it was unnecessary, without further discussion.13 Moreover, a 2017 study of HPE leaders highlighted multiple problematic authorship practices, including honorary authorship and the exclusion of authors who deserved authorship.14 Notably, only about half of the senior researchers surveyed were able to correctly identify the authorship standards used by most medical journals (i.e., the International Committee of Medical Journal Editors [ICMJE] authorship criteria15).
Notwithstanding these examples, scientific misconduct and QRPs have received limited attention in the HPE literature. In a recent article,7 we attempted to raise the community’s awareness of these issues and highlight the need to examine their pervasiveness among HPE researchers. Accordingly, we conducted this study to measure the frequency of self-reported misconduct and QRPs in a diverse, international sample of HPE researchers. In doing so, we hope to continue the conversation about responsible research in HPE, with the ultimate goal of promoting high-quality, ethical research.
To measure the frequency of serious research misconduct and other QRPs, many different approaches have been employed. These include counts of confirmed cases of researcher fraud and paper retractions, as well research audits by government funders.8 Such methods are limited because they are calculated on the basis of misconduct that has been discovered, and detecting such misconduct is difficult.16 Moreover, distinguishing intentional misconduct from honest mistakes is challenging. Therefore, such approaches significantly underestimate the real frequency of misconduct and QRPs because only researchers know if they have willfully acted in an unethical or questionable manner.
To address these challenges, survey methods have been used to directly ask scientists about their research behaviors.4–6,17,18 Like the measurement of any socially undesirable behavior, assessing irresponsible research practices via self-report likely underestimates the true prevalence or frequency of the behaviors. Nonetheless, when employed appropriately, survey methods can generate reasonable estimates that provide a general sense of the problem’s scope.19,20
We administered an anonymous, cross-sectional survey to determine the frequency of research misconduct and other QRPs in a sample of HPE researchers. The Ethical Review Board Committee of the Netherlands Association for Medical Education approved this study (dossier no. 937).
We developed the survey by adapting several existing tools. The final survey featured a total of 66 items divided into three sections (see Supplemental Digital Appendix 1, available at http://links.lww.com/ACADMED/A584). The first section included 43 items derived from two previously published surveys assessing research misconduct and QRPs in biomedicine.5,6 The items asked respondents to identify how often they had engaged in the particular research practice. The practices spanned the research continuum, from data collection and storage to study reporting, collaboration, and authorship. The items employed a six-point, Likert-type, frequency-response scale: “never,” “once,” “occasionally,” “sometimes,” “frequently,” and “almost always.” Each item also included the response option “not applicable to my work.”
We slightly modified the original misconduct and QRP items to improve their clarity and relevance to the HPE. For example, the original item “inadequately handled or stored data or (bio)materials” was revised to “inappropriately stored sensitive research data (e.g., data that contains personally identifiable information).” Following these modifications, 19 experienced HPE researchers reviewed the adapted survey items and provided detailed qualitative feedback.21 Ten of the expert reviewers were women, and all reviewers held doctoral degrees (13 PhDs, 4 MD/PhDs, and 2 MDs). On average, the reviewers had published 98.9 journal articles (SD = 66.1) indexed in PubMed. On the basis of the date of their first HPE publication, they had been publishing in the field for an average of 20.7 years (SD = 9.3). Expert reviewers reported their work location as the United States (n = 9), Europe (n = 4), Canada (n = 3), South America (n = 1), Africa (n = 1), and Australia (n = 1), and all but 2 were identified as full professors.
The expert feedback included comments on item relevance, clarity, missing facets, and suggestions for overall survey improvement. On the basis of the expert feedback, we revised the survey again; revisions included wording modifications and the development of several new HPE-specific items. For example, on the basis of several recommendations, we created the following items related to qualitative research methods, among several others: “misrepresented a participant’s words or writings” and “claimed you used a particular qualitative research approach appropriately (e.g., grounded theory) when you knowingly did not.”
The second section of the survey included 9 publication pressure items17 (these data are not reported here), and the final section included 13 demographic items.
Sampling and survey distribution procedures
To create our sample, we used two separate approaches. First, we created a “curated sample” by searching Web of Science for articles in HPE journals published in 2015 and 2016 (see Supplemental Digital Appendix 2, available at http://links.lww.com/ACADMED/A584, for a list of journals). Journals were selected if they focused on general HPE topics and were indexed in Web of Science. To increase international representation, we searched three additional regional databases—Scielo (a database focused on South America), African Journals Online, and Asia Journals Online—for articles in other HPE journals. We did not limit our search to English-language titles but did restrict retrieval to research articles as defined by Web of Science (e.g., original research, reviews). From these articles, we extracted all author e-mail addresses listed in the author information, removing duplicate authors. Authors were consider to be “HPE researchers” if they had published in one of the HPE journals searched. Altogether, this process generated a sample of 1,840 unique HPE researchers. All names and e-mails were entered into Qualtrics, an online survey tool (Qualtrics, Provo, Utah), and the survey was distributed in four waves of e-mail invitations: wave 1 (sent November 13, 2017), wave 2 (sent November 20, 2017), wave 3 (sent November 27, 2017), and wave 4 (sent December 11, 2017).
Next, we collected a “social media sample” by posting anonymous links to the survey on our Twitter and Facebook accounts (posted on December 11, 2017). All survey responses obtained from the social media links were tracked separately from those sent to the curated sample. To help prevent duplicate submissions, respondents in the social media sample were given the option to select “I have already completed this survey” on the informed consent page.
Prior to analysis, we screened the data for accuracy and missing values. Next, we calculated the response rate in the curated sample using response rate definition number 6, as delineated by the American Association for Public Opinion Research.22 Then, to assess potential nonresponse bias in the curated sample, we used wave analysis to calculate a nonresponse bias statistic.23 In wave analysis, late respondents are considered proxies for nonresponders, and their responses are compared with responses from the first wave. In addition, to determine whether or not it was appropriate to combine the curated and social media samples for analysis, we conducted a multivariate analysis of variance (MANOVA) to compare respondents on several demographic characteristics: age, experience doing HPE research, percentage of work time dedicated to HPE research, and number of peer-reviewed publications. Finally, we calculated descriptive statistics for the total sample, with particular emphasis on the frequency of self-reported misconduct and QRPs. All data analyses were conducted using IBM SPSS statistical software, version 24 (IBM Corporation, New York, New York) and Microsoft Excel 2013 (Microsoft Corporation, Redmond, Washington).
Of the 1,840 e-mail invitations sent to HPE researchers in the curated sample, 199 were returned as undeliverable, leaving 1,641 potential respondents. Of these, 463 (28.2%) researchers completed at least a portion of the survey (see Supplemental Digital Appendix 1, available at http://links.lww.com/ACADMED/A584, for regional response rates).22 Results from the wave analysis demonstrated a nonresponse bias statistic of 0.36. On a six-point, frequency-response scale, this represents a 6% difference, which is unlikely to have a meaningful effect on practical interpretation of the results.23
The social media sample yielded an additional 127 responses. Results from the MANOVA comparing respondents in the curated sample with those in the social media sample demonstrated statistically significant differences between the two groups F(5, 524) = 6.67, P < .001. In particular, post hoc analyses indicated that social media respondents were slightly younger (mean = 40.7 years) and more inexperienced in HPE research (mean = 7.5 years) than those in the curated sample (mean = 47.4 years and mean = 11.0 years, respectively). That said, the two groups did not differ in terms of percentage of work time spent doing HPE research activities or mean number of peer-reviewed publications. Therefore, because our goal was to explore the frequency of misconduct and QRPs among a diverse, international sample of HPE researchers, we pooled the two samples and analyzed the data together.
Of the 590 respondents in the pooled sample, the mean age was 46 years (SD = 11.6), and there were 305 (51.7%) women, 246 (41.7%) men, and 39 (6.6%) individuals who did not report gender. As indicated in Table 1, the sample consisted of HPE researchers from across the World Health Organization’s six world regions. The majority reported their location as the United States (n = 156; 26.4%), Europe (n = 137; 23.2%), and Canada (n = 90; 15.3%). Respondents’ education, area of study, work context and role, academic rank, and primary research activities are presented in Table 1. In addition, respondents reported the following: years involved in HPE in any capacity (mean = 14.9 years, SD = 9.7), years involved in HPE research (mean = 11.3 years, SD = 8.5), percentage of work time spent conducting HPE research (mean = 27.3%, SD = 23.7%), and total number of peer-reviewed publications (mean = 40.1, SD = 55.0).
Table 2 summarizes the frequency of self-reported misconduct and QRPs, and Figure 1 provides a visual representation of these results for the top 20 most frequently reported practices. As indicated, the most frequently reported behaviors were QRPs related to authorship and study reporting practices, as well as issues around data storage, collection, and interpretation. Additionally, 39 (6.7%) respondents reported misrepresenting a participant’s words, 31 (5.5%) reported using sections of text from another author’s copyrighted material without permission or proper citation, 30 (5.3%) reported inappropriately modifying study results because of pressure from a research advisor or collaborator, 20 (3.4%) reported deleting data before performing analysis without disclosure, and 14 (2.4%) reported fabricating data. Overall, 533 (90.3%) respondents reported at least one irresponsible behavior.
This study examined the frequency of self-reported misconduct and QRPs among HPE researchers, practices that may be detrimental to scientific inquiry.1,2 To our knowledge, this is the first study to explore irresponsible research practices across all phases of HPE research. Taken together, our findings indicate that a substantial proportion of HPE researchers admit to having engaged in a range of irresponsible behaviors, with QRPs reported more frequently than deliberate scientific misconduct. These findings are consistent with the extant literature on research misconduct in other fields4–6,8,17 and are important because such behaviors can waste resources, provide an unfair advantage to some researchers over others, and ultimately impede scientific progress. Therefore, we believe this study raises important concerns about the conduct of HPE research, suggesting that our community may need to take a hard look at its ethical norms and research culture.
In the current survey study, we asked respondents about a range of problematic behaviors. As Fanelli8 noted in his review of research misconduct, QRPs (e.g., honorary authorship or excluding study limitations) are qualitatively different from fabrication and falsification because they do not directly distort the quality of the science, per se. However, the damage done to the scientific enterprise by these “less severe” and more ambiguous practices may be proportionally greater than deliberate misconduct, simply because such practices occur more frequently. For example, 20.1% of respondents reported one or more instances of “salami slicing.” Although some may think of this as a minor offense, the practice fills the literature with more articles than is seemingly necessary.1,11 So, not only does this activity unfairly reward authors and waste resources (e.g., editorial time and journal space), it also can inflate the significance of a given finding, which in turn can distort the outcomes of meta-analyses and other types of systematic reviews.11,24
The interpretation of our findings is limited by the nature of the survey methodology employed and, in particular, by the threat of nonresponse bias (especially considering the sensitive nature of the topic under study).5,6 Therefore, it is reasonable to ask: What do these results really demonstrate about the actual frequency of misconduct and QRPs among HPE researchers? We argue here, as others have previously,5,8,17 that self-reports of irresponsible practices likely underestimate the real frequency of such behaviors. Researchers who have acted unethically are undoubtedly hesitant to reveal such activities in a survey, despite all assurances of anonymity. What is more, the opposite—researchers admitting to unethical or otherwise questionable practices that they did not do—seems unlikely.8 Therefore, we speculate that scientific misconduct and QRPs may be even more widespread in our community than our estimates imply. Nevertheless, rather than establish an absolute prevalence of misconduct and QRPs in HPE, we believe these data are better suited for helping the community understand the nature of the most common practices and beginning to find feasible solutions to improve our research enterprise.
QRPs related to authorship were some of the most frequently reported behaviors in this study, particularly the practice of giving or accepting unwarranted authorship (“honorary authorship”). Honorary authorship is unethical because individuals who have not sufficiently contributed to the work unfairly receive credit as an author and misrepresent their contributions in the scientific literature.25 Our findings corroborate the results of a recent survey of established HPE researchers14 and of several studies conducted in the field of medicine more broadly, suggesting we are not alone in these practices.25–27 Such questionable authorship practices have led journals to require that authors sign an author contribution agreement to verify their explicit authorship roles.28 The effectiveness of such requirements is unknown and could be a fruitful area for additional research.
Of note in this study, the frequency of authors giving honorary authorship and those accepting honorary authorship were not equivalent: 60.6% admitted to adding undeserving authors, whereas only 22.7% admitted to accepting honorary authorship. This mismatch suggests that HPE researchers may not fully understand authorship criteria. It may also be a concrete example of the “Muhammad Ali effect”—the idea that individuals often see themselves as more likely to perform good acts and less likely to perform bad acts than others.29 Regardless of the mechanism, this finding indicates the need for increased author communication and better shared understanding of author roles and responsibilities, such as those set forth by the ICMJE.15
A complete discussion of all the behaviors assessed in this study is outside the scope of this article. Although readers may debate the degree to which some of these behaviors are unethical or otherwise problematic, we should note, as described above, that seemingly minor infractions can have far-reaching negative consequences. For example, by employing data manipulation techniques like “P hacking” (i.e., massaging data or analyses until nonsignificant results become significant)30,31 or taking advantage of other types of “researcher degrees of freedom,”32 scientists can discover “illusory results”33 that actually represent artifacts of their study design and analytic approach, as opposed to legitimate findings that can be replicated.34,35 Some have argued that such behaviors are the result of individual researchers responding to a set of incentives, the most important of which are rewards for the quantity (not the quality) of publications.17,36 If one accepts this thesis, then the most effective way to improve research practices, and the quality of the corresponding science, may be to change incentive structures at the institutional level.37
Limitations and future directions
This study has several important limitations. First, nonresponse bias is a legitimate concern, especially considering the modest response rate (28.2%) in the curated sample. That said, recent research suggests that response rate may be a flawed indicator of response quality and representativeness.38,39 Moreover, notwithstanding its inherent limitations, our wave analysis results suggest that nonresponse bias was limited in our sample. Nonetheless, investigators should build on these initial results by examining misconduct and QRPs among a larger, more global sample of HPE researchers.
A second limitation relates to the innate challenge of assessing complex, context-specific research behaviors with a survey that requires respondents to self-assess their own practices.40,41 Because some of the irresponsible practices on our survey are judgment calls, their evaluation likely requires more detail and nuance than a single survey item can provide.40 So, while the practice of fabricating data is fairly straightforward (and never justified), the same cannot be said for something like inappropriately storing sensitive research data. The latter practice is open to interpretation: What is considered “inappropriate storage” to one researcher might seem innocuous to another. We attempted to reduce this type of ambiguity in our survey items by employing a rigorous expert review process. But, ultimately, “survey self-reports can never fully rule out ambiguities in meaning, limitations in autobiographical memory, or motivated biases.”40 Thus, future work might apply qualitative research methods to address some of these limitations and further unpack the nature of irresponsible research practices in HPE.
Finally, we administered our survey in English and did not ask respondents to focus on a particular time period. These implementation and design choices, particularly the latter decision to use an open-ended time period, could have negatively affected data quality. For example, several respondents noted in their written comments that ethical standards related to human-subjects research had evolved over time. Therefore, respondents reporting a given QRP may have been referring to activities that occurred decades prior, long before more stringent standards of conduct were in place. Future research could address this problem by limiting the time period that respondents are asked to consider.5
Recommendations for practice
As the field of HPE continues to mature, it is essential that we explicitly confront our obligation to conduct research in an ethical and responsible manner. Previously, we suggested a number of recommendations to improve practice,7 and the findings reported here indicate that the time is now to implement these and other policy changes. These institution-level approaches embrace the idea that scientific misconduct and QRPs are not simply the result of individual researchers acting badly. Instead, important contextual factors likely influence researcher behavior, including social norms, power disparities, institutional policies, and academic incentives.7 Examples of initiatives being tried elsewhere include promotion and tenure guidelines that privilege publication quality over quantity,42 study preregistration plans,33 and other open-science practices (e.g., open data and materials sharing).43 All of these approaches require study to determine their efficacy in HPE research and publication.
Cultivating the responsible conduct of research is essential if we are to maintain scientific integrity and engender public confidence in our research.1,2 This study raises important concerns about the conduct of HPE research and presents a somewhat pessimistic picture of our community. We should be clear, though—most HPE scientists surveyed did not report the vast majority of irresponsible research behaviors. They are presumably acting ethically and doing good science. Nevertheless, we believe that reforms are needed. In addition, we recommend future research to monitor research misconduct and QRPs in HPE, and to evaluate the effectiveness of policies designed to improve the integrity of our research enterprise.
The authors wish to sincerely thank all of the health professions education researchers who took the time (and had the courage) to complete the survey.
1. Steneck NH. Fostering integrity in research: Definitions, current knowledge, and future directions. Sci Eng Ethics. 2006;12:53–74.
2. Martinson BC, Anderson MS, de Vries R. Scientists behaving badly. Nature. 2005;435:737–738.
4. John LK, Loewenstein G, Prelec D. Measuring the prevalence of questionable research practices with incentives for truth telling. Psychol Sci. 2012;23:524–532.
5. Tijdink JK, Bouter LM, Veldkamp CLS, van de Ven PM, Wicherts JM, Smulders YM. Dorta-González P. Personality traits are associated with research misbehavior in Dutch scientists: A cross-sectional study. PLoS One. 2016;11(9):e0163251.
6. Bouter LM, Tijdink J, Axelsen N, Martinson BC, ter Riet G. Ranking major and minor research misbehaviors: Results from a survey among participants of four World Conferences on Research Integrity. Res Integr Peer Rev. 2016;1:17.
7. Maggio LA, Artino AR, Picho K, Driessen EW. Are you sure you want to do that? Fostering the responsible conduct of medical education research. Acad Med. 2018;93:544–549.
8. Fanelli D. Tregenza T. How many scientists fabricate and falsify research? A systematic review and meta-analysis of survey data. PLoS One. 2009;4:e5738.
10. Brice J, Bligh J, Bordage G, et al. Publishing ethics in medical education journals. Acad Med. 2009;84(10 suppl):S132–S134.
11. Eva KW. How would you like your salami? A guide to slicing. Med Educ. 2017;51:456–457.
12. ten Cate O. Why the ethics of medical education research differs from that of medical research. Med Educ. 2009;43:608–610.
13. Hally E, Walsh K. Research ethics and medical education. Med Teach. 2016;38:105–106.
14. Uijtdehaage S, Mavis B, Durning SJ. Whose paper is it anyway? Authorship criteria according to established scholars in health professions education. Acad Med. 2018;93:1171–1175.
15. International Committee of Medical Journal Editors. Recommendations for the conduct, reporting, editing, and publication of scholarly work in medical journals. http://www.icmje.org/icmje-recommendations.pdf
. Published 2017. Accessed July 30, 2018.
16. White C. The COPE Report 2000: Annual Report of the Committee on Publication Ethics. 2000.London, UK: BMJ Books.
17. Tijdink JK, Verbeke R, Smulders YM. Publication pressure and scientific misconduct in medical scientists. J Empir Res Hum Res Ethics. 2014;9:64–71.
18. Anderson MS, Horn AS, Risbey KR, Ronning EA, De Vries R, Martinson BC. What do mentoring and training in the responsible conduct of research have to do with scientists’ misbehavior? Findings from a national survey of NIH-funded scientists. Acad Med. 2007;82:853–860.
19. Schaeffer NC, Dykema J. Questions for surveys: Current trends and future directions. Public Opin Q. 2011;75:909–961.
20. Tourangeau R, Yan T. Sensitive questions in surveys. Psychol Bull. 2007;133:859–883.
21. Artino AR Jr, La Rochelle JS, Dezee KJ, Gehlbach H. Developing questionnaires for educational research: AMEE guide no. 87. Med Teach. 2014;36:463–474.
23. Phillips AW, Reddy S, Durning SJ. Improving response rates and evaluating nonresponse bias in surveys: AMEE guide no. 102. Med Teach. 2016;38:217–228.
24. Supak Smolcić V. Salami publication: Definitions and examples. Biochem Med (Zagreb). 2013;23:237–241.
25. Wislar JS, Flanagin A, Fontanarosa PB, Deangelis CD. Honorary and ghost authorship in high impact biomedical journals: A cross sectional survey. BMJ. 2011;343:d6128.
26. Kornhaber RA, McLean LM, Baber RJ. Ongoing ethical issues concerning authorship in biomedical journals: An integrative review. Int J Nanomed. 2015;10:4837–4846.
27. Vera-Badillo FE, Napoleone M, Krzyzanowska MK, et al. Honorary and ghost authorship in reports of randomised clinical trials in oncology. Eur J Cancer. 2016;66:1–8.
28. Lundberg GD, Flanagin A. New requirements for authors: Signed statements of authorship responsibility and financial disclosure. JAMA. 1989;262:2003–2004.
29. Allison ST, Messick DM, Goethals GR. On being better but not smarter than others: The Muhammad Ali effect. Soc Cogn. 1989;7:275–295.
30. Head ML, Holman L, Lanfear R, Kahn AT, Jennions MD. The extent and consequences of p-hacking in science. PLoS Biol. 2015;13:e1002106.
31. Nuzzo R. Scientific method: Statistical errors. Nature. 2014;506:150–152.
32. Simmons JP, Nelson LD, Simonsohn U. False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychol Sci. 2011;22:1359–1366.
33. Gehlbach H, Robinson CD. Mitigating illusory results through preregistration in education. J Res Educ Eff. October 2017:1–20.
34. Picho K, Maggio LA, Artino AR Jr.. Science: The slow march of accumulating evidence. Perspect Med Educ. 2016;5:350–353.
35. Ioannidis JP. Why most published research findings are false. PLoS Med. 2005;2:e124.
36. Horton R. Offline: What is medicine’s 5 sigma? Lancet. 2015;385:1380.
37. Smaldino PE, McElreath R. The natural selection of bad science. R Soc Open Sci. 2016;3:160384.
38. Johnson TP, Wislar JS. Response rates and nonresponse errors in surveys. JAMA. 2012;307:1805–1806.
39. Halbesleben JR, Whitman MV. Evaluating survey quality in health services research: A decision framework for assessing nonresponse bias. Health Serv Res. 2013;48:913–930.
40. Fiedler K, Schwarz N. Questionable research practices revisited. Soc Psychol Personal Sci. 2016;7:45–52.
41. Eva KW, Regehr G. Self-assessment in the health professions: A reformulation and research agenda. Acad Med. 2005;80(10 suppl):S46–S54.
42. Nazim Ali S, Young HC, Ali NM. Determining the quality of publications and research for tenure or promotion decisions. Libr Rev. 1996;45:39–53.
43. Kidwell MC, Lazarević LB, Baranski E, et al. Macleod MR. Badges to acknowledge open practices: A simple, low-cost, effective method for increasing transparency. PLOS Biol. 2016;14:e1002456.
Supplemental Digital Content
© 2019 by the Association of American Medical Colleges