Heitman, Elizabeth PhD; Olsen, Cara H. DrPh; Anestidou, Lida DVM, PhD; Bulger, Ruth Ellen PhD
Instruction in research integrity has been a feature of U.S. graduate education in the biomedical sciences since the early 1990s. Many programs were developed in response to the National Institutes of Health’s (NIH’s) 1989 policy that required National Service Research Award (NSRA) training grants to include instruction on research integrity1 or the 1992 policy that expanded the requirement to a wider scope of trainees.2 Still others are part of health sciences universities’ general efforts to promote the responsible conduct of research (RCR) in young investigators’ professional development. Undergraduate science programs seldom teach the principles and standards of research in a formal way, and undergraduates’ actual exposure to research practice varies across schools and programs. Although U.S. graduate programs in the biomedical sciences increasingly expect applicants to have research experience, educators likely know little about their new students’ previous exposure to or awareness of ethical norms and standards of practice in this field.
RCR educators and authors of curricular materials often make the paradoxical assumption that new graduate students know enough about scientific research to understand best practices and professional norms, but that they are still open enough to instruction that they will readily adopt behaviors and practice standards introduced in RCR courses. Many of our own students report that they have learned a great deal of new and useful information in our classes, but every year a few comment that they did not need the course because they were already familiar with the subject from their undergraduate education or research experiences. Occasionally, students note that what we teach about certain standards is not what they have observed in practice. And from time to time, students and other faculty question the value of teaching research integrity at all because they believe that adult students’ moral frameworks, beliefs, and behaviors are already established before they receive such instruction.
To be effective, adult education must be based on an understanding of students’ prior knowledge, expectations, and commitments.3 To assess the impact and value of mandatory RCR education at the graduate level, instructors need a clear understanding of graduate students’ knowledge of the standards of good research before students receive required RCR instruction. We developed an objective test of core RCR concepts and standards and administered it to new graduate students at four U.S. academic health science centers, just before they began their institutions’ required RCR instruction. We sought to identify students’ baseline knowledge of RCR, as well as the variables that affect this knowledge.
In the first phase of this study, begun in 2002 and reported elsewhere,4 we analyzed the content of 20 key instructional resources in research integrity in the biomedical sciences, published in the United States between 1984 and 2004. Our analysis of the material presented in these works used the framework of the nine core instructional areas of RCR proposed by the Office of Research Integrity (ORI) of the Department of Health and Human Services in 2000 (data acquisition, management, sharing, and ownership; conflict of interest and commitment; human subjects; animal welfare; research misconduct; publication practices and responsible authorship; mentor/trainee responsibilities; peer review; and collaborative science).5 We identified a wide range of common principles, standards, regulations, and issues in research integrity, which we categorized into the nine core areas. Content that did not fit into one of these categories was assigned to a 10th general category of “other core information.” We defined the accepted core knowledge of RCR as the standards and concepts presented most frequently by the majority of the key educational resources.
Using standard approaches to designing multiple-choice tests, such as those used in National Board-type exams, we constructed 50 objective multiple-choice test questions concerning principles, regulations, and standards of practice drawn from the identified core knowledge of RCR. The number of questions on each core topic reflected the relative emphasis on the topic and the extent of consensus on the core knowledge in each area included in the instructional literature. A five-member expert advisory committee of RCR educators evaluated the content, wording, and balance of the test questions within each core instructional area and assisted with the clarification and revision of some items.
In a concurrent, IRB-approved, second phase of the study, we conducted multidisciplinary focus groups with biomedical sciences graduate students at three health sciences universities with long-standing RCR education programs (see Study sites, below), to identify factors in undergraduate education and research experience that students believed related to their knowledge of RCR and the applicability of graduate RCR education. One focus group was exclusively composed of students with undergraduate degrees from outside of the United States. On the basis of the issues identified by the focus group participants, we constructed 14 multiple-choice questions related to demographic characteristics, previous education, and research experience. We combined these demographic questions with the 50 content questions to create a paper-and-pencil test using computer-scored Scantron answer sheets.
We selected as study sites four competitive health sciences universities with comprehensive institutional research agendas, long-standing RCR requirements for graduate students, and course directors committed to assessment of RCR education. At the request of institutional officials, and to protect the confidentiality of study participants, the study sites are identified here as Schools A, B, C, and D. School A is a public, mid-Atlantic academic medical center that requires its RCR course for all doctoral students in its small biomedical sciences graduate program. School B is a private, mid-Atlantic research university with a small graduate training program in the biomedical sciences that requires its RCR course for all doctoral students. School C is a private, southern research university with an affiliated health sciences center that requires its RCR course for all students in its large graduate program in the biomedical sciences as well as postdoctoral fellows on NSRA training grants. School D is a public, southwestern health sciences university that requires its RCR course for all students in its large graduate program in the biomedical sciences, as well as all postdoctoral fellows on NSRA training grants; this institution also has a highly diverse student body with a large population of international students. As discussed in the Results section below, complications with broadcast e-mail recruitment and test links for School D resulted in this study site being omitted from the final analysis.
Test format and scoring
After IRB approval, we piloted a paper-and-pencil instrument with 50 RCR content questions and 14 demographic questions to test 22 new doctoral students at School A just before beginning their required RCR course in fall 2003. Because of the unexpectedly long time that participants needed to complete the test, we reduced it to 45 content items and 14 demographic questions, which we piloted with 19 doctoral students at School B immediately before their required RCR course in spring 2004. Ultimately, we shortened the test to 30 content questions and 12 demographic items so that it could be administered in less than an hour. Content questions were included in the final instrument according to three criteria: (1) whether ignorance of the correct answer could lead a researcher into serious regulatory trouble, (2) whether the question addressed standards of excellence in biomedical research, and (3) the relative emphasis on the core area reflected in the instructional literature. Analysis of differences in scores between the 50-, 45-, and 30-question formats at the two pilot test sites (t test, paired samples) confirmed that the 30-question format was representative of the longer tests (t = −1.93, P = 0.060). These 30 content questions were used as the basis for calculating the score (percent correct) for all students in the final study population, including those who participated in the pilot.
The 30 content questions and 12 demographic questions were subsequently formatted both for online administration and as a paper-and-pencil test using Scantron forms. We administered the Scantron-format test to new graduate students at Schools A and B, and the online version of the same instrument to the larger populations at Schools C and D, just before they were scheduled to begin their program’s required RCR course in each of two academic years. An interim analysis of data was presented after the first cycle of testing (through 2004) at the 2004 Conference on Research in Research Integrity.6
To describe more clearly the unexpectedly high number of participants in the first cycle who reported prior graduate education, we added a 13th demographic question in the second cycle of testing to ask about prior graduate RCR education. Analysis of variance (ANOVA) in scores across both the schools and year of testing, including pilot testing, permitted a combined analysis of the total population of students participating during the two cycles (F 4,246 = 0.109, P = 0.979). This report summarizes data from the entire study.
Scantron-format tests were scored by a computer reader against a Scantron answer key prepared by the investigators; online tests were scored against a programmed answer key set up by the test site designer and verified by the investigators. Overall test score (percent correct) was compared between demographic subgroups using the Student t test for independent samples. Stepwise linear regression was used to identify demographic variables that were independently associated with test scores. Hierarchical cluster analysis was used to identify subsets of questions that were missed by the same students, and ANOVA was used to examine variability in percent correct across content areas. Data were analyzed using SAS version 9.1 (SAS Institute, Inc., Cary, NC).
A total of 300 students from the four study sites completed the test during the two cycles. Difficulties with the broadcast e-mail system and link to the online test for School D corrupted the master list of eligible students and disrupted recruitment at that site. Because assessment of a response rate was not possible without the denominator, and because the 49 tests submitted were a disproportionately small response from the targeted population, we ultimately dropped this school from the final analysis. The test scores of 251 of the 402 students (62%) at Schools A, B, and C who completed the test ranged from 26.7% to 83.3%, with a mean score of 59.5% (SD 10.5). Only seven (3%) of the 251 participants scored 80% or higher, which is widely considered the passing grade in graduate education. Table 1 displays the participants’ mean test scores for knowledge of RCR by school and test format.
Despite the participation of postdoctoral fellows at School C, the populations at the different study sites did not differ significantly with respect to any of the demographic variables collected (chi-square test, P > .05 for all tests). Four of the 13 demographic items projected by the focus groups to be important for new students’ knowledge of RCR showed no significant relationship to overall test scores. These factors were undergraduate major, having had an undergraduate course in research methods, having had undergraduate course in research integrity, and having attended an undergraduate institution with an honor code (Table 2).
Of the remaining nine demographic factors, only country of undergraduate degree and age were significantly associated with higher test scores at the P < 0.05 level, whether considered alone or after adjusting for other factors in the regression model. The 30 students (12%) who had received their undergraduate education outside the United States had a mean overall test score of 52.0%, 8.5 points lower than that of the United States-educated students (P < 0.001), and the 129 students (51%) ages 18 to 24 had a mean overall test score of 57.2%, 4.8 points lower than that of older students (P = .008).
Of the 92 trainees (37%) with prior graduate education, 16 in the second cycle reported that they had had previous graduate-level training in RCR. Their mean score of 67.7% was 10 points higher than that of new students (P = 0.008). These trainees were not compared separately in the linear regression because of the small sample size.
The remaining six demographic items had an unclear relation to test scores, either because of confounding by other variables or marked shifts in the responses between the two cycles of testing. These included respondents having received their undergraduate degree after January 2001; having received a prior graduate-level degree; having done hands-on research before graduate school; having had a research mentor before graduate school; race/ethnicity; and gender. Participants who received their undergraduate degrees before 2001, when the ORI originally planned to implement its now-suspended policy on RCR education,5 had a mean score of 57.5%, 5.3 points higher than participants who had graduated that year or later (P < 0.001). An unexpectedly high proportion (92 of the 210 participants [43.8%] who were asked about prior graduate education) had entered their current graduate programs with a prior graduate degree in the biomedical sciences or medical professions; this group’s mean score was 61.2%, 3.3 points higher than that of students entering a graduate program for the first time (P = 0.025). However, students who received their undergraduate degrees before 2001, as well as those with a prior graduate degree, tended to be older students. After adjusting for age, these associations were no longer statistically significant.
Students who reported having had practical hands-on research experience before graduate school, or who reported having had a research mentor before entering graduate school, had lower overall test scores than other students. This association was significant in the interim analysis after the first cycle of testing but was not significant in the final analysis. Finally, male students’ mean score of 61.5% was 3.1 points higher than that of their female counterparts (P = 0.022) and white, non-Hispanic students’ mean score of 61.1% was 4.8 points higher than that of students of other racial/ethnic backgrounds (P = 0.001), but these associations were not significant after adjusting for other demographic factors.
Test scores were approximately normally distributed across a range of 23.7% (low) to 83.3% (high). In addition to the wide range of scores, there was considerable variability in participants’ knowledge of the specific content areas, defined according to the ORI’s nine core instructional areas (Table 3). There was no apparent clustering by content area (data not shown). Similarly, ANOVA to compare variability in percent correct among content areas with variability among questions within content areas found no significant difference (F8,25 = 1.34, P = 0.27). This finding suggests that the depth of new students’ knowledge is not necessarily associated with any specific content area.
The 1989 NIH training grant mandate on RCR instruction1 and the 1992 expanded policy2 were based, in part, on the belief that formal, graduate-level instruction could make a significant difference in new researchers’ knowledge of standards of ethical research and the overall integrity of science. Our findings show that many new graduate students in the biomedical sciences do not know the basic concepts and standards of RCR. Moreover, we found that few of the undergraduate experiences that might be expected to provide such knowledge have any significant impact on graduate students’ knowledge of RCR.
Prior graduate degree and educational experience
We began this study intending to look at the knowledge of new graduate students entering their first postbaccalaureate degree program, before they began their university’s required RCR instruction. The universities where this study was conducted have long-standing RCR education programs that focus on first-year students as part of their basic orientation to graduate science education and the university’s research environment. However, we discovered that many incoming students enrolled in these required RCR courses did not come directly from undergraduate programs; rather, they had prior graduate degrees in the biomedical sciences or health professions.
We chose to study new graduate students to control for the possible effects of general graduate education in the biomedical sciences and RCR instruction that more advanced students might have received, but our findings confirm the implicit claim of the original NIH training grant mandate that standard graduate science education does not provide adequate grounding in the core concepts and standards of RCR. One third of our population of “new” graduate students reported having a prior graduate degree in the biomedical sciences or health professions. Although they scored significantly higher than the newly entering students, the mean score of this group of “senior” trainees was still only 61%, well below the 80% that is commonly considered the lowest passing score in graduate education.
Moreover, when we examined the additional demographic question about participants’ prior graduate-level RCR instruction, which we added in the second cycle of testing, we were surprised to find that the mean score of the 16 participants who reported previous RCR training was still only 67.7%. Although a few did achieve scores of 80% or 83.3%, most students’ scores fell below what might be considered an adequate minimum level of knowledge of the basics of RCR. This finding suggests not only that RCR instruction varies in content across institutions, but also that a single RCR course early in a student’s graduate career is probably insufficient to impart even core concepts and standards in a lasting way. These results are a disappointing complement to our finding that neither undergraduate courses in research methods or undergraduate classes in RCR, research integrity, or research ethics (not including theoretical bioethics) affected students’ knowledge of core concepts and standards.
The weak performance of the more senior trainees who had already received some formal RCR instruction also raises the important question of what should constitute a passing score in any RCR course. Although an 80% score would be a passing grade in most graduate schools, an 80% score on the study’s test meant that the participant missed 6 of 30 questions related either to recognized standards of excellence in science or to key regulatory standards for which noncompliance can carry significant penalties. Given the practical importance of accurate knowledge in basic RCR, it might be reasonable to insist that trainees in an RCR course achieve a perfect score on the final exam to demonstrate mastery of the material that their course faculty considered essential. And, although we cannot know the content of the previous RCR instruction that these 16 trainees had received, their scores suggest that graduate schools should not automatically permit trainees to place out of an institutionally required RCR course simply because they have received prior instruction elsewhere.
Country of undergraduate education
International students constitute a large and vital segment of graduate biomedical sciences trainees across the United States, and their important role in academic research in the United States is increasingly recognized.7 Thirty of the 251 participants in this study had received their undergraduate science education outside the United States. None of the non-U.S. graduates scored at or above the 80% level; only five non-U.S. graduates had scores of 70% or better. The relatively small number of non-U.S. graduates and the diversity of their previous educational experiences did not permit detailed analysis of associated variables.
Many of the international graduates who participated in the study had only just begun their U.S. training programs, and some may have experienced difficulty with the wording used in the test, although all would have previously passed both an English-language proficiency exam and standardized science testing in English for admission. More important, however, we examined students’ baseline knowledge of U.S. standards of RCR, identified in U.S. curricular materials, published in English, and some international students may have answered specific test questions on the basis of their experience of different standard practices in other countries. Differing standards of research practice among various countries may lead to confusion or conflict for international students in a new research environment,8 and it is essential that RCR educators recognize that their international students may have been exposed to different standards in the past and that they know how to explain U.S. standards in that context.
Research experience and mentoring: unclear findings
The graduate students who took part in our focus groups stated emphatically that hands-on research and research mentoring were essential to their understanding of RCR before entering graduate school. We hypothesized that students who reported hands-on research experience and research mentoring before graduate school would have higher overall test scores than those without such experience. Our interim analysis after the first cycle of testing, however, found that the 104 respondents who reported having done hands-on research before graduate school had significantly lower scores (−10.9, P < 0.001) than the 11 who reported no undergraduate research experience. Similarly, in the first cycle of testing, we found that participants who reported having a research mentor before graduate school had significantly lower test scores than those who had not had such a mentor (−5.5, P = 0.005). However, in the final analysis of 251 participants’ results, we found no significant positive or negative effect from either prior hands-on research or research mentoring before graduate school on overall test scores.
Because the first cycle of testing did not ask specifically about levels of prior graduate education—only about prior graduate degrees—the interim analysis combined participants with and without prior graduate degrees, a possible confounder for these questions about undergraduate research and mentoring. Our analysis of data from both testing cycles, in which we accounted for participants’ prior graduate degrees, still showed that participants who reported having had a research mentor before graduate school scored lower than those without a mentor, but the difference of 3.7 points was not statistically significant.
It is increasingly common for doctoral programs to expect and even require prior research experience as a condition for admission. We believe that it is vital to expand the research on the effects of mentoring on research integrity to include the effects of undergraduate research mentoring. Especially in light of the recent work by Martinson, de Vries, Anderson, and colleagues on questionable research practices by academic investigators,9 “normal” misbehavior in everyday research activities,10 and the complex effects of graduate-level mentoring,11 understanding possible differences between undergraduate mentoring in research intensive universities and four-year colleges also seems important. To study the effects of undergraduate research experience on graduate students’ knowledge of the concepts and standards of RCR, it will likely be necessary to study undergraduates directly.
Stratifying by prior graduate degree, older students seemed to do significantly better (or, conversely, younger students did significantly worse) regardless of prior graduate degree.
Even accounting for prior graduate degree and prior graduate RCR education, younger participants (ages 18–24) scored significantly lower than participants ages 25 to 29 (P = 0.013) and participants age 30 and older (P < 0.001). Age may reflect ethical maturity, life experience, or skill at taking multiple-choice tests. Age alone seems unlikely to affect professional knowledge, but subgroups are too small to analyze for other meaningful distinctions.
This study had some methodological limitations. The study protocol required participants to take the test before their respective universities started their RCR programs for the year, leaving only a small window of time for testing. Because of unforeseen difficulties with the broadcast e-mail system used in recruitment for online testing, both cycles of testing at School D experienced extremely low particiption. The tests of 49 participants from School D were ultimately dropped from the final analysis, because they constituted a very small percentage of the school’s eligible population of almost 400 graduate students and postdoctoral fellows. If that population had been adequately represented in the study, not only would the overall population have been larger, it would have been more ethnically and racially diverse, and a higher percentage of participants would have had non-U.S. undergraduate degrees.
Effective adult education in RCR requires instructors to identify learners’ expectations, existing levels of knowledge, and openness to change. It also requires clear instructional goals and curricular content on the basis of students’ identified needs.1 Because RCR instruction has largely been driven by the need to meet training grant requirements and been shaped more recently by the ORI’s recommended nine core areas for instruction, RCR educators often focus on what courses “must” cover, with little or no awareness of what their own trainees need or already know.
Baseline knowledge of concepts and standards of responsible research may vary markedly within a single RCR class because new graduate students do not necessarily have similar academic preparation or experience in research. We began our investigation with the hypothesis that new graduate students would vary in their backgrounds and baseline level of knowledge of RCR. We found that trainees enrolled in required RCR education are indeed a highly diverse group of new and continuing graduate students. With respect to their prior knowledge of the nine core areas of RCR, members of a single class may have little in common beyond the fact that their previous undergraduate science education, research experience, and work with research mentors have not provided what we consider adequate instruction in or exposure to the core concepts and standards provided in published curricular materials. New students’ gaps and weaknesses in knowledge cut across the core instructional areas of RCR and are not just apparent in a few clearly identifiable topics. The roles of these three important dimensions of research education—undergraduate science instruction, research experience, and mentoring—in the development of research integrity need much more investigation.
The diversity of knowledge of RCR among members of a single class poses a series of challenges to RCR educators. The simultaneous increase in the number and scope of formal standards of research integrity, larger and more interdisciplinary RCR courses, and a nationwide push to standardize, centralize, and evaluate RCR instruction are likely to encourage faculty to provide one-size-fits-all courses to ensure that everyone is exposed to a comprehensive range of core material. Our findings suggest that most students need comprehensive RCR instruction, but also that standardized training that aims at a middle ground may both underestimate and overestimate the needs of its audience as individuals. However, apart from a few general indications, it seems difficult to predict from trainees’ previous science education just what those needs for instruction might be.
New international trainees who have received their undergraduate science education outside the United States have significantly greater gaps in their knowledge of core U.S. concepts and standards of RCR than their United States-educated counterparts. International research trainees need United States-focused RCR education to help them succeed in their new environment. Although much biomedical research is international, RCR instructors need to learn more about standards of RCR outside the United States and their interpretation in practice. Successful multinational research collaboration may depend on investigators’ ability to recognize, interpret, and work within culturally distinct research practices and their ethical frameworks. Truly global health research requires investigators to negotiate successfully among different conceptions of good research conduct.
More senior trainees beginning a new program likely still have pronounced gaps in their knowledge of core concepts and standards of RCR, even if they have previously received graduate RCR education elsewhere. Graduate programs that provide introductory “inoculation” training in RCR should consider requiring ongoing or advanced-level RCR instruction beyond new students’ orientation or first-year course. As advanced students explore more sophisticated scientific questions, they encounter more complex ethical issues as well. Programs that are satisfied to offer only basic RCR instruction risk overlooking gaps in students’ knowledge that may negatively affect their work as more senior trainees and diminish their ability to mentor less advanced students. As a number of national-level policy reports have recommended,12,13 effective RCR education programs should involve all members of the research team, not simply new students.
Faculty of RCR courses, primary investigators of training grants, and university administrators responsible for implementing RCR instruction, should formally evaluate their students’ baseline knowledge in the ORI’s nine core instructional areas and in other issues of importance to the specific training program, before RCR instruction program begins. Such evaluation offers several benefits. Students in RCR classes with diverse enrollments may profit from the different kinds of knowledge their classmates bring to the course. Nonetheless, grouping students for discussion according to their demonstrated level of baseline knowledge can improve the experience for both new and seasoned trainees by permitting them to focus on content and issues that are new to them. Evaluating students’ knowledge both before they begin required RCR instruction and on completion also permits assessment of the RCR education program itself. Testing again at a still later point could measure students’ retention of core knowledge of RCR and evaluate how other aspects of the graduate program might affect that knowledge. Without such evaluation, claims for the benefits of graduate RCR instruction remain difficult to substantiate, and its fuller effects on research integrity remain uncertain.
The authors gratefully acknowledge the contributions of advisory committee members Michael Kalichman, Francis Macrina, Anna Mastroianni, Brian Schrag, and Linda Werling; the Web site design and support provided by Patrick Tay of the Vanderbilt University Office of Biomedical Research Education and Training; and the support of the RCR course directors and administrators at the four health science universities where this study was carried out.
This study was supported by grant #NS044533 from the National Institute of Neurological Disorders and Stroke and the DHHS Office of Research Integrity. An earlier version of this work was presented at the 2006 Meeting of the American Association for the Advancement of Science.