Openness and sharing of scientific data and findings are, or perhaps were, the fundamental ideals of academic science.1 Robert Merton described the communal sharing of data and ideas as a fundamental norm of the academic scientific enterprise. However, in their actual work, scientists in the life sciences and other fields frequently do not or cannot practice complete openness—often for legitimate reasons such as securing academic credit for their research endeavors and as a result of federal mandates to increase scientific secrecy to prevent terrorism, which may particularly compromise the training of noncitizens.2–5
In the life sciences, data from a national survey of academic life scientists showed that 46% of scientists who had asked other academics for access to published information, data, and materials had been denied access.6 More than three quarters (77%) reported that data withholding has detracted somewhat or greatly from the level of communication in science, 73% that data withholding slowed the rate of progress in their field of science, and 63% that data withholding harmed the quality of their relationships with other scientists.
Relationships between industry and academia have further complicated the issue of secrecy, and previous studies have shown that data withholding by faculty was strongly associated with commercialization of their research and increased ties with industry.7 The ethics and legality of secrecy surrounding academic–industry relations have been called into question in the past,8–11 but some studies suggest that these relationships may improve academic productivity.12 At the same time, faculty members' beliefs and practices regarding data sharing and withholding are likely to affect the educational activities conducted in universities and teaching hospitals.13–15 For example, open sharing of information during training has the potential to accelerate the training process and may expose trainees to the most recent scientific thinking and advances within their discipline. Moreover, a survey of academic geneticists and other life scientists found that more than half (56%) of academic scientists reported that data withholding detracted somewhat or greatly from the education their graduate students and postdoctoral students received.6 In addition, secrecy may result from competitive practices that may in turn take a toll on the educational experience and chip away at the ethical training of the next generation of scientists. In one study, 47% of medicine residents reported that colleagues and supervisors had taken credit for their work without acknowledgement.16
Currently, there are no systematic, empirical data regarding trainees' experiences and beliefs regarding data withholding in science. Understanding trainees' attitudes and behaviors is important for a number of reasons. First, exposure to data withholding could result in the next generation of scientists having an incomplete set of skills and techniques with which to address cutting-edge research issues in the future. Second, trainees exposed to data withholding may be more likely to practice withholding in the future. Third, trainees serve as an advance marker for understanding the extent to which openness in data sharing is (or is not) a fundamental norm underlying the social structure of the modern scientific enterprise. Finally, exposure to data withholding may delay trainees' educational progress, resulting in longer periods of training and reduced potential earnings. Exposure to data withholding, however, may also teach trainees key practices that protect their ability to benefit academically and commercially from their research in the future.
Despite the importance of this issue, to our knowledge there have been no studies in the United States of data sharing and withholding among trainees in the life sciences or any other field. The purpose of this study was to provide the first, systematic, national, multidisciplinary data on the nature, extent, and consequences of data sharing and withholding among trainees (doctoral students and postdoctoral fellows) in the life sciences, computer science, and chemical engineering. Our specific aims were
- to estimate the extent to which trainees in the life sciences are exposed to data withholding while in training and the factors that predict such exposure; and
- to measure the impact, if any, of data withholding on trainees' education and research related activities.
Method
Sample
Our sample of second-year and higher doctoral students and postdoctoral fellows (hereafter referred to as trainees) was constructed using a multistep process. In the first step, we used data from the National Science Foundation to identify the 50 U.S. universities that granted the largest number of doctorates in computer science, chemical engineering, and the life sciences in 2000. In the life sciences, one department or program of biochemistry, cell biology, genetics, or microbiology was randomly selected from each institution. This step yielded 150 departments or programs, with 50 in computer science, 50 in chemical engineering, and 50 in the life sciences. We chose these fields for several reasons. First, each field is primarily research-based and the conduct of empirical research is a required component of doctoral and postdoctoral training programs. Second, external funding for educational programs in these fields derives from the major federal agencies, including the National Science Foundation and the National Institutes of Health, as well as private industry, which suggests that the fields are comparably competitive, thus increasing the potential for data withholding. Third, these fields are among the largest areas of science in terms of the numbers of doctoral degrees granted each year. Fourth, in each of these fields commercial activities among faculty and universities are frequent and often involve trainees. Finally, given the increasing multidisciplinary nature of biomedical research, it is increasingly likely that scientists from these fields will interact in areas such as biomedical engineering, genetics, medical imaging, and bio-informatics. Understanding disciplinary differences in data withholding may facilitate such collaborations.
In the second step, at each selected department or program we requested that the department chair or program head provide the names of all second-year and higher doctoral students and all postdoctoral fellows in their department or program. We also secured the names of trainees from department or program Web sites when available. Of the 150 randomly chosen departments or programs, 115 participated. The primary reason for not participating was the unwillingness or inability of department chairs to release students' names. In some cases, department chairs were too busy to respond; in others, they believed participation to be against university policy. The 115 programs yielded a master list of 6,734 trainees with 38% in computer science, 34% in chemical engineering, and 28% in the life sciences.
In the third step, from the master list we selected a random sample of 2,000 trainees distributed evenly in each of the three fields and proportional to the number of trainees in each institution's department or program. Random sampling by program size and field ensured we had equal numbers of trainees in each field and, at the same time preserved department or program representativeness in each field. We did this to ensure a self-weighting sample.
Survey instrument design and testing
The design of the survey instrument was informed by two focus-group discussions with trainees from all three fields, ten semistructured interviews, discussions with colleagues, and review of the literature. The survey instrument was pretested using nine cognitive interviews conducted by professional interviewers at The Center for Survey Research (CSR). The final instrument was 11 pages in length and required about 20 minutes to complete. Given the complexity of the subject a long survey instrument was needed to cover as many different aspects of the topic as possible. Although the instrument was lengthy, we were able to achieve an acceptable response rate (see below). The institutional review boards (IRBs) of Massachusetts General Hospital and the University of Massachusetts at Boston approved the study.
Survey instrument administration
CSR administered the survey instrument via mail. Between January and April 2003, CSR sent an anonymous survey instrument to the departmental address of the trainees along with a letter, a fact sheet, a postage-paid postcard, and a postage-paid return envelope. Because of the sensitive nature of the study, we asked respondents to mail the postcard separately from the anonymous survey instrument. There was no personal identifying information on the instrument; thus, we relied on the postcard to track nonrespondents. This process allowed us to maintain respondents' anonymity in accordance with Massachusetts General Hospital's IRB requirements. Nonrespondents were mailed additional survey instruments and a letter encouraging them to complete the instrument, and were contacted personally by phone or e-mail and invited to participate. Of the 2,000 trainees in our initial sample, 164 were ineligible to participate. Trainees were considered ineligible if they had changed schools, left the country, or were on medical leave from their program.
Key study variables
Our study had four key variables: exposure to withholding, consequences of withholding, competition in the lab/research group, and industry financial support.
Exposure to withholding.
There were three measures of exposure to data withholding. The first measured how often trainees had been denied access to published information, data, materials, or programming that they requested of: their current advisor/mentor, faculty members in their department, graduate students or postdoctoral fellows in their lab or group, graduate students or postdoctoral fellows in other labs or research groups, or scientists at another academic institution. The response variables were “never,” “rarely,” “sometimes,” and “often.” Individuals who answered “rarely,” “sometimes,” or “often” were coded as 1—indicating they had been denied access at least once over the course of their training. All other respondents were coded as 0—indicating they had not been denied access during the course of their training.
The second exposure variable measured how often trainees had been denied access to unpublished information, data, materials, and programming they had requested of: their current advisor/mentor, faculty member in their department, graduate students or postdoctoral fellows in their lab or group, graduate students or postdoctoral fellows in other labs or research groups, and scientists at another academic institution. The response categories were “never,” “rarely,” “sometimes,” and “often.” Individuals who answered “rarely,” “sometimes,” or “often” were coded as 1—indicating they had been denied access at least once. All other respondents were coded as 0—indicating they had not been denied access during the course of their training.
The third exposure variable measured whether trainees had engaged in data withholding themselves. On the survey instrument we asked trainees to indicate the number of times they had received requests from another academic scientist for scientific information, data, or materials concerning their published research. For trainees who had received at least one such request we asked, “About how many times would you say you have denied giving other academic scientists requested information, data, and materials related to your published research?” Respondents who indicated they had denied one or more requests were coded as 1 and all other respondents were coded as 0.
Consequences of withholding.
On the survey instrument we asked, “Overall, how does data withholding among academic scientists currently affect: “the rate of scientific discovery in your lab/research group,” “the level of communication in your lab/research group,” “the quality of the education you receive,” “the progress of your research” and, “the quality of your relationships with other academic scientists?” The response categories were “no effect,” “detracts somewhat,” and “detracts greatly.” Responses to these questions were coded into dichotomous variables with “no effect” coded as 0 and “detracts somewhat” or “detracts greatly” coded as 1. Nonrespondents were coded as missing.
In addition we asked, “As a result of another scientist's failure to share scientific information, data and materials or programming have you ever: “had a publication significantly delayed?,” “been unable to publish the results of your research?,” “been unable to confirm others' published research?,” “abandoned a promising line of research?,” and “experienced a delay of more than six months in your research?” The response categories were “yes” and “no.” Nonrespondents were coded as missing.
Competition in lab/research group.
We measured trainees' perceptions of the amount of competition in their research groups by asking, “How would you characterize the overall level of competition for recognition or scientific priority in your lab/research group?” The response categories were “very competitive,” “moderately competitive,” “not very competitive,” and “not at all competitive.” Responses were grouped into a dichotomous variable with “very competitive” and “moderately competitive” coded as 1 representing competitive labs, and “not very competitive” and “not at all competitive” coded as 0.
Industry support.
We measured whether trainees were supported by industry funding with the following item: “Does any of your funding/support, whether for salary, research grants, training grants, scholarship support or research related gifts come from industry sources?” The response categories were “yes,” “no,” and “don't know.” “Yes” responses were coded as 1, “no” responses were coded as 0, and “don't know” responses were coded as missing.
Analysis
We analyzed the data using standard statistical techniques. For bivariate analyses involving dichotomous variables, differences in proportions were tested using chi-square analysis. For multivariate analyses involving dichotomous outcome variables (exposure measures and consequences measures), we used logistic regression to statistically control for the effects of gender, race, industry support, and amount of competition in the research group. Based on our previous research among faculty, we hypothesized that gender and ethnicity would be related to data sharing and withholding. Previous research by Louis and Anderson found that both of these characteristics were related to reported exposure to scientific misconduct.15 Finally, in the social sciences it is the usual practice to include gender and ethnicity as control variables, so excluding them would have raised concerns on the part of some readers.
Results
Characteristics of respondents
Of the 1,836 trainees who were eligible to participate, 1,077 responded (58.69%). Response rates varied by strata, with 382 (62.7%) in chemical engineering, 297 (47.3%) in computer science, and 398 (66.4%) in the life sciences.
Table 1 shows respondents' characteristics. Approximately one third (31.2%) of our respondents were women; 22.4% of chemical engineers and computer scientists and 46.3% of life scientists were women. Most of our respondents (81.9%) were doctoral students; the rest (18.1%) were postdoctoral fellows. The vast majority (92.9%) of chemical engineers and computer scientists were doctoral students, compared to 63.2% of life scientists. Half (50.7%) of our respondents were white, 39.5% were Asian, and the rest (9.9%) identified themselves as belonging to other racial/ethnic groups (e.g., African American, American Indian).
Table 1: Characteristics of Second-Year and Higher Doctoral Students and Post-Doctoral Fellows at 50 U.S. Universities, From a Study on Data Sharing and Withholding, 2003
More than a third of trainees (36.5%) reported that some of their funding support came from industry sources. However, there were disciplinary differences in industry support. For example, half (51.1%) of the computer scientists and chemical engineers had industry support, but only 11.7% of the life scientists were supported by industry. Just over half of all respondents (59.9%) rated the amount of competition for recognition or scientific priority within their own labs or research groups as high.
Data withholding/bivariate analyses
Overall about one quarter of trainees reported that they had asked and been denied access to information, data, materials, or programming associated with published (23.0%) or unpublished (20.6%) research during their training (see Table 2). Exposure to data withholding was significantly more common among postdoctoral fellows than among doctoral students for published information (31.6% versus 21.0, p = .002) and for unpublished information (33.7% versus 17.8%, p < .0001). Trainees in high competition labs were significantly more likely than those in low competition labs to have been denied access to unpublished information (23.0% versus 17.4%, p = .03). In terms of field, life scientists were more likely to have been denied access to published information, data, or materials than were computer scientists and chemical engineers (27.1% versus 20.5%, p = .02).
Table 2: Associations Between Characteristics of Second-Year and Higher Doctoral Students and Post-Doctoral Fellows at 50 U.S. Universities and Data Withholding/Bivariate Analyses, 2003
Overall, 7.9% of trainees reported they had denied another academic scientist's request(s) for scientific information related to their own published research. A significantly larger percentage of trainees with industry support reported engaging in data withholding compared to those without industry support (10.6% versus 6.6%, p < .05). We found similar results for those in high competition research groups compared to those in low competition research groups (9.6% versus 5.4%, p = .02). Trainees in the life sciences were less likely to report denying a request for information than computer scientists and chemical engineers (5.6% versus 9.3%, p = .04).
Predictors of data withholding/ multivariate analyses
After controlling for other factors, we found that trainees with industry support were significantly more likely than those without industry support to have been denied access to both published (odds ratio [OR] = 1.45, 95% confidence interval [CI] = 1.01–2.09) and unpublished (OR = 1.49, 95% CI = 1.03–2.15) information, data, and materials (see Table 3). We found similar results when comparing life scientists to computer scientists and chemical engineers (OR = 1.78, 95% CI = 1.23–2.58 for published and OR = 1.65, 95% CI = 1.13–2.42 for unpublished). Trainees in high-competition research groups were 1.7 times more likely than those in low-competition research groups to have been denied access to unpublished information (95% CI = 1.19–2.44). In addition, we found a significant predictor of being denied access to published information, data, and materials was the race/ethnicity of the trainee making the request. Compared to whites, trainees who classified themselves as Asian were significantly less likely be denied access (OR = .67, 95% CI = .46–.96).
Table 3: Association Between Characteristics of Second-Year and Higher Doctoral Students and Post-Doctoral Fellows at 50 U.S. Universities and Data Withholding/Multivariate Analyses, 2003
The only variable that was a significant predictor of withholding information from others was trainees' perceptions of the level of competition in their research group. Trainees in high-competition research groups were also almost twice as likely as trainees in low-competition research groups to report having denied another's request for information, data, or materials (OR = 1.81, 95% CI = 1.02–3.22).
Consequences of withholding
Five-hundred thirty-three respondents (50.8%) reported that withholding had had a negative effect on the progress of their own research, 508 (48.5%) on the rate of discovery in their own lab or group, 472 (45.0%) on the quality of their relationships with other academic scientists, 346 (33.0%) on the quality of the education they receive, and 299 (28.5%) on the level of communication in their lab or group. In addition, 125 respondents (11.8%) reported that their research had been delayed by more than six months because another academic was unwilling to share data with them.
Table 4 shows the extent to which trainees believed data withholding had had a negative effect on these aspects of training, comparing those exposed to withholding to those not exposed to withholding while at the same time statistically controlling for the effects of gender, race, industry support, and amount of competition in the research group. On every measure, trainees who had been denied access to information, data, or materials (both published or unpublished) were significantly more likely to report data withholding had had a negative effect (all p < .05). The largest negative effects of being denied access to information, data, and materials for trainees were effects on the progress of respondents' own research and on the quality of their relationships with other academic scientists (all p < .05).
Table 4: Multivariate Analyses of the Association between Data Withholding and the Educational Experiences of Second-Year and Higher Doctoral Students and Post-Doctoral Fellows at 50 U.S. Universities, 2003
Those who had denied others' requests were significantly more likely to report negative effects on communication within their research group (OR = 1.74, 95% CI = 1.02–2.97), on the quality of the education they receive (OR = 1.75, 95% CI = 1.04–2.93), and on the quality of their academic relationships (OR = 1.90, 95% CI = 1.14–3.16).
We found similar results regarding the impact of data withholding on trainees' research activities (see Table 5). On every measure, after we controlled for the impact of gender, race, industry support, and amount of competition in the research group, trainees who had been denied access to published information, data, and materials were significantly more likely than trainees who had not been denied access to have been unable to confirm others' published research (OR = 5.28, CI = 3.63–7.69), to have experienced a delay of more than six months in their research (OR = 6.14, CI = 3.98–9.48), to have abandoned a promising line of research (OR = 4.87, CI = 3.12–7.62), and to have had a publication significantly delayed (OR = 3.76, CI = 2.33–6.07). We found similar significant results for those who had been denied access to unpublished information, data, and materials.
Table 5: Multivariate Analyses of the Association between Data Withholding and the Training Progress of Second-Year Doctoral Students and Post-Doctoral Fellows at 50 U.S. Universities, 2003
Discussion
To our knowledge, this study provides the first empirical data regarding data withholding among trainees in the sciences. Our findings suggest that a small minority of trainees (8%) reported denying other academic scientists' requests for information, data, or materials. At the same time, the quantitative impact of withholding was substantially larger, with approximately a quarter having requested but been denied direct access to either unpublished (21%) or published (23%) information, data, materials, or programming while in training. It may be that trainees are unwilling to admit to withholding information from others—a practice that may be seen as negative by the scientific community—thus, our results may underestimate the percentage who engage in this behavior. However, it is more likely that few trainees have access to or are in a position to decide about providing information, data, or materials that are desired by other members of the scientific community.
When withholding occurs, it has important negative consequences for a large percentage of trainees in science. For example, almost half of respondents reported that withholding had had a negative effect on the quality of their relationships with other academic scientists; a third reported a negative effect on the quality of the education they had received; and a quarter reported a negative impact on the level of communication in their lab or research group. These social consequences, which were significantly more likely among those exposed to data withholding, may contribute to a culture of mistrust and professional isolation both within and between groups in the scientific community to the extent that those who have experienced the personal and professional consequences of denial will see this as a normative pattern to be repeated. While it is unclear from our study what, if any, long-term impact such exposure to data withholding might have, it is clear that in the short-term data withholding has negative effects on a substantial portion of trainees and should be addressed.
Trainees who had been denied access to published or unpublished information, data, or materials reported experiencing negative effects on various aspects of their research including abandoning a promising line of research, being unable to confirm other's published research, having a publication significantly delayed, experiencing a research delay of more than six months, or being unable to publish the results of their research. These findings suggest that there may be a role for faculty in assisting trainees to gain access to the scientific information, data, and materials from others and, when such access is not secured, develop mechanisms to buffer trainees from these negative effects early in their research careers. In the least, trainees should be aware of the potential negative effects withholding can have on their research activities and training.
From a policy and management perspective, the question is what factors significantly increase the likelihood that trainees will be exposed to data withholding. The first factor is the level of competition for recognition and scientific priority within the research group or lab. Trainees in labs characterized as highly competitive were significantly more likely to deny others' requests and to have had their own requests denied than were trainees in low-competition labs. At the same time, we found that trainees who had denied others' requests or who had had their own requests denied were significantly more likely to report that data withholding had had a negative effect on the communication in their research group, their relationships with colleagues, and the quality of their scientific education. This finding suggests that secretive practices exist within the primary organizational entity in the life sciences and carry an increased risk to scientific productivity, educational effectiveness, and individual relationships among scientists. It is possible that reducing the amount of competition for recognition or scientific priority in labs or research groups may improve trainees' access to scientific resources, which in turn may reduce the negative effects of withholding on scientific training. Clearly the burden for limiting intra lab/group competition would fall disproportionately on the senior members of labs/groups, such as faculty. However, additional research is needed to better understand the unintended consequences of reducing the amount of competition in research labs/groups on the overall level of progress of the research activities.
The second factor associated with a significantly increased likelihood of being denied access to scientific resources is having research support from industry. It may be that scientists are less willing to share with trainees who have industry support out of fear that the results will be used for commercial gain or shared with the company sponsor. It is also possible that trainees with industry support are working in subfields of the life sciences that are known to be more secretive, such as genetics.3 Regardless of the reason, our data suggest that being supported by industry funding limits trainees' access to scientific resources in the academic community and may provide a rationale for limiting, or at least redefining, the role of industry in scientific education in universities. However, reducing industry funding for academic research may have unintended effects if industry funding cannot be replaced with funding from other sources, such as the government or nonprofit foundations. This may be especially true for trainees in computer science and chemical engineering in which more than half of trainees reported receiving industry support for their education.
Third, trainees in the life sciences were significantly more likely to be denied access to both published and unpublished scientific resources than were trainees in chemical engineering or computer science—even after controlling for the effects of industry support, the amount of scientific competition, and the characteristics of the graduate student. This finding clearly suggests strong disciplinary differences in data withholding may be in play. For example the overall structure and length of training in the life sciences may create greater opportunities for exposure to data withholding for life science trainees. It may be that data withholding is more accepted and practiced among faculty in the life sciences than in the other scientific disciplines. While additional research is needed to better understand the causes of the disciplinary differences we observed, targeting interventions aimed at reducing data withholding, such as courses in research ethics, toward the life sciences may be warranted given the greater level of exposure of life sciences trainees to data withholding.
It is important to recognize that scientific education occurs in the milieu of the research enterprise in which certain types of data withholding, especially before publication, are necessary and legitimate to secure academic credit and certain forms of commercial benefits from research. Thus, reducing prepublication data withholding is likely to be difficult or perhaps inadvisable in the current scientific environment. However, withholding following publication is less justified in terms of securing credit or capturing commercial value, and thus, is an area in which improving trainees' access to published information, data, and materials may be most amenable to interventions by funding agencies and universities, as well as clear professional association standards.
Several limitations of this study should be considered. First, the results may not necessarily apply to fields such as business, the social sciences, or other fields within engineering, such as civil or mechanical engineering, given the major differences in the structures of these fields of science compared to the fields we studied. Second, given the cross-sectional nature of our study, we were unable to determine how trainees' exposure to data withholding has changed in recent years. Third, we examined only one form of data withholding (denial of requests). Other forms of secrecy such as significant delays in publishing results and failure to talk about research with others may affect trainees' scientific education as well. Additional research should explore trainees' experiences with these other forms of data withholding. Fourth, we recognize that issues of data sharing and withholding among scientists are complex and often highly situation specific. Case studies and observational studies are more appropriate than the survey method we used to explore the key contextual issues related to data withholding. Fifth, given the small number of postdoctoral fellows in computer science and chemical engineering, our results regarding scientific field may be an artifact of the higher percentage of postdoctoral fellows in the life sciences. In order to address this concern, we reanalyzed the multivariate data and included a variable for educational status along with all of the other variables in the model. The results for scientific field were similar but not identical to those reported here. The odds ratio for being denied access to published information, data, and materials for life sciences compared to the nonlife sciences was 1.6 (95% CI = 1.08–2.36). In other words, the effect of field did not change when the percentage of postdoctoral fellows was taken into consideration, when we examined published research. However, the odds ratio for being denied access to unpublished research was no longer significant when educational status was added to the model (OR = 1.30, 95% CI = 0.9–1.96). Finally, it is possible that some of trainees' exposure to data withholding may be the result of federal policies and procedures that encourage secrecy in science to protect national security. Unfortunately, we did not design our study to address this issue.
The future of the life science enterprise is vested in the next generation of men and women currently in training. This study is the first national study to show that data withholding has demonstrated significant negative effects on trainees. The life sciences as a field, more so than chemical engineering or computer science, will likely have to wrestle with this issue among the next generation of scientists. Failure to do so could result in needless delays in research, inefficient training programs, and a culture of withholding among life scientists in the future.
Acknowledgments
The authors would like to acknowledge the contributions of Sanford Chodosh, MD, at the Boston University School of Medicine, Sandra Feibelmann, MPH, and Gregory Koski, MD, PhD, MBA, of the Institute for Health Policy and Brian Clarridge, PhD, and Dragana Bolcic-Jankovic at the Center for Survey Research at the University of Massachusetts.
This study was funded by a grant from the Office of Research Integrity at the U.S. Department of Health and Human Services.