Blumenthal, David MD, MPP; Campbell, Eric G. PhD; Gokhale, Manjusha MA; Yucel, Recai PhD; Clarridge, Brian PhD; Hilgartner, Stephen PhD; Holtzman, Neil A. MD
Secrecy is a fact of life in academic science. Although policymakers and many investigators widely support the ideal of open and free exchange of information, departures from this ideal are common, at least in the life sciences, the only field for which quantitative information is available.1–7
The process of information exchange in science is complex and variegated, and many forms of data withholding may affect the rate and direction of scientific progress. Past research on data withholding in biomedical research has shed light on the prevalence, causes, and consequences of data withholding (defining data to include the full range of research results, techniques, and materials useful in future investigations).3,8–10 As expected, the causes often reflect well-known characteristics of the scientific process, such as investigators' desire to protect their own and their mentees' scientific priority.11 In other cases, influences on data sharing stem from recent changes in the U.S. research system. These factors include the strictures of commercial funders,12 and the material and financial costs of responding to requests for biomaterials.2,4 Funding agencies, including the National Institutes of Health (NIH) and the National Human Genome Research Institute, have developed policies to encourage data sharing by their grantees,13 in part by addressing underlying factors, such as the associated financial costs.14 In addition, previous research suggested that geneticists were more likely to engage in data withholding than nongeneticists.1,2
The better our understanding of the phenomenon of data withholding, the more successful efforts to encourage data sharing are likely to be. In this regard, much remains to be learned. Our past research from the data set on which this article is based has focused predominantly on one form of data withholding: denial of requests for information about published research results.4 While this is arguably one of the most noteworthy forms of data withholding, affecting the ability to replicate and build on the peer-reviewed published work,4 other, more common forms of data withholding may similarly frustrate the progress of science and disrupt trust and collegiality among investigators. This study was intended to extend our previous research on this topic.4
In this article, we address significant gaps in the previous literature on secrecy in science by among other things, documenting the prevalence of a wide range of data-withholding behaviors and exploring how training experiences affect them. Drawing on data from a survey conducted in 2000 of more than 1,800 life scientists in the 100 U.S. universities that received the most NIH funding, we examine two main questions:
* How likely are life scientists to engage in 13 different data-withholding behaviors?
* What are the predictors of these varied forms of data withholding?
A sample of 3,000 life scientists was selected in a multistep process. Using lists from the NIH, we identified the 100 U.S. educational institutions that received the most funding from the NIH in 1998. We selected all departments and/or programs in genetics and human genetics for the schools with the most grant funding, and then we randomly selected up to three other clinical (medicine, pathology, psychiatry, pediatrics, surgery) and nonclinical (biochemistry, microbiology, pharmacology, physiology, anatomy) life science departments and programs chosen because they received the largest number of NIH grants in 1998.
Using information from diverse sources, such as the Association of American Medical Colleges' Faculty Roster, Peterson's Guide to Graduate Programs in the Biological Sciences,15 school Web sites, and direct contact with departments, we identified all full-time faculty at the rank of assistant, associate, and full professor in each selected department and program and sampled among them. We excluded part-time faculty, lecturers and instructors, and any clinical faculty who did not have at least one published article in the National Library of Medicine's Medline database in the three years preceding the study.
Finally, because of a special interest in genetics generally and human genetics specifically, we included all faculty members who had been principal investigators on at least one research grant from the Human Genome Project administered by the National Human Genome Research Institute and the U.S. Department of Energy in the five years preceding the study (excluding those who only received grants from the Ethical, Legal, and Social Implications of Human Genetics Research program).
Our final stratified sample had 3,000 faculty members and included all grantees of the Human Genome Project (no. = 219), all faculty members in departments of genetics or human genetics (no. = 1,547), and randomly selected individuals in nonclinical (no. = 617) and clinical (no. = 617) departments.
Survey instrument design and administration
The design of the survey instrument was informed by two focus-group discussions, 20 semistructured interviews with knowledgeable biomedical researchers, discussions with colleagues, and a review of the literature. The survey instrument was pretested using cognitive interviews conducted by professional interviewers at the Center for Survey Research (CSR) of the University of Massachusetts in Boston.
The CSR administered the survey instrument by mail between March and July 2000. Participants were sent a letter, a fact sheet describing the study, a survey instrument, and a postage-paid postcard. They were asked to complete the survey instrument and mail the postcard separately from the completed instrument to the CSR. We were able to track nonrespondents via the postcard while assuring respondents' complete anonymity since the survey instrument had no identifying information. Nonrespondents were mailed a letter encouraging their participation, additional survey instruments, and were then contacted by telephone and encouraged to participate. Of the potential 3,000 participants, four were in the sample twice, seven were deceased, and 96 were ineligible because they were retired, out of the country, not located at the sampled institution, or lacked a faculty appointment. Thus, 2,893 faculty members were eligible to participate.
Dependent variables and measures
We asked participants whether they had engaged in 13 specific types of data-withholding behaviors. To make subsequent analyses manageable, we grouped these behaviors into two categories: withholding in verbal exchanges about unpublished research (verbal withholding) and withholding in and around the publishing process (publishing withholding). Our rationale for these groupings was both conceptual and empirical. Conceptually, verbal communication about unpublished information and publishing in the peer-reviewed literature differ in fundamental ways. Publishing is critical in securing scientific credit and is vital to promotion, future funding, and historical recognition. Verbal communication of unpublished results also has benefits, providing informal peer review, laying the groundwork for collaboration, and allowing exchange of ideas. However, verbal sharing never provides closure the way publishing does. Therefore, we hypothesized that withholding behaviors related to publishing will differ from withholding behaviors related to verbal sharing of unpublished results, and that these two phenomena will be influenced differently by independent variables. Nevertheless, we recognize that in the complex phenomenon of scientific exchange, verbal and publishing behaviors are not entirely distinct. Both have risks and benefits, and these must be constantly balanced by investigators. Therefore, to empirically test the usefulness of grouping variables related to publishing and verbal communication of unpublished results, we conducted a factor analysis, a common statistical procedure used to identify and represent the nature of the relationships between variables.16,17 Although not presented here, the results of this analysis confirmed both the groupings of variables and the existence of two basic factors corresponding to verbal withholding and publishing withholding.
We asked, “In the last three years, how often have you intentionally withheld information (other than for lack of time) about your unpublished research?” Participants were asked about the following specific settings: “conversations with students or postdoctoral fellows in your department,” “conversations with other academic scientists,” “seminars in your department,” “seminars at other academic institutions,” and “formal presentations at professional meetings.” Response categories were never, rarely, usually, always, or does not apply. Respondents who answered “usually” or “always” to any of the above questions were classified as having engaged in verbal withholding.
We asked participants, “In the last three years” have you “had research at your university which resulted in trade secrets (information kept secret to protect its proprietary value)”; “omitted pertinent information from a manuscript submitted for publication” to “protect your scientific lead” or “protect its commercial value”; “delayed publication for more than six months” to “honor an agreement with a collaborator,” “protect the priority of a graduate student, postdoctoral fellow, or junior faculty member,” “protect your scientific lead,” “meet the requirements of a nonindustrial sponsor,” “meet the requirements of an industrial sponsor,” “allow time for a patent application?” Response categories were yes or no. Respondents who answered “yes” to any of these questions were classified as engaging in publishing withholding. Though participation in trade secrecy would seem logically to involve both verbal withholding of unpublished results and publishing withholding, our factor analyses found this behavior to load strongly on the publishing factor and not on the verbal factor. Therefore, we included it in publishing withholding.
Independent variables and measures
We collected data on a range of explanatory and control variables that were chosen based on past research and results of focus groups and interviews.2,9 The independent variables are grouped into three categories: personal characteristics, research characteristics, and data-sharing experiences.
We included three variables representing personal characteristics: gender (male/female); whether the participant trained in a graduate program in the United States (yes/no); and academic rank (full professor, associate professor, assistant professor).
We used seven variables representing various research characteristics.
Field of research (genetics versus other life sciences).
Respondents identified themselves as a geneticist by responding “yes” to the following question, “Do you consider yourself a genetics researcher? By genetics researcher we mean someone whose research involves any of the following: (1) identification of genomes, genes, or gene products in any organism; (2) study of the structure, function, or regulation of genes or genomes; (3) comparison of genes and genomes between species or populations.” Respondents who answered “no” were classified as “other life scientists” (OLS).
The survey asked, “In the last three years have you conducted research that involved living humans as research subjects?” (yes/no). Respondents answering “yes” were classified as human subjects researchers.
Participants were asked, “Approximately how many articles have you published in refereed journals in the last three years?” We grouped respondents into two categories: those with zero to six publications (low publication) and those with seven or more publications (high publication).
Perception of competitiveness of field.
We asked, “How would you characterize the overall level of competition for recognition or scientific priority in your specific area of research?” (very competitive, moderately competitive, not very competitive, not at all competitive). Responses were transformed into a dichotomous variable with “very competitive” responses considered one group and all other responses collapsed into a single category representing “not very competitive” fields.
Industry research support.
We asked participants to “indicate which of the following you are currently receiving or have received in the last three years. (Please indicate only those received from a company whose work is related to your area(s) of interest).” Participants were asked to specify: “research grants or contracts,” “funds from industry to support students or postdoctoral fellows,” or “gift(s), independent of a grant or contract, that support(s) your research” (yes/no). Respondents who answered “yes” to any of the items on the list were classified as “receiving industry research support.”
Other industry involvement.
We asked participants if they “currently hold or have held in the past three years” a variety of industry roles, including: “member of a board of directors” or “consultant for pay” (yes/no). We asked also whether they were “currently receiving or have received in the last three years” returns from commercialization of research, including “royalties from licenses,” “equity in a firm in return for your professional services or intellectual property”(yes/no). Respondents who responded “yes” to any of the above relationships were considered to have “other industry involvement.”
Participants were asked, “… in the last three years has the research you do at your university resulted in: ‘patents applied for,' ‘patents issued,' ‘patents licensed,' ‘a product under regulatory review,' ‘a product on the market,' ‘a start-up company?' (yes/no).” Respondents who answered “yes” to any of the above were classified as having “commercial activities.”
Experiences with sharing.
We used three variables measuring experiences regarding sharing of data.
Sharing discouraged in training.
We asked participants, “When you were in your pre- and postdoctoral training, how willing were your lab directors/mentors to share their research information, data, or materials with other academic scientists?” (very willing, somewhat willing, not very willing, not at all willing). We also asked, “At any time during your pre- or postdoctoral training, were you ever discouraged by a lab director/mentor from sharing your information, data, or materials with others?”(yes/no). Respondents who indicated that their lab directors/mentors were “not very willing” or “not at all willing,” and respondents who reported that pre- and postdoctoral lab directors/mentors discouraged sharing were characterized as having “sharing discouraged in training.”
Participants were asked, “During your pre- or postdoctoral training were you ever given formal instruction on issues related to data sharing?” (yes/no). Respondents who answered “yes” were characterized as having formal instruction.
We asked participants, “As a result of sharing your information, data, or materials with another academic scientist after publication have you ever”: “been scooped by another scientist,” “compromised the ability of a graduate student, postdoctoral fellow, or junior faculty member to publish,” “been unable to benefit commercially from your results,” “opened a new line of research,” “performed research that would otherwise not have been possible,” “formed collaborations that led to publications,” or “formed collaborations that led to grants?” (yes/ no). Respondents who answered “yes” to “been scooped,” “compromised a junior staff member,” or “been unable to benefit commercially from your results” were classified as experiencing a negative sharing outcome. Respondents who answered “yes” to “opened up a new line of research,” “performed research that would otherwise not have been possible,” “formed collaborations which led to publications,” and “formed collaborations that led to grants” were classified as experiencing a positive sharing outcome. Respondents who answered “yes” to any of the negative outcome questions and no to all of the positive outcome questions were classified as having a “negative outcome only.” Respondents who answered “yes” to any of the negative outcome questions and yes to any of the positive outcome questions were classified as having a “mixed outcome.” Those who answered “yes” to any of the positive outcome questions and “no” to all the negative outcome questions were classified as having a “positive sharing outcome only.
We performed analyses of the aggregated dependent variables (verbal withholding and publishing withholding) using standard statistical procedures to generate percentages and their associated chi-squared values. All analyses were weighted to adjust for differences due to the likelihood of being sampled and for differences in nonresponse rates within survey strata. All analyses were conducted using SAS statistical software (SAS Institute, Inc., Cary, NC) and SUDAAN (RTI International, Research Triangle Park, NC), a statistical software package that correctly computes the standard errors when determining statistical significance for survey data derived from complex sampling methods.
To examine relationships between independent and dependent variables, we conducted multivariate logistic regressions controlling for the personal characteristics, research characteristics, and sharing experiences variables described above. To avoid multicollinearity among the predictor variables, we excluded a number of other independent variables from the final multivariate model because they were correlated (between .3 and .8) with other independent variables.
We report results for geneticists and OLS separately for several reasons. First, although our sample design resulted in representative groups of geneticists and of OLS from the 100 sampled institutions, we could not be sure that combining these groups would result in information that is generalizable to the life science faculties as a whole. Second, when we analyzed data separately for geneticists and OLS, differences emerged in relationships between independent and dependent variables for these populations.
Characteristics of respondents
Of the 2,893 faculty members who were eligible to participate, 1,849 responded (64%). We interviewed 256 nonrespondents by telephone. We found nonrespondents were significantly less likely to be geneticists than were respondents and significantly more likely than respondents to be full professors and to receive a high number of requests for information, data, and materials related to their published research.
Table 1 provides characteristics of our respondents (n = 1,849) for the entire sample and for geneticists and OLS separately. Both geneticists and life scientists tended to be senior in rank, male, and trained in the United States. While geneticists were significantly less likely than OLS to perform human subjects research, they were significantly more likely to have had seven or more publications in the preceding three years, to perceive their field as competitive, to have had other industry involvement, and to have engaged in commercial activities. Roughly equal percentages of geneticists and OLS reported that sharing had been discouraged during their training, and they reported no significant differences in occurrence of positive, negative, and mixed outcomes of sharing. Finally, geneticists were more likely than OLS to receive formal instruction in data sharing.
Prevalence of data withholding
Table 2 shows the proportion of geneticists and OLS who reported engaging in each of the 13 specific forms of data withholding defined above, the two overarching categories defined previously, and any of the forms and categories.
Forty-four percent of geneticists and 32% of OLS had engaged in any form of data withholding. The relative frequencies of categories and individual behaviors tended to be very similar for both groups. Publishing withholding was the most common category of withholding: geneticists (35%) and OLS (25%). Verbal withholding in presentations at formal meetings was a prevalent individual behavior, as was verbal withholding at seminars at other academic institutions, and omission of information from published manuscripts to protect a scientific lead. Trade secrecy was reported by 12% of geneticists and by 9% of OLS.
In bivariate analysis, all independent variables except “Trained in the United States” were significantly associated with at least one form of data withholding on the part of either geneticists or OLS (see Table 3). Male gender was associated with both types of withholding among geneticists, but not among OLS. OLS conducting research with living human subjects were less than half as likely as colleagues not performing such work to have engaged in verbal withholding. The number of articles published in the last three years was significantly associated with all forms of withholding among OLS, but not among geneticists. Perceived competitiveness of the investigator's field and industry research support were associated with publishing withholding for both groups of investigators. Industry research support was associated with publishing withholding but not verbal withholding for both geneticists and OLS, and other industry involvement was significantly associated with all forms of withholding: greater verbal withholding in genetics, greater verbal withholding in OLS, greater publishing withholding in genetics, and greater publishing withholding in OLS. Commercial activities were associated with verbal withholding among geneticists and publishing withholding among OLS. Discouragement of sharing during training also showed a similar significant association with verbal withholding and with publishing withholding. Surprisingly, among geneticists, formal instruction in data sharing was related to increases in both verbal withholding and publishing withholding. Finally, respondents in both groups with positive sharing outcomes were less likely to engage in verbal or publishing withholding.
Table 4 shows the results from the multivariate logistic regression analyses predicting the likelihood of verbal and publishing withholding for geneticists and OLS. The model tested included all variables in Table 3. Among personal characteristics, male gender was consistently associated with increased likelihood of withholding but results reached statistical significance only for verbal withholding and publishing withholding among geneticists.
Several research characteristics were significantly associated with one or more forms of withholding in one or both groups. Verbal withholding was less common among OLS conducting human subjects research, and lower academic productivity was similarly associated with less verbal withholding among OLS. Where significant, industry relationships were associated with increased likelihood of withholding, although the pattern of association varied by type of relationship and field of investigation. Receipt of industry research support was significantly associated with publishing withholding among OLS. Other industry involvement was significantly associated with publishing withholding among geneticists, and almost significantly so among OLS. Commercial involvement was significantly associated only with verbal withholding among geneticists. Perceived competitiveness of field was related only to publishing withholding among geneticists.
Among experiences with sharing, discouragement of sharing during training was generally associated with withholding, reaching statistical significance for verbal and publishing withholding among geneticists. Formal instruction in data sharing also tended to be associated with withholding, with significant associations detected for verbal and publishing withholding among geneticists. Compared to mixed outcomes of sharing, having only negative experiences was associated with increased verbal withholding among geneticists, but reduced publishing withholding among both geneticists and OLS. Experience with positive outcomes was generally associated with reduced withholding.
In this article, we provide the first quantitative review of the prevalence of and influences upon a wide variety of data-withholding behaviors in science. Data from this survey of 1,894 life scientists at the United States' leading universities show that data withholding is relatively common and takes multiple forms, that a variety of characteristics of investigators and their fields may influence data-withholding behaviors, and that these influences vary by field of investigation and withholding behavior.
Academic scientists are more likely to withhold data from colleagues in association with publishing than in verbal communications regarding unpublished work. Several factors could account for this pattern. First, in a world of multiauthored papers and increasing numbers of professional journals, more investigators may participate in publishing than in making formal presentations, thus creating greater opportunities to engage in publishing withholding than in verbal withholding. Second, qualitative research on publication practices suggests that strategic incentives may often favor publishing withholding; for example, scientists engaging in “races” sometimes delay publication or publish partial results to avoid aiding competitors.7 Third, anecdotal evidence suggests that agreements with industrial sponsors and collaborators often explicitly require delays in submission of manuscripts to allow prepublication review; in contrast, such explicit requirements may not be present for some forms of verbal communication. Fourth, scientists may perceive the discussion of unpublished findings as less risky and are thus less likely to censor their discussions related to unpublished research.
Our data suggest that close personal relationships among scientists likely reduce the incidence of data withholding, especially in verbal communications regarding science. We found that rates of verbal withholding seemed to increase as the audience became less familiar to investigators, for example, outside the confines of their laboratories and departments. This finding suggests that perceived social distance between investigators and their actual or potential audience may be an important influence on scientists' inclination to withhold information. Where the audience is small and familiar, an investigator may feel that even sensitive information can be discussed in comparative safety. When the audience is large and unknown, the researcher may perceive data sharing as more risky and thus researchers would be more likely to engage in data withholding.
Our findings regarding gender and data withholding were somewhat intriguing. Overall, among geneticists, males tended to be more likely to engage in data withholding than their female colleagues, even after controlling for the effects of all the other variables under consideration. This finding is consistent with a recent study suggesting that in collaborative settings females are more egalitarian and perhaps collaborative than their male counterparts.18 Unfortunately, the data from our study provide no further insight into the basis of this finding which may ultimately be explained by factors related to the professional socialization of female scientists in genetics, the overall distribution of females among the subfields of genetics, and/or the existence of other sharing practices and behaviors that may selectively affect the sharing behavior of females in the field. Clearly, more research on the relationship between gender and data-withholding behaviors is warranted.
Several research characteristics were related to data withholding in our study, although with less consistency and power than might have been expected. Particularly surprising was the comparatively modest influence of perceived competitiveness of field on tendency to withhold (significantly affecting only publishing withholding among geneticists). It appears that competitiveness is a less powerful influence on sharing than is commonly supposed once other attributes of investigators' backgrounds, experiences, and fields are accounted for. It may also be that competitiveness may be more important when comparing differences in withholding among subfields in genetics (e.g., human genetics versus yeast genetics) rather than in comparisons between genetics and the other life sciences. It may also be that researchers' reports of perceived competitiveness simply do not adequately capture the nature and extent of competitive pressures. Regardless of the explanation, this finding suggests at a minimum that influences other than the competitiveness of the field may more powerfully affect data withholding. Here again, further research is needed to confirm this counterintuitive finding.
Among all of our variables, having other relationships with industry, such as serving as a consultant, serving on an industry board or owning equity, was associated with an increased likelihood of engaging in data withholding for geneticists and OLS. To our knowledge, ours is the first study to document that relationships with industry beyond those involving funding for university-based research also influence the withholding practice of involved faculty.
These findings along with our previously published research with respect to the influence of training experiences on later behavior of life scientists suggest that efforts to increase data sharing will depend importantly on changing how scientists learn to share and withhold data.19 The importance of this experience is further emphasized by the fact that a large percentage of our respondents—42% of geneticists and 38% of OLS (see Table 1)—reported being discouraged during training from sharing data. We had expected that formal training regarding data sharing would reduce the frequency of withholding, but found that exposure to such training had little apparent effect, except, paradoxically, to increase the likelihood of withholding among geneticists. One potential explanation for this latter finding is that lab directors might have provided formal training selectively to individuals who showed a predilection not to share in the first place. In addition, formal instruction might be more likely to occur in subfields of science, institutions, or specific labs in which a high baseline level of withholding draws attention to this problem. A third alternative explanation is that trainees are instructed when and how to engage in data withholding, particularly in situations where agreements with sponsors require protection of confidential material. While our data did not allow us to test these alternative explanations, it is clear that experiences while in training are important in future scientific behavior, and that universities interested in the openness of science in future generations must closely examine how their scientists in training are being mentored regarding data sharing and withholding practices.
At the same time we found it reassuring that a large percentage of scientists reported only positive sharing outcomes (43% of geneticists and 52% of OLS), that purely positive experiences were more common than purely negative experiences (17% for both groups), and that those with only positive experiences shared more data than those with only negative experiences. Overall these findings suggest data sharing is perceived as more beneficial to academic scientists than data withholding. However, we cannot explain why investigators with only negative experiences were less likely to withhold through publication than those with mixed outcomes. It may be that their negative experiences tended to be associated with verbal exchanges rather than publication, and that they reacted by selectively curbing verbal interactions, while trying to assure that they received full credit for published work. It may also be that the causal relationships that exist between sharing experiences and withholding behaviors are indirectly regulated by other variables not captured in our study such as the nature and extent of the relationship(s) between the parties in which the negative sharing experience(s) occurred, and thus are not transferable to all withholding behaviors.
This study had several limitations. Like all survey data, ours were subject to biases resulting from nonresponse and underreporting of behaviors that may be viewed as inappropriate. Our brief survey of nonrespondents suggests that those completing the survey instrument may have somewhat overrepresented junior and less productive faculty. Since junior and less productive faculty likely receive fewer requests for data sharing, they may have fewer opportunities to engage in data withholding. The overrepresentation of junior and less productive faculty could, thus, add to any general tendency for our data to underestimate the frequency of withholding behaviors. Our survey was focused on research-intensive universities. Data-withholding behaviors may display different prevalence and be influenced by different factors among faculty in less research-intensive universities. Finally, while our sample included almost the entire universe of U.S. geneticists, it only included a small sample of OLS. Different findings might emerge if groups of life scientists other than geneticists were studied.
While it is impossible to predict the future, the results of our study suggest that if current trends continue, data withholding is likely to remain prevalent in academic science. One of the main obstacles is the growing commercialization of U.S. universities.20 Consistent with the adverse effect of commercialization on data sharing was our finding that both geneticists and OLS who had had “other industry support” (which included returns from commercialization of research, equity in a firm in return for professional services, or intellectual property) were significantly more likely to demonstrate both verbal and publishing withholding in bivariate analysis. This finding may be especially true in genetics, where the number of U.S. patents for human genetic tests held by universities increased from 17 in 1991–1994 to 140 in 2001–2002. In 2002, more patents on genetic tests were held by universities or colleges than by companies.21 To the extent that commercialism of academic science and relationships with industry increase in the future, it is likely that the amount of data withholding will increase as well. If private and public institutions wish to alter this trend, then further research to identify the influences on data sharing and withholding will be essential.
This work was supported by a grant from the National Human Genome Research Institute (1-RO1-H601789).
The authors would like to acknowledge the contributions of Dr. Karen Seashore Louis and Dr. Melissa S. Anderson from the University of Minnesota.