Secondary Logo

The Research Conundrum: Where Does Data Monopoly End and Sharing Begin?

Fromer, Margot J.

doi: 10.1097/01.COT.0000292680.67766.54
National Academ of Sciences Workshop

WASHINGTON, DC—“If we are going to establish an intellectual framework for standards of access to data, we must first acknowledge that science is a cumulative effort—that is, amassing data that is based on others' work.” So said Eric S. Lander, PhD, speaking here in his keynote address at a recent day-long workshop, “Community Standards for Sharing Publication-Related Data and Materials,” sponsored by the Board on Life Sciences of the National Academy of Sciences, held at the Academy's headquarters.

Dr. Lander, Founder and Director of the Whitehead Institute Center for Genome Research and Professor of Biology at Massachusetts Institute of Technology and one of the leaders of the Human Genome Project, went on to describe data sharing as a social bargain. “If we want to maximize the total social product, we ought to create a limited monopoly for full disclosure of information,” he said.

Patent laws require submission of full and complete data, enough that others can replicate the work, he explained. Peer-reviewed scientific journals, while not necessarily requiring every last bit of data for publication, ask for enough to enable reviewers to adequately evaluate the quality of the work.

The major issues that form the background to the problem of access to scientific data are data withholding in academia—in the field of genetics in particular—and unprotected databases.

Dr. Lander acknowledged that while it is certainly possible to erect restrictions on full access to data, it is not a good idea, and data should never be restricted after publication.

“If you don't want others to have access to your data, don't publish it,” he said. “It's as simple as that.”

There was some grumbling in the audience after this flat-out statement, but Dr. Lander went on to justify his opinion by describing the unintended consequences of restricting such access.

“Restrictions on disclosure would decrease the flow of information and will create divisiveness in the scientific community,” he said. “Moreover, it would ‘balkanize’ knowledge and result in an inability to create combinations of knowledge that are among the underpinnings of scientific discovery.”

Figure. E

Figure. E

There are practical problems as well, he added. “If you're going to restrict data, which authors would be sanctioned? Would you refuse to divulge information to academics, to commercial enterprises? Making these decisions would create a whole new— and unnecessary—layer of review.”

Back to Top | Article Outline

Major Issues Involved

The major issues that form the background to the problem of access to scientific data are data withholding in academia—in the field of genetics in particular—and unprotected databases.

Free and open sharing of scientific information, which is vital to replication of published results and the advancement of science, is often breached, speakers said.

Back to Top | Article Outline

Survey of Geneticists

According to a survey conducted by a team led by Eric G. Campbell, PhD, of the Institute for Health Policy in Boston and published earlier this year in the Journal of the American Medical Association (2002;287:473–480), 47 percent of the geneticists surveyed (out of 1,849 respondents) said that at least one of their requests to fellow faculty for additional information, data, or materials had been denied in the preceding three years. Ten percent of all postpublication requests for additional information were denied. And of those who had been denied access to data, 28 percent said they had been unable to confirm published research.

Among geneticists who said they had intentionally withheld data from their peers, 80 percent said that it required too much effort to comply with the request; 64 percent reported that they were protecting the ability of a graduate student, postdoctoral fellow, or junior faculty member to publish; and 53 percent said they were protecting their own ability to publish.

Thirty-five percent of the respondents said that data sharing had decreased during the last decade, and 14 percent said it had increased.

In a paper written for a National Academy of Sciences panel on scientific responsibility convened in 1993, Robert A. Weinberg, PhD, who discovered the first human oncogene and the first tumor-suppressor gene, commented that secrecy appears to be more common in genetics than in other areas, for two possible reasons.

First, academic geneticists publish more, teach more, and serve in more leadership roles than do those in other biomedical specialties, he said. Therefore, sharing and withholding data may have a particularly strong effect on university policy.

Second, Dr. Weinberg wrote, understanding the role of genetics in human disease is believed to be important to the future of medicine, and the progress made in mapping and sequencing the human genome is a major step toward scientific breakthroughs. The rate of progress in developing gene-based diagnostics and preventive and therapeutic technology depends to a great extent on the free flow of the results of genetic investigation.

Back to Top | Article Outline

Unprotected Databases

Also discussed at the workshop was the fact that databases have become an integral part of scientific research, but in the United States, they are almost completely unprotected under intellectual property laws.

Creating a reliable database is an extremely expensive and time-consuming endeavor that requires considerable personnel and resources. It must be constantly updated and verified, as well as presented in a user-friendly way. And the whole thing is open and free to anyone who wants to use it.

The situation is different in Europe. In 1996, the European Union (EU) started a database directive that grants sui generis protection to databases created there and in other countries with similar protection. This placed the US at a disadvantage, so to protect itself against undue competition and piracy, the database industry submitted a draft treaty proposal to the World Intellectual Property Organization.

At the same time, legislation was introduced in Congress to create protections similar to those enjoyed by EU countries. The industry has lobbied hard, but to date, neither proposal has been adopted.

To make matters worse, the EU directive includes a reciprocity provision that denies protection to databases produced in non-EU countries that do not offer comparable protection. Therefore, US databases are highly vulnerable to foreign competition and piracy—as they are in the US.

Back to Top | Article Outline


In order to provide substance to somewhat ephemeral issues, Thomas Cech, PhD, President of Howard Hughes Medical Institute, moderated a discussion of three hypothetical situations. Panelists and the audience were asked to react and describe how the situations should be handled.

The first involved a hypothetical well-known senior investigator who publishes a paper with others in his laboratory about generating knock-out mice and characterizing them with a polyclonal antibody. The mice reproduce poorly. The investigator plans to use the antibody for further experiments, but he has only a limited amount of reagent.

After receiving requests for materials associated with the paper, he decided to suggest that he is willing to distribute the reagent as part of a collaboration that would include coauthorship with some of the requesters. To complicate the situation, a young female investigator also has requested the materials—several times, but she has not received a response. She believes that the journal in which the work was published will not sanction the senior and well-known investigator and wonders if she should complain to the agency that funded his work. On the other hand, she does not want to damage her career.

The panelists for this discussion:

  • ▪ Maria C. Freire, PhD, CEO of Global Alliance for TB Drug Development and formerly Director of the NIH Office of Technology Transfer.
  • ▪ Michael Hayden, PhD, Professor of Medical Genetics at the University of British Columbia and Director of the Center for Molecular Medicine and Therapeutics in Vancouver.
  • ▪ Ira Mellman, PhD, Professor of Cell Biology and Immunobiology at Yale Medical School and Chairman of the Department of Cell Biology.
  • ▪ Elizabeth F. Neufeld, PhD, Professor and Chair of the Department of Biological Chemistry at UCLA School of Medicine.

They were asked to consider:

  • ▪ The pressures faced by the young scientist.
  • ▪ What should be considered a reasonable turnaround time for requests for data or reagents?
  • ▪ Under what circumstances is it fair for a senior investigator to request collaboration in exchange for published material?

Dr. Freire said there were three types of issues: practical, legal-regulatory, and the ethos of the situation. She noted that there are indeed times when materials should not be transferred—for instance, when they are clearly or potentially dangerous.

“I remember that when I was at NIH we had a request from Iraq, which posed sufficient problems that we denied it,” she said. “I think you ask the requester what he or she intends to do with the material or reagent.”

She was very clear about her opinion of published information. “If it's out there, it should be given away. It's very simple, and you can't have it both ways. You can't publish and still expect to maintain secrecy.”

Dr. Mellman agreed and went a step further: “Everything generated should be made freely available,” he said. “Of course, it's not as easy as it seems. Maybe there should be a waiting period of six months to a year between publication and sharing the material.”

“If you're going to restrict data, which authors would be sanctioned? Would you refuse to divulge information to academics, to commercial enterprises?”

Dr. Hayden opined that scientists should share their materials and reagents, but they should be appropriately compensated for the cost of providing them, especially when mice are involved—and the journal article should stipulate this.

He also said he believed that almost all scientists are reasonable about sharing data and materials and that such behavior frequently leads to collaborative and other types of positive relationships.

Regarding the request for collaboration by the senior scientist in the hypothetical situation, Dr. Freire said, “The collaboration should be genuine—no blocking of publication and no reward of authorship if actual work wasn't done on the published results.”

Dr. Neufeld noted that the young investigator might have been ignored because she was a woman, and several women in the audience nodded in agreement.

Back to Top | Article Outline

Scenario #2: Primary Brain Imaging Data

The next scenario, discussed in a break-out group moderated by Mary Waltham, an independent consultant who is a member of the Workshop planning committee and the former President and Publisher of Nature, involved a short article submitted to a hypothetical journal called Neural Hieroglyphica.

The article was an analysis of functional and physical changes in the brains of schizophrenics shown in the results of several functional magnetic resonance imaging (fMRI) procedures. (Functional MRI studies are based on differences between several images, and usually only the “difference” image is published.) One of the reviewers asked the journal editor to obtain the complete set of fMRI data from the authors in order to check the findings. The authors refused.

After the article was published, a colleague in an institution without fMRI facilities asked to see the primary (unpublished) data that the authors had used to construct the images on which measurements were taken. The published data consisted of a summary of quantitative measurements.

The colleague found that the article's conclusions did not fit with his anatomical studies of deceased schizophrenics. The authors claimed that the data would be used for a follow-up paper and refused to share the data—at least for the time being.

At the workshop, panel members were asked to discuss the following:

  • What should journal editors do when authors refuse to supply all primary data to reviewers? Some participants thought it depends on the nature of the data and the need for legitimate scholarly review. Others were adamant in thinking that authors have a clear obligation to show the data to reviewers so the information can be adequately evaluated. There was also some disagreement about original data always being available to a journal editor—who the group thought should exercise good judgment regarding the uses to which the information is put.
  • To what extent is “data mining” a privilege of the team that originally collects the primary data? The general consensus was that before publication, the choice belongs to the authors. They may share the data, but they may also attach conditions, which need to be respected by those receiving the data. After publication, primary data are fair game for anyone.
  • To what extent should primary data involving expensive facilities and human or higher animal subjects be made available for evaluation and alternative interpretation—and within what time limits and under what constraints? Again, there was consensus: After publication, the data should be made freely available. The only constraint is the guarantee of anonymity for human subjects. One participant commented that all data can be interpreted in more than one way. Someone else noted that if there is too much data, potential reviewers might refuse to serve, because the task is too onerous and time-consuming.
  • Is the cost and time it takes to collect materials a factor? If so, how? The participants agreed that cost and time are immaterial as long as the data are published.
  • Is the source of funding (public or private) a factor? This is relevant only to ownership of results and assignment of property rights, not to the availability of data once published.
Back to Top | Article Outline

Scenario #3: The Virtual Heart

In the third scenario, a hypothetical company called Cardiomics announced its proprietary Virtual Heart, which includes models of the heart that reflect various stages of cardiovascular disease, genetic disorders, and the consequences of infarction. It also contains information about how the heart would behave under a variety of nutritional, genetic, and pharmaceutical circumstances.

“Restrictions on disclosure would decrease the flow of information and will create divisiveness in the scientific community.“

The mythical Virtual Heart incorporates extensive experimental data collected by Cardiomics, which is now supposedly trying to publicize its product. As part of this effort, the company has submitted two papers on Virtual Heart to high-profile, for-profit journals.

The proprietary Cardiomics database is 500 terabytes of Oracle-relational database tables, including data on genetic polymorphism profiles in families with heart disease, MRI images, and EKGs. The Virtual Heart program itself is 500,000 lines of code.

The first hypothetical paper gives an overview of the entire Virtual Heart system, including the software and database. The central point is that the heart is useful for cardiac experimentation and diagnosis, but neither the database nor the software is available from Cardiomics. They are closely held proprietary assets.

The second paper describes a specific result in which the Virtual Heart system is used to predict that thrombospondin variants are probably associated with early heart attacks. This computational prediction is validated by experimental results that are fully described in the paper.

The panelists for this discussion:

  • ▪ Barbara Cohen, PhD, Executive Editor of the Journal of Clinical Investigation.
  • ▪ Ari Patrinos, PhD, Associate Director of the Office of Biological and Environmental Research in the Department of Energy.
  • ▪ James A. Wells, PhD, President and Chief Scientific Officer of Sunesis Pharmaceuticals.
  • ▪ Robert H. Waterston, MD, PhD, the James S. McDonnell Professor of Genetics at Washington University School of Medicine and Head of the Department of Genetics Developmental Biology and Molecular Genetics Program.

The panelists agreed that the first paper sounded more like advertisement than science, and even if it were to be accepted for publication, it could not stand on its own unless accompanied by the second paper, which appeared to have real scientific value.

“Broad, feel-good statements do not benefit science,” said Dr. Patrinos, who added that he worried about the creeping commercialization of science. “I have no idea how the relationship between the public and private sectors will evolve, but it seems to me that the role of big public funders of private research will diminish.”

Regarding submission of scientific data as a requirement for publication, Dr. Waterston maintained that it is the reviewers' responsibility to determine if the science is adequate, which they cannot do without access to the data.

Dr. Wells said, “This discovery is so inspirational and would be so useful that there might be real value in requiring only partial details of the scientific data before publication—if the full data were provided within a certain amount of time—a couple of years, for example.”

The panelists, as well as members of the audience concurred that enough scientific data must be made available to the journal editor and reviewers so that the conclusions claimed can be verified. Every last bit of data does not have to be submitted, they said, although there should be some type of plan for eventual dissemination.

© 2002 Lippincott Williams & Wilkins, Inc.
Home  Clinical Resource Center
Current Issue       Search OT
Archives Get OT Enews
Blogs Email us!