‘You can have data without information, but you cannot have information without data.’
(Daniel Keys Moran, US computer programmer and science fiction writer)
On 30th November 2011, Peter Gøtzsche from the Nordic Cochrane Centre in Copenhagen, addressed the European Parliament in Brussels while the conference ‘Horizon 2020: Investing in the Common Good’ was taking place. The conference intended an examination of what it meant to treat knowledge as a public good in policy-making and how this might influence future European Union funding schemes for research and innovation. His talk focused on the moral obligation and benefits to society of providing free access to all anonymised raw patient data for clinical research.1
In recent years, some major clinical research funders (http://www.gatesfoundation.org/what-we-do; http://grants.NIH.gov/grants/guide/notice-files/NOT-OD-03–032.html; http://www.mrc.ac.uk/research/research-policy-ethics/data-sharing/policy) and biomedical journals2,3 (http://www.nature.com/authors/policies/availability.html; http://www.trialsjournal.com/about#trials) have adopted policies supporting or even insisting on clinical trial sharing. This is intended to increase reproducibility and reliability of results and to maximise return on investment in research, with the eventual aim of improving healthcare.4
The question of whether research data should be shared or not has been the subject of intense debate for a few years, and an example is the recent controversy surrounding the crystalloid versus hydroxyethyl starch trial, published in the New England Journal of Medicine in 2012.5 As a clinician and/or researcher, one is left with the dilemma of whether or not research data should be shared with the scientific community.
Why should research data be shared?
Efficacy and safety of medical treatment rely on evidence coming from well conducted (multicentre) randomised controlled trials (RCTs). Given the time, effort and expense required to conduct such trials, it is unlikely that other researchers will independently repeat a trial. As a consequence, today, the results produced and published by the original research team are often accepted as the ultimate truth and there is little opportunity for reanalysis.6 However, the assumption that the interpretation of a clinical trial is straightforward and that another group would come to the same conclusion is increasingly challenged.7
There is more than one way to influence the way that data might be interpreted. A first concern is selective reporting, when the investigators chose to restrict the amount of their data released to the public, while keeping the remainder inaccessible.8–10 Indeed, most often the individuals who design and conduct the (multicentre) RCTs are the only ones to have access to the raw data. Those who own the data – scientists or industry – usually decide on what, when and where data will be published. This deprives the research community and clinicians of possible findings that are not disseminated.11 As long ago as 2004, a comparison of published articles with trial protocols indicated that reporting of trial outcomes is frequently inadequate and biased in favour of statistical significance. Moreover, the reporting of primary outcomes in the published studies often appeared to be inconsistent with those specified in the trial protocols8 or what was reported to the US Food and Drug Administration (FDA).12 Important discrepancies were also observed between what was reported in ClinicalTrials.gov and what was reported in the published study.13,14
A second concern is the observation that there may be a significant publication delay or even no publication at all in certain cases. According to observations by Ross et al.,15 less than 50% of trials were published within 2 years of completion of the study. Similarly, only 46% of clinical trials funded by the National Institutes of Health were published within 30 months after completion of the trial16 and less than 50% of the new drug trials submitted to the FDA were published within 5 years after approval of the drug.17 Although negative trials were less likely to be published, some 34% of the positive trials also remained unpublished after 5 years. The question now is whether these missing data affect conclusions and recommendations with regard to medical treatments. This issue was addressed by Hart et al.18 who investigated the effect of unpublished data on the results of meta-analyses of drug trials. They observed that inclusion of unpublished FDA trial data caused 46% of the summary estimates from meta-analyses to show lower efficacy of the drug, 7% to show identical efficacy, and 46% to show greater efficacy. Of note, the summary estimate of the single harm outcome showed the drug to be more harmful after inclusion of unpublished FDA trial data. This raises the suspicion that when data are missing they are unlikely to be missing by chance.19
Third, reanalysis of data may lead to conclusions other than those originally published. In 2014, Ebrahim et al.20 reported on a literature search of previously published reanalyses of RCTs between 1966 and March 2014. A first finding was how infrequently (37 reports) data reanalysis was undertaken – or at least reported – in medical research. Of these, only 16% were conducted by entirely independent authors. Interestingly, 35% of these reanalyses led to interpretations that differed from those of the original article.
Taken together, the facts and arguments presented strongly suggest that patient care may strongly benefit from open access to trial data for additional analysis or reanalysis.
Why should research data not be shared?
Not all concerned parties are in favour of implementing open research data access, and they have their reasons.
A principal concern of investigators is that they have invested a substantial amount of time, effort and money in designing the trial and collecting the data. It is understandable if they show reluctance to see their work and energy automatically provided for the benefit of other researchers. However, there are also other, less personal, concerns.
First, there is risk of inappropriate dredging of data sets, resulting in spurious findings. Indeed, while the primary analysis usually is straightforward, the same data may be used for analysis of outcomes or questions for which the trial was not initially designed. Although methods, such as propensity or instrumental variable techniques, may allow for comparison of patient groups that are not randomly distributed, evidence indicates that such methods were applied optimally in less than 5% of cases, and that related deficiencies were not detected by the regular peer review process.21 The concern is that inaccurate analyses and conclusions may be drawn from data in the public domain with the potential to adversely affect treatment with subsequent harm.22 An additional concern is that the flaws and limitations of the experimental protocol, which were well recognised by the original investigators, may remain undetected in subsequent analyses. Again this may lead to erroneous conclusions.22
Second, requests for open access to research data may come from nonexperts, academic competitors or parties with a commercial interest in a reanalysis or alternative analysis of the data. This raises the question of who should be allowed to access the data; should access be open or should some registration process be used, and if so, who would be responsible for reviewing these applications?11 Also, if reanalysis of data is planned, should it be left to the individual who makes the request or should it be performed by independent investigators? Finally, who should fund these secondary analyses? Substantial resources are needed to collate and annotate trial data so that future investigators might be able to analyse them.
Third, where should the data be stored to allow access by others? Should this be the responsibility of journals or should this depend on organisational initiatives such as the National Institutes of Health or the Yale University Open Data Access Project?
Finally, there is the risk of loss of patient confidentiality when research data are publicly shared. However, this can easily be prevented by legal restrictions and removing identifiers from databases.6
How to move forward?
There is an increasing call for open science and information exchange through data sharing not only by health authorities but also by research funders, major journals and individual researchers. This is true not only for research funded from public resources but also for industry-sponsored trials. Although the need for such open access to research data seems rather evident, its general acceptance today is still hampered by the absence of a well defined and legislated structure to implement such an initiative. Another major problem is the source of funding: should this rely on public funding, dependent on industry and other grants or a combination of both?
The future successful implementation of open access to research data requires the development of a strict protocol that sets out the process to be followed. Several criteria have been suggested for the conduct of reanalyses.23 These should be as formal and specific as any detailed study protocol, and should be registered in the same way as the original trial. Results of these reanalyses should be shared publicly through registration sites, presentations at scientific meetings and peer-reviewed publications. Needless to say, appropriate etiquette implies sharing of these results with the original investigators prior to public dissemination.
It is one thing to identify what needs to be done, and another to develop the infrastructure needed to make the ideas a reality, and yet another to provide the funding. It also requires a change in the mindset of all stakeholders, industry and individual research groups, who should understand that data, collected from patients and used to improve patient health, are not private property.
Acknowledgements relating to this article
Assistance with the Editorial: none.
Financial support and sponsorship: none.
Conflicts of interest: none.
Comment from the editor: CMS is chair and SDH past-chair of the ESA Scientific Committee. Both are Associate Editors of the European Journal of Anaesthesiology. This article was checked and accepted by the Editors, but was not sent for external peer-review.
1. Gøtzsche PC. Strengthening and opening up health research by sharing our raw data. Circ Cardiovasc Qual Outcomes
2. Laine C, Goodman SN, Griswold ME, et al. Reproducible research: moving toward research the public can really trust. Ann Intern Med
3. Groves T. BMJ policy on data sharing. BMJ
4. Rathi VK, Strait KM, Gross CP, et al. Predictors of clinical data sharing: exploratory analysis of a cross-sectional survey. Trials
5. Doshi P. Data too important to share: do those who control the data control the message? BMJ
6. Krumholz HM, Peterson ED. Open access to clinical trials data. JAMA
7. Gøtzsche PC. Why we need easy access to all data from all clinical trials and how to accomplish it. Trials
8. Chan A-W, Hrobjartsson A, Haahr MT, et al. Empirical evidence for selective reporting of outcomes in randomized trials. Comparison of protocols to published articles. JAMA
9. Ross JS, Gross CP, Krumholz HM. Promoting transparency in pharmaceutical industry-sponsored research. Am J Public Health
10. Wager E, Elia N. Why should clinical trials be registered? Eur J Anaesthesiol
11. Ross JS, Lehman R, Gross CP. The importance of clinical data sharing. Toward more open science. Circ Cardiovasc Qual Outcomes
12. Rising K, Bacchetti P, Bero L. Reporting bias in drug trials submitted to the Food and Drug Administration: review of publication and presentation. PLoS Med
13. Hartung DM, Zarin DA, Guise J-M, et al. Reporting discrepancies between the ClinicalTrials.gov results database and peer-reviewed publications. Ann Intern Med
14. Becker JE, Krumholz HM, Ben-Josef G, et al. Reporting of results in ClinicalTrials.gov and high-impact journals. JAMA
15. Ross JS, Mulvey GK, Hines EM, et al. Trial publication after registration in ClinicalTrial.gov: a cross sectional analysis. PLoS Med
16. Ross JS, Tse T, Zarin DA, et al. Publication of NIH funded trials registered in ClinicalTrials.gov: cross sectional analysis. BMJ
17. Lee K, Bacchetti P, Sim I. Publication of clinical trials supporting successful new drug applications: a literature analysis. PLoS Med
18. Hart B, Lundh A, Bero L. Effect of reporting bias on meta-analyses of drug trials: reanalysis of meta-analyses. BMJ
19. Krumholz HM. Open science and data sharing in clinical research. Basing informed decisions on the totality of the evidence. Circ Cardiovasc Qual Outcomes
20. Ebrahim S, Sohani ZN, Montoya L, et al. Reanalyses of randomized clinical trial data. JAMA
21. Austin PC. Primer on statistical interpretation or methods report card on propensity-score matching in the cardiology literature from 2004 to 2006: a systematic review. Circ Cardiovasc Qual Outcomes
22. Spertus JA. The double-edged sword of open access to research data. Circ Cardiovasc Qual Outcomes
23. Christakis DA, Zimmerman FL. Rethinking reanalysis. JAMA