The recent publication in EPIDEMIOLOGY of a graph about semen quality over time  - data that were somehow buried in a governmental report in Denmark - again raises the much-debated point of public access to data [2, 3, 4].
The mere fact of questioning a policy of public access to data, seems like being ‘against motherhood and world peace’. Isn’t it true that “Science is about debates on findings,” “Science serves people, and people (taxpayers) paid for it,” and “Expensive research data should become available to others”? Yet, the issues are more complex than the simple idea that ultimately we will all benefit from open access to data.
Firstly, what is meant by ‘data’? The original unprocessed MRI scans, blood, tissue, questionnaires? Or the processed data – determinations on blood, coded questionnaires? The cleaned data - with the possibility that the authors already have ‘massaged’ inconveniences? The analysis files – in which the authors have extensively repartitioned and recoded the data (another round of subjective choices)? Data should be without personal identifiers – of course – but in our digital age people can be identified by combinations of seemingly innocent bits of information. And, finally, should all discarded analyses, or discarded data, also become publicly available – to check what the authors ‘threw way’ and whether their action was ‘legitimate’?
Secondly, to what extent is the public as the taxpayer, or any organization that pays for the research, really the full owner of the data? Data exist because of ideas about how to collect and organize them. There is intellectual content, not just by the researchers, but also by their research surroundings, their departments, universities, and governmental organizations that make research intellectually possible. Data in themselves are not science. Giving your data to someone else is not an act of scientific communication. Science exists in reducing data according to a vision - some of which may develop during data analysis. Should researchers not have a grace period for the data they collected, or perhaps two: first a period in which they are the sole analysts, and then a period in which they share data only on conditions?
Thirdly, how protective can a researcher remain about her data? Should a researcher have the right to deny access to her data to particular other parties? Richard Smith, the former editor of the BMJ, stated in his blog that denying access is a wrong strategy – why fear open debate, it will only lead to better analyses? In his opinion, one should not deny data access even to the Tobacco Industry .
Reality is different: researchers know that when a party with huge financial interests wants access to data, there are three scenarios.
Scenario 1: they search and find some error somewhere in the data. This is always possible –no data are error-proof. The financially interested party will start a huge spin-doctoring campaign, proclaiming loudly in the media that the data are terrible. Remember the discussions on the climate reports?
Scenario 2: another analyst is hired by the interested party, and comes to the opposite conclusion. This is published with a lot of brouhaha. The original researcher writes a polite letter to the editor, explaining why the reanalysis was wrong. The hired analyst retorts by stating that it is the original analysis which was in error. Soon, only the handful of people who really know the data can still follow the argument. That is the signal for a new wave of spin-doctoring, in which medical doctors give industry-paid lectures stating that “even the experts do not know any more; we poor consumers should use common sense; most likely, nothing is the matter”. I witnessed this scenario in a controversy on adverse effects of oral contraceptives. A class action suit was deemed unacceptable by a UK court because, in a meta-analysis in which two competing analyses of the same data were entered (!!), the relative risk was 1.7. This number fell short of the magical 2.0, which is wrongly held by many courts as proof that there is ‘more than 50% chance’ that the product caused the adverse effect . Without studies and reanalyses directly sponsored by the industry, the overall relative risk was well over 2.0 . This was money well spent by the companies!
Scenario 1 and 2 have a name: “Doubt is our product” as it was originally coined by the tobacco industry: it is not necessary to prove that the research incriminating your product is wrong – nor that the company is right – it suffices to sow doubt. 
Scenario 3 is that the financially interested party subpoenas the researcher to testify over all parts of allegedly questionable aspects of the data in court. Detail upon detail is demanded. The researchers lose months (if not years) of research and their personal life. That scenario was played out against epidemiologists who did not find particular adverse effects of silicone breast implants . It is recently feared again as the next strategy by the tobacco industry in the UK .
Advocates of making data publicly available seem to live in an ideal dream world, in which for every Professor A whose PhD students always publish A, there exists a Professor B whose PhD students publish B. Such schools of thought combat each other scientifically with more or less equal weapons. Other scientists watch this contest and make up their mind as to who has the strongest arguments and data. This type of ‘normal science’ disappears when strong financial incentives exist. Then the weapons are no longer scientific publications, but public relations agents and lawyers. Of course, also in ‘normal science’, there are rivalries that can be strong. It happens that researchers do not want to share their complete data, or only part of the data under conditions. Often this is for the very simple reason that some sources of data, like blood samples, are finite.
Calls for making data publicly available need to take into account these scenarios. Some people hope that open information in the long run provides the ‘real’ truth. But in a shorter timescale, open information may also allow mischief by special interests, with plentiful resources, that are ruthless in their attempts to shape public policy. It seems difficult to ‘experiment’, i.e. to try open access to data for some time and then turn it back when the drawbacks seem too great.
An intermediary solution might be much more easy to implement. Tim Lash and I, following ideas of others, have proposed to make public registries of existing data . This would make it possible to start negotiating with the owners of the data about possible re-use. Such a registry might also facilitate the use of data in ways that were not originally planned. If controversy and distrust complicates the picture, trusted third parties can be sought to organize a reanalysis, with public input possible – a strategy recently proposed by a medical device maker .
In short, public access to data is much more complex than the proclamation of some principles that look so wonderfully scientific that nobody can argue against them.
Commentaries about this topic are greatly welcome. They can be published a full guest blog of about 450 words maximum. Please mail to Epidemiologyblog@gmail.com
Note: an earlier version of this blog was published as an opinion piece in the Dutch language newspaper NRC-Handelsblad in the Netherlands on 12 October 2011
© Jan P Vandenbroucke, 2011