Observational research is immensely successful. Our knowledge about the transmission of infectious diseases, from local outbreaks of food poisoning to the global spread of acquired immunodeficiency virus, derives mainly from observational research. All human genetic knowledge is observational—from initial observations of clustering of diseases in families up to molecular linkage analysis. The effects of environmental scourges, including smoking, lead exposure in paint or gasoline, or occupationally-induced exposures such as asbestos, have been analyzed in observational research. Almost all knowledge on adverse effects of medical interventions derives from observational research. All description of diseases, ie, their definition and their subdivisions, is based on observational research. Most diagnostic and all prognostic research is observational.
Paradoxically, observational research is regarded as somehow inferior relative to randomized trials, which are responsible for a much smaller fraction of useful medical knowledge. Randomized trials mostly have the luxury of starting with relatively high priors: 50–50 under equipoise. By that fact alone, positive findings from trials will lead to a strong posterior probability. In contrast, the hypotheses investigated in observational research, eg, about the causation of diseases, are much more tentative.1 Even if design and execution were to be flawless, a positive finding still would leave a causal or preventive hypothesis with lesser posterior probability of truth, and the hypothesis might more often falter upon replication. Randomized trials enjoy a second luxury: the formal aspects of their design and execution are highly codified, in contrast to observational research. For example, the choice of a control group in case-control studies rests on general principles that have to be reinterpreted and reapplied each time—which each time necessitates renewed critical reflection, of authors as well as readers.
Within the world of randomized controlled trials (RCTs), the CONSORT guidelines for reporting trials have been successful: they have improved the quality of reporting.2,3 However, I was wary when I heard about a group wanting to make a similar type of recommendations for observational research. Guidelines might be fit for highly-codified evaluation, but what about etiologic research?
I had been educated in a tradition in which a researcher each time starts thinking from scratch, by defining exactly what the problem is, what the aim of a study is, and then derives from first principles how to go about achieving that aim. Once you know how first principles translate into a piece of research, you also know which are the most important things to report. At the same time, you will know what to look for in papers on the same topic by others. So, why bother about guidelines? In the progress of science “anything goes”: if someone derives an argument that strikes colleagues as credible—even an argument resulting from a study design that was never used before—it becomes acceptable. Why stifle scientific creativity by guidelines?
I had a rather skeptical view of what I feared might lead to “cookbook” science. The first meeting of the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) group in Bristol, UK in 2004 was enlightening.4 It became clear that the initiative was not about “how to do observational research,” but “how to report observational research.” The initiative started from the point of view of persons who want to evaluate the literature, and who find that all too often elementary information is lacking in published papers. This focus was reassuring and looked worthwhile. Hence, my participation in the STROBE writing group was to help create what we preferred to call “recommendations” instead of “guidelines.”
Areas of Tension
The process of drafting the recommendations is described in the paper describing the STROBE Statement,4 which is being published simultaneously in several general medical and epidemiologic journals. The recommendations can be understood only alongside the longer Elaboration and Explanation (E&E) papers that is jointly published by Epidemiology and PLoS Medicine. Each paper influenced the other. While the original checklist was based on the discussions at Bristol, UK in 2004, and was first completed by the coordination group, we kept tinkering with it while writing the E&E paper; upon writing the explanation, we often realized that the checklist did not express sufficiently clearly what we meant to say.
The process was challenging. During the project, we had several areas of tension, with strong initial disagreement about what should be reported, what should not be reported, and why. Particularly difficult were topics such as subgroups, sample size, and interaction. However, while proceeding, we discovered the source of these tensions. Some of the STROBE writing group had their experience mainly in etiologic observational research; others mainly in evaluative research such as RCTs, still others in systematic reviews and meta-analysis. Thus, without being explicitly aware ourselves, the several participants had different mindsets: the mindset of the discovery type of science versus that of an evaluative type of science. The former were used to situations in which data from a case-control study are used secondarily for exploring possible new leads on causes of disease; the latter were used to conducting evaluation studies under strict protocols that permit no deviation. That difference translated at first into endless debates about how to think about reporting subgroup analysis (quasi-forbidden when not preplanned in RCTs, universally indulged in during etiologic research), the reporting of predefined sample size (necessary for the credibility of RCTs, only vaguely looked at in secondary use of existing data). The recognition of this tension led us to write an introduction to the E&E paper about the diverse purposes of observational research, ranging from discovery and first confirmation of new ideas, to evaluation under strict protocols. Going through this discussion was extremely useful, as it suddenly made things fall into place. The diverse research backgrounds and experience of the group meant that we had to come to a common view about reporting, which may make them more useful for a wider range of researchers.
Another area of tension was the intended audience. While the “E” in the STROBE acronym stands for epidemiology, we gradually realized that we were writing for a much larger research community: for anybody who would employ one of the major epidemiologic research designs, in whatever type of journal, and who would not necessarily have a professional epidemiologic or statistical background. This led us to explain some principles of epidemiology in the E&E paper, with ample reference to introductory and advanced textbooks. The E&E paper in places comes close to a “mini-textbook.” The reaction to this feature by readers of interim versions was diverse. Many applauded, and mentioned that they might use the E&E as a guide for teaching their doctoral students about writing and evaluating papers; others, in particular professional academic epidemiologists, wondered what the need was for such basic explanations.
A final area of tension in STROBE, as with similar guidelines, is the thin line between the idea of good reporting versus the idea of good science. Some nudging in the direction of what is generally accepted as better research or better analytic practice is unavoidable in recommendations about reporting. However, we soon learned that whenever our internal debates became particularly heated, it was a sign that we had crossed the line: we needed to retreat, emphasizing good reporting rather than prescribing how research should be done. The basic idea remained that it is more important to know what was actually done by the authors than to try to apply a straitjacket of ideas about study design and analysis. The latter might even be counterproductive, as authors might start to pay lip service to the recommendations in order to have their paper published.
These discussions forced us to the limits of our knowledge and wits. When one member of the writing group proposed something as being important, others might strongly disagree, and the one proposing had to explain by examples, by theory and, if possible, by empirical argument. We tended to use the classics of epidemiology to support our arguments, and we learned about important methodologic papers from each other. This led us to revisit those classics to see what was really written—and what we found was sometimes different from what we recalled. Very useful was the feedback on interim drafts from the dozens of persons listed in the acknowledgments of the STROBE papers; they kept us down to earth.
The STROBE recommendations will attract criticism. One of the difficulties of publishing guidelines is that they need to be broadly general; thus it is always possible to find exceptions, details omitted, etc. We have tried to assemble as many diverse sources of opinion and research experiences as possible in preparing these recommendations. We have attempted to express commonly-held views by knowledgeable practitioners and writers of leading textbooks, and we have not tried to be “avant-garde,” even if we knew about budding improvements to current methodology. As emphasized in the 2 papers, the guidelines are a first version only, and they will certainly improve by being used and criticized.
Closer to Truth?
The emphasis on good reporting has a particular role in observational research. When findings from observational studies are challenged, that challenge is most likely about a potential weakness in terms of bias and confounding. “Replicating” observational research is most of the time not pure replication, in the sense of simply obtaining more numbers, but involves the new design of studies, or new analyses of existing data, to attempt to overcome the alleged weaknesses of previous studies. Sifting out the likely direction and magnitude of an association—nil, slight or large—can be a long process with many detours. However, to be able to discuss a piece of research, and to know which aspects might be rightfully challenged, it is necessary to have clear reporting. Having started out as a skeptic fearing “cookbook” science, I came to see the importance of having a cogent document about reporting.
The usefulness of STROBE came home to me directly when the doctoral students in my department started asking me questions about the preliminary drafts of the STROBE recommendations. They had learned about these drafts from the “instructions for authors” of certain journals that had referred to the early Web-based versions. I discovered that these fledgling researchers considered even the meager 20-odd items (without the E&E) as an important guide to help them writing their papers.
To the expert epidemiologist, STROBE might still provide a good check of whether all necessary information is given. The expert with the attitude of “going back to first principles” might rightly emphasize in his report the aspects of the study that are crucial to getting his point across. She might treat these in great detail, but might treat other aspects just too briefly to be useful to others. STROBE might help to restore the balance.
What about readers or evaluators such as editors or reviewers? When authors adhere to STROBE, they may want to make a statement about an attempt at quality of reporting. However, this does not say anything about quality of research, and certainly not about its truthfulness, in the sense that the findings are reliable and likely to be replicated. Actually, several of the examples that we used in the E&E paper came from studies that ended with disastrously wrong messages because of serious flaws, but for which the reporting of some part of the study was very clearly written. For that reason, and that reason alone, they were considered to be good examples. Conversely, it is quite possible for authors of excellent and innovative research to emphasize only the new aspects and fail in their reporting to follow the STROBE guidelines.
Still, an improvement in the quality of reporting might lead to a higher quality of discussion and replication of observational research. Science makes progress because of open discussion about strengths and weaknesses of published research. STROBE might contribute, not toward more truth, but to better discussions.
ABOUT THE AUTHOR
JAN P. VANDENBROUCKE is Professor of Clinical Epidemiology at Leiden University Medical Center in The Netherlands, where he has been involved in the conduct of observational studies for the past 20 years. His main interest is the application of epidemiologic methods to problems of etiology and pathogenesis that are researched in secondary or tertiary care facilities and academic medical centers. He is an author of STROBE.
I thank Charles Poole for constructive comments on an earlier version of this manuscript.