Secondary Logo

Journal Logo

The Making of STROBE

Vandenbroucke, Jan P.

doi: 10.1097/EDE.0b013e318157725d

This commentary gives a personal view of drafting the “Strengthening the Reporting of Observational Studies in Epidemiology” (STROBE) statement, by one of its authors. My initial wariness about guidelines for observational research was overcome by focusing on clarity of reporting, rather than on how research should be done. Areas of tension that arose when drafting STROBE include the problem of finding common ground among researchers with different research backgrounds, questions of the intended audience (professional epidemiologists or statisticians vs. all researchers who use epidemiologic study designs), and the fine line between encouraging clarity of reporting vs. prescribing how to do research. STROBE is not an instrument to evaluate the quality of research: research can be reported clearly or not, irrespective of its intrinsic quality. However, the ultimate benefit of STROBE might be that more comprehensive reporting allows for better discussions about published observational research, which may lead to better decisions about what new analyses or new studies are needed to solve a problem.

From the Department of Clinical Epidemiology, Leiden University Medical Center, Leiden, The Netherlands.

Editors' note:Related articles appear on pages 789, 791, 792, 794, 800, and 805.

Correspondence: Jan P. Vandenbroucke, Department of Clinical Epidemiology, Leiden University Medical Center, 1-C9-P, PO Box 9600, 2300 RC Leiden, The Netherlands. E-mail:

Observational research is immensely successful. Our knowledge about the transmission of infectious diseases, from local outbreaks of food poisoning to the global spread of acquired immunodeficiency virus, derives mainly from observational research. All human genetic knowledge is observational—from initial observations of clustering of diseases in families up to molecular linkage analysis. The effects of environmental scourges, including smoking, lead exposure in paint or gasoline, or occupationally-induced exposures such as asbestos, have been analyzed in observational research. Almost all knowledge on adverse effects of medical interventions derives from observational research. All description of diseases, ie, their definition and their subdivisions, is based on observational research. Most diagnostic and all prognostic research is observational.

Paradoxically, observational research is regarded as somehow inferior relative to randomized trials, which are responsible for a much smaller fraction of useful medical knowledge. Randomized trials mostly have the luxury of starting with relatively high priors: 50–50 under equipoise. By that fact alone, positive findings from trials will lead to a strong posterior probability. In contrast, the hypotheses investigated in observational research, eg, about the causation of diseases, are much more tentative.1 Even if design and execution were to be flawless, a positive finding still would leave a causal or preventive hypothesis with lesser posterior probability of truth, and the hypothesis might more often falter upon replication. Randomized trials enjoy a second luxury: the formal aspects of their design and execution are highly codified, in contrast to observational research. For example, the choice of a control group in case-control studies rests on general principles that have to be reinterpreted and reapplied each time—which each time necessitates renewed critical reflection, of authors as well as readers.

Back to Top | Article Outline

Cookbook Science?

Within the world of randomized controlled trials (RCTs), the CONSORT guidelines for reporting trials have been successful: they have improved the quality of reporting.2,3 However, I was wary when I heard about a group wanting to make a similar type of recommendations for observational research. Guidelines might be fit for highly-codified evaluation, but what about etiologic research?

I had been educated in a tradition in which a researcher each time starts thinking from scratch, by defining exactly what the problem is, what the aim of a study is, and then derives from first principles how to go about achieving that aim. Once you know how first principles translate into a piece of research, you also know which are the most important things to report. At the same time, you will know what to look for in papers on the same topic by others. So, why bother about guidelines? In the progress of science “anything goes”: if someone derives an argument that strikes colleagues as credible—even an argument resulting from a study design that was never used before—it becomes acceptable. Why stifle scientific creativity by guidelines?

I had a rather skeptical view of what I feared might lead to “cookbook” science. The first meeting of the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) group in Bristol, UK in 2004 was enlightening.4 It became clear that the initiative was not about “how to do observational research,” but “how to report observational research.” The initiative started from the point of view of persons who want to evaluate the literature, and who find that all too often elementary information is lacking in published papers. This focus was reassuring and looked worthwhile. Hence, my participation in the STROBE writing group was to help create what we preferred to call “recommendations” instead of “guidelines.”

Back to Top | Article Outline

Areas of Tension

The process of drafting the recommendations is described in the paper describing the STROBE Statement,4 which is being published simultaneously in several general medical and epidemiologic journals. The recommendations can be understood only alongside the longer Elaboration and Explanation (E&E) papers that is jointly published by Epidemiology and PLoS Medicine. Each paper influenced the other. While the original checklist was based on the discussions at Bristol, UK in 2004, and was first completed by the coordination group, we kept tinkering with it while writing the E&E paper; upon writing the explanation, we often realized that the checklist did not express sufficiently clearly what we meant to say.

The process was challenging. During the project, we had several areas of tension, with strong initial disagreement about what should be reported, what should not be reported, and why. Particularly difficult were topics such as subgroups, sample size, and interaction. However, while proceeding, we discovered the source of these tensions. Some of the STROBE writing group had their experience mainly in etiologic observational research; others mainly in evaluative research such as RCTs, still others in systematic reviews and meta-analysis. Thus, without being explicitly aware ourselves, the several participants had different mindsets: the mindset of the discovery type of science versus that of an evaluative type of science. The former were used to situations in which data from a case-control study are used secondarily for exploring possible new leads on causes of disease; the latter were used to conducting evaluation studies under strict protocols that permit no deviation. That difference translated at first into endless debates about how to think about reporting subgroup analysis (quasi-forbidden when not preplanned in RCTs, universally indulged in during etiologic research), the reporting of predefined sample size (necessary for the credibility of RCTs, only vaguely looked at in secondary use of existing data). The recognition of this tension led us to write an introduction to the E&E paper about the diverse purposes of observational research, ranging from discovery and first confirmation of new ideas, to evaluation under strict protocols. Going through this discussion was extremely useful, as it suddenly made things fall into place. The diverse research backgrounds and experience of the group meant that we had to come to a common view about reporting, which may make them more useful for a wider range of researchers.

Another area of tension was the intended audience. While the “E” in the STROBE acronym stands for epidemiology, we gradually realized that we were writing for a much larger research community: for anybody who would employ one of the major epidemiologic research designs, in whatever type of journal, and who would not necessarily have a professional epidemiologic or statistical background. This led us to explain some principles of epidemiology in the E&E paper, with ample reference to introductory and advanced textbooks. The E&E paper in places comes close to a “mini-textbook.” The reaction to this feature by readers of interim versions was diverse. Many applauded, and mentioned that they might use the E&E as a guide for teaching their doctoral students about writing and evaluating papers; others, in particular professional academic epidemiologists, wondered what the need was for such basic explanations.

A final area of tension in STROBE, as with similar guidelines, is the thin line between the idea of good reporting versus the idea of good science. Some nudging in the direction of what is generally accepted as better research or better analytic practice is unavoidable in recommendations about reporting. However, we soon learned that whenever our internal debates became particularly heated, it was a sign that we had crossed the line: we needed to retreat, emphasizing good reporting rather than prescribing how research should be done. The basic idea remained that it is more important to know what was actually done by the authors than to try to apply a straitjacket of ideas about study design and analysis. The latter might even be counterproductive, as authors might start to pay lip service to the recommendations in order to have their paper published.

These discussions forced us to the limits of our knowledge and wits. When one member of the writing group proposed something as being important, others might strongly disagree, and the one proposing had to explain by examples, by theory and, if possible, by empirical argument. We tended to use the classics of epidemiology to support our arguments, and we learned about important methodologic papers from each other. This led us to revisit those classics to see what was really written—and what we found was sometimes different from what we recalled. Very useful was the feedback on interim drafts from the dozens of persons listed in the acknowledgments of the STROBE papers; they kept us down to earth.

The STROBE recommendations will attract criticism. One of the difficulties of publishing guidelines is that they need to be broadly general; thus it is always possible to find exceptions, details omitted, etc. We have tried to assemble as many diverse sources of opinion and research experiences as possible in preparing these recommendations. We have attempted to express commonly-held views by knowledgeable practitioners and writers of leading textbooks, and we have not tried to be “avant-garde,” even if we knew about budding improvements to current methodology. As emphasized in the 2 papers, the guidelines are a first version only, and they will certainly improve by being used and criticized.

Back to Top | Article Outline

Closer to Truth?

The emphasis on good reporting has a particular role in observational research. When findings from observational studies are challenged, that challenge is most likely about a potential weakness in terms of bias and confounding. “Replicating” observational research is most of the time not pure replication, in the sense of simply obtaining more numbers, but involves the new design of studies, or new analyses of existing data, to attempt to overcome the alleged weaknesses of previous studies. Sifting out the likely direction and magnitude of an association—nil, slight or large—can be a long process with many detours. However, to be able to discuss a piece of research, and to know which aspects might be rightfully challenged, it is necessary to have clear reporting. Having started out as a skeptic fearing “cookbook” science, I came to see the importance of having a cogent document about reporting.

The usefulness of STROBE came home to me directly when the doctoral students in my department started asking me questions about the preliminary drafts of the STROBE recommendations. They had learned about these drafts from the “instructions for authors” of certain journals that had referred to the early Web-based versions. I discovered that these fledgling researchers considered even the meager 20-odd items (without the E&E) as an important guide to help them writing their papers.

To the expert epidemiologist, STROBE might still provide a good check of whether all necessary information is given. The expert with the attitude of “going back to first principles” might rightly emphasize in his report the aspects of the study that are crucial to getting his point across. She might treat these in great detail, but might treat other aspects just too briefly to be useful to others. STROBE might help to restore the balance.

What about readers or evaluators such as editors or reviewers? When authors adhere to STROBE, they may want to make a statement about an attempt at quality of reporting. However, this does not say anything about quality of research, and certainly not about its truthfulness, in the sense that the findings are reliable and likely to be replicated. Actually, several of the examples that we used in the E&E paper came from studies that ended with disastrously wrong messages because of serious flaws, but for which the reporting of some part of the study was very clearly written. For that reason, and that reason alone, they were considered to be good examples. Conversely, it is quite possible for authors of excellent and innovative research to emphasize only the new aspects and fail in their reporting to follow the STROBE guidelines.

Still, an improvement in the quality of reporting might lead to a higher quality of discussion and replication of observational research. Science makes progress because of open discussion about strengths and weaknesses of published research. STROBE might contribute, not toward more truth, but to better discussions.

Back to Top | Article Outline


JAN P. VANDENBROUCKE is Professor of Clinical Epidemiology at Leiden University Medical Center in The Netherlands, where he has been involved in the conduct of observational studies for the past 20 years. His main interest is the application of epidemiologic methods to problems of etiology and pathogenesis that are researched in secondary or tertiary care facilities and academic medical centers. He is an author of STROBE.

Back to Top | Article Outline


I thank Charles Poole for constructive comments on an earlier version of this manuscript.

Back to Top | Article Outline


1. Ioannidis JP. Why most published research findings are false. PLoS Med. 2005;2:e124.
2. Plint AC, Moher D, Morrison A, et al. Does the CONSORT checklist improve the quality of reports of randomized controlled trials? A systematic review. Med J Aust. 2006;185:263–267.
3. Egger M, Juni P, Bartlett C. Value of flow diagrams in reports of randomized controlled trials. JAMA. 2001;285:1996–1999.
4. von Elm E, Altman DG, Egger M, et al. The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) Statement: guidelines for reporting observational studies. Epidemiology. 2007;18:800–804.
5. Vandenbroucke JP, von Elm E, Altman DG, et al. Strengthening the Reporting of Observational Studies in Epidemiology (STROBE): explanation and elaboration. Epidemiology. 2007;18:805–835.
© 2007 Lippincott Williams & Wilkins, Inc.