After floundering for endless time in uncharted territory, epidemiologists today are awash in helpful guidelines. Some of these guidelines advise on conducting research, some on reporting it, some on grading it, some on interpreting it, some on acting ethically with regard to it, and some on combinations of these activities. Many guidelines have been issued by professional societies, and others by concerned professionals who band together in spontaneous altruism to share their collective wisdom.
Guidelines are one of several ways to affect the behavior of epidemiologists. Other approaches, beginning with the most coercive, include government regulations, requirements to obtain funding, editorial policies, textbooks, articles, and, on the gentle end of the spectrum, simply setting a good example in one's own work. Guidelines usually lie in the middle of this hierarchy, depending on how seriously the professional community regards them and the degree to which they are adopted, formally or informally, by regulatory agencies, funding sources, and journals.
The contents of guidelines range from the pithy, broad, and platitudinous to the wordy, encyclopedic, and micromanaging. Many recommendations in guidelines are helpful, but others may be trivial, out of date, justifiably controversial, or just plain wrong. Here are a few examples (which we leave to the reader to characterize):
1. Guidelines for Good Pharmacoepidemiology Practices (GPP). “Sufficient resources, e.g., office space, relevant equipment, and office/professional supplies, shall be available to ensure timely and proper completion of all studies.”1
2. Guidelines on Studies in Environmental Epidemiology. “Case-control studies should involve random sampling from the population at risk.”2
3. Consolidated Standards of Reporting Trials (CONSORT). “A report of a randomized controlled trial (RCT) should convey to the reader, in a transparent manner, why the study was undertaken.”3
4. Good Epidemiological Practice (GEP)—Proper Conduct in Epidemiologic Research. “If the researchers themselves have no confidence in their results, the results should probably remain undisseminated.”4
5. American College of Epidemiology Ethics Guidelines. “[M]aintain honesty and impartiality in the design, conduct, interpretation, and reporting of research.”5
6. Meta-analysis of Observational Studies in Epidemiology (MOOSE). “In cases when heterogeneity of outcomes is particularly problematic, a single summary measure may well be inappropriate.”6
Insofar as guidelines distill important and correct principles into short summaries, they can be valuable. Nevertheless, they represent no more than the consensus opinion of a particular group of writers at a specific point in time. Guidelines intended only to lay out a bare minimum of advice can encourage mediocrity if they are perceived as describing all that is necessary for research. When they enshrine arguable or outmoded tenets, or convert the conduct or reporting of science into a recipe, guidelines retard progress by hamstringing scientists who do not need them.
It was from this ambivalent perspective on guidelines in general that we responded to the development of the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) guidelines. As epidemiologists who have long lived with the dread that someday guidelines would emerge riddled with ill-advised strictures we would have to follow, or spend inordinate time explaining why we chose not to, we greeted this particular guideline initiative with independent sighs of relief. Even in the state they were in when we first viewed them, these particular guidelines were a first-rate compendium of state-of-the art principles for epidemiologic research. The explanatory papers written to accompany the guidelines, and published in this issue of the journal,7,8 discuss the relevant issues behind the STROBE guidelines in admirable detail. We hope future references to the STROBE guidelines will include not only the checklist itself but these explanatory documents as well.
But consensus invariably entails compromise. Hence, it would be easy, even for those most intimately involved in developing these guidelines, to object to specific elements of lesser or greater importance. For example, the guidelines urge scientists to “consider translating estimates of relative risk into absolute risk for a meaningful time period.” Surely, “risk difference” or “absolute change in risk” was meant instead of just “absolute risk.” If not clearly stated, at least the intent is correct, and much more insightful than one might hope for in a set of broad guidelines. But consider another example: In the longer explanatory document, STROBE advises that authors should “not bother readers with post hoc justifications for study size or retrospective power calculations.”8 This advice is considered sound in some statistical circles,9 but it runs contrary to other guidance and practice, at least for those occasions when the null hypothesis is not rejected.10–12 The STROBE advice not to discuss power post hoc comes in a section explaining a checklist item that urges epidemiologists to “[e]xplain how the study size was arrived at.” The researcher thus faces a conundrum of explaining how the study size was determined after a study is completed, but without offering post hoc justifications of study size.
Despite an admirable emphasis on reporting point estimates and confidence intervals rather than conducting tests of statistical significance, the foregoing example illustrates how the STROBE guidelines nevertheless incorporate a reliance on statistical significance testing into the canon of reporting. It does so through numerous examples, if not by dictum. We admit to some possible hypersensitivity on this point. As the previous editor of this journal, one of us (K.J.R.) introduced a policy strongly urging authors to adopt estimation over testing in their submissions,13 and the other has strongly supported the principles behind that policy.14
A careful reader of the STROBE document could easily infer that significance testing is, if not preferable to estimation, at least desirable when one is estimating. Thus, to illustrate a good way to report interaction analyses, the longer STROBE explanatory document8 cites a statement that “[s]ex differences in susceptibility to the 3 lifestyle-related risk factors studied were explored by testing for biologic interaction according to Rothman.”15 The measure of interaction (RERI) cited was indeed proposed by one of us,16 but as a way to quantify biologic interaction, emphatically not as a way to test for it. After years spent explaining the drawbacks of relying on statistical tests and emphasizing the advantages of estimation, it was deflating to be cited in STROBE, albeit indirectly, as recommending a statistical test.
We note that difficulties are inevitable in any attempt to accomplish a task as ambitious as the provision of guidance for designing, conducting, and reporting observational studies of all kinds. The potential for such difficulties is multiplied when guidelines, and lengthy documents explaining them, are written by committees.
Notwithstanding the stimulating discussion in the explanatory document, the core of the STROBE guidelines is the checklist.17 We have always abhorred the thought of conducting or reporting science by checklist.17 Most worrisome is the ease with which journals and other decision-making authorities will be tempted to turn checklists into requirements—the template for all your future publications. We hope that journals resist this regimentation. Nevertheless, understanding that it may happen, as it did with the CONSORT guidelines,18 we suggest that guidelines should be formulated more as suggestions than as rigid rules.
Indeed, many of the current limitations of the STROBE guidelines, including all the ones we have mentioned, could be easily fixed in an evolving process of discussion and revision. As it stands, the checklist itself, in our judgment, is as close to completely benign as such a list can be, given the danger that it might be widely adopted as a publishing standard. This is high praise from skeptics such as us, who hold in low regard the potential for good to come from scientific guidelines, and who worry more about their potential for harm.
The central problem we see is that guidelines can promote ossification—just the thing to avoid in science. One way to deal with this problem would be to encourage the issuance of competing guidelines, premised on the theory that competition among writers of guidelines would promote excellence. This route, however, seems wasteful of collective talent. The better route is to deal with what we have. To have lasting influence, the STROBE guidelines will need tending. If there had been a STROBE document in 1960, 1970, 1980, or even 1990, and had it not been revised, today either we would be hampered by it or it would be largely forgotten.
Hence, our suggestion is to create ongoing processes of discussion and revision of the guidelines, and to incorporate expiration dates into STROBE, and other scientific guidelines, when they are published. For example, the present STROBE guidelines, if not revised, might be scheduled to expire on 31 December 2010. Upon revision by a suitably open process, the expiration date could be extended. Otherwise, after 2010, science could simply move along and the current STROBE guidelines could take their place among the archives of old guidelines.
ABOUT THE AUTHORS
KENNETH J. ROTHMAN is Vice President for Epidemiology Research at RTI Health Solutions, Research Triangle Park, NC, and Professor of Epidemiology at the Boston University School of Public Health and School of Medicine. He was the founding editor of Epidemiology. CHARLES POOLE is Associate Professor of Epidemiology at the University of North Carolina.
6. Stroup DF, Berlin JA, Morton SC, et al. for the Meta-analysis Of Observational Studies in Epidemiology (MOOSE) Group. JAMA
7. von Elm E, Altman DG, Egger M, et al. The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) Statement: guidelines for reporting observational studies. Epidemiology
8. Vandenbroucke JP, von Elm E, Altman DG, et al. Strengthening the Reporting of Observational Studies in Epidemiology (STROBE): explanation and elaboration. Epidemiology
9. Hoenig JM, Heisey DM. The abuse of power: the pervasive fallacy of power calculations for data analysis. Amer Statist
10. Glantz SA. Primer of Biostatistics
. 5th ed. New York: McGraw-Hill; 2001;164–198.
11. Dawson-Saunders B, Trapp RG. Basic and Clinical Biostatistics
. Norwalk, CT: Appleton & Lange; 1994;96.
12. Freiman JA, Chalmers TC, Smith H Jr, et al. The importance of beta, the type II error and sample size in the design and interpretation of the randomized cotnrolled trial: survey of 71 “negative” trials. N Engl J Med
13. Lang J, Rothman KJ, Cann CI. That confounded P-value [Editorial]. Epidemiology
14. Poole C. Low P-values or narrow confidence intervals: which are more durable? Epidemiology
15. Hallan S, de Mutsert R, Carlsen S, et al. Obesity, smoking, and physical inactivity as risk factors for CKD: are men more vulnerable? Am J Kidney Dis
16. Rothman KJ. Modern Epidemiology
. 1st ed. Boston: Little Brown; 1986.
17. Lanes SF, Poole C. “Truth in Packaging?” The unwrapping of epidemiologic research. J Occup Med
18. Altman DG. Better reporting of randomised controlled trials: the CONSORT statement. BMJ