The STROBE initiative is an excellent approach to improving observational epidemiologic studies. Our concerns include: 1) the need for further emphasis on presenting a clear definition of the hypothesis, its biologic rationale, and its implication to the health of the public; 2) correction of the glaring omission in the STROBE guidelines of the necessity to consider the incubation periods for risk factors and diseases and to review other biologically relevant issues that often have a major impact on the plausibility of the observed association; 3) the essential importance of guidance about a careful definition of host factors, including a clear statement of results specific to race, sex and ethnicity rather than merely stating: “The interaction term was not significant”; 4) the importance of specifying that all studies should present the actual rates or numbers of events in relation to the size of the population, including the actual numbers for each independent variable in a multiple regression analysis, rather than solely presenting a hazards ratio; and 5) the need to restrict the P value only to those hypotheses that were generated prior to the data analysis, reserving retrospective analyses to point estimates and confidence limits.
From the University of Pittsburgh, Pittsburgh, Pennsylvania.
Correspondence: Bernard D. Goldstein, A710 Crabtree Hall, 130 DeSoto Street, Pittsburgh, PA 15261. E-mail: bdgold@pitt.
The STROBE guidelines1 and the detailed paper2 are an excellent approach for improving the presentation and analysis of epidemiologic studies. There are, however, several concerns. First, a major problem with epidemiologic (observation) studies is the failure to carefully specify the hypotheses and the rationale for the hypotheses in the introduction to the study. The STROBE guidelines note that the hypotheses should be stated. They do not emphasize nearly enough the importance of clearly defining the hypotheses in the introduction, specifying the biologic rationale for the hypotheses if it exists, and testing the hypotheses with regards to future scientific endeavors or to the “public's health.”
The second and perhaps most glaring missing piece of information from the STROBE report is the lack of a discussion of the incubation period of the risk factors and disease of interest. There is a great tendency to report an association between a variable measured in the recent past and an outcome even when it is well known that the incubation period from the time of the initial lesion, (ie, carcinogenesis, atherosclerosis, cognitive changes leading to dementia) occurs over a relatively long period of time. Thus, many of the researchers’ interpretations of the associations are biologically implausible and are most likely due to reverse causality, ie, the disease or the evolving disease causes changes in the observed independent variable. Thus, it is extremely important to define the temporal relationship between the associations. This lack of information about incubation period is part of a general weakness of the STROBE approach in considering the biology underlying cause-and-effect relationships. For example, the classic Bradford Hill requirement to consider biologic plausibility is not mentioned in the STROBE Commentary or checklist,1 and is restricted to one paragraph in the full STROBE document.2 To provide information about the meaning of the findings, reports of observational epidemiologic studies must contain the information allowing integration of the findings within the broader scientific base. The STROBE guidelines for the discussion section should require reflection about the relevance of the epidemiologic finding to observations in laboratory animals or understanding about scientific mechanisms—including pertinent psychologic or sociological theory when warranted. The evidence to be considered in the discussion should be presented in an evenhanded manner, and should include information for and against the biologic plausibility of the findings.
Third, little attention is given to the importance of clearly defining the host characteristics in the study. Epidemiologic studies are a study of the host, the agent, and the environment. It is important that the host characteristics be defined not only as being men or women or blacks or whites, but by socioeconomic class, ethnic origin and, most likely in the future, genotypic characteristics. It is also important that the study design states whether there was a plan to compare the results in various ethnic or racial groups and, if not, why not. The National Institutes of Health have spent an extraordinary amount of money, time and effort in creating a series of regulations to increase the participation of women and minorities in studies. Yet there is a continued paucity in reporting of results specific to race, ethnicity or sex differences. We do not think that the most common explanation for the lack of such detail—that the interaction term was not significant—is acceptable. Journals should require that both race- and sex-specific data be presented or else an explanation for their omission be provided.
The fourth problem not implicitly included in the STROBE guidelines is the tendency to present hazards ratios (HR) without presenting the rates or even the number of events and populations in the numerator and the denominator. A large HR can be due to either a high rate in the numerator or to a very low rate in the denominator. Random variation in the denominator sometimes results in HRs that are quite different across studies but have nothing to do with the effects of the unique independent variable on the outcome. Thus, all studies should be required to present the actual number of events and the absolute rates, as well as HRs and relative risk. Similarly, multiple regression analyses should not be presented without also providing information about the number of individuals and events related to each of the independent variables. Thus, the associations in multiple regression analyses may, in part, be related to the differences in variability of the measurements of the independent variables. Some statement should also be included about the ability to measure the independent variables (ie, both within the individual and the between-individual variability of the measurement) and how that may reflect on the interpretation of the results.
Fifth, the statistical analyses, the P values, should be restricted in a paper to only the primary hypothesis and any subgroup hypotheses that were generated prior to any of the data analysis. Investigators submitting manuscripts to journals should be required to state that the analysis and the P values for the primary hypothesis and subgroup hypotheses were generated prior to reviewing the data. P values generated from data that already has been observed are essentially irrelevant since the observed effect is the effect; no probability of the effect can be considered. The so-called “peppered P value syndrome” is one of the biggest problems in observational epidemiologic studies because it leads to associations that cannot be substantiated. On the other hand, it is perfectly acceptable to do further analysis beyond the original hypothesis. Point estimates and confidence limits can be generated, but there should be no P values associated with such analyses.
Journal editors should make sure that the reviewers of articles do not judge their recommendation as to acceptability for publication on the size of the HRs but rather on the quality of the study and the specific hypothesis tested. If most of the research was original, most of the study results should be null and only about 5% of studies should have positive results. Given the fact that a priori biologic knowledge led to the study, the percentage of positive results should be somewhat higher. However, if most of the study results are positive then the studies cannot be generating new information but rather are only reflecting prior observations.
Finally, we note that the STROBE initiative was developed under European auspices and that the majority of the epidemiologists involved are from Europe. These fortunate individuals appear not fully aware of a problem with observational studies in the much more litigious environment of the United States—although greater familiarity with litigation may occur as the European Union gives standing to citizens of European Union countries similar to that of the United States. Journal editors need to be more suspicious about observational epidemiologic studies aimed specifically at toxic tort or regulatory issues by either side of the issue.
ABOUT THE AUTHORS
LEWIS KULLER is a University Professor of Public Health, Professor of Epidemiology and former chair of the Department of Epidemiology at the University of Pittsburgh Graduate School of Public Health. His primary research interests are the epidemiology and prevention of cardiovascular diseases, especially among women and older individuals. BERNARD GOLDSTEIN is Professor of Environmental and Occupational Health and former dean at the University of Pittsburgh Graduate School of Public Health; he is also the former head of research and development at the US Environmental Protection Administration. His major interests concern the interface between environmental health science and policy.
1. von Elm E, Altman DG, Egger M, et al. The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) Statement: guidelines for reporting observational studies. Epidemiology
2. Vandenbroucke JP, von Elm E, Altman DG, et al. Strengthening the Reporting of Observational Studies in Epidemiology (STROBE): explanation and elaboration. Epidemiology