Bateson, Thomas F.
From the National Center for Environmental Assessment, US Environmental Protection Agency, Washington, DC.
Supported by US Environmental Protection Agency.
The views expressed in this article are those of the authors and do not necessarily reflect the views or policies of the US Environmental Protection Agency.
Correspondence: Thomas F. Bateson, National Center for Environmental Assessment, US Environmental Protection Agency, 1200 Pennsylvania Ave, NW, Mail Code: 8623P, Washington, DC 20460. E-mail: Bateson.Thomas@epa.gov.
In this issue of Epidemiology, Checkley et al1 introduced what may be a new analytic tool for epidemiologists called gamma regression for the analysis of the determinants of the lengths of waiting times between subsequent events in a sparse time-series. This model is well known in survival analyses and in econometrics, but a recent search of the PubMed data base yielded only 4 hits for (epidemiology AND “gamma regression”).2–5 Gamma regression is typically used in survival analyses when trying to explain a distribution of waiting times (or counts) with a long tail. Gamma regression allows models to fit a distinct parameter for the variance of the counts that does not need to be related to the mean. In the present analysis, the time units are the discrete numerical count of days between discrete hospital admissions for Kawasaki disease. Although this conceptualization focuses on the times between events, the same time-series can also be assessed for the determinants of the count of daily events using Poisson regression. Such models are routinely used in air pollution epidemiology to assess the impact of pollutants and meteorological conditions on the risks of morbidity or mortality based on the number of daily adverse events in a time-series. The authors' juxtaposition of results between a gamma regression analysis of the waiting times between sparse events, and a corresponding Poisson regression analysis of the daily event counts in the Checkley et al1 article provides an opportunity to consider the potential advantages of each method.
Summary of Checkley et al
In their article, Checkley et al1 used gamma regression to identify meteorological risk factors associated with changes in the waiting times between successive initial hospital admissions for Kawasaki disease at a large hospital in Chicago over an 18-year period. They report that their findings showed an unseasonably warm climate reduced the length of time between events and concluded from this that weather affects the occurrence of Kawasaki disease–a result they found to be consistent with an infectious trigger.
Inside the Epidemiologist's Toolbox
Presented with a time-series of sparse events, different investigators will undoubtedly reach into their methodologic toolboxes in search of familiar tools that they are comfortable with. The choice will certainly depend on how the hypothesis of environmental triggers of disease occurrence is operationalized. One investigator may focus on the determinants of the length of time between rare events to identify clues to the etiology, whereas another may focus on the environmental conditions preceding relatively high daily case counts. Although seasoned practitioners of survival analysis might apply gamma regression methods to model the interevent waiting times, many epidemiologists would be inclined to apply Poisson time-series methods to such data. Checkley et al1 correctly stated that “the reciprocal of (the) mean time between admissions is equivalent to the daily rate of events.” Both methods should provide inference regarding the environmental triggers of Kawasaki disease. In many instances, there may be several suitable tools for solving a problem. We should remind ourselves of the adage that when holding a hammer, everything looks like a nail.
A Poisson process can be thought of as a sequence of independent and identically exponentially distributed waiting times between events.6 When the variance to the mean ratio of times between events is greater than 1, the Poisson interval length counts will be over-dispersed. Conversely when the variance to mean ratio of times between events is less than 1, the Poisson interval length counts will be under-dispersed.
It should be recognized that the Poisson and gamma distributions are inherently related, and that the Poisson nests within the more flexible gamma distribution and is equivalent when there is neither over- or under-dispersion of counts.7 A generalized model replaces the exponential distribution with a less restrictive non-negative distribution such as the Weibull, gamma, or log-normal. In its favor, the gamma distribution nests the exponential distribution, allows for a monotone, nonconstant hazard, and has the reproductive property that the sums of independent gamma distributions are also gamma distributed.7 Winklemann7 claims this generalized gamma regression model has the advantage that it provides a count-data model of substantially higher flexibility than the Poisson model of, in this case, intervals between hospital admissions at the cost of a single additional parameter, and it provides an interpretation of over- and under-dispersion in terms of an underlying sequence of waiting times.
Gamma Regression of Waiting Times Versus Poisson Regression of Daily Event Counts
A distinctive feature of the time-series of incident hospital admissions for Kawasaki disease is the infrequency of events, which produces a very sparse time-series with no event on most of the days. Across the 18-year study period, Checkley et al1 reported that 723 admissions were observed and 700 were included in the analysis. Across this 6574-day period, the mean waiting period was 9.40 days between events (6574/699) and the mean daily event rate was approximately 0.106 (700/6574), which has an approximate variance of 0.109. The results of the gamma regression of the waiting times were compared with the Poisson regression results of the daily case counts in the Appendix of Checkley et al.1
Eight shared regression coefficients and their variances are comparable (Checkley et al,1 Appendix). In each instance, the coefficients are approximately the inverse of each other as we would expect given the reciprocal nature of the 2 underlying outcome parameterizations. The single comparison that deviated most from this pattern was for the coefficient specific to the year 1998, which had been singled out due to the short waiting times between events (and the higher event rate). This was likely the result of the gamma regression model allowing for 1 over-dispersion parameter for the entire 18-year study period and another specific to 1998. This may have allowed the gamma regression to be more accommodative thereby generating a better model fit for that year. In fact, although both models yielded estimates that were strikingly aligned in terms of inference that could be taken from them, the gamma regression results were more precise in almost every instance, including the covariances associated with the interaction terms. Because one of the goals of this gamma regression analysis was to assess the potential differences in the effects of temperature in 1998 versus the other years, it is of particular importance to have more precise effect estimates as the power to evaluate interaction effects is typically low. The results in Table 1 of Checkley et al1 reflect the relatively narrower 95% CI between the gamma regression of the waiting times and the Poisson regression of the daily counts (following conversion to account for reciprocal effects).
It seems possible that the greater variability inherent in the waiting times between very sparse events compared with the low variability in the daily event counts of the Poisson time series might explain the corresponding difference in statistical power. If so, this would recommend the gamma regression model over the use of the Poisson regression model in similar instances involving very sparse time-series. Of course, when daily time-series are not sparse such as those commonly evaluated in air pollution epidemiology, the waiting times in a daily time-series are all one and are further complicated by the ties in waiting times when several events occur on each day. In these instances, the Poisson time-series regression approach to analyzing daily event counts is the appropriate choice.
When events in time are indeed sparse, there are additional factors that may influence an investigator's choice of analytic method. Checkley et al1 were also interested in how individuals' personal characteristics might influence the waiting times between events and they were able to incorporate this individual-level data into their gamma regression analysis. This too is possible in a Poisson time-series approach when there is not more than 1 event per day. Both methods require some accommodation to look at individual-level data when there are multiple events per day. Checkley et al controlled for seasonality using indicator variables for 3-month long “seasons.” Although this may be sufficient in this particular instance, Poisson regression methods for the analysis of daily event counts offer a rich set of parametric and nonparametric methods for potentially more powerful control of seasonality.
One last distinction between the 2 forms of analysis is that although the gamma regression approach focuses on the determinants of the waiting times, the Poisson approach allows for the assessment of day-specific lagged effects that precede the observed adverse events. The latter approach allows the investigator to estimate the distinct timing of potential triggers, which might be relevant in identifying an infectious trigger of Kawasaki disease as it might provide additional information on the incubation period or latency period from infection to hospital admission.
Both methods have their merits for the analysis of sparse time-series and can offer 2 views of the same underlying disease process. Understanding that there are multiple methodologic tools for any epidemiologic analysis is an important reminder.
ABOUT THE AUTHOR
THOMAS BATESON is an environmental epidemiologist with USEPA's National Center for Environmental Assessment in Washington, DC. He aided methodologic development of case-crossover studies of air pollution effects as an alternative to Poisson time-series by showing that time-varying confounding can be controlled by design rather than modeling. His current work focuses on epidemiologically based risk assessments of asbestos, formaldehyde, and other chemicals.
1. Checkley W, Guzman-Cottrill J, Epstein L, Innocentini N, Patz J, Shulman S. Short-Term Weather Variability in Chicago and Hospitalizations for Kawasaki Disease. Epidemiology. 2009;20:194–201.
2. Pagano E, Petrinco M, Desideri A, Bigi R, Merletti F, Gregori D. Survival models for cost data: the forgotten additive approach. Stat Med. 2008;27:3585–3597.
3. Keenan SP, Dodek P, Martin C, Priestap F, Norena M, Wong H. Variation in length of intensive care unit stay after cardiac arrest: where you are is as important as who you are. Crit Care Med. 2007;35:836–841.
4. Cooper NJ, Lambert PC, Abrams KR, Sutton AJ. Predicting costs over time using Bayesian Markov chain Monte Carlo methods: an application to early inflammatory polyarthritis. Health Econ. 2007;16:37–56.
5. Checkley W, Gilman RH, Black RE, et al. Effects of nutritional status on diarrhea in Peruvian children. J Pediatr. 2002;140:210–218.
6. Cox DR. Renewal Theory. New York: John Wiley; 1962.
7. Winklemann R. Econometric analysis of count data. Berlin, Germany: Springer-Verlag; 2008.
© 2009 Lippincott Williams & Wilkins, Inc.