Secondary Logo

Journal Logo

Underestimation of Standard Errors in Multi-site Time Series Studies

Daniels, Michael J.*; Dominici, Francesca; Zeger, Scott

doi: 10.1097/01.ede.0000092721.00997.f7
Original Article

Multi-site time series studies of the association of air pollution with mortality and morbidity have figured prominently in the literature as comprehensive approaches for estimating short-term effects of air pollution on health. Hierarchical models are generally used to combine site-specific information and to estimate pooled air pollution effects while taking into account both within-site statistical uncertainty and across-site heterogeneity.

Within a site, characteristics of time series data of air pollution and health (small pollution effects, missing data, and highly correlated predictors) make the modeling of all sources of uncertainty challenging. One potential consequence is underestimation of the statistical variance of the site-specific effects to be combined.

In this paper, we investigate the impact of variance underestimation on the pooled relative rate estimate. We focused on two-stage normal-normal hierarchical models and on underestimation of the statistical variance at the first stage. By mathematical considerations and simulation studies, we found that variance underestimation did not affect the pooled estimate substantially. However, the pooled estimate was somewhat sensitive to variance underestimation when the number of sites was small and underestimation was severe. These simulation results are applicable to any two-stage normal-normal hierarchical model for combining information of site-specific results (including meta-analyses), and they can easily be extended to more general hierarchical formulations.

We also examined the impact of variance underestimation on the national average relative rate estimate from the National Morbidity, Mortality and Air Pollution Study. We found that variance underestimation as large as 40% had little effect on the national average.

From the *Department of Statistics, University of Florida, Gainesville, Florida; †Department of Biostatistics, Johns Hopkins University, Baltimore, Maryland.

Submitted 4 December 2002; final version accepted 23 July 2003.

Research described in this article was partially supported by a contract and grant from Health Effects Institute (HEI), an organization jointly funded by the Environmental Protection Agency (EPA R824835) and automotive manufacturers. Funding for Francesca Dominici was provided by a grant from the Health Effects Institute (Walter A. Rosenblith New Investigator Award). Research described within was also supported by a grant from the National Institute of Environmental Health Sciences for the Johns Hopkins Center in Urban Environmental Health (P30ES03819–12). Funding for Michael Daniels was provided by a grant from NIH (CA-85295).

Correspondence: Michael J. Daniels, 207 Griffin Floyd Hall, Department of Statistics, University of Florida, Gainesville, FL 32611. E-mail: mdaniels*

In multi-site time series studies of the association of air pollution with mortality and morbidity,1,2 site-specific time series data are assembled under a common framework and analyzed with a uniform analytic approach. Hierarchical modeling is an unified approach for combining evidence across studies, quantifying the sources of variability, and assessing effect modification. Because of the development of computational tools that facilitate their implementation,3,4 hierarchical models have been recently applied to analysis of multi-site time series data.1,5–9 (Dominici10 presents a more detailed discussion on the use of hierarchical models in multi-site time series studies of air pollution and health.)

Hierarchical models11 for analyses of multi-site time series studies of air pollution and mortality have a multi-stage structure. At the first stage, the association between air pollution and health is described using a site-specific regression model7,12,13 which takes into account potential confounding factors such as trend, season, and climate. Generalized Additive Models (GAM)14 with nonparametric adjustment for confounding factors for the site (eg, smoothing splines) or Generalized Linear Models (GLM)15 with regression splines (eg, natural cubic splines) are generally used to estimate site-specific relative rates βˆc and their sampling variances vc. Here βˆc denotes the percentage increase in mortality/morbidity per unit increase in level of the air pollutant, and vc denotes the statistical uncertainty in βˆc; vc depends on the number of days with available air pollution data, adverse health events, and correlated time varying confounders. At the second stage, the information from multiple sites is combined by assuming that the true city-specific relative rates (βc) have a common mean α (also called the pooled relative rate) and variance τ2; this variance represents the variability across sites of the true relative rates (also called the heterogeneity parameter). Fixed or random effects, empirical Bayesian, or fully Bayesian models16–20 are used to estimate α and τ2.

The nature and characteristics of time series studies in air pollution and health make it difficult to estimate health risk while taking into account all sources of uncertainty. First, the variability in the mortality or morbidity time series explained by air pollution is an order of magnitude lower than the variability in the mortality time series explained by weather, trend, and seasonality. Consequently, the estimates of air pollution effects are sensitive to the method of adjustment for confounding factors. Second, to control adequately for confounding, several highly correlated predictors are included in the site-specific regression model. This can make variance estimation unstable, and slow the convergence of fitting algorithms such as the backfitting algorithm in GAM.21 Third, because the confounding effects of climate and season are not linear and vary slowly, these need to be modeled using smooth functions such as smoothing splines or regression splines.22 Nonlinear modeling increases both the number of nuisance parameters and the computational complexity. In all these cases, a sound and robust assessment of the statistical uncertainty of βc can be hard to obtain, and there is likely to be underestimation of variance due to model mis-specification.

Problems inherent in standard error estimation of air pollution effects have recently been pointed out in the literature. For example, Ramsey et al23 reported that the inability of the GAM to take full account of the correlation among nonlinear confounders can lead to underestimation of the standard error of relative rate estimates (see also Chambers and Hastie24(pp 303–304) and commentaries by Lumley and Sheppard25 and Samet et al26).

Recently Dominici, McDermott, and Hastie27 have developed gam.exact, an extended version of the S-plus function gam used to fit Generalized Additive Models. The S-plus function gam.exact calculates an asymptotically exact covariance matrix of the air pollution effect estimates by properly taking into account the correlation among the nonlinear predictors.

The reanalyses of the National Morbidity, Mortality, and Air Pollution Study (NMMAPS)28 empirically confirmed the theoretical results of Ramsey et al23 These reanalyses showed that the degree of bias in the standard errors was proportional to the size of the standard errors (a form of multiplicative bias). Variances more robust than the ones obtained from GAM software can be obtained by using standard statistical theory29 or by bootstrap30 or GEE methods.31 However, in time series studies of air pollution and health, such methods might be computationally expensive and “off-the-shelf” statistical software is not always available.

In this paper, we investigate the sensitivity of the pooled estimate αˆ with respect to underestimation of the city-specific statistical variances vc. We focus on underestimation (as opposed to overestimation) for 3 reasons. First, underestimation of vc is a much more serious problem than overestimation because it leads to less conservative conclusions about the statistical significance of a site-specific association between air pollution and health. Second, underestimation of vc is more common than overestimation, as the former generally reflects a failure to take into account 1 or more sources of uncertainty. Third, underestimation of vc has been identified as a limitation of the statistical software for the implementation of GAM.23,24 However, this limitation has recently been overcome.

We show that the pooled estimate is unaffected by underestimation of vc when the bias is additive or when the bias is multiplicative and the statistical variances are equal across cities. Then, by a simulation study, we investigate the case of multiplicative bias and unequal statistical variances. Here we define 33 = 27 scenarios which are a combination of number of cities, magnitude of the bias, and amount of heterogeneity. We then identify under which scenarios the underestimation of vc affects the estimate of the pooled relative rate estimate â. We also investigate the impact of variance underestimation on the national average relative rate estimate from the National Morbidity, Mortality, and Air Pollution Study (NMMAPS).

Back to Top | Article Outline


We considered the following 2 stage normal-normal hierarchical model11

where C is the total number of sites, and N(a, b) denotes the normal distribution with mean a and variance b.

We estimated α and its standard error SE(â) by using an empirical Bayes approach.18 More specifically, we first computed the restricted maximum likelihood estimate τˆ2, and then we calculated α and SE (â) conditional on τˆ2. Details are in the appendix. The empirical Bayes estimate of α and its standard error are defined below:

where wc = vc + τˆ2. Let vc be the “true” statistical variance, and let vc* be the “estimated” statistical variance under a miss-specified model. We assumed vc* < vc, and we considered 3 cases of underestimation of vc:

  1. additive bias: vc* = vcb, for arbitrary vc and b > 0;
  2. multiplicative bias, equal variances: vc* = k × vc, vcv; v for arbitrary v and 0 < k < 1;
  3. multiplicative bias, unequal variances: vc* = k × vc, for arbitrary vc and 0 < k < 1.

In the first and second case, underestimation of the variance does not affect the pooled estimate of α. The mathematical proof is given in the appendix.

In the third case, we explored the impact of multiplicative bias on the estimation of α by a simulation study. We considered 33 = 27 scenarios consisting of 3 components:

  • underestimation of 50%, 30%, and 10% which corresponds to k = 0.5, 0.7, 0.9;
  • number of sites: C = 15, 20, 90;
  • amount of heterogeneity: τ = 0.05, 0.5, 1 corresponding, respectively, to small, medium, and large between-city standard deviations.

For each scenario, we generated 250 replicates of βˆ = (βˆ1,…, βˆC) from model(1). More specifically, we chose values for α, τ2, and vc and we sequentially generated βˆc from N(α, τ2), and then βˆc from Nc, vc). For C = 90, we set vc equal to the estimates obtained from the NMMAPS reanalysis.28 For C = 15 and C = 20 we took a random sample from the 90 vc NMMAPS estimates. We also set α equal to 0.21 (the pooled NMMAPS estimate for total mortality at lag 128). In summary, each scenario (biased vc*, sample size C, and amount of heterogeneity τ), led to 250 simulated values of βˆ.

Back to Top | Article Outline


For each replicate βˆ, and for each scenario, we calculated the empirical Bayes estimates of τ2 and α, using both vc and vc*, by fitting model (1) and using equation (2). This led to 2 sets of 250 estimates of α, here denoted by â(vc) and â(vc*). Figure 1 shows boxplots of the 250 standardized differences between the 2 α estimates defined as (â(vc) − â(vc*))/SD, where SD is the standard deviation of the 250 estimates of â(vc). Thus, a standardized difference larger than 2 suggests high sensitivity of the pooled estimate to the use of the biased vc* instead of “true” vc.



For all scenarios, the 95% confidence intervals of the standardized differences were always within 2 standard deviations of 0, suggesting that underestimation of vc did not affect â substantially. In 8 scenarios, the distributions of the standardized differences showed more variability, with their maximum absolute differences larger than 2 standard deviations. Seven of those scenarios corresponded to extreme underestimation of 50%. The other scenario was characterized by small sample size, small heterogeneity, and a bias of 30%.

We also investigated the impact of variance underestimation on the NMMAPS national average relative rate estimate. We recalculated the national average relative rate of mortality for a 10 unit increase in PM10 by varying the underestimation parameter k from 0.1 (90% variance underestimation) to 1 (no variance underestimation). Figure 2 shows the NMMAPS national average and heterogeneity estimates as function of the underestimation parameter k. Note that for values of k increasing from 0.1 to 0.6 (variance underestimation from 90 to 40%), τˆ decreased; also, â moved from the un-weighted average of the βˆc(equal to 0.28%; 95% posterior interval equal to .05 – .51) toward the weighted average of the βˆc defined in equation (2) (equal to 0.21%; 95% posterior interval equal to .10 – .31). The latter occurred because when k was very small (large variance underestimation), τˆ2 was large relative to vc* = kvc, and roughly the same weight was assigned to all the city-specific estimates. For values of k increasing from 0.6 to 1 (variance underestimation from 40% to 0%), â was constant because τˆ2 = 0.



In summary, little or no effect was observed when the variance underestimation was less than 40% (k ≥ 0.6) (national average estimate equal to 0.21%). When underestimation was larger than 40% (k < 0.6), â gradually increased with the degree of underestimation toward the un-weighted pooled estimate which is equal to 0.28%. However, for all values of k, the t-ratio âSE(â) remained larger than 2 indicating “statistical significance” of the national average relative rate estimate.

Back to Top | Article Outline


The results of this paper indicate that, in multisite time series studies of air pollution and health, underestimation of the statistical variances vc did not affect the estimate of the pooled effect α substantially. Some sensitivity was observed when the number of sites was small (less than 20), the between-city variability was close to zero, and the underestimation was larger than 40%. In addition, our results suggest that variance underestimation of as much as 40% in the NMMAPS studies would not change the conclusions.

The mathematical results reported in this paper are generalizable to any two-stage normal-normal hierarchical model, and therefore can be applied to any meta-analysis of study results or clinical trials that uses such modeling approach. The simulation results reported in this paper can also be applied to any meta-analysis when the size of the overall parameter α with respect to the heterogeneity parameter τ and the city specific standard deviations, vc, is similar to the ones considered here. However, even in a meta-analysis with very different numerical values for α, τ, and vc, our simulation studies can be easily modified to reflect these other situations.

Our underestimation parameter k, here specified a priori, may depend upon site-specific characteristics. In each city, data characteristics may lead to different autocorrelation structures of the residuals, and therefore to different degrees of adjustment for confounding factors. Therefore, variance underestimation attributable to model mis-specification might be more or less severe in the different cities, leading to different values of k. Our simulation study considers both minor (10%) and major (50%) underestimations of standard errors and therefore should encompass the more severe cases of model mis-specification.

The robustness of air pollution effect estimates to variance underestimation applies only to pooled-effect α. Empirical Bayes estimates of the site-specific relative rates, here denoted by β˜c, are affected by the underestimation of the vc. To see this, consider their definition:

where Sc = vc−1/(τˆ−2 + vc−1) is the shrinkage factor. Note that Sc is not invariant to underestimation of the vc. In this case, underestimation of vc leads to overestimation of Sc and therefore leads to β˜c which rely too heavily on βˆc. Thus, variance underestimation leads to an overestimation of the heterogeneity of the air pollution effects, and therefore to under-shrinkage of the city-specific empirical Bayes estimates toward their overall mean â.

Unfortunately, because the true statistical variances are unknown, the distinction between additive and multiplicative bias is not straightforward. One possible approach is to specify 2 alternative but comparable site-specific regression models and to compare the estimates of vc under the 2 models. For example, Dominici et al27 estimated βˆc and vc under a GLM with natural cubic splines and under a GAM with smoothing splines; they then compared the vc under these 2 modeling approaches. In time series studies of air pollution and health, a more robust estimate of the statistical variance under a GAM can be obtained by using gam.exact.27 In other situations, robust variance estimates can be obtained by using asymptotic theory29 or bootstrap30 or sandwich estimates.31 Robust and model-based variance estimates can then be compared for a small number of sites to gain insight on the form of bias.

The results in this paper are presented using an empirical Bayesian approach to estimation, in which the Bayesian estimate of α is obtained by plugging a point estimate of τ2 into Equation (2). However our results apply also if a fully Bayesian version of the above model is fit with noninformative priors on τ2 and α.

Finally, the results of this paper apply under the assumption that the normal approximation to the likelihood function at the first stage of the hierarchical model is appropriate. Asymptotically, this approximation has an accuracy proportional to the number of days with available data in each city.32 In time series studies of air pollution and health the asymptotic normal approximation is generally accurate; however, additional work is needed to extend such results to distributions other than the normal and to examine the sensitivity of inferences if the normal approximation is not accurate.

Back to Top | Article Outline


Details on the estimation of τ2 and α. The empirical Bayes estimate of α was obtained by first computing the restricted maximum likelihood estimate of τ2, which was then plugged into equation (2). The restricted maximum likelihood estimate of τ2 was obtained by maximizing the following likelihood function

where â2) = [Σcβˆcwc2)−1]/[Σcwc2)−1] and wc2) = vc2. This maximization can be done iteratively using the E-M algorithm.18

Mathematical arguments concerning the robustness of the pooled estimate to variance underestimation under cases 1 and 2. Under scenario 1, we start by showing that additive bias does not affect estimation of α for arbitrary vc. Let τ2* be the maximum likelihood estimate (mle) of τ2 obtained by maximizing the likelihood (3) with vc* = vcb instead of vc. Note that estimation of α as defined in equation (2) depends on vc only through vc + τ2. Therefore, we simply need to show that the estimate for τ2* conditional on vc* is b units more than the estimate for τ2 conditional on vc. In this case, vc + τˆ2 = vc* + τˆ2* and therefore â(τˆ2) is unaffected by underestimation of vc. We define τ2* = τ2 + b and we maximize the likelihood (3) (with vc replaced by vc*) with respect to τ2*. By the invariance property of the maximum likelihood estimates, if τˆ2* is the mle then τˆ2 is also the mle. Therefore, vc + τˆ2 = (vcb) + (τˆ2 + b) = vc* + τˆ2*, τˆ2* = τˆ2 + b.

Under scenario 2, we assume multiplicative bias with equal variances vc = v. Here vc* = k × vc can be rewritten as vc* = vc − (1 − k) × vc. Because vc = v, then (1 − k) × vc = (1 − k) × v = b and therefore vc* = vcb and so the results above for additive bias apply here also.

Back to Top | Article Outline


The authors thank Jonathan M. Samet for comments.

Back to Top | Article Outline


1.Katsouyanni K, Touloumi G, Spix C, et al. Short term effects of ambient sulphur dioxide and particulate matter on mortality in 12 european cities: results from time series data from the APHEA project. Brit Med J. 1997;314:1658–1663.
2.Samet JM, Zeger SL, Dominici F, Dockery D, Schwartz J. The National Morbidity, Mortality, and Air Pollution Study Part I: Methods and Methodological Issues. Cambridge, MA: Health Effects Institute, 2000.
3.Thomas A, Spiegelhalter DJ, Gilks WR. Bugs: A program to perform Bayesian inference using Gibbs sampling. In: Bernardo JM, Berger JO, Dawid AP, Smith AFM, eds. Bayesian Statistics 4. Proceedings of the Fourth Valencia International Meeting. Oxford: Clarendon Press; 1992;837–842.
4.Gilks WR, Richardson S, Spiegelhalter DJ, eds. Markov Chain Monte Carlo in Practice. London: Chapman & Hall, 1996.
5.Burnett R, Krewski D. Air pollution effects of hospital admission rates: A random effects modelling approach. Can J Stat. 1994;22:441–458.
6.Roemer W, Hoek G, Brunekreef B, Kalandidi A, Pekkanen J. Daily variations in air pollution and respiratory health in a multicenter study: the peace project. Eur Respir J. 1998;12:1354–1361.
7.Dominici F, Samet JM, Zeger SL. Combining evidence on air pollution and daily mortality from the twenty largest US cities: A hierarchical modeling strategy (with discussion). J R Stat Soc [Ser A]. 2000;163:263–302.
8.Zanobetti A, Schwartz J, Dockery D. Airborne particles are a risk factor for hospital admissions for heart and lung disease. Environ Health Perspect. 2000;108:1071–1077.
9.Schwartz J. Assessing confounding, effect modification, and thresholds in the associations between ambient particles and daily deaths. Environ Health Perspect. 2000;108:563–568.
10.Dominici F. Air pollution and health: what can we learn from an hierarchical approach? [Invited commentary]. Am J Epidemiol. 2002;1:11–15.
11.Lindley DV, Smith AFM. Bayes estimates for the linear model (with discussion). J R Stat Soc [Ser B]. 1972;34:1–41.
12.Samet JM, Zeger SL, Berhane K. The Association of Mortality and Particulate Air Pollution. Cambridge, MA: Health Effects Institute, 1995.
13.Kelsall J, Samet JM, Zeger SL. Air pollution, and mortality in Philadelphia, 1974–1988. Am J Epidemiol. 1997;146:750–762.
14.Hastie TJ, Tibshirani RJ. Generalized Additive Models. New York: Chapman & Hall, 1990.
15.McCullagh P, Nelder JA. Generalized Linear Models (Second Edition). New York: Chapman & Hall, 1989.
16.Hedges LV, Olkin I. Statistical Methods for Meta-analysis. Orlando, FL: Academic Press, 1985.
17.DerSimonian R, Laird N. Meta-analysis in clinical trials. Control Clin Trials. 1986;7:177–188.
18.Carlin BP, Louis TA. Bayes and Empirical Bayes Methods for Data Analysis. New York: Chapman & Hall, 1996.
19.Morris CN, Normand SL. Hierarchical models for combining information and for meta-analysis. In: Bernardo JM, Berger JO, Dawid AP, Smith AFM, eds. Bayesian Statistics 4. Oxford: Oxford University Press; 2003;321–344.
20.Gelman A, Carlin J, Stern H, Rubin D. Bayesian Data Analysis. London: Chapman & Hall, 1995.
21.Buja A, Hastie T, Tibshirani R. Linear smoothers and additive models. Ann Stat. 1989;17:453–455.
22.Green PJ, Silverman BW. Nonparametric Regression and Generalized Linear Models: A Roughness Penalty Approach. London: Chapman & Hall, 1994.
23.Ramsay T, Burnett R, Krewski D. The effect of concurvity in generalized additive models linking mortality to ambient particulate matter. Epidemiology. 2003;14:18–23.
24.Chambers J, Hastie T. Statistical Models in S. London: Chapman and Hall, 1992.
25.Lumley T, Sheppard L. Time series analyses of air pollution and health: Straining at gnats and swallowing camels? Epidemiology. 2003;14:13–14.
26.Samet J, Dominici F, McDermott A, Zeger S. New problems for an old design: Time series analyses of air pollution and health. Epidemiology. 2003;14:11–12.
27.Dominici F, McDermott A, Hastie T. Improved semi-parametric time series models of air pollution and mortality. Johns Hopkins University, Dept. of Biostatistics, Technical Report, 2003.
28.Dominici F, McDermott A, Zeger SL, Samet JM. Airborne particulate matter and mortality: Time-scale effects in four us cities. Am J Epidemiol. 2003;157:1055–1073.
29.van der Vaart AW. Asymptotic Statistics. New York: Cambridge University Press, 1998.
30.Efron B, Tibshirani RJ. An Introduction to the Bootstrap. New York: Chapman & Hall, 1993.
31.Liang KY, Zeger SL. Longitudinal data analysis using generalized linear models. Biometrika. 1986;73:13–22.
32.Daniels M, Kass R. A note on first-stage approximation in two-stage hierarchical models. Sankhya Series B. 1998;60:19–30.
© 2004 Lippincott Williams & Wilkins, Inc.