In multi-site time series studies of the association of air pollution with mortality and morbidity,^{1,2} site-specific time series data are assembled under a common framework and analyzed with a uniform analytic approach. Hierarchical modeling is an unified approach for combining evidence across studies, quantifying the sources of variability, and assessing effect modification. Because of the development of computational tools that facilitate their implementation,^{3,4} hierarchical models have been recently applied to analysis of multi-site time series data.^{1,5–9} (Dominici^{10} presents a more detailed discussion on the use of hierarchical models in multi-site time series studies of air pollution and health.)

Hierarchical models^{11} for analyses of multi-site time series studies of air pollution and mortality have a multi-stage structure. At the first stage, the association between air pollution and health is described using a site-specific regression model^{7,12,13} which takes into account potential confounding factors such as trend, season, and climate. Generalized Additive Models (GAM)^{14} with nonparametric adjustment for confounding factors for the site (eg, smoothing splines) or Generalized Linear Models (GLM)^{15} with regression splines (eg, natural cubic splines) are generally used to estimate site-specific relative rates βˆ_{c} and their sampling variances *v*_{c}. Here βˆ_{c} denotes the percentage increase in mortality/morbidity per unit increase in level of the air pollutant, and *v*_{c} denotes the statistical uncertainty in βˆ_{c}; *v*_{c} depends on the number of days with available air pollution data, adverse health events, and correlated time varying confounders. At the second stage, the information from multiple sites is combined by assuming that the true city-specific relative rates (β_{c}) have a common mean α (also called the pooled relative rate) and variance τ^{2}; this variance represents the variability across sites of the true relative rates (also called the heterogeneity parameter). Fixed or random effects, empirical Bayesian, or fully Bayesian models^{16–20} are used to estimate α and τ^{2}.

The nature and characteristics of time series studies in air pollution and health make it difficult to estimate health risk while taking into account all sources of uncertainty. First, the variability in the mortality or morbidity time series explained by air pollution is an order of magnitude lower than the variability in the mortality time series explained by weather, trend, and seasonality. Consequently, the estimates of air pollution effects are sensitive to the method of adjustment for confounding factors. Second, to control adequately for confounding, several highly correlated predictors are included in the site-specific regression model. This can make variance estimation unstable, and slow the convergence of fitting algorithms such as the backfitting algorithm in GAM.^{21} Third, because the confounding effects of climate and season are not linear and vary slowly, these need to be modeled using smooth functions such as smoothing splines or regression splines.^{22} Nonlinear modeling increases both the number of nuisance parameters and the computational complexity. In all these cases, a sound and robust assessment of the statistical uncertainty of β_{c} can be hard to obtain, and there is likely to be underestimation of variance due to model mis-specification.

Problems inherent in standard error estimation of air pollution effects have recently been pointed out in the literature. For example, Ramsey et al^{23} reported that the inability of the GAM to take full account of the correlation among nonlinear confounders can lead to underestimation of the standard error of relative rate estimates (see also Chambers and Hastie^{24}^{(pp 303–304)} and commentaries by Lumley and Sheppard^{25} and Samet et al^{26}).

Recently Dominici, McDermott, and Hastie^{27} have developed gam.exact, an extended version of the S-plus function gam used to fit Generalized Additive Models. The S-plus function gam.exact calculates an asymptotically exact covariance matrix of the air pollution effect estimates by properly taking into account the correlation among the nonlinear predictors.

The reanalyses of the National Morbidity, Mortality, and Air Pollution Study (NMMAPS)^{28} empirically confirmed the theoretical results of Ramsey et al^{23} These reanalyses showed that the degree of bias in the standard errors was proportional to the size of the standard errors (a form of multiplicative bias). Variances more robust than the ones obtained from GAM software can be obtained by using standard statistical theory^{29} or by bootstrap^{30} or GEE methods.^{31} However, in time series studies of air pollution and health, such methods might be computationally expensive and “off-the-shelf” statistical software is not always available.

In this paper, we investigate the sensitivity of the pooled estimate αˆ with respect to underestimation of the city-specific statistical variances *v*_{c}. We focus on underestimation (as opposed to overestimation) for 3 reasons. First, underestimation of *v*_{c} is a much more serious problem than overestimation because it leads to less conservative conclusions about the statistical significance of a site-specific association between air pollution and health. Second, underestimation of *v*_{c} is more common than overestimation, as the former generally reflects a failure to take into account 1 or more sources of uncertainty. Third, underestimation of *v*_{c} has been identified as a limitation of the statistical software for the implementation of GAM.^{23,24} However, this limitation has recently been overcome.

We show that the pooled estimate is unaffected by underestimation of *v*_{c} when the bias is additive or when the bias is multiplicative and the statistical variances are equal across cities. Then, by a simulation study, we investigate the case of multiplicative bias and unequal statistical variances. Here we define 3^{3} = 27 scenarios which are a combination of number of cities, magnitude of the bias, and amount of heterogeneity. We then identify under which scenarios the underestimation of *v*_{c} affects the estimate of the pooled relative rate estimate â. We also investigate the impact of variance underestimation on the national average relative rate estimate from the National Morbidity, Mortality, and Air Pollution Study (NMMAPS).

## METHODS

We considered the following 2 stage normal-normal hierarchical model^{11}

where *C* is the total number of sites, and *N*(*a*, *b*) denotes the normal distribution with mean *a* and variance *b*.

We estimated α and its standard error *SE*(â) by using an empirical Bayes approach.^{18} More specifically, we first computed the restricted maximum likelihood estimate τˆ^{2}, and then we calculated α and *SE* (â) conditional on τˆ^{2}. Details are in the appendix. The empirical Bayes estimate of α and its standard error are defined below:

where *w*_{c} = *v*_{c} + τˆ^{2}. Let *v*_{c} be the “true” statistical variance, and let vc* be the “estimated” statistical variance under a miss-specified model. We assumed vc* < *v*_{c}, and we considered 3 cases of underestimation of *v*_{c}:

- additive bias: vc* =
*v*_{c} − *b*, for arbitrary *v*_{c} and *b* > 0;
- multiplicative bias, equal variances: vc* =
*k* × *v*_{c}, *v*_{c} ≡ *v*; *v* for arbitrary *v* and 0 < *k* < 1;
- multiplicative bias, unequal variances: vc* =
*k* × *v*_{c}, for arbitrary *v*_{c} and 0 < *k* < 1.

In the first and second case, underestimation of the variance does not affect the pooled estimate of α. The mathematical proof is given in the appendix.

In the third case, we explored the impact of multiplicative bias on the estimation of α by a simulation study. We considered 3^{3} = 27 scenarios consisting of 3 components:

- underestimation of 50%, 30%, and 10% which corresponds to
*k* = 0.5, 0.7, 0.9;
- number of sites:
*C* = 15, 20, 90;
- amount of heterogeneity: τ = 0.05, 0.5, 1 corresponding, respectively, to small, medium, and large between-city standard deviations.

For each scenario, we generated 250 replicates of βˆ = (βˆ_{1},…, βˆ_{C}) from model(1). More specifically, we chose values for α, τ^{2}, and *v*_{c} and we sequentially generated βˆ_{c} from *N*(α, τ^{2}), and then βˆ_{c} from *N*(β_{c}, *v*_{c}). For *C* = 90, we set *v*_{c} equal to the estimates obtained from the NMMAPS reanalysis.^{28} For *C* = 15 and *C* = 20 we took a random sample from the 90 *v*_{c} NMMAPS estimates. We also set α equal to 0.21 (the pooled NMMAPS estimate for total mortality at lag 1^{28}). In summary, each scenario (biased vc*, sample size *C*, and amount of heterogeneity τ), led to 250 simulated values of βˆ.

## RESULTS

For each replicate βˆ, and for each scenario, we calculated the empirical Bayes estimates of τ^{2} and α, using both *v*_{c} and vc*, by fitting model (1) and using equation (2). This led to 2 sets of 250 estimates of α, here denoted by â(*v*_{c}) and â(vc*). Figure 1 shows boxplots of the 250 standardized differences between the 2 α estimates defined as (â(*v*_{c}) − â(vc*))/*SD*, where *SD* is the standard deviation of the 250 estimates of â(*v*_{c}). Thus, a standardized difference larger than 2 suggests high sensitivity of the pooled estimate to the use of the biased vc* instead of “true” *v*_{c}.

For all scenarios, the 95% confidence intervals of the standardized differences were always within 2 standard deviations of 0, suggesting that underestimation of *v*_{c} did not affect â substantially. In 8 scenarios, the distributions of the standardized differences showed more variability, with their maximum absolute differences larger than 2 standard deviations. Seven of those scenarios corresponded to extreme underestimation of 50%. The other scenario was characterized by small sample size, small heterogeneity, and a bias of 30%.

We also investigated the impact of variance underestimation on the NMMAPS national average relative rate estimate. We recalculated the national average relative rate of mortality for a 10 unit increase in *PM*_{10} by varying the underestimation parameter *k* from 0.1 (90% variance underestimation) to 1 (no variance underestimation). Figure 2 shows the NMMAPS national average and heterogeneity estimates as function of the underestimation parameter *k*. Note that for values of *k* increasing from 0.1 to 0.6 (variance underestimation from 90 to 40%), τˆ decreased; also, â moved from the un-weighted average of the βˆ_{c}(equal to 0.28%; 95% posterior interval equal to .05 – .51) toward the weighted average of the βˆ_{c} defined in equation (2) (equal to 0.21%; 95% posterior interval equal to .10 – .31). The latter occurred because when *k* was very small (large variance underestimation), τˆ^{2} was large relative to vc* = *kv*_{c}, and roughly the same weight was assigned to all the city-specific estimates. For values of *k* increasing from 0.6 to 1 (variance underestimation from 40% to 0%), â was constant because τˆ^{2} = 0.

In summary, little or no effect was observed when the variance underestimation was less than 40% (*k* ≥ 0.6) (national average estimate equal to 0.21%). When underestimation was larger than 40% (*k* < 0.6), â gradually increased with the degree of underestimation toward the un-weighted pooled estimate which is equal to 0.28%. However, for all values of *k*, the t-ratio â*SE*(â) remained larger than 2 indicating “statistical significance” of the national average relative rate estimate.

## DISCUSSION

The results of this paper indicate that, in multisite time series studies of air pollution and health, underestimation of the statistical variances *v*_{c} did not affect the estimate of the pooled effect α substantially. Some sensitivity was observed when the number of sites was small (less than 20), the between-city variability was close to zero, and the underestimation was larger than 40%. In addition, our results suggest that variance underestimation of as much as 40% in the NMMAPS studies would not change the conclusions.

The mathematical results reported in this paper are generalizable to any two-stage normal-normal hierarchical model, and therefore can be applied to any meta-analysis of study results or clinical trials that uses such modeling approach. The simulation results reported in this paper can also be applied to any meta-analysis when the size of the overall parameter α with respect to the heterogeneity parameter τ and the city specific standard deviations, *v*_{c}, is similar to the ones considered here. However, even in a meta-analysis with very different numerical values for α, τ, and *v*_{c}, our simulation studies can be easily modified to reflect these other situations.

Our underestimation parameter *k*, here specified a priori, may depend upon site-specific characteristics. In each city, data characteristics may lead to different autocorrelation structures of the residuals, and therefore to different degrees of adjustment for confounding factors. Therefore, variance underestimation attributable to model mis-specification might be more or less severe in the different cities, leading to different values of *k*. Our simulation study considers both minor (10%) and major (50%) underestimations of standard errors and therefore should encompass the more severe cases of model mis-specification.

The robustness of air pollution effect estimates to variance underestimation applies only to pooled-effect α. Empirical Bayes estimates of the site-specific relative rates, here denoted by β˜_{c}, are affected by the underestimation of the *v*_{c}. To see this, consider their definition:

where *S*_{c} = *v*_{c}^{−1}/(τˆ^{−2} + v_{c}^{−1}) is the shrinkage factor. Note that *S*_{c} is not invariant to underestimation of the *v*_{c}. In this case, underestimation of *v*_{c} leads to overestimation of *S*_{c} and therefore leads to β˜_{c} which rely too heavily on βˆ_{c}. Thus, variance underestimation leads to an overestimation of the heterogeneity of the air pollution effects, and therefore to under-shrinkage of the city-specific empirical Bayes estimates toward their overall mean â.

Unfortunately, because the true statistical variances are unknown, the distinction between additive and multiplicative bias is not straightforward. One possible approach is to specify 2 alternative but comparable site-specific regression models and to compare the estimates of *v*_{c} under the 2 models. For example, Dominici et al^{27} estimated βˆ_{c} and *v*_{c} under a GLM with natural cubic splines and under a GAM with smoothing splines; they then compared the *v*_{c} under these 2 modeling approaches. In time series studies of air pollution and health, a more robust estimate of the statistical variance under a GAM can be obtained by using gam.exact.^{27} In other situations, robust variance estimates can be obtained by using asymptotic theory^{29} or bootstrap^{30} or sandwich estimates.^{31} Robust and model-based variance estimates can then be compared for a small number of sites to gain insight on the form of bias.

The results in this paper are presented using an empirical Bayesian approach to estimation, in which the Bayesian estimate of α is obtained by plugging a point estimate of τ^{2} into Equation (2). However our results apply also if a fully Bayesian version of the above model is fit with noninformative priors on τ^{2} and α.

Finally, the results of this paper apply under the assumption that the normal approximation to the likelihood function at the first stage of the hierarchical model is appropriate. Asymptotically, this approximation has an accuracy proportional to the number of days with available data in each city.^{32} In time series studies of air pollution and health the asymptotic normal approximation is generally accurate; however, additional work is needed to extend such results to distributions other than the normal and to examine the sensitivity of inferences if the normal approximation is not accurate.

### Appendix

**Details on the estimation of τ**^{2} and α**.** The empirical Bayes estimate of α was obtained by first computing the restricted maximum likelihood estimate of τ^{2}, which was then plugged into equation (2). The restricted maximum likelihood estimate of τ^{2} was obtained by maximizing the following likelihood function

where *â*(τ^{2}) = [Σ_{c}βˆ_{c}w_{c}(τ^{2})^{−1}]/[Σ_{c}w_{c}(τ^{2})^{−1}] and *w*_{c}(τ^{2}) = v_{c}+τ^{2}. This maximization can be done iteratively using the E-M algorithm.^{18}

**Mathematical arguments concerning the robustness of the pooled estimate to variance underestimation under cases 1 and 2.** Under scenario 1, we start by showing that additive bias does not affect estimation of α for arbitrary *v*_{c}. Let τ^{2}* be the maximum likelihood estimate (mle) of τ^{2} obtained by maximizing the likelihood (3) with vc* = *v*_{c} − *b* instead of *v*_{c}. Note that estimation of α as defined in equation (2) depends on *v*_{c} only through *v*_{c} + τ^{2}. Therefore, we simply need to show that the estimate for τ^{2}* conditional on vc* is *b* units more than the estimate for τ^{2} conditional on *v*_{c}. In this case, *v*_{c} + τˆ^{2} = vc* + τˆ^{2}* and therefore â(τˆ^{2}) is unaffected by underestimation of *v*_{c}. We define τ^{2}* = τ^{2} + *b* and we maximize the likelihood (3) (with *v*_{c} replaced by vc*) with respect to τ^{2}*. By the invariance property of the maximum likelihood estimates, if τˆ^{2}* is the mle then τˆ^{2} is also the mle. Therefore, *v*_{c} + τˆ^{2} = (*v*_{c} − *b*) + (τˆ^{2} + *b*) = vc* + τˆ^{2}*, τˆ^{2}* = τˆ^{2} + *b*.

Under scenario 2, we assume multiplicative bias with equal variances *v*_{c} = *v*. Here vc* = *k* × *v*_{c} can be rewritten as vc* = *v*_{c} − (1 − *k*) × *v*_{c}. Because *v*_{c} = *v*, then (1 − *k*) × *v*_{c} = (1 − *k*) × *v* = *b* and therefore vc* = *v*_{c} − *b* and so the results above for additive bias apply here also.

## ACKNOWLEDGMENTS

The authors thank Jonathan M. Samet for comments.

## REFERENCES

1.Katsouyanni K, Touloumi G, Spix C, et al. Short term effects of ambient sulphur dioxide and particulate matter on mortality in 12 european cities: results from time series data from the APHEA project.

*Brit Med J*. 1997;314:1658–1663.

2.Samet JM, Zeger SL, Dominici F, Dockery D, Schwartz J.

*The National Morbidity, Mortality, and Air Pollution Study Part I: Methods and Methodological Issues*. Cambridge, MA: Health Effects Institute, 2000.

3.Thomas A, Spiegelhalter DJ, Gilks WR. Bugs: A program to perform Bayesian inference using Gibbs sampling. In: Bernardo JM, Berger JO, Dawid AP, Smith AFM, eds. Bayesian Statistics 4. Proceedings of the Fourth Valencia International Meeting. Oxford: Clarendon Press; 1992;837–842.

4.Gilks WR, Richardson S, Spiegelhalter DJ, eds.

*Markov Chain Monte Carlo in Practice*. London: Chapman & Hall, 1996.

5.Burnett R, Krewski D. Air pollution effects of hospital admission rates: A random effects modelling approach.

*Can J Stat*. 1994;22:441–458.

6.Roemer W, Hoek G, Brunekreef B, Kalandidi A, Pekkanen J. Daily variations in air pollution and respiratory health in a multicenter study: the peace project.

*Eur Respir J*. 1998;12:1354–1361.

7.Dominici F, Samet JM, Zeger SL. Combining evidence on air pollution and daily mortality from the twenty largest US cities: A hierarchical modeling strategy (with discussion).

*J R Stat Soc [Ser A]*. 2000;163:263–302.

8.Zanobetti A, Schwartz J, Dockery D. Airborne particles are a risk factor for hospital admissions for heart and lung disease.

*Environ Health Perspect*. 2000;108:1071–1077.

9.Schwartz J. Assessing confounding, effect modification, and thresholds in the associations between ambient particles and daily deaths.

*Environ Health Perspect*. 2000;108:563–568.

10.Dominici F. Air pollution and health: what can we learn from an hierarchical approach? [Invited commentary].

*Am J Epidemiol*. 2002;1:11–15.

11.Lindley DV, Smith AFM. Bayes estimates for the linear model (with discussion).

*J R Stat Soc [Ser B]*. 1972;34:1–41.

12.Samet JM, Zeger SL, Berhane K.

*The Association of Mortality and Particulate Air Pollution*. Cambridge, MA: Health Effects Institute, 1995.

13.Kelsall J, Samet JM, Zeger SL. Air pollution, and mortality in Philadelphia, 1974–1988.

*Am J Epidemiol*. 1997;146:750–762.

14.Hastie TJ, Tibshirani RJ.

*Generalized Additive Models*. New York: Chapman & Hall, 1990.

15.McCullagh P, Nelder JA.

*Generalized Linear Models (Second Edition)*. New York: Chapman & Hall, 1989.

16.Hedges LV, Olkin I.

*Statistical Methods for Meta-analysis*. Orlando, FL: Academic Press, 1985.

17.DerSimonian R, Laird N. Meta-analysis in clinical trials.

*Control Clin Trials*. 1986;7:177–188.

18.Carlin BP, Louis TA.

*Bayes and Empirical Bayes Methods for Data Analysis*. New York: Chapman & Hall, 1996.

19.Morris CN, Normand SL. Hierarchical models for combining information and for meta-analysis. In: Bernardo JM, Berger JO, Dawid AP, Smith AFM, eds.

*Bayesian Statistics 4*. Oxford: Oxford University Press; 2003;321–344.

20.Gelman A, Carlin J, Stern H, Rubin D.

*Bayesian Data Analysis*. London: Chapman & Hall, 1995.

21.Buja A, Hastie T, Tibshirani R. Linear smoothers and additive models.

*Ann Stat*. 1989;17:453–455.

22.Green PJ, Silverman BW.

*Nonparametric Regression and Generalized Linear Models: A Roughness Penalty Approach*. London: Chapman & Hall, 1994.

23.Ramsay T, Burnett R, Krewski D. The effect of concurvity in generalized additive models linking mortality to ambient particulate matter.

*Epidemiology*. 2003;14:18–23.

24.Chambers J, Hastie T.

*Statistical Models in S*. London: Chapman and Hall, 1992.

25.Lumley T, Sheppard L. Time series analyses of air pollution and health: Straining at gnats and swallowing camels?

*Epidemiology*. 2003;14:13–14.

26.Samet J, Dominici F, McDermott A, Zeger S. New problems for an old design: Time series analyses of air pollution and health.

*Epidemiology*. 2003;14:11–12.

27.Dominici F, McDermott A, Hastie T. Improved semi-parametric time series models of air pollution and mortality. Johns Hopkins University, Dept. of Biostatistics,

*Technical Report*, 2003.

28.Dominici F, McDermott A, Zeger SL, Samet JM. Airborne particulate matter and mortality: Time-scale effects in four us cities.

*Am J Epidemiol*. 2003;157:1055–1073.

29.van der Vaart AW.

*Asymptotic Statistics.* New York: Cambridge University Press, 1998.

30.Efron B, Tibshirani RJ.

*An Introduction to the Bootstrap*. New York: Chapman & Hall, 1993.

31.Liang KY, Zeger SL. Longitudinal data analysis using generalized linear models.

*Biometrika*. 1986;73:13–22.

32.Daniels M, Kass R. A note on first-stage approximation in two-stage hierarchical models.

*Sankhya Series B*. 1998;60:19–30.