In Eq 1, coefficient β2 assumes a value of 0.1 or 0 as respectively established by scenarios with or without confounding due to unobserved variables. To assess the effect of PM10 autocorrelation on the estimates, we created a new time series without autocorrelation. To this end, the original PM10 series was repermuted (random reassortment of existing data) on the basis of the original data series.
On the basis of the time series generated by Eq 1, data were obtained using different structures. First, we used a symmetric case-crossover design having 2 control days, one 7 days before and the other 7 days after the event,8 with subjects being excluded from the first and last 7 days of the series to avoid selection bias.12 We also employed a semisymmetric case-crossover design,13 in which a single control day was chosen at random 7 days before or after the event (Table 1). When conditional logistic regression was applied to this design, weighting was performed to avoid the influence, which is exerted by subjects failing near the beginning or end of the observation time and which results in only one of the 2 potential control times being observable.13 In the semisymmetric design, a stratum is formed for each case by choosing a control time, with equal probability, before or after the referent time. This design calls for the use of a likelihood function altered to achieve a weighted conditional logistic regression,13 but the procedure is nevertheless difficult to implement with standard software. We therefore created the effect of weighting by means of sampling, so as to be able to use standard conditional logistic regression: where a case in any given stratum had a weighting of 0.5, a second control was added on the same day as that on which the case had occurred; and where a control in any given stratum had a weighting of 0.5, a second case was added on the same day as that on which the first case had occurred.
Third, time-stratified case-crossover was applied.5,6 To this end, calendar months were taken as strata, and controls for each case day were taken as being all those days of the same stratum (month) having the same day of the week as the referent day, for example, where the case day was Saturday 9 December 1995, the control days would be the remaining Saturdays in December.
Fourth, we used a full-symmetric case-crossover design, in which the 7 days before and 7 days after the event were chosen as the control period (Figure available with the electronic version of this article). Finally, we examined a full semisymmetric case-crossover design, for which a control period of 14 days before or after the event was chosen at random. A control period of 14 days was used so that the statistical power would be comparable with that of the full-symmetric design. The S-Plus syntax for implementing these samplings will be made available to any reader who requests it from the corresponding author.
Data were analyzed using different methods. In the first place, we applied an ecological time series study, using generalized additive models with Poisson response, and smoothing splines with 7 degrees of freedom per year as smoothers.6 For model estimation purposes, the convergence criteria were modified as proposed by Dominici et al,6 and the standard errors were calculated using the gam.exact function.22
Data transformed by the case-crossover–structure approach were analyzed as matched case–control studies, using exact conditional logistic regression23 generalized to more than one control period, without weighting. We also analyzed these data following a proportional hazards model, with the possibility of recurrent events and time-dependent explanatory variables. We used the Andersen–Gill approach to proportional hazards regression for this purpose.24–26 This approach deems every observation for every individual an independent unit of analysis, albeit dependent on explanatory variables. Under such conditions there is no guarantee of the efficiency of the estimates. Consequently, standard errors were robustly estimated by grouping jackknife estimates.26,27
GLMMs,28–30 which are an extension of generalized linear models,31 also were applied to allow for additional sources of variability caused by unobservable random effects. GLMMs describe the relationship between a variable response and covariables in data that are grouped by one or more factors (in case-crossover studies the groupings would be the individuals) and are thus very useful for analyzing longitudinal data, repeated measures or multilevel data. The GLMMs are estimated using the penalized pseudo-likelihood method.28 Notwithstanding the fact that the estimators of the parameters are consistent and efficient,29 the estimation of the variance of the fixed effects may be slightly biased.30
A binary-response GLMM was constructed including, as fixed effects, the independent term, exposure and individual trend, and as random effect, the independent term. We defined the individual trend as the calendar time of follow-up for each individual, by taking the value of 1 for the first day of follow-up for each such individual and so on successively, until 15. Inclusion of the independent term as a random effect in the GLMM enables the initial heterogeneity among subjects to be controlled for, and its application to a case-crossover design of acute effects of environmental exposures could enable trend and seasonality to be controlled for. Individual trend was included as a fixed effect with the aim of eliminating any possible confounding due to exposure trend. Similarly, to eliminate the influence of autocorrelation on exposure, the inclusion of a first-order autocorrelation variance structure was evaluated.
Using the S-Plus 6.0 statistical software package (Insightful, Seattle, WA), 1000 replicates were generated and analyzed for each of the scenarios. To ensure that the results of the different scenarios would be comparable, we used the same set of randomization seeds (consisting of the replicate number, ie, 1 to 1000) in all scenarios. The glmmPQL function was obtained from the Venables and Ripley MASS Library (4th ed.).32
Expression of Results
For each of the simulations, we calculated: (1) the percentage of mean bias with respect to the coefficient's true value, (Ax − β) · 100/β; (2) the standard deviation of the coefficients, which in turn enabled us to approximate the dispersion and instability of the estimates of Ax; (3) the percentage of coverage, which consists of calculating the percentage of replicates that include the true value of the coefficient within their 95% confidence interval (CI). This measure is complementary to bias because even though mean bias is low, if the calculation of the coefficients is very imprecise (great dispersion) the percentage of coverage may well be small. In addition, the percentage of coverage tends to indicate whether the standard errors of the coefficients have been properly calculated. Obviously, percentage coverage must be equal to or higher than 95%; and (4) the percentage of replicates that rejected the null hypothesis. We felt that the measure of coverage should be complemented by an indicator of statistical power, in view of the fact that a very high percentage of coverage could be attained thanks to overestimated standard errors, which result in wide 95% confidence interval and ensuing low statistical power. As is the case with coverage, the best model is that which has the greatest statistical power.
Figures 1 and 2 set out the simulation conditions. Figure 1A shows the PM10 sequence that was used as an independent variable to generate the replicates. Shown in Figure 1B is the confounding variable with trend and seasonality. Figure 2A is the graph of close autocorrelations (1–30 days) and Figure 2B that of distant autocorrelations (28–364 days, at intervals of 28 days) in respect of PM10. In Barcelona, the PM10 variable plotted a statistically significant negative trend (P < 0.001) of −5.96 10−3. Furthermore, the confounding variable had a correlation of 0.3670 with PM10.
From Table 1, it can be seen that case-crossover designs (whether symmetric, semi-symmetric or time-stratified) analyzed with conditional logistic regression achieved coverage percentages of around 95%, though with a statistical efficiency that in some cases was half that of time-series studies with Poisson regression (23.5 vs. 61.4%). Biases in the case of symmetric and time-stratified designs were higher than in the case of their semisymmetric counterparts and appeared to display a certain sensitivity to seasonal confounding. When case-crossover designs were analyzed by means of survival models, they did not appear to afford any advantage over matched case–control models (data not shown). Yet, when the longitudinal GLMM design was applied, whether full-symmetric or full semisymmetric, this generally ensured attainment of high coverage percentages, a statistical power very much higher than that of conditional logistic regression and time-series studies with Poisson regression, as well as low biases; thus, where the mean number of events was high (22 events/day), full semisymmetric case-crossover designs analyzed with GLMM yielded biases of 7.9%. Furthermore, where the daily mean number of events was low (2/day), full semi-symmetric case-crossover designs analyzed as longitudinal studies showed themselves to be very superior to all the remaining models, not only in terms of bias but also in terms of coverage and statistical efficiency.
Table 2 (available with the electronic version of this article) shows the influence exerted by effect magnitude (from 0 to 0.005) and exposure autocorrelation on the estimates of the different models. Semisymmetric case-crossover and time-series studies with Poisson regression models seemed to be least affected by the magnitude of the effect of the pollutant. In general, case-crossover tended to register biases for low coefficient values, except in semisymmetric case-crossover. Whereas full semisymmetric CC model designs analyzed as longitudinal data displayed moderate biases and optimal coverages for coefficient values below 0.003, bias increased and coverages decreased for high coefficients. When data without autocorrelation were applied, a decrease in bias was in evidence across all case-crossover designs, yet the effect was clearer in case-crossover analyzed as longitudinal data, in which bias disappeared for high coefficients.
The results of this simulation study show that longitudinal approaches applied to case-crossover designs may prove useful for analyzing the acute effects of environmental exposures. Specifically, the full semisymmetric case-crossover design analyzed with generalized linear mixed models applied to scenarios with pollutant effect magnitudes similar to those described in the literature33 displays optimal confidence-interval coverage and high statistical efficiency (very superior to that of case-crossover designs analyzed with conditional logistic regression). When applied to scenarios with magnitudes greater than the real values, however, an underestimate of the effect and a decrease in coverage are in evidence, possibly attributable to the effect of autocorrelation present in the exposure variable.
The greatest advantage of case-crossover designs with respect time-series studies with Poisson regression is that time-variable functions are not needed to control for the effect of unmeasured confounding variables. This leads to an automatically control (by design) of any possible confounders (known and unknown). In turn, this avoids the establishment of degrees of freedom nor implies problems of concurvity. Bateson and Schwartz8 comment that a price must be paid if case-crossover studies analyzed with conditional logistic regression are to control for trend and seasonality by design and, in addition, are to investigate potential effect modifiers at an individual level. The price referred to lies in the fact that the relative efficiency of semisymmetric case-crossover designs analyzed with conditional logistic regression is only 66% of that of time-series studies with Poisson regression.8 In our simulations, in which underestimation of standard errors due to concurvity had already been eliminated, time-series studies with Poisson regression were as much as doubly efficient when compared with case-crossover analyzed with conditional logistic regression. Nevertheless, case-crossover analyzed as longitudinal data can be even more efficient than time-series studies with Poisson regression; thus, in analyzing case-crossover as longitudinal data, one would no longer have to pay the price of efficiency in order for these advantages to be obtained from case-crossover studies.
The principal limitation of the case-crossover design analyzed with generalized linear mixed models is the existence of a sensitivity on the part of the model to autocorrelation present in the pollutant variable. While the effect of autocorrelation on estimates yielded by case-crossover designs analyzed with conditional logistic regression has been studied in depth in previous articles,5,34 this effect is not yet known in cases where such designs are analyzed as longitudinal studies. Our data indicate that longitudinal case-crossover designs have a greater sensitivity to autocorrelation than case-crossover with conditional logistic regression and that this effect becomes more pronounced in response to a rise in the magnitude of the effect of the pollutant. For coefficients below 0.003 this effect appears to be small, but it rises sharply thereafter, leading to ensuing low coverage. We feel that new studies are needed to identify the control-selection techniques best suited to reducing the effect of autocorrelation where case-crossover are analyzed as longitudinal data, as has already been done in the case of case-crossover designs analyzed with conditional logistic regression.5,34
Consideration of case-crossover studies with a longitudinal approach and their analysis with mixed models would afford an additional advantage, namely, multilevel analysis, which allows for joint analysis of many cities and so eliminates the need for application of meta-analysis,35 pooled analysis36 or 2-stage hierarchical normal model methodologies.37 In these multilevel case-crossover models, one level could be the city and another the individual, and intercity heterogeneity could be studied by including it as a random effect. This would enable assessment of: (1) the effect of overall dose-response relationships for all the cities as a whole; (2) the forms of possible interaction between climatological variables and pollutants; and (3) the interactions between individual characteristics on the one hand, and climatological variables and pollutants, on the other, for all cities as a whole. The application of mixed models could also be extended to geographical analyses, with parts of cities being deemed intermediate levels of analysis, and areas of highest risk thus being assessed.
Once the influence of autocorrelation on estimates of very marked effects has been controlled for, case-crossover designs analyzed as longitudinal studies could prove a very good alternative to time-series studies with Poisson regression, in that they may well afford a considerable number of additional advantages. Hence, the effects of pollutants and their dose-response relationships could be more specifically ascertained in line with the characteristics of individual subjects (eg, previous disease history) or geographic location (specific areas of cities), with optimal statistical power. This could in turn open the way to a new type of public health guideline addressing the health-related effects of air pollution, targeted at specific population groups or areas of cities.
We thank William Navidi for his advice and guidance on semisymmetric case-crossover design and Michael Benedict for his comments on the translation.
1.Sunyer J, Basagana X, Belmonte J, et al. Effect of nitrogen dioxide and ozone on the risk of dying in patients with severe asthma. Thorax
2.Sunyer J, Basagana X. Particles, and not gases, are associated with the risk of death in patients with chronic obstructive pulmonary disease. Int J Epidemiol
3.Kwon HJ, Cho SH, Nyberg F, et al. Effects of ambient air pollution on daily mortality in a cohort of patients with congestive heart failure. Epidemiology
4.Peters A, Dockery DW, Muller JE, et al. Increased particulate air pollution and the triggering of myocardial infarction. Circulation
5.Levy D, Sheppard L, Checkoway H, et al. A case-crossover analysis of particulate matter air pollution and out-of-hospital primary cardiac arrest. Epidemiology
6.Maclure M. The case-crossover design: a method for studying transient effects on the risk of acute events. Am J Epidemiol
7.Navidi W. Bidirectional case-crossover designs for exposures with time trends. Biometrics
8.Bateson TF, Schwartz J. Control for seasonal variation and time trend in case-crossover studies of acute effects of environmental exposures. Epidemiology
9.Mittleman MA, Maclure M, Robins JM. Control sampling strategies for case-crossover studies: an assessment of relative efficiency. Am J Epidemiol
10.Marshall RJ, Jackson RT. Analysis of case-crossover designs. Stat Med
11.Greenland S. Confounding and exposure trends in case-crossover and case-time-control designs. Epidemiology
12.Bateson TF, Schwartz J. Selection bias and confounding in case-crossover analyses of environmental time-series data. Epidemiology
13.Navidi W, Weinhandl E. Risk set sampling for case-crossover designs. Epidemiology
14.Borgan O, Langholz B, Samuelsen SO, et al. Exposure stratified case-cohort designs. Lifetime Data Anal
15.Lumley T, Levy D. Bias in the case-crossover design: implications for studies of air pollution. Environmetrics
16.Greenland S. A unified approach to the analysis of case-distribution (case-only) studies. Stat Med
17.Vines SK, Farrington CP. Within-subject exposure dependency in case-crossover studies. Stat Med
18.Diggle PJ, Liang KY, Zeger SL. Analysis of Longitudinal Data
. Oxford: Oxford University Press; 1994.
19.Verghese J, Lipton RB, Katz MJ, et al. Leisure activities and the risk of dementia in the elderly. N Engl J Med
20.Canfield RL, Henderson CR Jr., Cory-Slechta DA, et al. Intellectual impairment in children with blood lead concentrations below 10 microg per deciliter. N Engl J Med
21.Figueiras A, Sastre I, Tato F, et al. One-to-one versus group sessions to improve prescription in primary care: a pragmatic randomized controlled trial. Med Care
23.Mehta CR, Patel NR. Exact logistic regression: theory and examples. Stat Med
24.Andersen PK, Gill RD. Cox's regression model for counting processes: a large sample study. Ann Stat
25.Barceló MA. Modelos de fragilidad en el análisis multivariante de supervivencia. PhD Dissertation. Girona: Servei de Publicacions, Universitat de Girona; 2001.
26.Therneau TM, Hamilton SA. rhDNase as an example of recurrent event analysis. Stat Med
27.Lin DY, Wei LJ. The robust inference for the Cox proportional hazards model. JAMA
28.Breslow NE, Clayton DG. Approximate inference in Generalized Linear Mixed Models. J Am Stat Assoc
29.Wolfinger R, O'Connell M. Generalized linear mixed models: a pseudo-likelihood approach. J Stat Computation Simulation
30.Saez M. El problema de las medidas repetidas. Análisis longitudinal en epidemiología. Gac Sanitaria
31.McCullagh P, Nelder JA. Generalized Linear Models
. London: Chapman and Hall; 1989.
32.Venables WN, Ripley BD. Modern Applied Statistics with S-PLUS
. Oxford: Springer; 1999.
33.Dominici F, Zeger SL, Samet JM. Combining Evidence on Air Pollution and Daily Mortality from the Largest 20 US cities: a hierarchical modeling strategy. J Royal Stat Soc A
34.Checkoway H, Levy D, Sheppard L, et al. A case-crossover analysis of fine particulate matter air pollution and out-of-hospital sudden cardiac arrest. Health Effects Institute Res Report
35.Saez M, Ballester F, Barcelo MA, et al. A combined analysis of the short-term effects of photochemical air pollutants on mortality within the EMECAM project. Environ Health Perspect
36.Saez M, Figueiras A, Ballester F, et al. Comparing meta-analysis and ecological-longitudinal analysis in time-series studies. A case study of the effects of air pollution on mortality in three Spanish cities. J Epidemiol Community Health
© 2005 Lippincott Williams & Wilkins, Inc.
37.Dominici F, McDermott A, Zeger SL, et al. On the use of generalized additive models in time-series studies of air pollution and health. Am J Epidemiol