What this study adds
The main contribution of this work is to use nonlinear parametric functions in the case-crossover methods. As a criterion to assess the goodness of fit, the Akaike information criterion value is used. The constructed models are more accurate than in the standard approach, where a linear association is assumed.
The health effects of short-term exposure to air pollution are very important aspects of environmental epidemiology. The effects, often measured as emergency department visits or deaths, provide evidence for relationships between exposure and acute health problems. Among various statistical methods used to assess the associations between exposure to ambient air pollution and health outcomes, the case-crossover (CC) technique is very popular. This methodology was presented in 1991,1 and since then, it has been well accepted and widely used. The CC method is an expansion of the case–control method, based mainly on comparing exposures of the same subject in different time frames or periods, usually expressed as different days. This method allows for the generation of unbiased conditional logistic regression estimates when applied to analyses of rare, acute health effects presumably correlated with short-term transient factors.2 It is also well suited for studying lagged effects when short time intervals separate an increase in exposure. Finally, the method allows for individuals to serve as their own controls, which assures compensating by design for the time-invariant factors locally. This technique is applied here as a basis to construct the models.
Time-stratified design is used mainly to define controls in the applied CC method.3,4 Here we are using control days as every seventh day in the same month and year as the case. This technique results in three or four controls for each considered individual health event, and control days are chosen with alternative selection configurations, pre- and post-event. Consider the following two examples. For a health event on Monday, October 2, 2017, the days October 9, 16, 23, and 30 are control periods—in this situation we have 4 controls, and they are all after the event day. For a health event on Wednesday, October 11, 2017, the days October 4, 18, and 25 are control periods. In this situation, we have only 3 controls; one before and two after the event day. As in the standard CC method, in the presented approach, a time window is also 1 month. The standard CC method gives the risk related to exposure in the form of exp(β × concentration), where β is the coefficient estimated by the model. The methodology proposed here assumes flexible forms of the expression for the risk.
A main goal of this work is to present some methodological techniques related to short-term ambient air pollution exposure and associated health outcomes. This article further develops a methodology previously applied to cohort studies of long-term exposure. It applies the methodology to realize nonlinear transformations of the pollutant concentrations. As a primary statistical method, the flexible alternative of the CC methodology is used. The current study aimed to present the combination of two approaches, conditional Poisson models and a parametric nonlinear concentration–response function. The described methodology is tested using four databases, of which two are well known and publicly available.
The method presented here is based on the CC methodology. The CC method can be replaced by other, alternative statistical techniques, as proposed by some authors.5–7 The used technique may be classified as a hybrid of the CC approach. In our study, we are using conditional Poisson models considered on the stratum of the following form: <year: month: day of week>.5 Thus, we have a 1-month time window and day-of-week (dow) factors that are adjusted in the constructed models by the design. Applying such a technique, we consider daily counts rather than daily individual events as in the standard CC method. In general, other statistical models can be used here in this stage.5–7 We need only to be able to retrieve the value of the Akaike information criterion (AIC)8 or other measurements of the quality of the approximations. In summary, the standard CC method realizes the model log(RR) = Concentration + Covariate; the presented methodology log(RR) = g(Concentration) + Covariates, where RR is the relative risk and the function g(z) has a special form, described below.
In the work9 related to a longitudinal cohort study in Canada, the model with a concentration-dependent coefficient risk function was introduced. In such a methodology, β(z) = β × ω(z|μ,τ) represents the risk coefficient that varies with concentration. Variations on an assumed sigmoidal shape of concentration–response are controlled by the logistic weighting function of the following form:
where μ (mu) is a location parameter and r is the range of z (concentration). The scalar τ (tau) controls the curvature of the weighting function. Larger values of τ produce shapes with less curvature. The weighting function is almost linear for τ > 0.5. Nasari et al.9 present the algorithm which searches for the best fitted model (using the maximum likelihood criterion) among the models with two forms of the function f:f(z) = z and f(z) = log(z) and estimated weighting functions. The approximation to the log of risk now has the form g(z) = f(z) × β(z). The algorithm tries two variants of the weighting function with τ = 0.1 and τ = 0.2. The published algorithm9 searches on predefined grids of the percentiles of z and, therefore, determines optimal value of the location parameter μ (mu) with an approximate accuracy.
Here, in our approach, we are considering the nonlinear function AIC (mu, tau). In some situations, we use also the alternative function AIC (mu) with fixed tau (0.1 or 0.2), as in some cases the algorithm gives a better selection of the model. The function AIC is subject to minimization with respect to its parameters. We submit to minimization of the AIC with a given initial value (mu0, tau0). To retrieve the AIC value for the next step, the iterative process needs to fit the model with the new values (mu, tau). Finally, the best fit is determined in the considered class of the models. The coefficient (Beta) and its standard error (SE) are also estimated. In addition, we also check the AIC values for the original statistical models, without the logistic weighting function, that is, for a linear (z, identity) and a logarithmic (log(z)). It allows us to evaluate the obtained models.
In this work, for illustrative purposes, the following four databases are used to present the proposed methodology. Two of the presented databases are publicly available. In all cases, temperature and relative humidity in the constructed statistical models are represented in the form of natural splines. The exposure is considered on the same day as the health event (lag 0).
- (a) Milan mortality: The Milan mortality database has records of daily mortality for 3652 consecutive days (January 1, 1980, to December 30, 1989, Milan, Italy). The data are publicly available from the package SemiPar in R.10 Here we consider total number of deaths as the health response and exposure as total suspended particulates in ambient air.
- (b) Chicago mortality: This data set contains mortality (all causes, cardiovascular disease (CVD), respiratory), weather (temperature, relative humidity) for the city of Chicago, USA, for the period 1987 to 2000.11 These data are publicly and freely available in the form of data frame.11 Here, daily cardiovascular mortality counts (CVD) are considered as health outcomes. For exposure, median daily ambient ozone concentration on the same day is defined.
- (c) Calgary mortality: The data contain daily mortality counts for all nonaccidental deaths, cardiopulmonary causes (CP), and non-CP in Calgary, Canada, for the period January 1, 1984, to December 31, 2007 (in total 8766 days).12 Here we used CP mortality (41,944 deaths) as health response and daily ozone 8 hour maximum as an exposure.
- (d) Edmonton emergency department visits: Emergency department (ED) visits in Edmonton, Canada, for the period April 1, 1992, to March 31, 2002 were used. The data consist of almost 3 million coded visits recorded by trained medical nosologists using The International Classification of Diseases, 9th Revision codes.13 Here we consider the ED visits for nondependent abuse of drugs (The International Classification of Diseases, 9th Revision: 305) as health outcomes. The data include 21,613 such diagnosed ED visits, of which 13,780 cases were males over a 10-year period. For exposure, daily mean ambient levels of nitrogen dioxide (NO2) were used. Here, the exposure parameters were lagged by 1 day.
In the Supplemental Content; http://links.lww.com/EE/A6 is provided another example of the freely available data (10 years mortality in London, UK) and the corresponding codes in R. (Two data sets are available in the R packages. The data used in the Supplemental Content; http://links.lww.com/EE/A6 are available (a link is provided) and the codes are listed.)
The results for the four databases (a–d) are summarized in Table 1 and illustrated in Figure 1. In Table 1, the estimated values for the coefficients (Beta) and their respective standard errors (SE) are presented. As we are using the logistic weighting function, two additional parameters are provided, the location value mu and the scale value tau. In addition, the form of transformation (z or log(z)) which was applied to obtain the best fit is also listed. In some cases, the minimization failed for the function AIC (mu, tau) with two variables. The value of the obtained AIC (tau, mu) was larger than in the case of minimization of the function AIC(mu), for fixed tau (0.1 or 0.2).
Figure 1 shows different shapes obtained for the health outcomes considered. The illustration on panel (a) suggests that the association of mortality with total suspended particulates exposure does have a short threshold, and the association is significant even for low concentrations. The AIC value for the standard CC method was 24,109.12 and for the method with logistic weighting function was 24,105.45. The opposite was observed for exposure to ozone and its association with CVD mortality (panel b) in Chicago. For a large range of the exposure concentrations, the curve is flat. The health effect of ozone exposure occurs for concentrations above 30 ppb. For this model, the estimated AIC was 36,819.29; for the model using the standard CC approach, the AIC value was 36,893.82 in the case of z and 36,898.64 in the case of log(z).
As the results for Milan and Chicago can be easily assessed as the corresponding data are available, on the panels (a) and (b) are also shown (blue) the estimations generated by the standard CC method (f(z) = z, identity). For both methods, CC and the presented (red), the estimated 95% confidence interval is also shown.
In Calgary (panel c), the association of CP mortality with ozone exposure is positive but not statistically significant. The relative risk slowly and steadily rises with concentration. The association of nitrogen dioxide (NO2) exposure with the ED visits for nondependent abuse of drugs (panel d) is positive and statistically significant. The relative risk is low for the concentrations below 40 ppb, but it increases with higher concentrations. For example, for 60 ppb, the RR is about 1.402. In Edmonton, the effect associated with exposure to nitrogen dioxide appears to be the threshold; the observed effect below a concentration of 30 ppb is low.
A relative risk is a function of a vector of unknown parameters β. Here, simulations of a large number of realizations of β(z) = ω(z|μ,τ) is applied. Variation of betas depends on the parameters (mu and tau) of the logistic weighting function and the used function f(z). We consider only (one) optimal model and simulated 1000 values of the betas based on its standard error. The obtained results are presented in Figure S1 in the Supplemental Content; http://links.lww.com/EE/A6. We observe relatively stable behavior of the generated models.
The technique presented here is relatively simple and practical, in the sense that it can be easily constructed using existing software realized as available procedures. The two publicly available data sets presented in this article allow readers to repeat the analysis and to verify/confirm the presented results.
Panels (a) and (b) on Figure 1 show interesting phenomenon. For Milan data, the approach proposed here gives large RR values for low concentrations than the standard CC method. The linear model (CC method) generates low estimation for the same range of concentrations. The opposite is true for the Chicago data. Such differences may have impact on policy. Majority of the published results represent the estimations based on the standard CC methods (blue lines on Figure 1).
As the AIC values indicate, the applied models give a better fit than does traditional CC methodology, as we also run the models with only f(z) = z and f(z) = log(z), that is, without the logistic weighting function. The provided examples support the use of the models: log(RR) = g(z) + covariates, rather than log(RR) = z + Covariates. The estimated relative risk has the form RR = exp(β(z) × g(z)), and this function may have various shapes. It contrasts with RR = exp(β × z) realized in the standard approach. It indicates that linearity of effects is an assumption of the models and that it should always be checked. The results from two different approaches, linear and nonlinear, may have different qualifications of the associations. There are other statistical approaches to assess the nonlinearity. Among them are the models based on splines. One advantage of the approach presented is that it results with the models in a parametric form. This allows other researchers to easily reproduce the concentration–response curve for the estimated value of the parameters. This is more difficult with splines. Also, splines can go down with increasing concentration, while in the approach presented, the generated regression is monotonic. The approach presented here has the following property: if the estimated parameters are positive, then relative risk is monotonic nondecreasing function of the concentration levels. For two concentration levels, the estimated RR value for larger concentration will be at least the same as for lower.
A reasonable approach is to verify the models constructed with minimization of the AIC only with the location parameter mu. In such a case, it is suggested to run separately with two curvature parameters tau = 0.1 and tau = 0.2 and estimate only the parameter mu.
The estimated shapes may have various forms. As we see in the case of the Chicago mortality data, the estimated curve suggests the existence of a threshold. It should be noted that the standard CC method fits the log-linear models, and as such, it cannot detect the potential threshold.
Conflicts of interest statement
The author declares no conflicts of interest with regard to the content of this report.
The author works at Health Canada and this work is the result of his professional activity.
1. Maclure M. The case-crossover
design: a method for studying transient effects on the risk of acute events. Am J Epidemiol 1991; 133144–153
2. Wang SV, Coull BA, Schwartz J, Mittleman MA, Wellenius GA. Potential for bias in case-crossover
studies with shared exposures analyzed using SAS. Am J Epidemiol 2011; 174118–124
3. Janes H, Sheppard L, Lumley T. Case-crossover
analyses of air pollution
exposure data. Referent selection strategies and their implications for bias. Epidemiology 2005; 16717–726
4. Szyszkowicz M, Tremblay N. Case-crossover
design: air pollution
and health outcomes. Int J Occup Med Environ Health 2011; 24249–55
5. Armstrong BG, Gasparrini A, Tobias A. Conditional Poisson
models: a flexible alternative to conditional logistic case cross-over analysis. BMC Med Res Methodol 2014; 14122
6. Szyszkowicz M. Use of generalized linear mixed models to examine the association between air pollution
and health outcomes. Int J Occup Med Environ Health 2006; 19224–227
7. Szyszkowicz M, Burr WS. The use of chained two-point clusters for the examination of associations of air pollution
with health conditions. Int J Occup Med Environ Health
8. Akaike H. Petrov BN, Csaki F. Information theory as an extension of the maximum likelihood principle. Second International Symposium on Information Theory 1973. BudapestAkademiai Kiado267–281
9. Nasari MM, Szyszkowicz M, Chen H, Crouse D, Turner MC, Jerrett M, Pope CA 3rd, Hubbell B, Fann N, Cohen A, Gapstur SM, Diver WR, Stieb D, Forouzanfar MH, Kim SY, Olives C, Krewski D, Burnett RT. A class of non-linear exposure-response models suitable for health impact assessment applicable to large cohort studies of ambient air pollution
. Air Qual Atmos Health 2016; 9961–972
10. Wand M. Ruppert D, Wand MP, Carroll RJ. The package “SemiPar” in the R library. In the package milan.mortaity data frame. Functions for semiparametric regression analysis, to complement the book. Semiparametric Regression 2003. Cambridge, UKCambridge University PressAvailable at: http://www.use-r.com/SPmanu.pdf
11. Wood S. Wood S. The gamair package. Generalized Additive Models: An Introduction With R 2006. Boca Raton, Florida, CRC;
12. Szyszkowicz M. Cardiopulmonary mortality
and temperature. Environ Polln Protect 2017; 23238
13. Szyszkowicz M, Rowe B. Respiratory health conditions and ambient ozone: a case-crossover
study. Insights in Chest Dis 2015; 19
Air pollution; Case-crossover; Emergency department visit; Mortality; Poisson
Supplemental Digital Content
Copyright © 2018 The Authors. Published by Wolters Kluwer Health, Inc.