**ArticlePlus**

Click on the links below to access all the ArticlePlus for this article.

Please note that ArticlePlus files may launch a viewer application outside of your web browser.

* http://links.lww.com/EDE/A112

* http://links.lww.com/EDE/A113

Over recent decades, several epidemiologic time-series studies have reported adverse health effects of air pollution, even at historically low levels of air pollution.^{1–6} There has been concern that the air pollution and health associations may be confounded by other time-varying risk factors^{7,8} that are uncontrolled or inappropriately controlled.

Time series data of health events show sudden changes in the number of events attributable to a wide range of social factors (eg, day of week and school holidays) and environmental factors (eg, aeroallergens, epidemics).^{9} Although such discrete episodes in time may be less obvious in mortality series, they are quite obvious in hospital admission series. It has been shown that epidemics of influenza, in particular influenza A, are associated with increased mortality or morbidity.^{10–12} In health events series, influenza epidemics appear as outliers in the regression analysis of counts of health events on predictor factors such as air pollution or weather variables. Inappropriate modeling could bias the air pollution health results if the timing of those epidemics was correlated with air pollution.

Possible sources of influenza data are sentinel monitoring of influenza visits, pneumonia hospital admissions, and respiratory mortality. Whichever information is available, it is better to identify specific epidemic periods to be modeled instead of the daily number of counts. A single-indicator variable approach assumes the same increase in daily events in each epidemic in each year, which is a fairly crude approach.

In most of the reported results about the short-term effects of air pollution on health, influenza epidemics were not modeled at all,^{4,13} or they were controlled using an indicator variable.^{6,9} Given that in most locations data on daily counts of influenza epidemics are not available, a method to adequately control for influenza epidemics in the absence of such data is needed.

Recently, in air pollution epidemiology, sophisticated models, including generalized additive (GAM) Poisson autoregression models have been proposed to analyze time series of health events and to investigate short-term effects of air pollution on health.^{4,6,9} Such models use nonparametric smoothing methods to remove long-term trends and seasonal variations from the data. Despite their flexibility in modeling local variations, a global model may not be appropriate for adjusting for sudden events such as influenza epidemics. It is difficult to control for influenza epidemics using a global model without risking overfitting in the nonepidemic periods because even with nonparametric smoothing, the same span is used to model the entire season.

Within the Air Pollution and Health–A European Approach (APHEA)-2 project,^{6,9} we conducted an analysis to investigate the effect of influenza epidemics on the association between mortality and air pollution. More specifically, we analyzed data on influenza epidemics from 7 cities throughout Europe; 10 methods to control for influenza epidemics, plus analyses without control, were compared.

#### METHODS

##### Data

The APHEA-2 project is a multicenter study that includes 30 cities across Europe and associated regions (eg, Istanbul and Tel Aviv) and aims to investigate the short-term health effects of air pollution.^{6} Seven cities were selected for the present analysis based on their geographical dispersion and the availability of influenza counts data: London, Madrid, Stockholm, Budapest, Paris, Lyon, and Zurich. Daily counts of all-cause mortality, excluding deaths from external causes (International Classification of Disease [ICD-9] codes 800 and higher), and of mortality from cardiovascular diseases (ICD-9 codes 390–459) were considered as health outcomes. We used daily measurements of particulate matter less than 10 μm in aerodynamic diameter (PM_{10}) as the air pollution indicator.

PM_{10} measurements were provided by the monitoring networks established in each town. A monitor was included if certain completeness criteria were fulfilled. We constructed a series consisting of the arithmetic mean of daily values of all available monitoring stations. We used time series of daily temperature (degrees Celsius, daily mean) and relative humidity (%) to control for the potential confounding effects of weather.^{6,9}

##### Analytic Approaches

For the analysis, we used the APHEA-2 methodology, which has been described in detail elsewhere.^{9} Briefly, in each city, we fitted a generalized additive Poisson regression model, also allowing for overdispersion, to model the logarithm of the expected number of daily deaths as a sum of smooth functions of the predictor variables. In particular, LOESS, a moving regression smoother, was used to control for long-term and seasonal terms and weather effects. It has been recently reported that the default convergence criteria in the widely used S-PLUS software *gam* function for implementation of the GAM^{14} do not assure convergence and can provide biased estimates of both coefficients and their associated standard errors.^{15} To avoid bias in the coefficients, we analyzed all data using the more stringent convergence criteria recently proposed.^{15} However, bias in the standard errors is likely to remain.^{16} As a sensitivity analysis, all models were refitted after replacing GAM using LOESS smoothers with Poisson regression models using penalized splines^{17,18} for smoothing. We controlled for day-of-week and holiday effects by using indicator variables. We applied autoregressive Poisson models if necessary and decided a priori to use the average pollutant concentrations of lags 0 and 1 as the exposure variable. This decision was based on previous studies having shown those lags to be the most relevant.^{19} It has been shown that log-transformed air pollutant concentrations give the best fit in settings with high levels of air pollution. There is evidence that 150 μg/m^{3} is the point where the slope starts to change for particle indicators.^{20} To facilitate the comparison of the different methods to control for influenza, we introduced PM_{10} into the model as linear after restricting the analysis to days with PM_{10} concentrations less than 150 μg/m^{3}.

##### Influenza Data

To control for influenza, we obtained data on emergency hospital admissions from London, Stockholm, and Budapest, whereas for Paris, Madrid, Lyon, and Zurich, we used data on influenza cases recorded by the city sentinel system, which includes a certain sample (approximately 20%) of the general practitioners. In London, Madrid, Stockholm and Zurich, influenza data consisted of continuous series of counts covering the whole study period whereas in Paris, Budapest, and Lyon influenza counts were provided only during influenza epidemic periods. In the former case, influenza epidemic periods were defined as those periods for which influenza cases exceeded the 90th percentile of its city-specific distribution. In the latter case influenza epidemic periods were defined according to certain criteria, including influenza virus detection and counts of health indicators such as counts of respiratory infections.^{21,22} Over all cities, the outbreaks were obvious in the plots (see, for example, Fig. 1). It should be noted that the London influenza data refer to the entire country of England (population 48.7 million). When independent influenza data are not available, an alternative way to control for influenza epidemics is to look at respiratory mortality, which is available everywhere.

In this work, we compared 10 methods to control for influenza; 5 of them are based on daily influenza data and can only be applied if such data are available, and the other 5 methods are based on the daily number of respiratory deaths. For comparison, we also present results from the model without any adjustment for influenza. The methods that are dependent on influenza data availability were:

1. Inclusion of influenza data in the model. For consistency of the method in all cities, influenza series included counts of the corresponding indicator (ie, hospital admissions or reported cases by general practitioners) when an epidemic existed and 0 otherwise.

2. A single indicator variable for all influenza epidemics.

3. Multiple indicator variables, 1 for each epidemic period.

4. In each city, indicator variables representing the ranges of influenza counts. The number of dummy variables was either 3 or 4, based on the number of epidemics and the variability of counts within each epidemic. They were defined after exploring the original influenza series. In London, for example, the 4 dummy variables represented the ranges of hospital admissions: ≤200,000, 200,001–300,000, 300,001–400,000, and >400,000 (Fig. 1).

5. Inclusion into the model of a separate smooth function (LOESS terms or p-splines in the sensitivity analysis) of time during the epidemic periods.The methods that depend on respiratory mortality data were:

6. An indicator variable taking the value of 1 when the 7-day moving average of the respiratory mortality was greater than the 90th percentile of its city-specific distribution. This method was applied in the APHEA-2 project.^{9}For the other 4 methods, influenza epidemic periods were estimated based on respiratory mortality series. We defined epidemic periods as follows. Similar to the APHEA-2 method, all days on which the 7-day moving average of the respiratory mortality was greater than the 90th percentile of its city-specific distribution were defined as “epidemic days.” We considered periods with 10 or more neighboring “epidemic days” as epidemic periods. We ignored isolated nonepidemic days within an epidemic period and counted them as part of the epidemic. A sequence of at least 4 consecutive nonepidemic days was required to count a subsequent run of epidemic days as a new epidemic period. For these estimated epidemic periods, methods analogous to methods 2–5 above were applied. Hence the methods were:

7. A single indicator variable for all estimated epidemic periods;

8. Multiple indicators, one for each estimated epidemic;

9. In correspondence with method 4, indicator variables for the 90th, 95th, and 98th percentiles of the 7-day moving average of respiratory mortality for the estimated epidemic periods were included in the model; and

10. In correspondence with method 5, we introduced into the model a separate smooth function of time during the estimated epidemic periods.

Because the comparison of all the above-mentioned methods in each city separately would lead to confusing results, we compared estimates from each method pooled over all cities. We used inverse variance weighting to obtain pooled estimates and, to compare the different methods, we used the sum over all cities of the corrected for overdispersion Akaike's Information Criterion (AIC) for each method.

#### RESULTS

Table 1 shows the summary statistics for total and cardiovascular mortality, PM_{10} concentrations and temperature for each city, as well as the 90th percentile for respiratory mortality. Figure 2 shows total mortality and influenza epidemics as defined from the reported influenza data and from the respiratory mortality series as well as the single estimated influenza days (APHEA-2 method) for Paris and Madrid. In both cities, the duration of the epidemics varied substantially in length, ranging from 21 to 49 days in Madrid and from 20 to 63 in Paris.

Table 1 Image Tools |
Figure 2 Image Tools |

We observed similar variation in all cities included in the present analysis (Table 2). The impact of influenza epidemics on total mortality also varied substantially. In Madrid, the number of deaths increased on average by 8.8% at the peak of epidemics, whereas in some cases, the increase in mortality was higher than 10%. Similarly, in Paris the number of deaths increased on average by 5.8% at the peak of epidemics, whereas among the rest of the cities the average increase ranged from 5% to 10%.

The degree of overlap between reported and estimated epidemics varied substantially among cities (Fig. 2 and Table 2). Generally, there was a time shift (approximately a week later) in the beginning of the epidemics as determined by the respiratory mortality series compared with reported epidemics (Table 2). As can be seen in Figure 1, we observed a similar delay in daily mortality peaks compared with influenza peaks. A few of the reported epidemics were not picked up by the respiratory mortality proxy and vice versa. Epidemic days, as defined by the APHEA-2 method, had the drawback of defining single days with relatively high respiratory mortality as epidemics. There was no pattern regarding the levels of PM_{10} during epidemic and nonepidemic days among cities, being higher in 3, lower in 2, and with no substantial difference in the rest of the cities during the epidemic periods. Reported epidemics were moderately correlated with daily mortality (unadjusted correlation coefficients ranged from 0.20 to 0.40) and weakly with PM_{10} levels (range, −0.24 to 0.12).

Table 3 (and Fig. 3, available with the electronic version of this article) shows the results, pooled over all cities, from the final Poisson regression models regarding PM_{10} effects on total and cardiovascular mortality, with and without controlling for influenza epidemics. Generally, controlling for influenza resulted in an increase in the effect size estimate for PM_{10}. This increase was more pronounced for the methods derived from the respiratory mortality data. Among the methods that depended on the availability of influenza data, the one with the largest increase in the PM_{10} effect estimate was the method that used a separate smooth function of time during the epidemic periods (39% increase compared with no control for total mortality and 20% increase for cardiovascular mortality). In this method, the number of degrees of freedom used in the epidemic period time smooths ranged from 1 (in Zurich) to 21.8 (in Paris). However, for the rest of the methods the change in the PM_{10} effect estimate ranged from −1.4% to 12.4% for total mortality and from 1.30% to 7.4% for cardiovascular mortality.

Among the methods that do not depend on the availability of influenza data, compared with no control, the increase in PM_{10} effect estimate ranged from 24.8% to 50.8% for total mortality and from 14.4% to 27.3% for cardiovascular mortality. It is worth noting that the method used in the APHEA-2 project showed the smallest increase (24.9% and 14.4% for total and cardiovascular mortality, respectively). In an effort to further explore the observed negative confounding effect of influenza epidemics on the PM_{10}-mortality relationship, we investigated the city-specific partial autocorrelation function of the final models’ residuals under the various methods to control for influenza epidemics. In most cases, a negative autocorrelation not present in the original data was induced. This may be an indication of overfitting because the particular choice of modeling led to a change in the structure of the data.

The methods with the best fit as evaluated by the AIC were those derived from the respiratory mortality data (Table 3). Among these, the APHEA-2 method was the fourth-best method for total mortality and the third best for cardiovascular mortality, respectively. However, over all these methods, AIC values were fairly close to each other. The method without control for influenza gave the worst fit for both total and cardiovascular mortality. Among the methods that depend on the availability of influenza data, the one with the smallest AIC was the one that used a separate smooth function for epidemic periods. It should be noted that, concerning PM_{10} effect estimate, this method gave similar results with the methods derived from respiratory mortality data. City-specific results, however, were heterogeneous, with increased PM_{10} effect estimate after influenza control in some cities and decreased in others. There was a tendency for the adjusted PM_{10} effect estimate to increase in cities where PM_{10} concentrations peaked during winter periods. However, it should be noted that AIC evaluates the overall fit of the model and may not be the most appropriate criterion to evaluate potential biases induced in the air pollution-health effect estimates. For example, AIC had smaller values for the methods derived from respiratory mortality series because total and cardiovascular data had higher correlation with respiratory mortality rather than with influenza counts.

Replacing LOESS smoothers with p-splines resulted in a decrease in the effect size estimate for PM_{10}. The reduction was more pronounced for total than for cardiovascular mortality (% decrease ranged from 12% to 30% in the former case and from 1% to 5% in the latter). Additionally, associated standard errors were increased by approximately 13% in most cases. However, regarding influenza confounding, results were broadly similar to those of the main analysis. That is, PM_{10} effect sizes were not much changed and generally were slightly higher after controlling for influenza. (Table 3b with the corresponding results is available with the electronic version of this article.)

#### DISCUSSION

This work focused on determining the effect of various methods to control for influenza epidemics on the association between PM_{10} and mortality (total and cardiovascular). Several studies have consistently shown the adverse health effects of PM_{10}.^{1–6} However, as more evidence is accumulated, there is increased concern for the effects of potential unknown or uncontrolled confounders.^{7,8} One possible confounder is influenza epidemics because it is well known that epidemic periods are strongly and positively associated with both mortality and morbidity.^{10–12,22–25} It is not clear, however, whether influenza epidemic periods are associated with air pollution concentrations in different locations. Although influenza epidemics occur in winter, which is correlated with air pollution in many cities, all of the models already control for season. It is unclear why the specific time within a winter that an epidemic occurs in a particular city should have much to do with air pollution levels. One major problem is the lack of data on influenza counts. Several morbidity and mortality indicators have been evaluated as proxies for influenza A (the most virulent agent) epidemics, but a consensus on the best indicator has not been reached.^{21,26}

In several studies a simple indicator variable to indicate influenza epidemics has been used to control for potential confounding effects on the air pollution health outcome relationship.^{6,27–31} This method has been criticized because it implies that all influenza epidemics have the same impact on health. It has been shown, however, that there is heterogeneity in the influenza epidemic effect on health,^{32} which also was confirmed in our data. It has been proposed that this heterogeneity could be explained by the difference in the pathogens (or their virulence) that are responsible for the outcomes in each specific period.

Braga et al^{32} investigated whether respiratory epidemics confound the relationship between PM_{10} and daily deaths in 5 U.S. cities. They used data on pneumonia-related hospital admissions as an indicator for influenza and fitted a cubic polynomial function for each epidemic period (that is, allowing for a separate behavior of each epidemic) to control for influenza epidemics. Their results showed that the overall PM_{10} effect was only slightly (approximately 8%) reduced after controlling for influenza, leading to the conclusion that the association between air pollution and daily deaths is robust. Tobias et al^{33} have reported that, although controlling for influenza cases resulted in a 10% decrease in the black smoke effect estimate on total mortality, the black smoke effect remained significant. In another study^{34} on the relationship between air pollution and emergency visits in Barcelona, it was found that, compared with modeling epidemics using a single indicator, multiple indicators for different epidemics resulted in increased air pollution coefficients. However, compared with no control at all, controlling for epidemics decreased the regression coefficients for black smoke and SO_{2} and increased the regression coefficients for NO_{2} and O_{3}. It is worth noting that the first 2 pollutants peaked in the winter whereas the latter 2 peaked in the summer.

In our study, our use of various methods to control for influenza epidemics resulted in relatively small changes in the effect of PM_{10} on total or cardiovascular mortality regression coefficients compared with no control (% change −2% to 5%). The only exception was when a separate smooth function for epidemics was used to control for influenza, in which case the PM_{10} effect estimate was increased by about 39% for total mortality and by about 20% for cardiovascular mortality. Among the methods depending on influenza data availability, this method gave the best fit, as evaluated by the AIC. Instead, when epidemics were modeled using respiratory mortality as an indicator for influenza, PM_{10} effect estimate was increased compared with the estimate from the model without any control (range of increase, 14.8% to 37.4% for total and 11.3% to 25.5% for cardiovascular mortality). Compared with no control, as the number of the degrees of freedom used to model influenza epidemics increased, the percentage of change in the PM_{10} estimate also increased. Despite this variability in the estimates under the various control methods, all estimates are consistent with the estimate under no control, in the sense that all fall within the corresponding ranges of the 95% confidence intervals of the estimates. Additionally, the estimate from the APHEA-2 method lies in the middle of the range of all estimates. The estimated increase in daily counts of death per 10 μg/m^{3} increase in PM_{10} ranged from 0.45% (95% confidence interval = 0.26–0.69%) to 0.67% (0.46–0.89%), whereas the corresponding figures for cardiovascular mortality were 0.85% (0.53–1.18%) and 1.06% (0.74–1.39%).

It is worth noting that the methods we derived from respiratory mortality series had better fit (as evaluated by the AIC) than the methods derived from influenza counts. Although AIC is a criterion for the overall goodness of fit, it may not be appropriate to evaluate the optimal method to control for influenza epidemics.^{35} In our study PM_{10} concentration peaks occurred during the winter in some cities, whereas in others there was no obvious seasonality in the PM_{10} series. We observed similar variation in the PM_{10} levels during epidemic and nonepidemic days. Overall, the correlation between PM_{10} levels and influenza counts, unadjusted for seasonality and weather factors, was weak, indicating little or no confounding. Several studies have used respiratory mortality to identify influenza cases.^{3,6,28} There is some question on to whether it is proper to incorporate mortality data on both the left- and the right-hand sides of the regression equations. It also has been shown that respiratory mortality series react late in identifying the initiation of influenza epidemics.^{27} This delay also has been confirmed in our data. However, the effect of an influenza epidemic on mortality displays a lag (in our data, this was about a week) and it appears more appropriate to control influenza mortality rather than influenza morbidity, when the outcome is a mortality outcome. Finally, influenza is not the only respiratory pathogen associated with increased mortality. It is possible that using the respiratory mortality series predicts mortality better than the influenza-based approaches because total respiratory mortality captures the effects of those other pathogens as well.

Definition of epidemic periods was based on certain cutoff points (ie, 90%) of the city-specific distributions of the daily counts. Changing the cut-off points (for example from 90% to 95% or 98%) is expected to affect results. However, a sensitivity analysis in which influenza epidemics were defined based on the 95th rather than 90th percentile of the city-specific respiratory mortality distribution showed that results were only marginally affected. Replacing LOESS smoothers with p-splines did not change the main conclusions regarding potential influenza confounding effects.

In summary, we have shown that the association between air pollution and daily number of deaths remains and even increases after controlling for influenza, regardless of the method used for control. Simple methods, such as 1 or multiple indicators for influenza epidemics, are relatively easy and effective ways to control or to evaluate potential confounding effect of influenza epidemics. Respiratory mortality data may be used instead of influenza cases data if the latter are not available; respiratory mortality data may be more appropriate way to control when the outcome studied is a mortality outcome. In general, sensitivity analysis with and without influenza control could give a reasonable range for air pollution effect estimates, although none of the methods investigated in this work suggested appreciable confounding by influenza epidemics.

#### ACKNOWLEDGMENTS

A list of collaborative group members is available with the electronic version of this article.