What this study adds
This study illustrates a two-stage self-organizing map—Poisson regression approach that supports investigating associations between observed types of ambient air pollutant mixtures and health.
Outdoor air pollution consists of a mixture of pollutants rather than a single pollutant. Despite this reality, most evidence on the health risks associated with outdoor air pollution, as well as current regulatory strategies to reduce exposure, have largely been based on approaches that define air pollution using a single pollutant such as ozone or particulate matter.1 While this has generally served us well,2,3 health scientists, public health professionals, and regulators share a concern that understanding the effects of air pollution can continue to improve if new approaches are further explored.1,4,5
One such area is to begin including multipollutant or mixtures-based methodologies into air pollution studies. These directions are motivated by the hope that improved study of this topic will lead to breakthroughs in understanding of environment–disease relationships and increase opportunities for prevention.1,4,6 However, study has proven difficult for many reasons, including problematic designs, highly correlated exposures, complex measurement errors, and differing biological responses, all of which can contribute to making the study of pollutant mixtures a highly complex task.1,4,6,7 In an effort to overcome such challenges, the development of new methods, particularly statistical methods, has become a priority research area in environmental health.1,4
In response, several statistical approaches have emerged; these include classic linear regression, classification and prediction, exposure–response surface estimation, variable selection, and variable shrinkage methodologies.8 Each explores the problem in a unique way, resulting in different benefits, drawbacks, and conclusions that can be taken away, and thus, it has been recommended that investigators choose a method based on the study objectives.8 For example, a study may seek to identify which pollutants to include in the mix, identify a hidden pattern within the mix, or estimate the health effect of the mix, or all three, and thus, it should not be unexpected that a number of tools may be required. Thus, to better understand which tools are needed and which questions can be addressed, there is an increasing demand for new studies.1,6,9
Here, we present findings from an acute health effects study of air pollution for Columbia, South Carolina, during the years 2002 to 2013. The objective of the study was to assess associations between ambient air pollution mixtures and select cardiorespiratory outcomes. We seek to answer the following research questions:
- What types of day-level pollutant combinations were observed in Columbia, South Carolina, and how often did they occur?
- Were certain pollutant combinations more strongly associated with cardiorespiratory outcomes?
We address these objectives using a two-stage approach that involved applying a self-organizing map (SOM) to create a collection of categories we define as multipollutant day types (MDTs) and linking them to specific cardiorespiratory outcomes using a standard time-series regression model.10,11 We note that our approach builds upon previous studies of Atlanta that explored development of MDTs for characterizing air quality mixtures and for their application in estimating associations between complex mixtures and asthma morbidity.12,13 Here we expand upon previous work by examining associations for a larger number of health outcomes and through application in a new study area.
Health outcome data
We obtained all emergency department (ED) visits and hospital admissions (HA) for health outcomes in South Carolina defined with primary International Classification of Diseases Ninth Revision diagnoses of asthma (493, 786.07), upper respiratory infections (URIs: 460–465, 466.0, 477), congestive heart failure (CHF: 428), and ischemic heart disease (IHD: 410–414) from the South Carolina Revenue and Fiscal Affairs Office of Health Statistics. Daily counts of individual events were then summed for all residential ZIP Code Tabulation Areas within the Core-Based Statistical Area of Columbia, South Carolina (Figure 1), during the years 2002 to 2013.
Daily exposures were based on stationary air monitoring data obtained for Columbia, South Carolina, from EPA’s State or Local Air Monitoring Stations (n = 3) and NCore (n = 1) site operated by the South Carolina Department of Health and Environmental Control (Figure 1). Day-level pollutant measures included 1-hour maximum nitrogen dioxide (NO2) in parts per billion (ppb), 8-hour maximum ozone (O3) in ppb, 1-hour maximum sulfur dioxide (SO2) in ppb, and 24-hour average particulate matter <2.5 μm (PM2.5) in μg/m3. Daily meteorological conditions were obtained from land surface observations collected at that Columbia Metropolitan Airport ground station (GHCND: USW00013883).
To determine the main types of day-level pollutant combinations observed and how often they occurred in our air pollution data, we applied an approach known as the SOM as it has been shown to be beneficial in other similar studies.10,12,13 In brief, SOM is a statistical learning algorithm that not only discovers subgroups in data with similar attributes but also produces a visualization—the “map”—that spatially organizes group profiles based on the similarity of their attributes.10Figure 2 provides a hypothetical example of a map produced by SOM. Conceptually, the approach is similar to other classification and clustering techniques (such as k-means, classification and regression trees, Bayesian profile regression) as a group profile is the basic unit of inference for describing patterns observed among the attributes of multiple air pollutants.8,14,15 Here, our aim is to discover subgroups of days that exhibit similar patterns in day-level measures of air pollution: categories which we define as MDTs.12,13 As such, we apply SOM as an unsupervised learning tool to identify MDTs based on empirically derived patterns without the influence of an outcome variable to predict.16,17 This involved the following steps: data preparation; identification of a k* that specifies a suitable number of MDTs to target; profile visualization; and development of a categorical exposure variable.
To prepare the data, we first selected training days with complete observations (n = 3892) for all pollutants and then standardized the data measured in different units to have a mean of zero and a standard deviation of one, as we wanted pollutants to have equal influence in profile development.13,16 Next, we sought to identify an acceptable k* within the range of k = 2, …, 20 using the following steps: we searched for an obvious k* graphically16; (2) we assessed the relationship between exposure classification error and each k; and (3) we assessed the relationship between sample size and each k. Visualization involved applying principal components analysis (PCA) to the pollutant data, and subsequently, plotting the first two components on a biplot to identify if k* could be determined using obvious patterns of correlation among the pollutants that explained the most variation. Next, we examined the exposure classification error as a function of k by assessing the goodness of fit of across each k application. This involved using k MDTs as a categorical predictor, fitting regression models for each pollutant, and examining the resulting adjusted R2. Finally, we assessed sample size distributions for MDTs within each k application to get a clearer picture of potential statistical power.12 We then used this information collectively to determine k*. Profile visualization was achieved by plotting MDTs on the map using mean centered profile bars set to a percentage scale as we desired to compare resulting MDTs to the overall average pollutant conditions. Finally, k* MDTs were used to construct a categorical exposure variable by assigning each day during the study period to its most similar MDT profile and setting the MDT with the lowest pollution values as the referent. All SOM analyses were conducted using functions available in the kohonen and class packages within the R Project for Statistical Computing version 3.4.3 (https://www.r-project.org/). For more detail on SOM implementation in the context of this study, please refer to Pearce et al.13
To assess if certain pollutant combinations were more strongly associated with cardiorespiratory health, we applied overdispersed Poisson time-series regression models to each outcome separately to assess associations with MDTs using a low pollution referent group for an a priori selected exposure lags of 0, 1, 2,and 3 days. Models were fit for each outcome-lag using a dependent variable defined as daily counts of events for the outcome of interest (e.g., asthma ED visit) and MDT assignments for the specified lag period. Each model included covariates to control for confounding by long-term trends and seasonality, day-of-the-week, average temperature, relative humidity (RH), and sea-level pressure (SLP). The general structure is
where E(Yt) is the expected number of events for outcome Y on day t, assumed to follow an overdispersed Poisson distribution; β0 is the model intercept; ns(timet) is a natural cubic spline for day of study on day t; ns(DOWt) is a natural cubic spline for day of the week on day t, with 4 df; ns(temperaturet) is a natural cubic spline of the 3-day moving average of temperature on day t and the preceding 2 days (t – 1, t − 2), with 4 df; ns(relative humidityt) is a natural cubic spline of the 3-day moving average of mean relative humidity for day t, with 4 df, and ns(sea-level pressuret) is a natural cubic spline of the 3-day moving average of SLP for day t, with 4 df. Finally, our exposure metric
is a categorical variable with k* levels that indicate the assignment of MDT c on day t for lag model l = 0, 1, 2, 3.12 A referent MDT level was specified using the profile that captured days with the lowest overall pollution days.
We reported our risk estimates as rate ratios (RR) along with their 95% confidence intervals (CIs). For comparison, we also fit single pollutant models to determine if our MDTs aligned with more conventional analyses that fit a separate regression model for each pollutant. All analyses were conducted using the stats and splines packages within the R Project for Statistical Computing version 3.4.3 (https://www.r-project.org/).
A total of 2,192,170 cardiorespiratory events occurred among residents of Columbia between January 1, 2002, and December 31, 2013 (Table 1). Of these, 1,700,823 (78%) occurred as ED visits for either asthma or URI and 491,347 (22%) were diagnosed as HA for either CHF or IHD. The average number of daily events for all outcomes was 499, with URIs being the most numerous, followed by IHD, asthma, and CHF, respectively. Generally speaking, events affecting the same bodily system were moderately correlated over time (respiratory r = 0.57; cardiovascular r = 0.53); however, outcomes across systems did not exhibit much correlation (r < 0.3).
Air pollution data summaries indicated that relatively modest-to-low pollution levels were experienced during our study period (Table 1). The largest day-to-day variability (as measured by the coefficient of variation) was exhibited by SO2 (105%), followed by NO2 (60%), PM2.5 (48%), and O3 (34%). Correlation among PM2.5 and O3 was moderate (r = 0.48), but correlation was generally weak between the other pollutants (r < 0.3).
Results from the PCA revealed no obvious k* (Figure 3A); however, additional results found that a k* = 6 produced MDTs that explained over 50% of the daily variation in the pollutant data (Figure 3B) and provided sample sizes greater than 5% (Figure 3C). As such, we determined that six MDTs provided a suitable categorization of the days for our study and thus applied a 3 × 2 SOM to create the final map of multipollutant profiles (Figure 4).
These results reveal that the most common day type, MDT 6, occurred on 28.8% of days when all pollutant levels were below average (Figure 4F). These “clean” days were well distributed across seasons and experienced broad temperatures, high wind speeds, lower SLP, and higher relative humidity suggestive of precipitation (Figure 5). Given these characteristics, we designated MDT 6 as the referent group for subsequent analysis.
The second most common day type, MDT 3, occurred on 25.7% of days and captured conditions when all pollutants were below average with the exception of O3 (Figure 4C). These days occurred primarily in the warm season and were accompanied with elevated temperatures, moderate winds, and somewhat broad SLP and RH (Figure 5).
The third most common day type, MDT 5, revealed that 17% of days experienced modest increases in NO2 and SO2 and below average O3. These days occurred primarily during the cool season and were accompanied by colder temperatures, low wind speeds, and higher SLPs that suggest periods of atmospheric stability (Figure 5).
Moving to MDT 4, we find cool, stable, dry days (10.8%) dominated by above average NO2 with above average levels for O3 and SO2 (Figure 4D). Moving to MDT 2, we find modestly frequent days (10.8%) that experienced above average levels for all pollutants—particularly O3 and PM2.5—and were accompanied by hot, stable, and dry weather. With MDT 1, we find the least frequent profile (7%) capturing days that exhibited relatively high levels of SO2 accompanied by modest increases all other pollutants (Figure 4A).
Overall, the MDTs capture a broad range of observed multipollutant combinations, with profiles nearer to the bottom right representing more “typical” air quality days dominated by relatively low levels and profiles nearer to the upper left reflecting “rare” events dominated by relative air pollution exposure extremes. Evaluation of daily transitions (dt) suggests profile assignments changed every one to two days, with more frequent profiles having stronger persistence (e.g., MDT 6 dt = 2.5).
Using MDT 6 as a referent, our statistical analysis identified multiple significant positive associations between MDTs and cardiorespiratory outcomes within a four-day exposure window (Figure 6). Broadly, RRs reveal associations of ≈2–5% in three outcomes occurred following exposure to several of our MDTs. More specifically, results for asthma reveal significant positive associations with MDT 3 (lag 2 and 3) and MDT 4 (lag 3) and marginally (P value < 0.1) positive associations with MDT 2 (lag 3) and MDT 5 (lag 3). For URIs, we found significant positive associations with MDT 3 (lag 0, 1, and 2) and marginal positive association with MDT 1 (lag 0), MDT 2 (lag 1, 2), and MDT 4 (lag 3). Marginally positive associations for CHF were shown for MDT 4 (lag 0, 3). For IHD, significant positive associations were identified for MDT 1 (lag 0), MDT 2 (lag 0), MDT 4 (lag 0), and MDT 5 (lags 0) with corresponding marginal associations for MDT 1 and MDT 2 at lag 1. Overall, these findings reveal significant positive associations between days having a variety of multipollutant characteristics and cardiorespiratory outcomes when compared to relatively clean days, in Columbia, South Carolina.
For comparison, we fitted conventional single-pollutant models and identified significant positive associations between asthma, URIs, CHF, and IHD, with overall risk ratios ranging from 1 to 3% within three days of an interquartile range (IQR) increase in air pollution concentrations (Figure 7). For asthma, we found a significant positive association with NO2 (lag 3), O3 (lag 2, 3), and PM2.5 (lag 2, 3), with the largest RR of 1.8% (95% CI = 1.1%, 2.2%) for O3 on lag 2. For URIs, we found significant positive associations with O3 (lags 1, 2, 3, and 3-day moving average [MA]) and PM2.5 (lags 1, 2, 3 and 3MA), with the largest RR of 2.2% (1.8%, 4.1%) for a 3-day moving average of O3. For CHF, we found a significant positive association with NO2 (lag 0, 3), with the largest RR of 1.9% on lag 0. Finally, for IHD, we found a significant positive association for NO2 (lag 0; 3MA) and PM2.5 (lag 0), with the largest risk of 3% (2%, 4%) for a lag 0 of NO2. Overall, the alignment between multipollutant and single pollutant findings strengthens evidence of an association between ambient air pollution and adverse cardiorespiratory outcomes in Columbia, South Carolina.
In this acute health effects study of air pollution in Columbia, South Carolina, we found positive associations between multiple cardiorespiratory outcomes and short-term exposure to air pollution defined using both multipollutant and single pollutant approaches. Multipollutant exposures were defined using MDTs that describe observed day-level combinations among four pollutants during 2002–2013 (Figure 4). Resulting MDTs captured a broad range of daily conditions, ranging from relatively common low pollution days (MDT 6, 29%), to less common days with high levels of multiple pollutants (MDT 2, 11%), to even rarer days dominated by single pollutant extremes (MDT 1, 7%). Subsequent statistical analyses identified clear associations between higher levels of pollution and adverse health when comparisons were made to clean days (MDT 6; Figure 6). More specifically, we found that occurrence of warmer, dry days with elevated pollution levels (MDTs 2 and 3) were associated with increasing risk for asthma, URIs, and IHD. We also found that cooler days with higher primary source pollution (MDTs 4 and 5) were associated with asthma, CHF, and IHD. Single pollutant extremes (MDT 1) were only found to be significant for IHD. Single pollutant results generally supported these findings (Figure 7). Collectively, these findings establish a positive association between air pollution and cardiorespiratory outcomes in our study population, which further evidence of health effects of complex environmental mixtures.
As with any study, there are limitations of the current work. Broadly, we aimed to identify health associations with complex, multifactorial exposures using a time-series study in which aggregated health outcome and exposure data were compared across a large geographic domain (Figure 1). We chose this approach for its relative ease and low cost; however, common exposure to air pollution, nonspecific outcomes, and estimated risks that tend to be low add to challenges of using this study design to detect subtle health effects (e.g., relative risks often less than 1.10).6 Nevertheless, such designs have proven quite useful in studies of air pollution health effects,18 and thus, we conducted this study using a careful design that assessed associations based on an exposure metric that maximized contrasts across multiple exposures in the study setting by grouping days under common MDTs. As such, our metric likely suffered from nondifferential misclassification of exposure, and thus, it is highly likely that this led to wider CIs (i.e., bias towards the null).19 We attempted to mitigate this issue by making sample size an important aspect of our groupings but note that this was a particularly difficult challenge, as larger sample sizes tend to decrease the number of groupings—a problem that could lead to an important mixture being lost under groupings that are too broadly defined. This is an important concern worth noting as exposure characterization errors are inherent with dimension reduction techniques such as the SOM.13
Another concern is confounding in the context of our multipollutant exposure. We have attempted to use traditional modeling approaches to control for potential confounding (i.e., including covariates for long-term and seasonal trends, day of the week, and weather) but note that—like other studies seeking to examine the relationship between multipollutant exposures and health effects20—grouping days make it difficult to confirm that the multipollutant effect is not also encompassing effects of other factors correlated with those day types. An obvious concern here is the weather, as certain mixtures may only form under certain meteorological conditions. Finally, we note another challenge for our study is the difficulty in making a direct comparison between our categorically modeled exposure (i.e., multipollutant metric) and our continuously modeled exposure (i.e., single pollutant metric) as different modeling assumptions are made, and thus, it is difficult to evaluate if one metric more effectively captured risk than the other.20
Despite these limitations, our study has many strengths that suggest it can support improving understanding of the health risk attributed to air pollution mixtures. First, we improve knowledge of complex exposure to mixtures by conducting a novel exposure assessment that resulted in identification of a collection of observed MDTs (Figure 4). This is an important contribution as it helps understand the nature and magnitude of pollutant combinations observed in our study area and narrows the field for future study.
Second, we improve understanding of potential health risk by establishing that certain air pollutant combinations were more strongly associated with adverse events across multiple health outcomes (Figure 6). We note that we did not find a “perfect storm” that was most harmful but rather a range of RRs for day types that suggest adverse health was associated with increasing concentrations for multiple pollutants when compared to low-pollution days. This is a finding that agrees well with previous work in Atlanta.12 In particular, we found that days in the warmer months with pollutant increases (MDT 2 and MDT 3) were more strongly associated with our respiratory outcomes and that cooler days with increasing pollution (MDT 4 and MDT 5) were more strongly associated with our cardiovascular outcomes. Single pollutant findings were similar as they also illustrated outcomes associated positively with several air pollutants. These findings are important as they support a growing body of evidence towards the use of multipollutant exposure profiles to better understand exposure and health effects of complex air pollution mixtures.12,19–24
As others have noted, additional research on complex exposure–outcome associations is expected to assist in reducing the burden of air pollution.1,4,25 This could be achieved by integrating multipollutant strategies into primary, secondary, and tertiary prevention of air pollution–related disease. For example, primary prevention strategies could be enhanced to include MDT surveillance, regulation, and warning systems. Secondary prevention could involve enhancing patient education and medical management to reduce exposure on “risky” types of days and tertiary prevention could include improved access to care on such days.
In closing, this work established associations between short-term exposure to types of days defined by their multipollutant profiles and select cardiorespiratory outcomes in a population that generally experience modest-to-low pollution. These are important findings that enhance our understanding of the health risks associated with complex exposures and support hypothesis generation for future study. This was achieved through a novel two-stage frame work that involved, first, applying an SOM to identify a manageable number of environmentally relevant pollutant combinations, and second, fitting a standard time-series model to estimate associations with health. We note that we continue to find this SOM-based approach attractive as it is highly flexible and can be broadly applied to future studies seeking to identify relevant types of environmental mixtures and link them with health.
Research reported in this publication was supported by the National Institute of Environmental Health Sciences of the National Institutes of Health under Award number K99/R00ES023475 and funding from the Department of Public Health Sciences at the Medical University of South Carolina (MUSC). The content is solely the responsibility of the authors and does not necessarily represent the official views of NIEHS, NIH, or MUSC. We also thank Anda Olsen for reviewing the article. Finally, the authors are indebted to the Environmental Epidemiology editors and anonymous peer reviewers, whose comments and suggestions significantly enhanced the final version of this article.
Conflict of interest statement
The authors declare that they have no conflicts of interest with regard to the content of this report.
1. Dominici F, Peng RD, Barr CD, Bell ML. Protecting human health from air pollution: shifting from a single-pollutant to a multi-pollutant approach.Epidemiology2010212187
2. Pope CA, Dockery DW, Schwartz J. Review of epidemiological evidence of health effects of particulate air pollution.Inhal Toxicol1995711–18
3. Brunekreef B, Holgate ST. Air pollution and health.Lancet200236093411233–1242
4. Carlin DJ, Rider CV, Woychik R, Birnbaum LS. Unraveling the health effects of environmental mixtures
: an NIEHS priority.Environ Health Perspect20131211A6–A8
5. Kelly FJ, Fussell JC. Air pollution and public health: emerging hazards and improved understanding of risk.Environ Geochem Health2015374631–649
6. Dockery DW. Epidemiologic study design for investigating respiratory health effects of complex air pollution mixtures
.Environ Health Perspect1993101Suppl 4187
7. Tolbert PE, Klein M, Peel JL, Sarnat SE, Sarnat JA. Multipollutant
modeling issues in a study of ambient air quality and emergency department visits in Atlanta.J Exposure Sci Environ Epidemiol200717S2S29
8. Taylor KW, Joubert BR, Braun JM, et al. Statistical approaches for assessing health effects of environmental chemical mixtures
in epidemiology: lessons from an innovative workshop.Environ Health Perspect201612412A227
9. Sexton K, Hattis D. Assessing cumulative health risks from exposure to environmental mixtures
—three fundamental questions.Environ Health Perspect20071155825
T. The self-organizing map.Neurocomputing1998211–31–6
11. Dominici F, Sheppard L, Clyde M. Health effects of air pollution: a statistical review.Int Stat Rev2003712243–276
12. Pearce JL, Waller LA, Mulholland JA, et al. Exploring associations between multipollutant
day types and asthma
morbidity: epidemiologic applications of self-organizing map ambient air quality classifications.Environ Health201514155
13. Pearce JL, Waller LA, Chang HH, et al. Using self-organizing maps
to develop ambient air quality classifications: a time series example.Environ Health201413156
14. Davalos AD, Luben TJ, Herring AH, Sacks JD. Current approaches used in epidemiologic studies to examine short-term multipollutant
air pollution exposures.Ann Epidemiol2017272145–153. e1
15. Oakes M, Baxter L, Long TC. Evaluating the application of multipollutant
exposure metrics in air pollution health studies.Environ Int20146990–99
16. Kaufman L, Rousseeuw PJ. Finding Groups in Data: An Introduction to Cluster Analysis2009Vol 344John Wiley & Sons, Hoboken, New Jersey;
17. Friedman J, Hastie T, Tibshirani R. The Elements of Statistical Learning20092nd EditionSpringer Series in Statistics New York
18. Kim K-H, Kabir E, Kabir S. A review on the human health impact of airborne particulate matter.Environ Int201574136–143
19. Ljungman PL, Wilker EH, Rice MB, et al. The impact of multi-pollutant clusters on the association between fine particulate air pollution and microvascular function.Epidemiology (Cambridge, MA.)2016272194
20. Gass K, Klein M, Sarnat SE, et al. Associations between ambient air pollutant mixtures
and pediatric asthma
emergency department visits in three cities: a classification and regression tree approach.Environ Health201514158
21. Austin E, Coull BA, Zanobetti A, Koutrakis P. A framework to spatially cluster air pollution monitoring sites in US based on the PM2. 5 composition.Environ Int201359244–254
22. Coker E, Liverani S, Ghosh JK, et al. Multi-pollutant exposure profiles associated with term low birth weight in Los Angeles County.Environ Int2016911–13
23. Coker E, Liverani S, Su JG, Molitor J. Multi-pollutant modeling through examination of susceptible subpopulations using profile regression.Curr Environ Health Rep20185159–69
24. Keller JP, Drton M, Larson T, Kaufman JD, Sandler DP, Szpiro AA. Covariate-adaptive clustering of exposures for air pollution epidemiology cohorts.Ann Appl Stat201711193
25. West JJ, Cohen A, Dentener F, et al. What We Breathe Impacts Our Health: Improving Understanding of the Link Between Air Pollution and Health2016ACS Publications
Keywords:Copyright © 2018 The Authors. Published by Wolters Kluwer Health, Inc.
Asthma; Cardiovascular; Columbia; Kohonen; Mixtures; Multipollutant; Pulmonary; Self-organizing maps