Skip to main content
Help us understand how you use our websites. Take part in our 30 minute study now.

Associations between ambient air pollutant mixtures and pediatric asthma emergency department visits in three cities: a classification and regression tree approach



Characterizing multipollutant health effects is challenging. We use classification and regression trees to identify multipollutant joint effects associated with pediatric asthma exacerbations and compare these results with those from a multipollutant regression model with continuous joint effects.


We investigate the joint effects of ozone, NO2 and PM2.5 on emergency department visits for pediatric asthma in Atlanta (1999–2009), Dallas (2006–2009) and St. Louis (2001–2007). Daily concentrations of each pollutant were categorized into four levels, resulting in 64 different combinations or “Day-Types” that can occur. Days when all pollutants were in the lowest level were withheld as the reference group. Separate regression trees were grown for each city, with partitioning based on Day-Type in a model with control for confounding. Day-Types that appeared together in the same terminal node in all three trees were considered to be mixtures of potential interest and were included as indicator variables in a three-city Poisson generalized linear model with confounding control and rate ratios calculated relative to the reference group. For comparison, we estimated analogous joint effects from a multipollutant Poisson model that included terms for each pollutant, with concentrations modeled continuously.

Results and discussion

No single mixture emerged as the most harmful. Instead, the rate ratios for the mixtures suggest that all three pollutants drive the health association, and that the rate plateaus in the mixtures with the highest concentrations. In contrast, the results from the comparison model are dominated by an association with ozone and suggest that the rate increases with concentration.


The use of classification and regression trees to identify joint effects may lead to different conclusions than multipollutant models with continuous joint effects and may serve as a complementary approach for understanding health effects of multipollutant mixtures.

Peer Review reports


Humans breathe a mixture of different air pollutants. Characterization of these multipollutant mixtures in relation to health effects has been addressed by air pollution research groups for decades. Historically, a common epidemiological approach for addressing mixtures has been through single pollutant models, in which a single pollutant effect is viewed as a surrogate for a particular air pollution mixture. Other approaches for investigating pollutant mixtures include composite metrics such as air quality indices [1], hierarchical and k-means clustering [2, 3], supervised principal components analysis [4], groupings based on chemical properties of pollutants [5], source apportionment [6], and exposure to specific mixtures, such as traffic-related pollutants [7].

Two reviews of statistical approaches for multipollutant research were recently published [8, 9]. In both reviews, classification and regression trees (C&RT), a supervised recursive partitioning approach, was cited as a method for handling multipollutant exposures; however, there have been few applications of C&RT in assessing the health effects of ambient air pollution exposure [10, 11]. In a recent paper we showed how C&RT can be adapted for epidemiologic research, particularly to control for confounding, and become a useful tool for generating hypotheses about multipollutant joint effects [12].

C&RT groups days according to their multipollutant profiles. This can help to elucidate patterns of meteorology, seasonality, and emission sources that cause certain pollutants to covary more strongly than others. From a health perspective, grouping days can enable identification of day types that are more harmful to human health and help to improve risk prediction systems. From a regulatory perspective, identifying the most harmful multipollutant joint effects can lead to more targeted regulation.

In this analysis, we use C&RT to identify multipollutant mixtures associated with pediatric asthma emergency department (ED) visits and calculate the joint effects associated with each mixture in a time series framework. Our analysis focuses on three criteria pollutants, ozone (O3), nitrogen dioxide (NO2), and particulate matter less than 2.5 microns in diameter (PM2.5), shown to have strong associations with asthma/wheeze in previous analyses [13, 14]. We conduct three separate C&RT analyses using time series data from Atlanta, Dallas and St. Louis. To increase statistical power we combined C&RT results across three cities. Although methods exist for combining trees generated from random subsets of a single data source (e.g. bagging and random forests), similarly well-established methods do not exist for combining trees generated from different data sources. In this paper we present an approach for combining C&RT results across cities to identify between-city similarities. A drawback of combining results from different cities is that city-specific heterogeneity will be masked. This concern is not unique to our study but rather is a generic epidemiologic concern that should be considered whenever estimates are pooled across strata, and in air pollution epidemiology there is a precedent of combining data across different cities, starting with the National Morbidity, Mortality and Air Pollution Study [15].

Finally, because C&RT is seldom used in multipollutant research, we compare the results of this three-city C&RT approach with the analogous joint effects estimated from a multipollutant regression model with each pollutant concentration modeled continuously [16].



Emergency Department visit data

Computerized billing records for ED visits to acute care hospitals in each city were obtained as follows: for 20-county Atlanta, from individual hospitals and the Georgia Hospital Association for an 11-year study period (1/1/1999 – 12/31/2009); for 12-county Dallas, from the Dallas-Fort Worth Hospital Council Foundation for a 3.5-year study period (1/1/2006 – 8/31/2009); and for 16-county St. Louis, from the Missouri Hospital Association for a 6.5-year study period (1/1/2001 – 6/27/2007). Relevant data elements included a patient identifier, admission date, patient age, primary and secondary International Classification of Diseases 9th Revision (ICD-9) diagnosis codes, and ZIP code of patient residence. Data were used in accordance with the individual hospital and/or hospital association data use agreements; this study was also approved by the Emory University Institutional Review Board. Visits by patients living in ZIP codes outside the city-specific study areas were excluded.

The individual-level data were restricted to visits by pediatric patients ages 2–18 years and aggregated to daily counts of asthma and/or wheeze, identified as any ICD-9 code of 493 and/or 786.07, resulting in 271,725 visits in Atlanta, 116,212 in Dallas, and 100,471 in St. Louis.

Air pollution data

Daily population-weighted average concentrations of ambient O3 (8-hr max), NO2 (1-hr max) and PM2.5 (24-h average) were calculated using measurements from stationary monitors in each of the three cities [17]. The averaging times for each pollutant were selected to coincide with those used for the National Ambient Air Quality Standards.

Statistical methods

Base model

The base model used to analyze each city-specific time series was a Poisson generalized linear model using a framework equivalent to the conditional logistic case-crossover model [18]. This is the same model used by Gass et al. [12]. Time trends were controlled by matching on weekday, month and year, and meteorology was controlled with cubic terms for the three-day moving average of: maximum temperature, maximum temperature interacted with season, and dew point. A spline for day-of-year with two knots was included to provide additional control for seasonal trends. This base model was used to generate the city-specific C&RT trees, estimate the joint effects for the mixtures identified by C&RT, and estimate the joint effects from the comparison multipollutant model.

Classification and regression trees: an overview

C&RT is a non-parametric regression approach that represents a supervised form of hierarchical clustering in which the observations are sequentially split into dichotomous groups, such that each resulting group contains increasingly similar responses for the outcome [19, 20]. Every tree starts with a “root node” that contains the observations from which the tree will be grown. The observations are then partitioned into two “child nodes” based on the value of an independent predictor variable. The resulting child nodes each contain a subset of the original observations. Child nodes may be further partitioned, again based on the value of an independent predictor variable. This process continues until a set of partitioning criteria are no longer met, resulting in terminal nodes. The collection of terminal nodes forms a complete partition of the observations in the root node. Each terminal node can be viewed as a unique mixture, defined by the path of partitions, or splits, leading from the root node to that particular terminal node. Previously we described how C&RT was used in a single-city analysis [12]; next we describe its application in the three-city analysis. Figure  1 summarizes this approach.

Fig. 1

Flow diagram outlining three-city C&RT approach. The table in the upper right contains the concentration cutoffs used to categorize each pollutant into four levels and the frequency at which each level occurred by city

Three-city C&RT approach

Step 1: Categorize pollutants into levels: The first step in our three-city C&RT comparison was to specify concentration cut-points for O 3 , NO 2 , and PM 2.5 that were common across all three cities. Each of the three pollutants was divided into four levels, resulting in 43 or 64 possible daily pollution profiles (aka “Day-Types”). Throughout the paper, the nomenclature used to describe these Day-Types is “O 3 -level/ NO 2 -level/ PM 2.5 -level”. For example, Day-Type “2/2/4” refers to days where both O 3 and NO 2 concentrations (3-day moving average) are in the 2nd level and the PM 2.5 concentration (3-day moving average) is in the 4th level (with levels 1 – 4 ranging from lowest to highest concentration).

Step 2: Define a common referent group: To enable estimation of joint effects, we defined a referent group corresponding to days when all three pollutants were in the lowest level (i.e., days designated as Day-Type 1/1/1). It was decided a priori that the referent group should include at least 100 days in each city. To satisfy this requirement, the referent group was defined as: O 3  ≤ 35 ppb, NO 2  ≤ 21 ppb, PM 2.5  ≤ 11ug/m3, which corresponds to roughly the 40th percentile of the overall distribution for the 3-day moving average of each pollutant, from the three cities combined. The remaining level cut-points were defined at approximately the 60th and 80th percentiles (Fig.  1 ). We chose to use absolute cut-points, rather than city-specific relative cut-points, so improve generalizability of the results. This approach is consistent with the setting of uniform air quality standards across cities or regions.

Step 3: Generate city-specific regression trees: For each city, the referent days were withheld while the remaining days formed the root node in the C&RT algorithm. We considered the nine possible ways these days could be partitioned based on the three pollutant concentration levels. This was accomplished by creating three mutually exclusive indicator variables for each pollutant representing the different comparisons: level 1 vs. 2–4, levels 1 and 2 vs. 3 and 4, and levels 1–3 vs. 4. Each of these nine indicators was considered one-at-a-time in the base model that controlled for confounding (described earlier). The selected indicator was the one that resulted in the smallest P-value for the null hypothesis that the beta for that indicator was 0. The days were then partitioned according to pollution levels defined by that indicator. For example, if the indicator representing NO2 level 1–3 vs. 4 had the lowest P-value, the data were partitioned into two groups: one consisting of all days with NO2 concentration in levels 1–3 and the other with days with NO2 concentration in level 4. Partitioning according to levels of a single pollutant continued until one of three stopping criteria were met: there were no remaining ways to partition the days, the remaining splits were not significant at a pre-specified level of alpha (α = 0.15), or the minimum number of days for each node (n = 60) was not met. In this case partitioning stopped and the node became a terminal node. Note that the confounding variables were included in the model but not used in the splitting.

Step 4: Identify pollution mixtures of interest: We grew three separate trees using the same algorithm and splitting indicator definitions for Atlanta, Dallas, and St. Louis. For each city, the C&RT algorithm partitioned the 63 Day-Types (all but the withheld referent group) into terminal nodes according to their association with the outcome. Rather than comparing the shapes of the trees or the ordering of the splits we focused on the mixtures of air pollutants encompassed in the terminal nodes that were common across cities. For example, suppose Day-Types 2/3/4 and 3/3/4 appear together in a terminal node in all three cities, then we say this pair of Day-Types constitutes a “mixture” in which we are potentially interested. This approach was used to identify all mixtures encompassed in the terminal nodes across the three city trees.

Step 5: Estimate joint effects of mixtures: We then estimated the joint effects of these C&RT-generated mixtures using the base model with indicators for each mixture. Rate ratios for each mixture were estimated using the previously withheld referent group (Day-Type 1/1/1) as the comparison in a three-city model with city-specific effects for all covariates (excluding the mixture indicator variables). Day-Types that did not fall into one of the 17 mixtures were represented with a single indicator variable in the model; rate ratios for these Day-Types are not presented in the Results.

Comparison model: multipollutant regression with pollutants modeled continuously

Finally, we compared our findings from the C&RT-identified mixtures to those obtained from a multipollutant regression model with pollutants modeled continuously. This comparison model consisted of the base model described above with the inclusion of a linear term (β*pollutant) for each of the three pollutants, modeled using the continuous 3-day moving average concentrations. A three-city model was run that included city indicator variables and product terms between these indicators and each other model covariate (excluding the pollutant concentrations). To compare the joint effects of this model with those of the C&RT-identified mixtures, we specified pollutant concentrations that were most commensurable to the pollutant concentrations of each mixture as well as the referent group (Day-Type 1/1/1). Specifically, we used the comparison model to estimate the joint effects for a change in concentration from the mean of the referent group to the mean of each C&RT-identified mixture, for each pollutant. This was accomplished by multiplying the coefficient for the linear effect of each pollutant by the difference in the pollutant-specific mixture mean and referent group mean. These products were summed and exponentiated to get the rate ratio for the joint effect. Standard errors for the joint effects were calculated using the variance-covariance matrix. As a sensitivity analysis, we also calculated the joint effects using the contrast represented by the change from the median concentration in the referent group to the median concentration in each C&RT-identified mixture. Further sensitivity analyses were conducted considering alternative comparison models, including a model with all first- and second-order multiplicative terms for the interactions between the pollutants as well as a model with both linear and quadratic terms for each pollutant. As before, joint effects were calculated for the differences in concentration between the mixture mean and the referent group mean. Analyses were performed using SAS® v9.3 (Statistical Analysis System; North Carolina).


After excluding days with missing air pollution levels, 4012 observations remained for analysis from Atlanta, 2354 days from St. Louis and 1337 days from Dallas. The referent group, identified as days where all pollutants were in the lowest level, contained 606 days (15 %) for Atlanta, 115 days (5 %) for St. Louis, and 121 days (9 %) for Dallas (Fig.1). A more extensive description of the referent group is provided in Table 1, including the monthly distribution, percent of days with precipitation, and average wind speed. Table 2 contains the frequency of each of the 64 Day-Types (the joint distribution of the three pollutants parameterized as ordinal variables) in each city and the correlations between each of the pollutants, by city, are shown in Table 3. All Day-Types occurred at least once in Atlanta; in St. Louis there was one Day-Type that never occurred, and in Dallas there were five Day-Types that never occurred.

Table 1 Distribution of referent and non-referenta days by city
Table 2 Frequency at which each Day-Type (n = 64) occurred by city, as well as the terminal node designation from the city-specific trees
Table 3 Spearman correlations between O3, NO2, and PM2.5 within each city

The C&RT algorithm was run separately for each city, generating regression trees with seven, six, and seven terminal nodes in Atlanta, Dallas, and St. Louis, respectively (Figs. 2, 3, 4). Comparing terminal nodes across the three cities, there were 17 pollution mixtures of two or more Day-Types that occurred together in the same terminal node in all three cities (Table 4). The numeric labeling of the mixtures was arbitrary; we ordered the mixtures according to the level of O3, followed by NO2. Of the 7709 days from the three cities combined, 842 were in the referent group (Day-Type 1/1/1) and 5446 were in one of the 17 mixtures. Table 4 contains the number of days in each mixture by city, as well as the mean concentrations of O3, NO2, and PM2.5. There were 10 Day-Types, corresponding to 1421 days from Atlanta, St. Louis and Dallas combined, that did not appear in any of the 17 mixtures, meaning these Day-Types did not appear together in terminal nodes with other Day-Types consistently across the cities.

Fig. 2

Regression tree illustrating multipollutant joint effects on pediatric asthma emergency department visits in Atlanta (1999 – 2009). Internal nodes are designated with an oval and numbered such that each node, n, produces two child nodes numbers 2n and 2n + 1. The branches of the tree are labeled according to the level of the pollutant used to partition the tree. For each partition, the branch with the more harmful association is bolded. Terminal nodes are numbered 1A-7A (A for Atlanta). The pie graphs at each terminal node are a graphical representation of the Day-Types that fall into each terminal node. Each pie graph has 12 wedges, four representing each level (L1-L4) of O3 (shades of purple), four representing each level of NO2 (shades of gold), and four representing each level of PM2.5 (shades of blue). Pie wedges are colored if a pollutant level is classified into that terminal node and left white if the pollutant level is absent from the terminal node. Day-Types present in the terminal node can be identified by finding every combination of one O3 wedge (purple), one NO2 wedge (gold) and one PM2.5 wedge (blue). For example terminal node 7A contains 4 Day-Types: O3 level 2, NO2 level 3 and PM2.5 levels 1–4 (2/3/1, 2/3/2, 2/3/3, 2/3/4)

Fig. 3

Regression tree illustrating multipollutant joint effects on pediatric asthma emergency department visits in Dallas (2006 –2009). Internal nodes are designated with an oval and numbered such that each node, n, produces two child nodes numbers 2n and 2n + 1. The branches of the tree are labeled according to the level of the pollutant used to partition the tree. For each partition, the branch with the more harmful association is bolded. Terminal nodes are numbered 1D-6D (D for Dallas). The pie graphs at each terminal node are a graphical representation of the Day-Types that fall into each terminal node. Each pie graph has 12 wedges, four representing each level (L1-L4) of O3 (shades of purple), four representing each level of NO2 (shades of gold), and four representing each level of PM2.5 (shades of blue). Pie wedges are colored if a pollutant level is classified into that terminal node and left white if the pollutant level is absent from the terminal node. Day-Types present in the terminal node can be identified by finding every combination of one O3 wedge (purple), one NO2 wedge (gold) and one PM2.5 wedge (blue). For example terminal node 6D contains 2 Day-Types: O3 level 3, NO2 levels 3 and 4, and PM25 level 1 (3/3/1, 3/4/1)

Fig. 4

Regression tree illustrating multipollutant joint effects on pediatric asthma emergency department visits in St. Louis (2001–2007). Internal nodes are designated with an oval and numbered such that each node, n, produces two child nodes numbers 2n and 2n + 1. The branches of the tree are labeled according to the level of the pollutant used to partition the tree. For each partition, the branch with the more harmful association is bolded. Terminal nodes are numbered 1S-7S (S for St. Louis). The pie graphs at each terminal node are a graphical representation of the Day-Types that fall into each terminal node. Each pie graph has 12 wedges, four representing each level (L1-L4) of O3 (shades of purple), four representing each level of NO2 (shades of gold), and four representing each level of PM2.5 (shades of blue). Pie wedges are colored if a pollutant level is classified into that terminal node and left white if the pollutant level is absent from the terminal node. Day-Types present in the terminal node can be identified by finding every combination of one O3 wedge (purple), one NO2 wedge (gold) and one PM2.5 wedge (blue)

Table 4 Mixtures described by Day-Type, number of days and mean concentration

The rate ratios (RR) for the 17 mixtures, modeled as indicator variables in the base case-crossover model with all three cities combined, are shown in Table 5 (2nd column). Nearly every RR, with the exception of the RRs for mixtures 1 and 9, was suggestive of a harmful association with pediatric asthma. Mixtures 10 and 13 had the largest effect sizes (RR: 1.07, 95 % CI: 1.03, 1.12; and RR: 1.06, 95 % CI: 1.02, 1.09).

Table 5 Rate ratios for the multipollutant joint effects of the C&RT-identified mixtures and comparison model

The RR results of the C&RT-identified mixtures are presented alongside the RRs for the joint effects calculated from the comparison model in Table 5. The RRs shown for the comparison model are for a concentration change from the referent mean to the mixture mean for each of the three pollutants. A Wald test of the statistical significance for the exposure terms, considered simultaneously, was significant for the C&RT model as well as the comparison model (p = 0.006 and p < 0.001, respectively).

The RRs from the comparison model have relatively good agreement with the C&RT-identified results for the mixtures with lower mean concentrations. At higher pollution levels, joint effects calculated from the comparison model suggest increasing risk with concentration, while the C&RT-identified results suggest risk plateaus at the highest concentration mixtures. For example, the RR for mixture 17, which contains the Day-Type when all pollutants are at their highest level (4/4/4), is 1.05 in the C&RT model (95 % CI: 1.01, 1.09) vs. an RR of 1.10 in the comparison model (95 % CI: 1.06, 1.14). To highlight this further, the 95 % confidence levels for the joint effects of mixtures 1, 9, 14, 15 and 17 calculated using the comparison model do not contain the point estimate from the corresponding joint effects from the C&RT approach, whereas for the other mixtures there was some overlap.

The sensitivity analysis using the mixture medians (as opposed to mean) to calculate the joint effects for the comparison model yielded similar estimates (Table 6). Joint effects from the two alternative comparison models (the multipollutant model with interaction terms and multipollutant model with quadratic effects) using the mixture means are included in Table 6. For the multipollutant model with interaction terms, the confidence intervals were large, while for the quadratic model, a test of significance for the exposure terms beyond the linear effects was non-significant (p = 0.63) and suggests that the multipollutant effect does not have a quadratic form. In both sensitivity analysis models, the point estimates of the joint effects for each mixture were nearly identical to the comparison model with only linear effects.

Table 6 Sensitivity analysis results for linear term models using Mixture median and Mixture mean with interactions and quadratic terms


In this paper we used classification and regression trees, a non-parametric recursive partitioning approach, to identify multipollutant joint effects associated with pediatric asthma in Atlanta, Dallas and St. Louis. As evidenced by our sensitivity analyses, it is difficult to identify complex interactions of two, three or four pollutants using conventional regression models due to power limitations [21]. A known advantage of C&RT is that it can be used to detect complex and multiple interactions between covariates [8, 9]. We have previously shown that with few modifications, C&RT can be used to detect interactions between pollutant concentrations while simultaneously controlling for temporal and meteorological confounding [12].

A key finding of this analysis is that no single C&RT-identified mixture emerged as being substantially more harmful than the rest. All mixtures, with the exception of mixtures 1 and 9, had a harmful association relative to days when all pollutants were in the lowest level. There were seven mixtures with statistically significant associations, all of which were similar in magnitude. There was no apparent pollutant-specific pattern to the RRs of the C&RT-identified mixtures, suggesting that no single pollutant was driving the associations. Instead the results of the C&RT approach suggest that higher levels of any of the three pollutants (O3, NO2, and PM2.5) are more harmful and that the rates appear to plateau at higher concentrations. For example mixture 17, which contains the highest mean concentrations of O3 and NO2 and the second highest PM2.5 concentration, has an RR of 1.05, while mixture 13, which is characterized by more moderate concentrations of the three pollutants, has an RR of 1.06. While this lack of a synergistic --or even multiplicative-- response is surprising, it is not unprecedented. In a review of the literature, Mauderly and Samet found that 22 out of 36 laboratory studies failed to demonstrate a synergistic response [22]. A hypothesis that could explain the C&RT finding that risk plateaus on the highest pollution days is if asthmatic children change their behavior and limit exposure. A cross-sectional study by Wen et al. lends some support to this theory; the study found that asthmatic adults had a greater odds of modifying their outdoor activity compared with non-asthmatics on days with media alerts due to a high air quality index [23].

This C&RT approach for identifying and evaluating multipollutant joint effects yielded different results than the multipollutant comparison model with continuous terms for the three pollutants. This latter multipollutant regression model was selected for comparison with the C&RT approach because it is an extension of the traditional single-pollutant models and likely represents a characterization of joint effects for which researchers are most familiar, although only a few studies have actually used such models to estimate a combined multipollutant joint effect [16, 2426]. It should be emphasized that the comparison model was not expected to generate equivalent results as the C&RT model, but rather to serve as a comparison for the C&RT approach.

Examination of the differences between the C&RT and comparison model results suggests that the two approaches for modeling multipollutant exposures lead to different conclusions regarding the roles of individual pollutants. In the comparison model, joint effects are driven by O3 concentration. This is demonstrated by the increasing RRs from top to bottom in Table 5, an artifact of assigning arbitrary labels to the mixtures based on increasing O3 concentration. Conversely, one does not see the same pattern of increasing RRs in the C&RT-identified mixtures, again suggesting that no single pollutant is driving these results. With respect to PM2.5, results from the comparison model with continuous terms suggest that PM2.5 has a null association (overall RR 1.00 for an IQR increase in PM2.5, holding the other two pollutants constant), while the C&RT approach suggests that PM2.5 drives some of the mixture associations. This is perhaps best highlighted by comparing the RRs for mixtures 1 and 2, which have very similar levels of O3 and NO2 but differ in the concentration of PM2.5. Using the C&RT approach the RR for mixture 1, which has the lower PM2.5 mean concentration, is 0.99 (95 % CI: 0.97, 1.02) while the RR for mixture 2 is 1.04 (95 % CI: 0.99, 1.09) (Table 5). In contrast, the RRs for mixtures 1 and 2 using the multipollutant comparison model are both 1.01 (with 95 % CIs of 1.00, 1.02 and 1.00, 1.03, respectively).

Although the joint effects of the C&RT and comparison models differ (i.e., in the C&RT model pollution is modeled categorically whereas in the comparison model pollution is modeled continuously), the differences implied by the results are striking and merit further attention. One possibility is that C&RT identified joint effects are driven by PM2.5 constituents. For example, the comparison model treats all specific PM2.5 concentrations equally; a high PM2.5 day that has high levels of elemental carbon is not distinguished from a high day that has a greater proportion of sulfates. Conversely the C&RT model has the potential to distinguish mixtures of PM2.5 constituents through the interactions detected by subsequent partitioning. By partitioning on a pollutant (e.g., NO2) that is correlated with certain PM2.5 components, C&RT has the ability to differentiate PM2.5 mixtures through their correlation with other independent variables in the model. While the multipollutant comparison model with the inclusion of first- and second-order interaction terms could discriminate between some PM2.5 mixtures, its discriminatory power would be limited to a linear effect for each of the interaction terms. As such it could not, for example, identify the same complex interactions as seen in the St. Louis C&RT tree through nodes 1-2-4-9 (Fig.4).

By binning days, the C&RT model may be able to account for unmeasured confounding that is non-smooth (i.e., that varies with terminal node classification, not pollution). For example, people may modify their behavior under certain types of days in a way that affects ED visits for asthma. As a result, it is possible that the point estimates for the mixture results are measuring not only the multipollutant effect but also the effects of other factors that are correlated with those Day-Types. While this could be a disadvantage if one intends to use the point estimates to conduct risk assessment, it could be beneficial if the interest is in identifying types of days that are most harmful for a particular health outcome and could lead to a more targeted warning system.

When considering the C&RT approach, if two or more Day-Types appear together in a terminal node, this is either indicative of homogeneity of effect or lack of power to detect any further effect (i.e., drive any further partitioning). It is likely that one or more of the 17 mixtures were formed as a result of insufficient power to further partition the terminal node in a given city. This would result in an “artificial group”, that is a mixture in which the Day-Types do not have a similar association with the outcome.

One downside to presenting the combined mixture RRs in this analysis is that any heterogeneity across the cities will be masked. Some between-city heterogeneity is to be expected due to tangible and intangible city differences, including socio-economic status, air conditioning use, climate acclimation and behavior patterns that are likely to modify the estimated health associations [27]. Nonetheless, it seems likely that there exist some ambient pollution mixtures that are universally harmful, and by combining cities our aim was to overcome some of the power limitations mentioned above.


C&RT can be used to investigate multipollutant joint effects and may lead to different conclusions than multipollutant models with continuous terms. In particular, the results from this study suggest C&RT and comparison models lead to different joint effects of O3, NO2 and PM2.5 when concentrations are high and that the apparent role of PM2.5 differs according to the model used. Furthermore we have shown how C&RT models can be used to identify types of days that are particularly harmful to health, which can help to improve warning systems and lead to more targeted regulation.

Understanding the potential risk air pollution mixtures pose to human health is a complex and challenging undertaking that has only just begun. It is important to emphasize that we do not know which, if either, of the models is more correct. Different assumptions are made when exposure is modeled categorically vs. continuously, making direct comparisons of results difficult. Nevertheless, exploring alternative models can be a useful way to generate new ideas and perhaps gain greater insight into air pollution mixtures.



Confidence interval


Classification and regression trees


Emergency Department


Nitrogen dioxide




Particulate Matter (<2.5 μg)


Rate ratio


  1. 1.

    Stieb DM, Burnett RT, Smith-Doiron M, Brion O, Shin HH, Economou V. A new multipollutant, no-threshold air quality health index based on short-term associations observed in daily time-series analyses. J Air Waste Manag Assoc. 2008;58:435–50.

    CAS  Article  Google Scholar 

  2. 2.

    Austin E, Coull BA, Zanobetti A, Koutrakis P. A framework to spatially cluster air pollution monitoring sites in US based on the PM2.5 composition. Env Int. 2013;59:244–54.

    CAS  Article  Google Scholar 

  3. 3.

    Qian Z, Chapman RS, Hu W, Wei F, Korn LR, Zhang JJ. Using air pollution based community clusters to explore air pollution health effects in children. Env Int. 2004;30:611–20.

    CAS  Article  Google Scholar 

  4. 4.

    Roberts S, Martin MA. Using supervised principal components analysis to assess multiple pollutant effects. Env Heal Perspect. 2006;114:1877–82.

    Google Scholar 

  5. 5.

    Suh HH, Zanobetti A, Schwartz J, Coull BA. Chemical properties of air pollutants and cause-specific hospital admissions among the elderly in Atlanta, Georgia. Env Heal Perspect. 2011;119:1421–8.

    Article  Google Scholar 

  6. 6.

    Sarnat JA, Marmur A, Klein M, Kim E, Russell AG, Sarnat SE, et al. Fine particle sources and cardiorespiratory morbidity: an application of chemical mass balance and factor analytical source-apportionment methods. Env Heal Perspect. 2008;116:459–66.

    Google Scholar 

  7. 7.

    McConnell R, Islam T, Shankardass K, Jerrett M, Lurmann F, Gilliland F, et al. Childhood incident asthma and traffic-related air pollution at home and school. Env Heal Perspect. 2010;118:1021–6.

    CAS  Article  Google Scholar 

  8. 8.

    Billionnet C, Sherrill D, Annesi-Maesano I, Study G. Estimating the health effects of exposure to multi-pollutant mixture. Ann Epidemiol. 2012;22:126–41.

    Article  Google Scholar 

  9. 9.

    Sun Z, Tao Y, Li S, Ferguson KK, Meeker JD, Park SK, et al. Statistical strategies for constructing health risk models with multiple pollutants and their interactions: possible choices and comparisons. Env Heal. 2013;12:85.

    Article  Google Scholar 

  10. 10.

    Hu W, Mengersen K, McMichael A, Tong S. Temperature, air pollution and total mortality during summers in Sydney, 1994–2004. Int J Biometeorol. 2008;52:689–96.

    Article  Google Scholar 

  11. 11.

    Oiamo TH, Luginaah IN. Extricating sex and gender in air pollution research: a community-based study on cardinal symptoms of exposure. Int J Env Res Public Heal. 2013;10:3801–17.

    Article  Google Scholar 

  12. 12.

    Gass K, Klein M, Chang HH, Flanders WD, Strickland MJ. Classification and regression trees for epidemiologic research: an air pollution example. Environ Heal. 2014;13:17.

    Article  Google Scholar 

  13. 13.

    Strickland MJ, Darrow LA, Klein M, Flanders WD, Sarnat JA, Waller LA, et al. Short-term associations between ambient air pollutants and pediatric asthma emergency department visits. Am J Respir Crit Care Med. 2010;182:307–16.

    Article  Google Scholar 

  14. 14.

    Winquist A, Klein M, Tolbert P, Flanders WD, Hess J, Sarnat SE. Comparison of emergency department and hospital admissions data for air pollution time-series studies. Env Heal. 2012;11:70.

    CAS  Article  Google Scholar 

  15. 15.

    Samet JM, Dominici F, Zeger SL, Schwartz J, Dockery DW. The national morbidity, mortality, and air pollution study part I: methods and methodologic issues. Cambridge, MA: Health Effects Institute; 2000

  16. 16.

    Winquist A, Kirrane E, Klein M, Strickland M, Darrow LA, Sarnat SE, et al. Joint effects of ambient air pollutants on pediatric asthma emergency department visits in atlanta, 1998–2004. Epidemiology. 2014;25:666–73.

    Article  Google Scholar 

  17. 17.

    Ivy D, Mulholland JA, Russell AG. Development of ambient air quality population-weighted metrics for use in time-series health studies. J Air Waste Manage Assoc. 2008;58:711–20.

    CAS  Article  Google Scholar 

  18. 18.

    Lu Y, Zeger SL. On the equivalence of case-crossover and time series methods in environmental epidemiology. Biostatistics. 2007;8:337–44.

    Article  Google Scholar 

  19. 19.

    Breiman L, Friedman JH, Olshen RA, Stone CJ. Classification and Regression Trees. Belmont: Wadsworth International Group; 1984.

    Google Scholar 

  20. 20.

    Strobl C, Boulesteix AL, Zeileis A, Hothorn T. Bias in random forest variable importance measures: illustrations, sources and a solution. BMC Bioinformatics. 2007;8:25.

    Article  Google Scholar 

  21. 21.

    Greenland S. Tests for interaction in epidemiologic studies: a review and a study of power. Stat Med. 1983;2:243–51.

    CAS  Article  Google Scholar 

  22. 22.

    Mauderly JL, Samet JM. Is there evidence for synergy among air pollutants in causing health effects? Env Heal Perspect. 2009;117:1–6.

    CAS  Article  Google Scholar 

  23. 23.

    Wen XJ, Balluz L, Mokdad A. Association between media alerts of air quality index and change of outdoor activity among adult asthma in six states, BRFSS, 2005. J Community Heal. 2009;34:40–6.

    Article  Google Scholar 

  24. 24.

    Jerrett M, Burnett RT, Beckerman BS, Turner MC, Krewski D, Thurston G, et al. Spatial analysis of air pollution and mortality in California. Am J Respir Crit Care Med. 2013;188:593–9.

    Article  Google Scholar 

  25. 25.

    Roberts S, Martin MA. Investigating the mixture of air pollutants associated with adverse health outcomes. Atmos Environ. 2006;40:984–91.

    CAS  Article  Google Scholar 

  26. 26.

    Schildcrout JS, Sheppard L, Lumley T, Slaughter JC, Koenig JQ, Shapiro GG. Ambient air pollution and asthma exacerbations in children: an eight-city analysis. Am J Epidemiol. 2006;164:505–17.

    Article  Google Scholar 

  27. 27.

    Katsouyanni K, Touloumi G, Samoli E, Gryparis A, Le Tertre A, Monopolis Y, et al. Confounding and effect modification in the short-term effects of ambient particles on total mortality: results from 29 European cities within the APHEA2 project. Epidemiology. 2001;12:521–31.

    CAS  Article  Google Scholar 

Download references


This publication was made possible by US EPA grant R834799. This publication’s contents are solely the responsibility of the grantee and do not necessarily represent the official views of the US EPA. Further, US EPA does not endorse the purchase of any commercial products or services mentioned in the publication. This publication was also made possible by grants to Emory University from the Electric Power Research Institute (EP-P25912/C12525, EP-P34975/C15892, and EP-P45572/C19698). This publication is based in part upon information obtained through the Dallas Fort Worth Hospital Council Foundation Information and Quality Services Center’s collaborative hospital data initiative. The Foundation is grateful for the support of its participating hospitals.

Author information



Corresponding author

Correspondence to Katherine Gass.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

KG conceived of the methodology, designed the algorithm, performed the analysis and drafted the manuscript. MK aided in the conception of the methodology, algorithm development and drafting of the manuscript. SES, AW, and LAD aided in the acquisition and preparation of the three-city ED data and contributed to the interpretation of results. WDF provided statistical and conceptual guidance in the design of the algorithm and analysis. HHC provided statistical guidance in the design of the algorithm and analysis. JAM aided in the preparation and calculation of the population-weighted average exposure data and interpretation of the mixtures. PET oversaw the acquisition of the ED data and contributed to the drafting and interpretation of results. MJS oversaw the conception of the methodology, designing of the algorithm, analysis and drafting of the manuscript. All authors read and approved the final manuscript.

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Gass, K., Klein, M., Sarnat, S.E. et al. Associations between ambient air pollutant mixtures and pediatric asthma emergency department visits in three cities: a classification and regression tree approach. Environ Health 14, 58 (2015).

Download citation


  • Air pollution
  • Classification and regression trees
  • Multicity
  • Multipollutant
  • Mixtures
  • NO2
  • Ozone
  • Pediatric asthma
  • PM2.5