Our study was conducted under a protocol approved by the Harvard School of Public Health Human Subjects Committee. Study data did not include individual identifiers and thus consent was not obtained from individuals.
Cause-specific hospital admissions data
Data on daily emergency hospital admissions were obtained from billing claims of Medicare enrollees for Atlanta, GA (2006–2009) and Birmingham, AL (2006–2009) and for Dallas, TX (2006–2007). Data for hospitals within Clayton, Cobb, De Kalb, Fulton and Gwinnett counties in Atlanta, Jefferson and Shelby counties in Birmingham and Dallas county in Dallas were included in the analyses. Only admissions that occurred through the emergency department were included, as scheduled admissions are likely not related to short-term air pollution exposures.
Each billing claim contains information on the date of hospitalization, age, residence county and primary diagnoses. Using codes from the International Classification of Diseases, 9 Revision (ICD-9; Center for Disease Control and Prevention 2008), we considered hospital admissions for all CVD conditions (codes 390–429), for all respiratory outcomes (codes 460–519), and for specific CVD or respiratory conditions: congestive heart failure (CHF; code 428), myocardial infarction (MI; code 410), ischemic heart disease (IHD; codes 410–414), chronic obstructive pulmonary disease (COPD; codes 490–492, 494–496) and pneumonia (codes 480–487). Outcomes were selected based on findings from previous air pollution health studies[12, 19, 20].
Air pollution and meteorologic data
In each city, we obtained daily data for ∼120 non-polar compounds by thermal desorption GC/MS, OC by IMPROVE protocol thermal optical reflectance and PM2.5 (measured using 24-hr integrated Federal Reference Methods) measured as part of the ARIES and Texas ARIES studies. The analytical methods are well-accepted and have been previously published[21–23]. Data on temperature and dew point were obtained from Atlanta and Birmingham monitoring sites and the Dallas Fort Worth International Airport.
All statistical analyses were conducted using the R Statistical Software, version 2.14.1 (Foundation for Statistical Computing, Vienna, Austria).
We characterized OC particle concentrations using time-series plots, histograms and summary statistics. We further assessed seasonal differences in concentrations, with October–March as the cold period and April–September as the warm period.
OC species were included in further analyses if ≥50% of their observations were above their limit of detection (LOD), ≥75% of the observations were non-missing, and their IQR/median ratio was above 0.30, in all cities. The IQR/median ratio was used instead of the coefficient of variation, given its lower vulnerability to extreme observations. Pollutants with IQR/median ratios ≤0.30 were excluded, as they were not sufficiently variable to allow effect estimation with sufficient power.
Characterization of primary organic compounds
Primary OC compounds were classified by their chemical structures, as these govern their properties, reactivity and behavior. We categorized OC into six chemical groups: PAHs, n-alkanes, hopanes, steranes, iso-/anteiso-alkanes and cyclohexanes (Additional file1: Table A-1).
Although not used in the health analyses, the seasonal variability of the organic constituents was also examined, to provide insight in their potential sources. N-alkanes and hopanes were further classified by their sources using well-accepted methods. For alkanes, we estimated city-specific monthly Carbon Preference Index (CPI), as the prevalence of odd to even numbered carbon species, to assess the relative importance of anthropogenic or biogenic sources. For our analyses, CPI values greater than 2 indicated plants and other biogenic sources as the primary n-alkane source, while values near 1 were consistent with anthropogenic sources[25, 26]. To identify anthropogenic sources further[3, 4, 6, 10], we also conducted exploratory factor analyses in each city, with the number of factors determined based on (a) identified factors having ≥3 species with a correlation ≥0.30 and (b) a solution that explained ≥90% of the species common variance.
We classified hopane sources in each city using the moretane ratio, with higher ratios indicating greater maturity of the hopanes. This ratio is based on the fact that with increasing thermal maturity, unstable hopanes with hydrogen atoms at the β β-position are transformed to moretanes (β α-hopanes) and further to more stable α β-hopanes. Ratios greater than 0.9 indicated that hopanes originated from crude oil, near 0.1 from lignite coal smoke, and 0.4-0.6 from cleaner coals.
To assess the effect of primary OC compounds on hospital admissions we used a 2-stage hierarchical regression modeling approach, as has been used in studies of dietary exposures and breast cancer. Hierarchical approaches have been widely used in air pollution epidemiology to combine health effects across cities[15, 29, 30], and more recently across multiple pollutants, such as associations between chemical properties of multiple air pollutants and hospital admissions.
In the first stage we fit a case-crossover analysis to the data from all cities. In a case-crossover design, each case acts as their own control, thus eliminating confounding by any personal characteristics that do not change over time. The effect of the exposure on the outcome is then assessed by comparing the distribution of exposures on the days when the case occurred versus the days when the subject did not have the outcome (control days). We modified the time-stratified approach that was proposed by Lumley and Levy, employing an ignorable and localizable design, choosing control days bidirectionally for subjects within the same city, on the same year and month of the emergency hospital admission, but leaving 3 days between each control day instead of also matching on day of week. By doing so we increased the number of control days, increasing power to detect any effects and by leaving the 3-d buffer we avoided choosing control days too close to the exposure period, which may lead to confounding due to serial correlation. Choosing control days close in time to the admission, furthermore, limits confounding by seasonality and long-term trends.
We ran conditional logistic regressions by cause-specific hospital admissions, including simultaneously all eligible primary organic compounds, with their concentrations scaled by their IQR, and adjusted linearly for same day temperature, same day dew point, 1- to 3-day averaged temperature, and day of week. We additionally adjusted for PM2.5, as it has been associated with the health outcomes and differentially correlated with the pollutants included in the model as well as with other pollutants not included in the model that could act as confounders.
In the second stage we used a multivariate weighted regression model using the coefficient estimates from the first stage as the dependent variables and the variance-covariance matrix of these coefficients as weights. Let k denote the number of primary organic compounds in the model and g denote the number of pollutant groups, according to their chemical structure:
where are the coefficient estimates from the first stage, contains the chemical structure groups (0/1 dummy variables), contains the effect estimates of interest, i.e. the coefficients representing pollutant group effects on the outcome, with each individual coefficient representing the average log rate ratio associated with an IQR increase in that pollutant class, are independent random variables with zero mean and pre-specified variance τ
2, with k = 58 and g = 5. Since the second stage includes all chemical structure groups in which the organic compounds we used belong, the existence of any residual associations seems unlikely and we therefore set the τ
2 to a modest value, i.e. τ
2 = 2.6 × 10−5. Because 100 × [exp(2 × 1.96×τ) − 1] ≈ 2, if τ = 0.0051, our selected value for τ
2 corresponds to expectations that 95% of the % changes would fall within a 2-fold range.
We examined associations between cause-specific emergency hospital admissions and weekly (7-d) exposures to primary organic compounds, with exposure windows chosen based on previous literature. We also examined same day exposures and moving averages of 2-, 4-, and 6-days. We call statistically significant effects those whose 95% confidence intervals do not include 0.
We examined potential multicollinearity among the groups using the eigenvalues of the variance-covariance matrix of the second stage effects. Based on this examination, sterane effects were found to be highly correlated with hopane (r = -0.69), n-alkane (r = 0.78) and cyclohexane (r = -0.88) effects and were thus excluded from further analysis.
We conducted a series of sensitivity analyses to assess the robustness of our results.
Limit of detection
We ran two-stage models for total CVD and respiratory emergency hospital admissions, including species with at least 75% of observations above the limit of detection, to examine sensitivity of our results to LOD exclusion criteria. For this analysis a total of 40 species were included, as compared to 58 in the main analysis: 14 n-alkanes, 2 PAHs, 4 iso-/anteiso-alkanes, with the same number of cyclohexanes (2) and hopanes (18) as in the main analysis.
Effect estimate stability
We assessed the stability of the effect estimates (as % change in group effects and width of confidence intervals) by excluding individual chemical groups from the analysis one-by-one and assessing change in the results.
We also assessed the sensitivity of our results to the inclusion of PM2.5 in the health models, by repeating analyses omitting PM2.5.
Sensitivity of our results to the choice of the τ
We assessed the dependence of our results for total CVD and total respiratory admissions on our pre-specified value of τ
2, by exploring different values for τ
2. Specifically, we also examined τ
2 = 0, allowing the variability of the second stage effects to only depend on the variance-covariance matrix of the first stage coefficients, and also τ
2 = 0.0001, corresponding to expectations that 95% of the % changes would fall within a 4-fold range.
Given observed associations between extreme temperatures and adverse health[37, 38], we examined whether our findings were affected by extreme temperatures. We did so by excluding the 99th and 1st percentiles of daily temperatures from our health models.