Emulating causal dose-response relations between air pollutants and mortality in the Medicare population

Wei, Yaguang; Yazdi, Mahdieh Danesh; Di, Qian; Requia, Weeberb J.; Dominici, Francesca; Zanobetti, Antonella; Schwartz, Joel

doi:10.1186/s12940-021-00742-x

Research
Open access
Published: 06 May 2021

Emulating causal dose-response relations between air pollutants and mortality in the Medicare population

Yaguang Wei ORCID: orcid.org/0000-0002-6796-7510¹,
Mahdieh Danesh Yazdi¹,
Qian Di²,
Weeberb J. Requia³,
Francesca Dominici⁴,
Antonella Zanobetti¹ &
…
Joel Schwartz^1,5

Environmental Health volume 20, Article number: 53 (2021) Cite this article

8296 Accesses
27 Citations
35 Altmetric
Metrics details

Abstract

Background

Fine particulate matter (PM_2.5), ozone (O₃), and nitrogen dioxide (NO₂) are major air pollutants that pose considerable threats to human health. However, what has been mostly missing in air pollution epidemiology is causal dose-response (D-R) relations between those exposures and mortality. Such causal D-R relations can provide profound implications in predicting health impact at a target level of air pollution concentration.

Methods

Using national Medicare cohort during 2000–2016, we simultaneously emulated causal D-R relations between chronic exposures to fine particulate matter (PM_2.5), ozone (O₃), and nitrogen dioxide (NO₂) and all-cause mortality. To relax the contentious assumptions of inverse probability weighting for continuous exposures, including distributional form of the exposure and heteroscedasticity, we proposed a decile binning approach which divided each exposure into ten equal-sized groups by deciles, treated the lowest decile group as reference, and estimated the effects for the other groups. Binning continuous exposures also makes the inverse probability weights robust against outliers.

Results

Assuming the causal framework was valid, we found that higher levels of PM_2.5, O₃, and NO₂ were causally associated with greater risk of mortality and that PM_2.5 posed the greatest risk. For PM_2.5, the relative risk (RR) of mortality monotonically increased from the 2nd (RR, 1.022; 95% confidence interval [CI], 1.018–1.025) to the 10th decile group (RR, 1.207; 95% CI, 1.203–1.210); for O₃, the RR increased from the 2nd (RR, 1.050; 95% CI, 1.047–1.053) to the 9th decile group (RR, 1.107; 95% CI, 1.104–1.110); for NO₂, the DR curve wiggled at low levels and started rising from the 6th (RR, 1.005; 95% CI, 1.002–1.018) till the highest decile group (RR, 1.024; 95% CI, 1.021–1.027).

Conclusions

This study provided more robust evidence of the causal relations between air pollution exposures and mortality. The emulated causal D-R relations provided significant implications for reviewing the national air quality standards, as they inferred the number of potential early deaths prevented if air pollutants were reduced to specific levels; for example, lowering each air pollutant concentration from the 70th to 60th percentiles would prevent 65,935 early deaths per year.

Peer Review reports

Introduction

Fine particulate matter (PM_2.5), ozone (O₃), and nitrogen dioxide (NO₂) are major air pollutants that pose considerable threats to human health [1, 2]. Starting in the 1990s, a large literature of epidemiological research has reported associations between chronic air pollution exposures and mortality, with PM_2.5 and O₃ being the most extensively studied components [3,4,5,6,7,8,9]. Chronic exposure to NO₂ has also been associated with mortality, although the evidence is relatively scarce [10, 11]. These findings provide important implications for understanding the health burden attributable to poor air quality. In the United States, it is estimated that each 1 μg·m^− 3 increase in PM_2.5 concentration is associated with over 30,000 deaths each year, equivalent to a loss of 0.13–0.15 years in national life expectancy [12].

The primary objective of epidemiology is to identify a causal connection between exposure and health outcome, thereby informing decisions on policy interventions [13]. For example, the United States Environmental Protection Agency (US EPA) reviews the National Ambient Air Quality Standards (NAAQS) periodically based on the cause-effect relationship that can be inferred from the best available science [14]. However, as observational studies, many air pollution epidemiological investigations, by nature, have been associational rather than causal [15]. Although a growing literature has examined the long-term effect of PM_2.5 on mortality using the formal causal modeling techniques, there is so far little evidence for O₃ and NO₂ [16,17,18]. Indeed, O₃ and NO₂ have received less attention than they deserve; so far there is no standard for long-term O₃ concentrations (only daily) and the standard for annual NO₂ concentrations has remained the same for decades [19].

What has been mostly missing in air pollution epidemiology is the specific shapes of causal dose-response (D-R) relations between air pollution exposures and risk of mortality. Such causal D-R relations can provide profound implications in predicting the health impact at a target level of air pollution concentration [20]. Recently, a study of PM₁₀ that explicitly used a formal causal modeling approach to estimate the D-R relationship found a higher mortality risk at low to moderate air pollution levels [21]. However, to date no such studies have been done for PM_2.5, O₃ or NO₂. Specifying the causal D-R relationship, especially at very low levels, is critically important in measuring the risk of mortality induced directly by the change of air pollution level, thus supporting the potential revision of NAAQS in the US and, globally, the World Health Organization air quality guidelines [19, 22].

The present study analyzed 74 million Medicare beneficiaries in the contiguous US with 637 million person-years of follow-up from 2000 to 2016, which covers more than 95% of elders aged 65 years and older in the US who are considered to be most susceptible to air pollution [23]. The Medicare population also accounts for two-thirds of total mortality, allowing us to analyze most deaths induced by air pollution [1, 6]. By linking the annual averages of ambient PM_2.5 and NO2 concentrations as well as warm-season (April–September) average of ambient O₃ to the ZIP Codes of beneficiaries’ residence, we were able to have proxy measures of chronic exposures for each individual [24]. We proposed a decile binning approach which divided each exposure by deciles and predicted the inverse probability of being assigned to the observed group for each observation, adjusting for the other two concurrent exposures, personal characteristics, meteorological, socioeconomic, behavioral, and medical access variables, and long-term time trend. If propensity score models were correctly specified, we had constructed a valid counterfactual framework and thus estimated the causal D-R relations between chronic exposures to PM_2.5, O₃, and NO₂ and the risk of all-cause mortality.

Methods

Mortality data

We obtained Medicare enrollment records for beneficiaries aged 65 years and above residing in the contiguous US between 2000 and 2016 from the Centers for Medicare and Medicaid Services, with all-cause mortality as the study outcome. For each beneficiary, we extracted their demographic information (sex, race, age at initial enrollment), Medicaid eligibility, ZIP Code of residence, year of initial enrollment, and year of death if it occurred during the study period. We constructed an open cohort with person-years of follow-up in which each beneficiary was followed each year from the study entry until the end of study, drop out of the cohort, or death, whichever occurred earliest. Note that the same data format has been used to fit time-varying Cox proportional hazard models [6].

Exposure assessment

The daily concentrations of ambient PM_2.5, O₃, and NO₂ at 1 km × 1 km grid cells across the contiguous US were predicted and validated using hybrid models that ensembled predictions from random forest, gradient boosting, and neural network. Multiple predictor variables were incorporated in the predictions, including ground monitoring data, satellite data, meteorological conditions, land-use variables, and chemical transport model simulations, etc., with details published elsewhere [25,26,27].

These high-resolution predictions at 1 km × 1 km grid cells allow us to estimated ZIP Code-level exposure levels with a high degree of accuracy, with annual R² on held out monitors of 0.89 for PM_2.5, 0.86 for O₃, and 0.84 for NO₂. There are two major types of ZIP Codes in the US: standard ZIP Code and PO Box. Because a standard ZIP Code represents a delivery area, we used the polygon layer generated by Environmental Systems Research Institute (Esri) [28], and estimated the ZIP Code’s daily concentrations by averaging the predictions at grid cells whose centroid points were inside the polygon of that ZIP Code. For PO Box, because it is used only for a given facility and therefore can be represented by a single point, we estimated its daily concentrations by linking it to the nearest grid cell.

The exposures of interest were assessed based on the ZIP Code-level estimates. For PM_2.5 and NO₂, we defined their chronic exposures as annual average concentrations. For O₃, following previous literature [5, 6], we defined the chronic exposure as the average concentration during warm season (April–September) of the year. We assigned the chronic exposures to PM_2.5, O₃, and NO₂ to each person-year based on that person’s ZIP Code of residence and calendar year.

Covariate information

Meteorological variables including daily air temperature and humidity at 2 m above the ground were extracted from Phase 2 of the North American Land Data Assimilation System, with 12 km × 12 km resolution across the continental US [29]. The average temperatures during warm (April–September) and cold seasons (January–March plus October–December) of each year were calculated from the daily data because both exceedingly low and high temperatures were physically stressful and were also associated with air pollution levels [30, 31]. Annual average humidity was calculated from the daily data on a yearly basis. ZIP Code Tabulation Areas (ZCTA)-level socioeconomic variables, including the percentage of Blacks, percentage of Hispanics, median household income, median value of owner occupied housing, percentage of Americans aged 65 and older living below the poverty threshold, percentage of Americans with less than high school education, percentage of owner occupied housing units, and population density, etc., were obtained from 2000 and 2010 US Census and the American Community Survey [32]. These variables were linearly extrapolated by year to account for the time varying nature of socioeconomic status. County-level behavioral variables, including body mass index (BMI) and percentage of ever smokers for each year, were obtained from the Behavioral Risk Factor Surveillance System [33]. From the Dartmouth Atlas of Health Care [34], we obtained percentage of Medicare participants who had a hemoglobin A1c test, a low-density lipoprotein cholesterol (LDLC) test, a mammogram, and an eye exam to a primary care physician for each year in each hospital catchment area in the US and assigned it to all ZCTAs in that area. We also computed the distance from each ZIP Code centroid to the nearest hospital. These variables were linked to each person-year by ZIP Code of residence and calendar year. Summary statistics of the covariates are provided in Section 5 of Supplementary Information.

A decile binning approach to emulate causal D-R relations

To emulate the causal D-R relationship, we need a counterfactual framework. For a binary exposure, the causal estimate in a population of interest comes from the difference between the counterfactual outcome under which all the members of the population had been exposed versus the counterfactual outcome had they not been exposed, thus no confounding occurs [35]. In randomized experiments, counterfactuals are constructed by randomly assigning individuals to treatment groups to ensure that exposure is independent of all potential confounders. In observational studies, however, the exposure assignment is not random but instead is considered to be influenced by subject characteristics, and causal methods seek ways to approximate counterfactuals with reference to the observed population [36]. Inverse probability weighting (IPW), for example, is a formal causal modeling technique and is increasingly being used in observational studies [37, 38]. For a binary exposure, it uses quasi-experimental design to construct a “pseudo-population” by weighting the population by the inverse probability of the observed exposure given all measured confounders. The “pseudo-population” is then used to estimate the exposure effect. If the systematic difference of characteristics among the exposed and unexposed is adequately adjusted so that the two groups are comparable with respect to any confounders, a causal conclusion is warranted [35, 39].

But air pollution exposure is continuous in nature. Estimating the inverse probability weights in the continuous setting is challenging as it needs to 1) correctly specify the distributional form of exposure, 2) deal with non-constant variance (heteroscedasticity), and 3) avoid excessively large or small weights for outliers that are more likely to occur [40, 41]. For these reasons, in this section we proposed a decile binning approach to emulate the causal D-R relations between chronic air pollution exposures and mortality by dividing each exposure into ten equal-sized groups by deciles, treating the lowest decile group (i.e., 10% of the study population with the lowest exposure levels) as the reference, and estimating the effects for the other groups compared to the reference. This relaxes the strong assumptions of distribution form and homoscedasticity for continuous exposure by relying solely on deciles. In addition, binning data makes the inverse probability weights robust against outliers [42].

We had a dataset with person-year representations of follow-up which allowed for time-varying exposures and covariates. To reduce the computational burden, first we aggregated the person-years with the same sex, race, age, Medicaid eligibility, living in the same ZIP Code of residence and in the same year. We treated them as a single record because those person-years had identical values for all exposures and covariates and thus could be treated interchangeably in the analysis. As a result, we retained all the information yet significantly reduced the size of the data in which each observation represented a stratum of combination of sex, race, age, and Medicaid eligibility per ZIP Code of residence per year. Numbers of deaths and person-years were cumulated for each stratum.

For each exposure, the analysis under the counterfactual framework was composed of two stages: a design stage where a randomized “pseudo-population” was constructed by weighting the observed population by the inverse probability of exposure given all measured confounders, and an analysis stage where the treatment effect was estimated among the constructed “pseudo-population” [43]. In the first stage, we binned the exposure into ten equal-sized categories based on deciles. The stabilized inverse probability weight (sw_ij) for stratum j in exposure category i was defined as:

$$ {sw}_{ij}=\frac{P\left(X\in i\right)}{expit\left(g\left({X}_{ij};{n}_{ij}\ |\ \boldsymbol{C}\right)\right)} $$

where P(X ∈ i) denotes the probability of any observed exposure X being in group i, which equals to 0.1; expit(∙) denotes the inverse logistic link function where $ expit(x)=\frac{\mathit{\exp}(x)}{1+\mathit{\exp}(x)} $; and g(∙) the gradient boosting machine (GBM) model with logistic loss function for predicting the probability of the observed categorized exposure X_ij given the set of confounders C, weighted by n_ij, the number of person-years aggregated in the stratum. The use of GBM for estimating the probability of observed exposure has demonstrated a better predictive accuracy compared to the conventional logistic regression, as it captures nonlinearity and interactions of confounders and is unaffected by the potential autocorrelation [44, 45]. The confounder set C includes the other two concurrent exposures, calendar year, the individual-level variables (sex, race, 5-year age group, and Medicaid eligibility), and the area-level meteorological, socioeconomic, behavioral, and medical access variables as detailed in the previous section. The numerator, P(X ∈ i), is used to stabilize the variability of weights to avoid excessively upweighting or downweighting observations [40].

In the second stage of the analysis, for each exposure, we fitted a log linear regression relating the number of deaths and factored exposure category, weighted by the stabilized inverse probabilities estimated from the first stage. We used quasi-Poisson link function to account for overdispersion in the number of death, and included an offset of the number of person-years to account for the different population size at risk in each stratum. As a result, we obtained the marginal effect of each decile group on mortality. If the model for estimating the stabilized inverse probability weight is correctly specified, we have achieved an unbiased estimator of the causal effect for each group [41].

The results are expressed as the relative risks (RR) of mortality for higher decile groups against the lowest-decile group (reference). The number of early deaths avoided by lowering air pollutant concentration of a higher to lower decile group can be calculated as $ N{\alpha}_0\left(\frac{RR_{high}-1}{RR_{high}}-\frac{RR_{low}-1}{RR_{low}}\right) $, where N is the annual averaged number of person-years, α₀ is the baseline annual mortality rate, and RR_high and RR_low are the relative risks of mortality for the higher decile group and the lower decile group, respectively. More details are provided in Section 4 of Supplementary Information.

We assessed the robustness of the causal dose-response relations between the chronic air pollution exposures and mortality risk by conducting sensitivity analysis of splitting each exposure into 14 bins.

Results

The demographic characteristics of the national Medicare cohort during 2000–2016 were summarized in Table 1. The cohort included 74,537,533 Medicare beneficiaries with a total of 637,207,589 person-years of follow-up. The average follow-up time was 8.5 years. Among them 30,209,831 deaths occurred, accounting for 40.5% of the population. The cohort comprised more females (55.4%), mostly whites (84.0%), and mostly aged 65–74 years when entering the cohort (78.4%). Over 13 million beneficiaries ever enrolled in the Medicaid program, accounting for 18.5% of the population.

Table 1 Demographic characteristics of Medicare cohort, 2000–2016

Full size table

Maps of the contiguous US with annual PM_2.5, warm-season (April–September) O₃, and annual NO₂ concentrations at ZIP Codes of the Medicare beneficiaries’ residence in 2016 are presented in Fig. 1. The PM_2.5 concentration was higher in most central and eastern states and the Central Valley of California, and was lower in the northeast US and mountainous region. The warm-season O₃ concentration was highest in the mountainous region and California. The NO₂ concentration was higher in populous cities and along major highways. Over the years 2000–2016, the annual PM_2.5 concentration at ZIP Codes averaged at 9.85 μg·m^− 3, the warm-season O₃ averaged at 39.34 ppb, and the annual NO₂ averaged at 17.30 ppb (Table 2).

Table 2 Summary statistics for annual PM_2.5, warm-season O₃, and annual NO₂ concentrations, 2000–2016

Full size table

Figure 2 presents the estimated causal D-R relations between chronic exposures to PM_2.5, O₃, NO₂ and the RR of mortality. The exposure concentration corresponding to each effect estimate represents the average concentration within the decile group. The dose-response relationship between chronic exposure to PM_2.5 and mortality was monotonic and approximately linear, with higher concentration levels associated with greater risk of mortality. Specifically, the RRs of mortality associated with chronic exposure to PM_2.5 ranged from 1.022 [95% confidence interval (CI), 1.018–1.025] at 6.60 μg·m^− 3 (the 2nd decile group) to 1.207 (95% CI, 1.203–1.210) at 15.47 μg·m^− 3 (the 10th decile group). For O₃, the risk of mortality monotonically increased from the 2nd (RR, 1.050; 95% CI, 1.047–1.053) to the 9th decile group (RR, 1.107; 95% CI, 1.104–1.110), and dropped at the highest decile group (RR, 1.044; 95% CI, 1.041–1.048). For NO₂, the dose-response curve wiggled at low levels and started rising from the 6th decile group (RR, 1.005; 95% CI, 1.002–1.018) till the highest decile group (RR, 1.024; 95% CI, 1.021–1.027). Importantly, the risk of mortality associated with chronic PM_2.5 exposure was substantially larger than those with O₃ and NO₂; the highest RR for PM_2.5 was greater than those for O₃ and NO₂. The entire dose-response relationship for NO₂ occurred at concentrations below the national standard of 53 ppb, and most of the PM_2.5 relationship was also below the standard of 12 μg·m^− 3 [19]. There is no long-term standard for O₃. All the numerical results are provided in Section 1 of Supplementary Information.

The dose-response relations remained robust after splitting each exposure into 14 bins, with details provided in Section 2 and Section 3 of Supplementary Information.

Discussion

We proposed a decile binning approach to simultaneously emulate the D-R relations between chronic exposures to major air pollutants and mortality in a general and susceptible older population. Assuming that the IPW models were correctly specified and the counterfactual framework was valid, the D-R curves revealed that in general, higher levels of PM_2.5, O₃, and NO₂ were causally associated with a greater risk of mortality. Compared with previous associational D-R curves [3, 5,6,7], the causal D-R curves essentially infer the number of potential lives saved if air pollution concentrations were reduced to targeting levels. For example, lowering each air pollutant concentration from the 70th to 60th percentiles would prevent 65,935 early deaths among elders per year (Section 4 of Supplementary Information), and this is a substantial public health benefit.

A major advance of the present study is that we simultaneously evaluated PM_2.5, O₃, and NO₂, which allowed us to mutually adjust their confounding and also to directly compare their health impacts. We found that PM_2.5 had a substantially larger effect on mortality than O₃ and NO₂. The finding confirmed previously published results suggesting that PM_2.5 is the most deadly air pollutant and that chronic exposure to PM_2.5 is of greater public health concern [18]. The increasing patterns of the D-R relations for PM_2.5 and NO₂ at levels below the current NAAQS suggest the necessity of more stringent national air quality standards for the protection of public health. Currently the NAAQS lack regulation for long-term O₃, and clearly the daily standard has not reduced the warm-season average to concentrations with no mortality association [46]. Our results support the argument for establishing a warm-season O₃ standard. The lower risk of mortality for the highest decile group for O₃ may suggest that the O₃ effect was represented by traffic exhausts such as nitrogen oxides and volatile organic compounds, as they play important roles in O₃ actions and are highly reactive at extreme levels. However, further investigations are needed to address this question.

The causal conclusions of this study depend on the key assumption of correct IPW model specification. The validity of this assumption is not testable and relies on outside information. To minimize confounding bias, we adjusted for any known possible confounders such as concurrent air pollutants, Medicaid eligibility (proxy for individual’s low socioeconomic status), and seasonal temperatures and humidity (important physical stressors and determinants of air pollution [30]), etc. We also adjusted for area-level confounders of socioeconomic status, ethnicity, smoking status, obesity, population density, access to medical care, and calendar year (to capture unmeasured confounders that had a temporal scale of variation). In predicting the inverse probability of being assigned to the observed exposure decile given the set of confounders, GBM adaptively captured any nonlinearity and interactions and was unaffected by the potential autocorrelation [44]. Although residual confounding can never be ruled out, the consistent dose-response relationships for PM_2.5 and O₃ obtained across different study designs and populations provide some reassurance that our causal estimates are not substantially biased [3, 5,6,7].

As we have noted, the proposed decile binning approach relaxed assumptions on data distribution and homoscedasticity when constructing inverse probability weights for continuous exposures. Ambient air pollution concentrations usually follow a heteroscedastic distribution possibly with long tails, which results in excessively upweighted observations. To fix this issue, Naimi et al. proposed a quantile binning approach where he estimated weights for binned exposure and then treated those bins as continuous and linear, and found it outperformed other IPW estimators with various parametric forms of the exposure distribution [42]. Adopting his idea, our approach further relaxed the linear assumption by categorizing bins and comparing the effect of each bin to the reference group. If the assumption of correct IPW model specification holds, the estimated effect of each bin is an asymptotically unbiased estimator of the true causal effect [41]. Further, the estimand of our interest, the marginal effect estimates, do not depend upon the distributions of confounders and have arguably greater public health relevance because many confounders might not be measurable at decision time. The marginal effect estimates are also more useful when depicting dose-response relationship for the purpose of understanding the total effect [20].

Assigning ambient air pollution concentrations at ZIP Codes as a marker of individual exposure levels may result in measurement error. Although measuring more personal exposures can overcome the limitation, it also introduces confounding that are difficult to control such as personal behaviors, which may affect personal exposure measurement directly but not affect ambient air pollution estimation. In addition, personal exposure measurements can be compromised by the study outcome and thus is also more vulnerable to reverse causation [24]. For example, patients who die from chronic obstructive pulmonary disease (COPD) generally spent less time outdoors [47]; because ambient air pollutants are filtered by the building envelop and deposit on indoor surfaces, there are lower concentrations of those ambient pollutants indoors [48]. Hence, those patients have lower levels of personal exposures. By contrast, under the null assumption, COPD mortality is not associated with ambient concentration predictions, which are more proxy exposure measurement than the personal measurement. In epidemiology studies, ideally the measure of exposure should be as accurate as possible. In practice, however, this is usually not possible and the issue is to choose an appropriate exposure metric that balances the biological relevance, interpretability, and implications for public health policy. While using a proxy measure for air pollution exposure increases measurement error, it also brings important advantages for causal inference.

Some limitations must be acknowledged. First, we were not able to examine on cause-specific mortality which is not available for the Medicare data. Further studies investigating which major specific causes are driving the death would provide a valuable addition. Second, spatial confounding inherent to proximity-based air pollution measurements could still be present given that ZIP Code was the finest geographical unit we could use to link air pollution levels with each beneficiary. Third, restricted by available data sources, we could not adjust for individual behavior and medical history because such information was not available for the Medicare enrollment data, which may contribute to residual confounding. Fourth, although air pollution levels were estimated from models with excellent out-of-sample prediction ability, they are not perfect and therefore may attenuate effect estimates [24].

Conclusions

In summary, this study simultaneously emulated D-R curves between chronic exposures to PM_2.5, O₃, NO₂ and all-cause mortality among the national Medicare cohort during 2000–2016. We proposed a decile binning approach to relax the contentious assumptions of conventional IPW estimators, which yielded more robust causal evidence on adverse effects of air pollution exposure on mortality. Assuming that the IPW models were correctly specified, the estimated D-R curves reveal that in general, higher levels of chronic PM_2.5, O₃, and NO₂ exposures were causally associated with a greater risk of mortality, even at levels below the national standards. Among the three pollutants, PM_2.5 posed the greatest public health concern. The estimated D-R relations provide particularly significant implications for US EPA reviewing NAAQS, as the causal D-R curves essentially infer the number of potential lives saved if air pollution concentrations were reduced to specific levels. For example, lowering the air pollutant concentration from the 70th to 60th percentiles would prevent 65,935 early deaths among elders per year.

Availability of data and materials

The exposure data are available from the corresponding author on reasonable request. The Medicare data are available upon request to the Centers for Medicare and Medicaid Services. The other data are publicly available, with sources described in the manuscript.

Abbreviations

PM_2.5 :: Ambient fine particulate matter
O₃ :: Ozone
NO₂ :: Nitrogen dioxide
US EPA:: United States Environmental Protection Agency
NAAQS:: National Ambient Air Quality Standards
D-R:: Dose-response
IPW:: Inverse probability weighting
ZCTA:: ZIP Code Tabulation Areas
GBM:: Gradient boosting machine
ppb:: Parts per billion
COPD:: Chronic obstructive pulmonary disease

References

Schraufnagel DE, Balmes JR, Cowl CT, De Matteis S, Jung SH, Mortimer K, et al. Air pollution and noncommunicable diseases: a review by the forum of international respiratory Societies’ environmental committee, part 2: air pollution and organ systems. Chest. 2019;155(2):417–26. https://doi.org/10.1016/j.chest.2018.10.041.
Article Google Scholar
Wei Y, Wang Y, Di Q, Choirat C, Wang Y, Koutrakis P, et al. Short term exposure to fine particulate matter and hospital admission risks and costs in the Medicare population: time stratified, case crossover study. BMJ. 2019;367:l6258.
Article Google Scholar
Dockery DW, Pope CA, Xu X, Spengler JD, Ware JH, Fay ME, et al. An association between air pollution and mortality in six U.S. cities. N Engl J Med. 1993;329(24):1753–9. https://doi.org/10.1056/NEJM199312093292401.
Article CAS Google Scholar
Brook RD, Rajagopalan S, Pope CA 3rd, Brook JR, Bhatnagar A, Diez-Roux AV, et al. Particulate matter air pollution and cardiovascular disease: an update to the scientific statement from the American Heart Association. Circulation. 2010;121(21):2331–78. https://doi.org/10.1161/CIR.0b013e3181dbece1.
Article CAS Google Scholar
Jerrett M, Burnett RT, Pope CA 3rd, Ito K, Thurston G, Krewski D, et al. Long-term ozone exposure and mortality. N Engl J Med. 2009;360(11):1085–95. https://doi.org/10.1056/NEJMoa0803894.
Article CAS Google Scholar
Di Q, Dominici F, Schwartz JD. Air pollution and mortality in the Medicare population. N Engl J Med. 2017;377(15):1498–9. https://doi.org/10.1056/NEJMc1709849.
Article Google Scholar
Burnett R, Chen H, Szyszkowicz M, Fann N, Hubbell B, Pope CA 3rd, et al. Global estimates of mortality associated with long-term exposure to outdoor fine particulate matter. Proc Natl Acad Sci U S A. 2018;115(38):9592–7. https://doi.org/10.1073/pnas.1803222115.
Article CAS Google Scholar
Bowe B, Xie Y, Yan Y, Al-Aly Z. Burden of Cause-Specific Mortality Associated With PM2.5 Air Pollution in the United States. JAMA Netw Open. 2019;2(11):e1915834.
Article Google Scholar
Turner MC, Jerrett M, Pope CA 3rd, Krewski D, Gapstur SM, Diver WR, et al. Long-term ozone exposure and mortality in a large prospective study. Am J Respir Crit Care Med. 2016;193(10):1134–42. https://doi.org/10.1164/rccm.201508-1633OC.
Article CAS Google Scholar
Faustini A, Rapp R, Forastiere F. Nitrogen dioxide and mortality: review and meta-analysis of long-term studies. Eur Respir J. 2014;44(3):744–53. https://doi.org/10.1183/09031936.00114713.
Article CAS Google Scholar
Eum KD, Kazemiparkouhi F, Wang B, Manjourides J, Pun V, Pavlu V, et al. Long-term NO2 exposures and cause-specific mortality in American older adults. Environ Int. 2019;124:10–5. https://doi.org/10.1016/j.envint.2018.12.060.
Article CAS Google Scholar
Bennett JE, Tamura-Wicks H, Parks RM, Burnett RT, Pope CA 3rd, Bechle MJ, et al. Particulate matter air pollution and national and county life expectancy loss in the USA: a spatiotemporal analysis. PLoS Med. 2019;16(7):e1002856. https://doi.org/10.1371/journal.pmed.1002856.
Article CAS Google Scholar
Lilienfeld DE. Definitions of epidemiology. Am J Epidemiol. 1978;107(2):87–90. https://doi.org/10.1093/oxfordjournals.aje.a112521.
Article CAS Google Scholar
Owens EO, Patel MM, Kirrane E, Long TC, Brown J, Cote I, et al. Framework for assessing causality of air pollution-related health effects for reviews of the National Ambient air Quality Standards. Regul Toxicol Pharmacol. 2017;88:332–7. https://doi.org/10.1016/j.yrtph.2017.05.014.
Article Google Scholar
Hill AB. The environment and disease: association or causation? 1965. J R Soc Med. 2015;108(1):32–7. https://doi.org/10.1177/0141076814562718.
Article Google Scholar
Wang Y, Kloog I, Coull BA, Kosheleva A, Zanobetti A, Schwartz JD. Estimating causal effects of Long-term PM2.5 exposure on mortality in New Jersey. Environ Health Perspect. 2016;124(8):1182–8. https://doi.org/10.1289/ehp.1409671.
Article Google Scholar
Schwartz J, Bind MA, Koutrakis P. Estimating causal effects of local air pollution on daily deaths: effect of low levels. Environ Health Perspect. 2017;125(1):23–9. https://doi.org/10.1289/EHP232.
Article CAS Google Scholar
Wei Y, Wang Y, Wu X, Di Q, Shi L, Koutrakis P, et al. Causal effects of air pollution on mortality in Massachusetts. Am J Epidemiol. 2020;189(11):1316–23. https://doi.org/10.1093/aje/kwaa098.
Article Google Scholar
U.S. E. 40 CFR Part 50. National ambient air quality standards for particulate matter: Final rule. Fed Regist. 1997;62(138):38652–460.
Google Scholar
Cox LA. Do causal concentration-response functions exist? A critical review of associational and causal relations between fine particulate matter and mortality. Crit Rev Toxicol. 2017;47(7):609–37. https://doi.org/10.1080/10408444.2017.1311838.
Article CAS Google Scholar
Forastiere L, Carugno M, Baccini M. Assessing short-term impact of PM10 on mortality using a semiparametric generalized propensity score approach. Environ Health. 2020;19(1):46. https://doi.org/10.1186/s12940-020-00599-6.
WHO. WHO Expert Consultation: Available Evidence for the Future Update of the WHO Global Air Quality Guidelines (AQGs). Geneva: WHO; 2016.
Howden L, Meyer J. Age and sex composition: 2010 Census briefs: U.S. CENSUS BUREAU; 2011.
Weisskopf MG, Webster TF. Trade-offs of Personal Versus More Proxy Exposure Measures in Environmental Epidemiology. Epidemiology (Cambridge, Mass). 2017;28(5):635–43.
Article Google Scholar
Di Q, Amini H, Shi L, Kloog I, Silvern R, Kelly J, et al. An ensemble-based model of PM2.5 concentration across the contiguous United States with high spatiotemporal resolution. Environ Int. 2019;130:104909.
Article CAS Google Scholar
Requia WJ, Di Q, Silvern R, Kelly JT, Koutrakis P, Mickley LJ, Sulprizio MP, Amini H, Shi L, Schwartz J. An Ensemble Learning Approach for Estimating High Spatiotemporal Resolution of Ground-Level Ozone in the Contiguous United States. Environ Sci Technol. 2020;54(18):11037–47. https://doi.org/10.1021/acs.est.0c01791.
Di Q, Amini H, Shi L, Kloog I, Silvern RF, Kelly JT, et al. Assessing NO2 concentration and model uncertainty with high spatiotemporal resolution across the contiguous United States using ensemble model averaging. Environ Sci Technol. 2019;54(3):1372–84.
Article Google Scholar
Institute ESR. Esri Data & Maps 10. Redlands: An Esri White Paper; 2010.
Google Scholar
Mitchell KE, et al. The multi-institution North American Land Data Assimilation System (NLDAS): Utilizing multiple GCIP products and partners in a continental distributed hydrological modeling system. J Geophys Res. 2004;109:D07S90. https://doi.org/10.1029/2003JD003823.
Barreca AI. Climate change, humidity, and mortality in the United States. J Environ Econ Manag. 2012;63(1):19–34. https://doi.org/10.1016/j.jeem.2011.07.004.
Article Google Scholar
Shi L, Kloog I, Zanobetti A, Liu P, Schwartz JD. Impacts of temperature and its variability on mortality in New England. Nat Clim Chang. 2015;5(11):988–91. https://doi.org/10.1038/nclimate2704.
Article Google Scholar
Council NR. Using the American community survey: benefits and challenges. Washington, DC: The National Academies Press; 2007.
Google Scholar
CDC. Behavioral Risk Factor Surveillance System Survey Questionnaire. Atlanta: U.S. Department of Health and Human Services, Centers for Disease Control and Prevention; 2004.
Google Scholar
Cronenwett JL, Birkmeyer JD. The Dartmouth atlas of vascular health care. Cardiovasc Surg. 2000;8(6):409–10.
Article CAS Google Scholar
Robins JM, Hernan MA, Brumback B. Marginal structural models and causal inference in epidemiology. Epidemiology (Cambridge, Mass). 2000;11(5):550–60.
Article CAS Google Scholar
Rosenbaum PR, Rubin DB. The central role of the propensity score in observational studies for causal effects. Biometrika. 1983;70(1):41–55. https://doi.org/10.1093/biomet/70.1.41.
Article Google Scholar
Cole SR, Hernan MA. Constructing inverse probability weights for marginal structural models. Am J Epidemiol. 2008;168(6):656–64. https://doi.org/10.1093/aje/kwn164.
Article Google Scholar
Hernan MA, Brumback B, Robins JM. Marginal structural models to estimate the causal effect of zidovudine on the survival of HIV-positive men. Epidemiology (Cambridge, Mass). 2000;11(5):561–70.
Article CAS Google Scholar
Hirano K, Imbens GW. The Propensity Score with Continuous Treatments. In: Gelman A, Meng XL, editors. Applied Bayesian Modeling and Causal Inference from Incomplete-Data. Hoboken: John Wiley & Sons, Ltd; 2004. p. 73–84.
Google Scholar
Robins J. Marginal structural models versus structural nested models as tools for causal inference. Stat Models Epidemiol Environment Clin Trials. 2000:95–133. https://doi.org/10.1007/978-1-4612-1284-3_2.
Hernan MA, Robins JM. Estimating causal effects from epidemiological data. J Epidemiol Community Health. 2006;60(7):578–86. https://doi.org/10.1136/jech.2004.029496.
Article Google Scholar
Naimi AI, Moodie EE, Auger N, Kaufman JS. Constructing inverse probability weights for continuous exposures: a comparison of methods. Epidemiology (Cambridge, Mass). 2014;25(2):292–9.
Article Google Scholar
Rubin DB. For objective causal inference, design trumps analysis. Ann Appl Stat. 2008;2(3):808–40.
Article Google Scholar
McCaffrey DF, Ridgeway G, Morral AR. Propensity score estimation with boosted regression for evaluating causal effects in observational studies. Psychol Methods. 2004;9(4):403–25. https://doi.org/10.1037/1082-989X.9.4.403.
Article Google Scholar
Friedman J. Greedy function approximation: a gradient boosting machine. Ann Stat. 2011:1189–232.
Di Q, Dai L, Wang Y, Zanobetti A, Choirat C, Schwartz JD, et al. Association of Short-term Exposure to air pollution with mortality in older adults. JAMA. 2017;318(24):2446–56. https://doi.org/10.1001/jama.2017.17923.
Article CAS Google Scholar
Donaldson GC, Wilkinson TM, Hurst JR, Perera WR, Wedzicha JA. Exacerbations and time spent outdoors in chronic obstructive pulmonary disease. Am J Respir Crit Care Med. 2005;171(5):446–52. https://doi.org/10.1164/rccm.200408-1054OC.
Article Google Scholar
DYC L. Outdoor-indoor air pollution in urban environment: challenges and opportunity. Front Environ Sci. 2015;2:69. https://doi.org/10.3389/fenvs.2014.00069.

Download references

Acknowledgements

Not applicable.

Funding

This publication was made possible by the United States Environmental Protection Agency (US EPA) grants RD-8358720 and RD-83587201-0. Its contents are solely the responsibility of the grantee and do not necessarily represent the official views of the US EPA. Further, the US EPA does not endorse the purchase of any commercial products or services mentioned in the publication. This publication was also made possible by National Institutes of Health (NIH) grants ES-000002, R01 ES024332–01, R01 MD012769, R01 ES028033, R21 ES024012, 1R01AG060232-01A1, 1R01ES030616, and 1R01AG066793-01R01, by Health Effects Institute (HEI) grant 4953-RFA14-3/16-4, by Alfred P. Sloan Foundation grant G-2020-13946, and by the Harvard University Climate Change Solutions Fund.

Author information

Authors and Affiliations

Department of Environmental Health, Harvard T. H. Chan School of Public Health, Landmark Center 4th West, 401 Park Drive, Boston, MA, 02215, USA
Yaguang Wei, Mahdieh Danesh Yazdi, Antonella Zanobetti & Joel Schwartz
Vanke School of Public Health, Tsinghua University, Beijing, China
Qian Di
School of Public Policy and Government, Fundação Getúlio Vargas, Brasília, Distrito Federal, Brazil
Weeberb J. Requia
Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA, USA
Francesca Dominici
Department of Epidemiology, Harvard T. H. Chan School of Public Health, Boston, MA, USA
Joel Schwartz

Authors

Yaguang Wei
View author publications
You can also search for this author in PubMed Google Scholar
Mahdieh Danesh Yazdi
View author publications
You can also search for this author in PubMed Google Scholar
Qian Di
View author publications
You can also search for this author in PubMed Google Scholar
Weeberb J. Requia
View author publications
You can also search for this author in PubMed Google Scholar
Francesca Dominici
View author publications
You can also search for this author in PubMed Google Scholar
Antonella Zanobetti
View author publications
You can also search for this author in PubMed Google Scholar
Joel Schwartz
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Y.W. and J.S. designed research and performed analysis; M.D.Y., Q.D., W.J.R., F.D., and A.Z. prepared data; and Y.W. and J.S. wrote the paper. All authors helped interpret the results and provided comments. The author(s) read and approved the final manuscript.

Corresponding author

Correspondence to Yaguang Wei.

Ethics declarations

Ethics approval and consent to participate

This study was approved by the institutional review board at the Harvard T.H. Chan School of Public Health and was exempt from informed consent requirements as a study of previously collected administrative data.

Consent for publication

Not applicable.

Competing interests

Dr. Joel Schwartz serves as an expert witness for the United States Department of Justice in a case involving a Clean Air Act violation.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article

Wei, Y., Yazdi, M.D., Di, Q. et al. Emulating causal dose-response relations between air pollutants and mortality in the Medicare population. Environ Health 20, 53 (2021). https://doi.org/10.1186/s12940-021-00742-x

Download citation

Received: 28 December 2020
Accepted: 30 April 2021
Published: 06 May 2021
DOI: https://doi.org/10.1186/s12940-021-00742-x

Emulating causal dose-response relations between air pollutants and mortality in the Medicare population