Study population
The cohort for this study was constructed using publicly-available National Health Interview Survey (NHIS) data from 1987 to 2014, linked with restricted-use geographic information and mortality follow-up through 2015. The sample was limited to NHIS respondents aged 18–84 at the time of survey for whom information was available regarding age, sex, race-ethnicity, income, education, marital status, smoking status, BMI, census tract, ambient air pollution, survey date, mortality status at the end of 2015, and date of death (if deceased at the end of 2015).
The NHIS is a household survey administered annually by the National Center for Health Statistics (NCHS) and designed to be representative of the civilian noninstitutionalized U.S. population [28]. Survey data were linked with the National Death Index for mortality follow-up through 2015 [29]. The construction of this cohort has been described in a previous study [25], where it was referred to as a “subcohort” of a larger NHIS cohort. This cohort, rather than the larger “full cohort” of the prior study, was chosen for the present analysis because it included information for smoking status and BMI. The NHIS design was altered periodically over the sample period, so some variables required harmonization. Data linkage was performed with permission and assistance from the NCHS. Further details on construction, harmonization, and data linkage for the NHIS cohort are documented elsewhere [25].
Air pollution data
Air pollution exposures were assigned to individuals based on their census tract of residence at the time of survey, using year-2000 Census tracts for individuals surveyed from 1987 to 2010 and year-2010 Census tracts for individuals surveyed from 2011 to 2014. Annual-average estimates of ambient air pollution were calculated for criteria pollutants (PM2.5, PM10, SO2, NO2, O3, and CO) using estimates from the v1 empirical models of Kim et al., 2018 [30], available at www.caces.us. These models employed regulatory monitoring and land-use data, and pollution estimates were calculated starting with the first year for which nationwide monitoring data were available for that pollutant (1979 for SO2, NO2, and O3; 1988 for PM10; 1990 for CO; and 1999 for PM2.5). In the case of O3, annual values are the mean for May through September of the daily maximum eight-hour moving average. O3 monitoring is not widely and routinely conducted from October through April since these months typically experience very low O3 concentrations. Estimates for each pollutant-year through 2015 were generated at the census-block level using year-2010 Census block centroids. Tract-level estimates for year-2000 Census tracts and year-2010 Census tracts were estimated by mapping year-2010 Census blocks to census tracts and then calculating a population-weighted average of the census blocks within a census tract. PM2.5 exposures prior to 1999 were estimated by multiplying a census tract’s PM10 value with the census tract’s mean PM2.5:PM10 ratio from 1999 to 2003, as explained elsewhere [25]. Values for PM2.5–10 were calculated by subtracting PM2.5 from PM10.
In addition, spatially-decomposed PM2.5 data were generated following an approach described elsewhere [26]. Briefly, a census block’s total ambient PM2.5 was decomposed into four components, depending on the spatial variance in PM2.5 surrounding the census block. Estimating spatial decompositions involved finding and subtracting the minimum PM2.5 values within circular buffers around each census block. First, the minimum PM2.5 for census block centroids within a 100 km radius of a given census block centroid was found, and this minimum was designated as regional (> 100 km) PM2.5. After subtracting regional PM2.5, the minimum PM2.5 within a 10 km radius of the census block centroid was found, and this value was designated as mid-range (10–100 km) PM2.5. Next, the minimum value within 1 km of the block centroid was similarly used to calculate neighborhood (1–10 km) PM2.5 by subtracting regional and mid-range PM2.5. Finally, the residual PM2.5 that remained after subtracting regional, mid-range, and neighborhood PM2.5 was called local (< 1 km) PM2.5. The process was repeated for each year-2010 Census block and for each year from 2000 through 2015. Values for census tracts were calculated using population-weighted averages of year-2010 Census blocks.
Statistical analyses
Statistical analyses were performed at the NCHS Research Data Center in Hyattsville, MD, using SAS (version 9.3; SAS Institute). Survival analyses were performed for all-cause and cardiopulmonary mortality, with cardiopulmonary mortality defined as mortality due to cardiovascular disease (ICD-10 codes: I00-I09, I11, I13, I20-I51), cerebrovascular disease (I60-I69), chronic lower respiratory disease (J40-J47), and influenza or pneumonia (J09-J18). Mortality hazard ratios (HRs) and 95% confidence intervals (CIs) were estimated using two versions of the Cox proportional hazards (PH) model. The first PH model, referred to as the basic PH model, controlled for age, sex, and race-ethnicity by allowing each combination of age (in one-year increments), sex, and race-ethnicity (Hispanic, non-Hispanic black, non-Hispanic white, other or unknown) its own baseline hazard function using the STRATA statement of the PHREG procedure in SAS. The second PH model, referred to as the complex PH model, controlled for age group (18–24 years and subsequent five-year age groups), sex, and race-ethnicity by including an indicator variable for each interaction of age group, sex, and race-ethnicity. The complex PH model was estimated using the SURVEYPHREG procedure in SAS, adjusting for the NHIS complex survey design, using reported survey stratum, primary sampling unit, and sample weight from mortality follow-up files [28].
Both PH models controlled for covariates by including indicator variables for each value of marital status (never married, married, separated, divorced, widowed), inflation-adjusted household income ($0–35,000; $35,000-50,000; $50,000-75,000; >$75,000), education (<high school graduate, high school graduate, some college, college graduate, >college graduate), smoking status (current, former, never), BMI (< 20, 20–25, 25–30, 30–35, > 35), U.S. Census region, urban versus rural designation, and survey year. Survival time was the number of days between survey and death. For all-cause mortality, censored survival time was the number of days between survey and mortality follow-up (31 Dec 2015). In models that considered cardiopulmonary mortality, censored survival time was the number of days between survey and mortality follow-up, or the number of days between survey and non-cardiopulmonary mortality. Pollution values were included as continuous variables in the regressions.
In models using criteria pollutants (PM2.5, PM2.5–10, SO2, NO2, O3, and CO), regressions included one, two, or six pollutants, and were estimated for both all-cause and cardiopulmonary mortality. One- and two-pollutant regression models used the basic PH model. For six-pollutant regression models, both the basic and complex PH models were employed to examine whether results were sensitive to adjusting for the NHIS complex survey design. Basic PH models were also used to estimate the associations between spatial decompositions of PM2.5 and risk of all-cause and cardiopulmonary mortality. Regressions were performed for each of the four decompositions individually and for models that included all four decompositions.
For temporally-decomposed analyses, the NHIS cohort was decomposed into 24 yearly cohorts (1992–2015), beginning in 1992 to allow up to a five-year lagged pollution-exposure window. An individual in the NHIS cohort was included in a particular year’s cohort if she was alive on 1 Jan and was surveyed by 31 Dec of that year. For example, the 1992 cohort included those surveyed before 1992 and alive on 1 Jan 1992. It also included those who were surveyed in 1992. For those who died in 1992, survival time was the number of days between 1 Jan 1992 and date of death (for individuals surveyed before 1992), or the number of days between survey date and date of death (for individuals surveyed in 1992). For those who did not die in 1992, censored survival time was the number of days between 1 Jan and 31 Dec (for individuals surveyed before 1992), or the number of days between survey date and 31 Dec (for individuals surveyed in 1992). Analogous cohorts were constructed for each year from 1993 to 2015. The construction of these yearly cohorts is illustrated in Additional file 1: Figure S1.
Complex PH regressions were performed for all-cause and cardiopulmonary mortality for each of the 24 temporally-decomposed cohorts. In each cohort, individuals were assigned a two-year (cohort year and previous year) and five-year (cohort year and four previous years) average of ambient PM2.5 using their census tract of residence at time of survey. In addition, age was adjusted to age in cohort year. Other covariates were not updated between cohorts. Meta-analytic fixed-effect estimates of the HR associated with a 10 μg/m3 increase in mean ambient PM2.5 were calculated for all-cause and cardiopulmonary mortality using estimates generated by the 24 yearly cohorts (Comprehensive Meta Analysis Ver. 3 Biostat Englewood, NJ).