Study population
In Taiwan, over 99.5% of the people have enrolled in the National Health Insurance (NHI) program [19]. The computerized claims data of the NHI program comprise of medical records of all 23 million beneficiaries, and are managed and stored in the Health and Welfare Data Science Center, Ministry of Health and Welfare of Taiwan. The database consists of files of inpatient and outpatient cares, prescription drugs, and the Registry for Beneficiaries, between which linkages are conducted through encrypted personal identifications. Together, these files provided individual-level information on disease diagnoses, surgical procedures, medications, and demographics.
Using the claims data of NHI, we conducted a retrospective cohort study, in which a cohort of 4.5 million people was randomly selected from all NHI beneficiaries in 2011. For each cohort member, the index date was defined as the first date of enrollment for this study; if the first date was before January 1, 2011, we set the index date as January 1, 2011. Cohort members who were aged <20 years at the index date (n = 989,853), previously diagnosed with stroke (n = 314,331), or had invalid or missing data on survival status (n = 40) were excluded. Study subjects were considered as having a history of stroke if they had an outpatient or inpatient claim with a diagnosis of stroke before index date (International Classification of Diseases, 9th Revision, Clinical Modification [ICD-9-CM] codes 430.xx-438.xx). We also excluded subjects residing in offshore islands because the income data were unavailable (n = 5723), and those residing in administrative subdivisions without air quality monitoring station (n = 1,827,769). A total of 1,362,284 adults were included in our analysis.
In Taiwan, there are 22 administrative divisions in terms of special municipality, county and provincial city directed by the central government. The 22 divisions are further divided into 368 administrative subdivisions for which the official terms were city, township, or district. The study population, monitoring stations and the administrative subdivisions that were included and excluded from our data analysis are shown in Fig. S1 in the Supplementary Material in the Additional file 1. In Taiwan, nearly 80% of the total 76 air quality monitoring stations are set up in cities and counties with high population density (general stations). The remaining stations are installed in the metropolitan areas with high traffic volume (traffic stations), industrial areas (industrial stations), the national parks (national park stations), and the areas with less pollution (background stations and other stations for specific purpose) [20]. The study subjects included in our analysis were residents of areas with air quality monitoring stations. Therefore, they were more likely to live in more urbanized areas with higher income (Table SI). However, distributions of age, sex, and comorbidities were similar between residents included and excluded in this study (Table SI). The Research Ethics Committee at China Medical University and Hospital approved the study protocol (CRREC-107-021(CR-3)).
Assessment of long-term PM2.5 exposure
The Environmental Protection Administration of Taiwan has established air quality monitoring stations since 1990s throughout the whole island to measure hourly air pollutants, including sulfur dioxide (SO2, ppb), particulate matter with an aerodynamic diameter ≤ 10 μm (PM10, μg/m3), carbon monoxide (CO, ppm), nitrogen oxides (NOx, ppb), ozone (O3, ppb), and nitrogen dioxide (NO2, ppb). As of the end of 2005, there were 76 monitoring stations, and these stations have started monitoring hourly measures of PM2.5 since 2005 [20]. Previous studies have shown that the stroke risk associated with PM2.5 exposure in recent 1- to 5-year was stronger than the exposure in earlier period (i.e., 5 years before baseline) [21, 22]. Therefore, we used PM2.5 measured from 2010 to 2015 at the administrative subdivisions (i.e., city, township, or district) as the proxy for individual exposure because the address information in the NHI claims data is limited to the subdivision level for the protection of patient privacy. We calculated annual average PM2.5 concentrations using the hourly data by administrative subdivision with at least one monitoring station, and assigned the PM2.5 levels to study subjects by the administrative subdivision they lived in. We used only the measurements collected from stations available in the respective administrative subdivision. In the area with more than one monitoring stations, we used the mean level of all stations. Measures of PM2.5 were available for all study subjects in 2010, and the percentage of people with missing values ranged from 0.6 to 9.2% from 2011–2015 (Table S II). Subjects with missing values were excluded from the regression analysis in which PM2.5 was considered time-varying exposures. We also inspected missing values among monitoring stations. Of all stations from 2010 to 2015, only 1.0% station-years (n = 4) had missing values for more than 20% of a year (i.e., missing data on daily exposure measurement for more than 73 of the 365 days), and 1.6% station-years (n = 7) had missing values for more than 10% of a year (i.e., 36 of the 365 days).
The potential confounders and comorbidity
We identified the potential confounders based on literature review. Data on demographics and socioeconomic status that have been associated with the stroke risk [23], available in the claims data for this study included age, sex, urbanization level of the residential area, and per capita disposable income. The income level was estimated using per Capita Disposable Income reported by Directorate General of Budget, Accounting and Statistics, Executive Yuan, Taiwan, on the yearly basis [24]. We used the classification scheme developed by Liu et al. to determine the urbanization level of the administrative subdivision where a study subject registered as residence [25]. In brief, all cities, townships, and districts in Taiwan were classified into 7 urbanization levels according to a score computed based on population density (people/km2), proportion of people with a college or higher level of education, proportion of elderly and agricultural population, and the number of physicians per 100,000 people. Level 1 is the most urbanized. Levels 4 to 7 were combined because of the small number of subjects in these categories.
In addition to PM2.5, data of NO2, NOx, and SO2 were also used in this study, as several studies have associated these air pollutants with increased risks of stroke and cardiovascular disease [26,27,28,29,30,31]. Long-term exposure to ambient temperature also has been linked to increased hospitalization for ischemic stroke and cardiovascular risk [18, 32]. Therefore, we considered meteorological conditions as potential risk factors associated with stroke, focusing on temperature. Data on all air pollutants, temperature and relative humidity were obtained from the same air quality monitoring stations where the PM2.5 concentrations were collected (Fig. S1). For all the air pollutants, data from all 74 monitoring stations were used, excluding 2 stations in offshore islands (Fig. S1). Temperature and relative humidity were unavailable at 2 stations in 2010, resulting missing values for 38,583 subjects (or 2.8% of all study subjects) (Table S II). Subjects included in our main exposure (PM2.5) analysis and those with temperature data were similar in distributions of socio-demographic characteristics and comorbidities at baseline (number of subjects, 1,362,284 and 1,323,701, respectively, Table S III). The data processing procedure for all environmental variables including temperature was similar to that for PM2.5. Annual mean levels of these environmental variables measured during 2010–2015 were assigned to the study subjects based on their registered residence, by the city/township /district level.
From the claims data, we identified several comorbidities known as risk factors of stroke, including diabetes mellitus (ICD-9-CM code 250), hypertension (ICD-9-CM codes 401–405), hyperlipidemia (ICD-9-CM code 272), coronary artery disease (ICD-9-CM codes 410–414), chronic obstructive pulmonary disease (ICD-9-CM codes 491, 492, 496) and atrial fibrillation (ICD-9-CM code 427.3). ICD-10 version was not applicable in this study because in Taiwan, ICD-9 version had been adapted until the end of 2015. A comorbid condition was defined by at least one inpatient or two outpatient claims for the disease diagnosed within two years before the index date. Evidence has shown that the pathophysiological mechanisms linking long-term PM2.5 exposure and cardiovascular health involved inflammation, atherosclerosis, and changes in vascular functions [1]. Therefore, the comorbidities were not adjusted for in the regression analysis, as they might more likely be the mediators rather than confounders. However, we performed stratified analyses by these comorbidities to observe whether the associations were homogenous in the patient subgroups.
Health outcomes
Study subjects were followed up to observe the occurrence of stroke, which was considered to be present if study subjects had a hospital admission with a primary discharge diagnosis of stroke (ICD-9-CM code 431, 432, 433, 434, 436). Follow-up person-years were calculated for each subject from the index date until the diagnosis of stroke, or until censored because of withdrawal from the insurance coverage or death, or December 31, 2016. We assessed survival status and date of death from the Registry for Cause of Death, and classified cases of hemorrhagic stroke (ICD-9-CM codes 431–432) and ischemic stroke (ICD-9-CM codes 433, 434, 436)。.
Statistical analysis
Baseline characteristics of study subjects were described using mean (standard deviations, SD) for continuous variables and the number of subjects (percentages) for categorical variables. We presented distributions of annual mean levels of temperature, humidity and air pollutants from 2010 to 2015, and listed the missing values, mean, median, quartile 1, quartile 3, and minimum and maximum values. Cox proportional hazards models were applied to evaluate the association between long-term PM2.5 exposure and risks of all strokes, ischemic stroke, and hemorrhagic stroke. We performed two sets of analyses, using different time windows of exposures: mean PM2.5 level in 2010, and yearly mean levels of PM2.5 from 2010 to 2015 treated as a time-varying exposure in the Cox models. To take into account residence changes in the time-varying Cox proportional hazard models, we updated the registered residence locations yearly from 2010 to the end of follow-up. The yearly mean level of PM2.5 was assigned according to the updated location of residence for each individual. In each set of analyses, hazard ratios (HRs) and 95% confidence intervals (CIs) were reported per interquartile range (IQR) increase in PM2.5 (single-exposure model). We performed two models with increasing level for adjustment. Model 1 was adjusted for age (in continuous scale, years) and sex (men, women). In Model 2, our main model, we additionally adjusted for income level (<=272,470, 272,471-273,351, 273,351-311,566, >311,566, New Taiwan dollars per year) and the urbanization level (levels 1, 2, 3, 4–7). To explore the effect of other environmental exposures, the single-exposure model was repeated by substituting PM2.5 with NO2, NOx, SO2, O3, CO, temperature or relative humidity. To assess if the association between PM2.5 and risk of stroke changes after controlling for other environmental exposures and vice versa, we performed two-exposure models to assess environmental variables significantly associated with developing stroke in the single-exposure models. In the two-exposure models, we included NO2 but not NOx, as evidence has shown a consistent association between NO2 and risk of stroke [2, 31]. In all the single- and two-exposure models, the environmental exposure variables were entered as linear terms (per IQR increase). The proportional hazards assumption was examined by including product terms between the environmental variables and a function of follow-up time, and the results showed no violation of the assumption. To assess correlations between environmental exposures in two-exposure models, we calculated Spearman correlation coefficients yearly from 2010 to 2015.
We evaluated the exposure-response relationship between PM2.5 levels and incidence of all stroke, ischemic stroke and hemorrhagic stroke using restricted cubic splines with three knots located at the 10th, 50th, and 90th percentiles of the distributions of PM2.5 (R package rms and survival) in Cox regression analysis [33, 34]. The first model, our main model, was adjusted for all socio-demographic variables including age, sex, income level and urbanization level. We then performed 5 models based on the main model, additionally adjusting for NO2, SO2, O3, CO, or temperature that were significantly associated with stroke in single-exposure models, to observe the shape of the association between PM2.5 and stroke considering other environmental exposures. In addition, we performed a sensitivity analysis, in which natural cubic splines with two degrees of freedom were fitted to evaluate the consistency of shapes of dose-response relationships.
Stratified analyses were performed to assess the potential effect modification by comorbidities. The interaction terms were tested using likelihood ratio test. We performed two sensitivity analyses to assess the impact of misclassification of residence because a portion of study subjects may not reside at their registered residence. First, we used the estimated current residence instead of registered residence in the Cox regression analysis. The current residence was estimated using the algorithm developed by Ku et al., based on sequentially considering several locations such as locations of outpatient clinic visit for low respiratory tract infections, locations of primary physician clinics visit, and geographical proximity between registered residence and the locations of access to primary healthcare [35]. Second, we performed data analysis excluding people whose location of registered residence was inconsistent with that of their NHI registration. In Taiwan, areas where people register in the NHI program are usually locations of their residences or employments. Data processing and statistical analyses were performed using SAS Version 9.4 (SAS Institute, Inc., Cary, NC, USA) and R version 4.1.0. A two-tailed p value <0.05 was considered statistically significant.