Study population
Data were used from a cross-sectional questionnaire based study in school children in a small town in Chile, which has one open-pit gold and one open-pit copper mine. Children and their parents or legal guardians provided written informed consent. The study included school children from 1st to 6th grade attending the two major local elementary schools. These two schools were the only ones in the community offering complete elementary education, and approximately 84 % of children in grades 1st to 6th were registered in these schools. Parental questionnaires with instructions were mailed to children’s homes. Further details about the study design are reported in Ohlander et al. [23].
Data
The original study included 288 children (out of 418 children invited to participate, response rate: 69 %). Data on outcome variables and potential confounders were ascertained using a validated questionnaire in Spanish [24]. We groped the variables into three categories:
Child health outcomes
Asthma: a child was considered to have asthma, if the child was reported to ever have had asthma confirmed by a medical doctor or if the child had taken asthma medication during the last 12 months prior to the survey. Asthma medication was considered, because patients in Chile are often not aware of their asthma diagnosis (personal communication).
Rhinoconjunctivitis: positive report of one or more of the following nasal symptoms during the past 12 months prior to the survey: sneezing, itching, nasal congestion or rhinitis in conjunction with itchy, red, and watery eyes.
Characteristics of the participants
Sex (female vs. male), age (6–7 years vs. 8–9 years, 10–11 years and 12 years or more), parental history of atopic disease defined as reported history of asthma, rhinitis or eczema in the child’s biological mother or father (no vs. yes). Indicators of socioeconomic status in this setting were whether the mother (no vs. yes) or father (no vs. yes) of the participant were working, and whether the child was living with both parents (no vs. yes).
Other sources of air pollution
The following variables were considered as additional sources of exposure: current cigarette exposure at home (no vs. yes), type of heating used at home (other vs. coal and gas), pavement of the nearest street (no vs. yes) (“no” indicates the nearest street was mainly a dirt road), the child’s preferred play area (inside vs outside), and time spent at home (less than 3 h/day vs. 3–6 h/day, more than 6 h/day).
Exposure assessment
In the absence of ambient measurements of emission or of personal exposures, we used the proximity to the open pit mines as proxy of exposure [25, 26]. We used the residential proximity to the gold and the copper mines, and the average distance to both. During the fieldwork, the geographical coordinates of each place of residence was obtained using global positioning system (GPS). The locations of the main extraction places were geocoded using official information given by the companies. The Harvesine formula for distance was used to account for the curvature of the earth [27]. The lower quartile of distance to each mine separately and of the average distance was selected a-priori as cut-off point of the exposure categories.
Statistical analysis
To assess associations between mining exposure and health outcomes, we conducted a three-step analysis 1) Logistic regressions were fitted for the two outcomes using the dichotomized proximity variable; 2) Bayesian structured additive regressions were used to estimate a possible non-linear effect on the respiratory diseases related to the proximity to the mines; results of this part were used to calibrate the subsequent modeling step; 3) Bayesian models with a parametric distance function were used to estimate associations between distance to the mines and the outcomes, and to identify a minimum distance to the point sources beyond which the health impact is considered non statistically significant. To adjust for possible regional effects, which are not captured by distance to the mine or the individual’s characteristics, we included an additional spatial effect. All analyses where done using the statistical software R [28].
Logistic regression
We estimated associations between distance from the mines (i.e., gold mine, copper mine and average distance to both mines) and the respiratory diseases using logistic regression models adjusting for potential confounders. Distance was dichotomized at the first quartile, i.e., the quartile closest to the point sources was categorized as 1 for children living in the closest quartile, 0 otherwise. A variable was considered a potential confounder if it was associated with the outcome in the crude models. All subsequent analyses were adjusted for these variables. To assess the potential influence of item non-response, we performed sensitivity analyses employing multiple imputation (MI) under the assumption of missing at random. Seven data sets were generated with the MI procedure and combined to create pooled estimates using Rubin’s rules [29]. We then compared the pooled with the complete case estimates. The MI process and model estimation were done using the R libraries Amelia [30] and Zelig [31], respectively.
Bayesian structured additive regression
We investigated non-linear associations between air pollution exposure proxies (i.e., the distance variable) and respiratory diseases using Bayesian generalized structured additive regression (STAR) [32] from a Bayesian perspective. The model has the equation: logit(π
i
) = β
0 + f(d
ik
) + Xβ, where f(d
ik
) is an unknown smooth function of the distance d
ik
of child i to mine k. X comprises additional confounders with effects β. We used Bayesian P-splines f(d
ik
). We assumed independent diffuse priors for the estimated effects of the potential confounders β, the intercept β
0 and the P-splines [33].
The Markov chain Monte Carlo methods was implemented through the software BayesX [34] using 70,000 iterations including a burn-in phase of 15,000 iterations. Two chains were generated with different starting values. Finally, convergence was evaluated with trace plots and the convergence statistics provided by the software BayesX.
We used the results of the Bayesian Structured Additive Regression to calibrate the prior distributions for the Bayesian models with parametric distance functions. This approach is preferable when little or nothing is known about the shape of the relationship between the proximity to the point sources of pollution and the disease [35]. Then, STAR models could be seen as a calibration step for models with a parametric form for the distance.
Bayesian models with a parametric distance function
We fitted Bayesian models with a parametric distance function proposed by [25, 36, 37] and used the Bayesian perspective suggested by [26], and extended these models for multiple diseases with one point sources and a spatial effect. These models have the function [38],
$$ logit\left({\pi}_i(s)\right)={\beta}_0+ f\left({d}_{i k}\left|{\alpha}_k,{\phi}_k,{\delta}_k\right.\right)+\mathrm{X}\upbeta +\mathrm{S}\left(\mathrm{s}\right) $$
(1)
where f(.) is the parametric function of the distance proposed in [39] as
$$ f\left({d}_{ik}\left|{\alpha}_k,{\phi}_k,{\delta}_k\right.\right)=\left\{\begin{array}{rr}\hfill 1+{\alpha}_k,& \hfill if\ {d}_{ik}\le\ {\delta}_k\\ {}\hfill 1+{\alpha}_k \exp \left\{{\left(\frac{d_{ik}-{\delta}_k}{\phi_k}\right)}^2\right\},& \hfill if\ {d}_{ik}>{\delta}_k\end{array}\right. $$
(2)
Here 1 + α
k
reflecting the excess of odds ratio at the mine, δ
k
represents the radius of the plateau around the mine, ϕ
k
represents the rate at which the risk decreases with each additional kilometer of distance to the edge of the plateau. An additional spatial effect S(s) corrects for any regional differences in the risk of the disease, which could not be explained by the potential confounders or by the proximity to the mines. Model construction was performed in a hierarchical procedure:
-
Model 0: logit(π
i
) = β
0 + Xβ, including only potential confounders
-
Model 1: logit(π
i
(s)) = β
0 + Xβ + S(s), including a spatial variation.
-
Model 2: logit(π
i
(s)) = β
0 + f(d
ik
| α
k
, ϕ
k
, δ
k
) + Xβ, including the parametric proximity function.
-
Model 3: logit(π
i
(s)) = β
0 + f(d
ik
|α
k
, ϕ
k
, δ
k
) + Xβ + S(s), including an additional spatial effect to explore if there is a remaining spatial variation which could not be explained by the proximity or the confounders.
The models were compared using the deviance information criterion (DIC) [40] where lower values indicate an improvement of the fit. These models were estimated using the software JAGS [41] called from R.