Assessing short-term impact of PM10 on mortality using a semiparametric generalized propensity score approach

Background The shape of the exposure-response curve describing the effects of air pollution on population health has crucial regulatory implications, and it is important in assessing causal impacts of hypothetical policies of air pollution reduction. Methods After having reformulated the problem of assessing the short-term impact of air pollution on health within the potential outcome approach to causal inference, we developed a method based on the generalized propensity score (GPS) to estimate the average dose-response function (aDRF) and quantify attributable deaths under different counterfactual scenarios of air pollution reduction. We applied the proposed approach to assess the impact of airborne particles with a diameter less than or equal to 10 μm (PM10) on deaths from natural, cardiovascular and respiratory causes in the city of Milan, Italy (2003-2006). Results As opposed to what is commonly assumed, the estimated aDRFs were not linear, being steeper for low-moderate values of exposure. In the case of natural mortality, the curve became flatter for higher levels; this behavior was less pronounced for cause-specific mortality. The effect was larger in days characterized by higher temperature. According to the curves, we estimated that a hypothetical intervention able to set the daily exposure levels exceeding 40 μg/m3 to exactly 40 would have avoided 1157 deaths (90%CI: 689, 1645) in the whole study period, 312 of which for respiratory causes and 771 for cardiovascular causes. These impacts were higher than those obtained previously from regression-based methods. Conclusion This novel method based on the GPS allowed estimating the average dose-response function and calculating attributable deaths, without requiring strong assumptions about the shape of the relationship. Its potential as a tool for investigating effect modification by temperature and its use in other environmental epidemiology contexts deserve further investigation.


Background
In investigating short-term effects and impacts of air pollution on population mortality and morbidity, the exposure-response relationship is frequently assumed to be linear on a logarithmic scale, supporting the idea that an increase of the air pollution level yields the same percent increase in the occurrence of health events at any exposure level. This choice is in line with part of the epidemiological literature which does not report evidences of strong deviations from log-linearity within the ranges of exposure levels observed in urban areas [1][2][3][4][5][6][7][8][9][10]. However, recent works suggest a possible violation of the log-linearity assumption, especially in contexts where the range of exposure values was sufficiently wide to allow exploring the curve also for high air pollutant levels. For example, in a study aimed at investigating the relationship between the concentration of atmospheric particulate matter having diameter less than or equal to 2.5 μm (PM 2.5 ) and mortality in 272 Chinese cities, a leveling off in the exposure-response curves at high concentrations was reported in some areas of the Country [11]. Another Chinese study found a non log-linear increase in the occurrence of respiratory diseases for daily PM 2.5 levels above 50 μg/m 3 [12]. Evidences of violation of the loglinearity assumption were found also in Li et al. [13] for the relationship between PM 2.5 and mortality and in Zu et al. [14] for the relationship between ozone and hospitalizations from asthma. Finally, in the recent study conducted by Liu and colleagues on 652 cities around the world, the estimated pooled exposure-response functions between daily average levels of PM 10 and PM 2.5 and all-cause mortality showed no thresholds and seemed to flatten at high air pollutants concentrations [15].
The shape of the exposure-response relationship has important regulatory implications. As a matter of fact, in the absence of a safe threshold below which exposure has no effect on mortality (which is the case of a log-linear exposure-response relationship), any reduction of the air pollutant concentrations is expected to bring to a reduction of the mortality/morbidity burden attributable to the exposure, even when air pollution levels are already under the actual regulatory limits: as such, this suggests the opportunity of policies even in areas characterized by lowmoderate levels of air pollution. On the other hand, should there be a safe threshold, regulations reducing concentrations already below such a threshold would be considered to have a small impact on public health [10]. This is also why the shape of the exposure-response relationship, as related to the existence of a safe threshold for air pollutant concentration, has been a matter of debate both within the scientific community and across regulatory agencies [16,17].
A second aspect related to log-linearity concerns the quantification of mortality and disease burdens in the population. In fact, it was shown that even in case of mild violation of log-linearity, assuming log-linearity could lead to under-or over-estimate the impact of air pollution, intended as number of health events attributable to the exposure [6]. Despite this, it is still common practice to estimate impacts assuming a log-linear effect of the exposure (see for example [15]). Gasparrini and Leone [18] did propose a method to estimate the fraction of health events attributable to extreme ambient temperatures from a flexible exposure-response curve. However, their approach, although potentially applicable to assess also the short term impact of air pollution, is not yet common practice in this field.
Up to now, the studies that explored the violation of the log-linearity assumption in the context of the short-term effects of air pollution usually described the exposureresponse function by including flexible terms in the Poisson or quasi-Poisson regression model specified on the daily number of health events, for example a regression spline or a penalized spline defined on the daily air pollutant concentrations [7,12,14,15,19,20]. Alternative approaches to regression have never been adopted.
The potential outcomes approach to causal inference [21,22] was recently considered a useful tool in the policy debate about air pollution regulatory interventions [23], as it encourages to think in terms of causes and effects, within a formal mathematical framework. The causal interpretation of the estimated associations is particularly relevant when the effect estimates are used to calculate the absolute mortality and disease burden related to the air pollution exposure. Some studies adopted causal inference methods in the analysis of short-term effects of air pollution on population health, but none of these explored the shape of the exposure-response function [24,25].
The generalized propensity score (GPS) method allows a flexible modeling of the exposure-response function within a potential outcomes approach to causal inference. The GPS was introduced by Hirano and Imbens [26] as a generalization of the propensity score (PS) -used in the case of a binary treatment -to the case of discrete treatments, continuous treatments and arbitrary treatment regimes [27][28][29][30]. As opposed to standard regression adjustment, the GPS-based adjustment allows adjusting for the potentially non-linear effect of a large number of covariates and it results in a flexible way to explore effect modification with every characteristic.
In the present paper, we proposed to use a semiparametric GPS method for covariate adjustment to estimate the exposure-response function describing the short-term effect of the air pollutant exposure on the health outcome, in the context of epidemiological time series analyses. In particular, we built on the semiparametric GPS proposed by Bia and colleagues [31], replacing the normal model for the outcome with a quasi-Poisson model, more suitable for count data, and including spline terms to account for non-linearities. In addition, we implemented a novel procedure for the estimation of the attributable number of events according to the estimated exposure-response curve, assuming different counterfactual scenarios.
In this paper, we also defined ad hoc estimands to investigate the role of temperature in modifying the effect of air pollution. As a matter of facts, a major point in the study of the short-term effects of air pollution on health is the possible synergic effect between exposure to air pollution and ambient temperature. This issue is particularly relevant if we consider that, according to the climate change projections, number and intensity of extreme weather days are expected to increase in the future [32]. In the literature, there is some evidence that the association between air pollution and mortality is stronger during summer or in days with high temperatures [33][34][35][36], but further investigation is needed.
We applied the proposed approach to the estimation of short-term effects and impacts of particulate matter with a diameter less than or equal to 10 μm (PM 10 ) on mortality from all natural, respiratory, and cardiovascular causes in the city of Milan, Italy for the period 2003-2006. This data set has already been analyzed according to the standard method based on Poisson regression [37,38] and, more recently, using propensity score-based matching with a dichotomous version of the exposure (high levels versus low levels of PM 10 ) [25]. This allowed us to compare our results to the findings of these previous studies.

Data
We considered data for the city of Milan (1,299,633 inhabitants in 2007) for the years 2003-2006. Milan is located in an area where unfavorable geographical and climate conditions induce frequent phenomena of thermal inversion, exacerbating the air pollution mainly due to traffic and, to a minor extent, to non-industrial combustion [39]. The daily time series of PM 10 levels, temperature and humidity were obtained by averaging data over the available air quality monitoring stations of the Regional Environmental Protection Agency (ARPA Lombardia). Daily mortality data were obtained from the Regional Mortality Register. We focused on the deaths of the resident population occurred within the city area. We considered daily mortality from all natural (International Classification of Diseases, Ninth Revision (ICD-9) codes below 800), cardiovascular (ICD-9: 390-459) and respiratory (ICD-9: 460-519) causes.

Notation and potential outcomes
Let i = 1, . . . , N be the indicator of the day, which will be also referred to as the unit. Let Z i ∈ Z be the exposure level in day i, defined as the average level of PM 10 in the current day i and in the previous one i − 1 (lag 0-1 exposure), and let Y i ∈ Y be the number of deaths in day i. Finally, let X i ∈ X be a vector of K covariates for day i, which includes meteorological variables (ambient temperature and humidity), seasonality terms, holidays and influenza epidemics indicators.
According to the potential outcome framework [21,22], we denote by Y i (z) the potential number of deaths in day i if z were the exposure level in that day. For each day a collection of potential outcomes is defined, one for each possible level of exposure z, but we only observe the one corresponding to the actual exposure of that day, Z i , being Y i (Z i ) = Y i . Thus, our effort is to define a procedure that allows us to extrapolate information on the unobserved potential outcomes across days with similar baseline covariates.
Note that potential outcomes of the form Y i (z), with z ∈ Z are well-defined only under the Stable Unit Treatment Value Assumption (SUTVA) [22], which rules out the presence of interference between units, i.e., the possibility that the potential outcomes of one day are affected by the level of exposure in previous days. In this paper, we considered the lag 0-1 exposure not only to allow comparison with previous results, but also to prevent interference across days as required by SUTVA. In fact, using the lag 0-1 exposure instead of the current PM 10 level makes the no-interference assumption more plausible [25]. Enlarging the window of the moving average for the exposure definition would have made the no-interference assumption even more plausible, but at the price of a lower variability of the exposures and of a reduced possibility of detecting an impact.

Causal estimands
In our analysis we focused on different causal estimands. We defined the average dose-response function (aDRF) as: which represents the average daily number of deaths we would have observed fixing the exposure to z in each day during the study period. We estimated this quantity precisely to capture the average causal relationship between the exposure level and mortality rate in the city of Milan during the years 2003-2006. We also defined conditional average dose-response functions of the type: where N(X ) is the number of days with covariate levels included in the subset X of X . We used this kind of estimand to investigate how the relationship between exposure and outcome changed according to the levels of air temperature. In addition to the average dose-response functions, we also focused on the absolute number of deaths attributable to observed levels of exposure exceeding a pre-specified counterfactual threshold z . In the epidemiological literature, this quantity is usually referred to as attributable deaths (AD) and can be expressed as: In other words, AD(z ) compares the observed mortality in days with Z i > z to the mortality that we would have observed had the exposure been set exactly equal to z .
As an alternative to AD(z ), we quantified the number of attributable deaths with reference to a more complex, but realistic, counterfactual scenario, reflecting a hypothetical intervention that, instead of fixing the exposure to a specific value z , would replace the observed exposures when Z i > z with exposure values drawn from an observed or hypothetical distribution p (z). This new causal estimand, referred to as distributional attributable deaths (DAD), takes the following form: In our analysis, we defined p (z) as the actual distribution of the exposure in the days where Z i ≤ z , that is, p (z) = p(z|Z i ≤ z ). For simplicity, we denoted the corresponding estimand by DAD(z ).
Under the assumption that all relevant confounders have been measured, also known as the unconfoundedness assumption or the no-unmeasured confounders assumption 1 , the aDRF and the other estimands presented in the previous section were identifiable and could be estimated without bias conditioning on covariates. In our analysis, the validity of the unconfoundedness assumption, which cannot be tested on data, relied on the fact that the selection of the covariates X i was based on a substantive a priori knowledge of the phenomenon derived from the literature on the analysis of short term effects of air pollution, from the seminal papers [40][41][42] to the very recent work by Liu et al. [15].

Generalized propensity score
According to Hirano and Imbens [26], the GPS, denoted by r(z; x), is defined as the conditional density of the exposure Z given the covariates X: 1 The weak unconfoundedness assumption states the conditional independence between exposure and potential outcomes: Similarly to the PS, the GPS is a balancing score, meaning that within strata of units with the same value of r(z; x), the probability that Z i is equal to z does not depend on the covariates value [43]. The unconfoundedness assumption, coupled with the balancing property of the GPS, implies that assignment to treatment is unconfounded given the GPS 2 . In practice, this means that any bias given by differences in the distribution of covariates across groups with different exposure levels can be removed by adjusting for the GPS [26].
In this paper, we used the GPS to obtain estimates of the causal estimands described in the previous section. For this purpose, first we parametrically modeled and estimated the GPS through a model for the exposure, then we specified and estimated a model for the outcome given the GPS and the exposure, and we used it to obtain estimates of the causal quantities of interest.

Model for the exposure
We assumed a parametric log-normal model for the exposure Z i : where α 0 and α X were unknown regression coefficients and σ Z was the unknown variance. In particular, we included in the model the following regressors: indicators of day of week, holiday and influenza epidemics, a linear regression spline with 5 degrees of freedom per year and equally spaced knots on the calendar day to account for seasonality and long-term trend, a cubic regression spline for temperature at lag 0-3 with 5 degrees of freedom and knots at the quantiles, linear and quadratic terms for relative humidity, and an indicator of the July-August period to account for the reduction of the population present in the city during the summer season [15,37]. According to model 6, the GPS for day i at the level of exposure z was thus defined as: In the same way, we defined the actual GPS for day i, i.e. the GPS evaluated at the level of exposure actually observed, as: It is worth noticing that different specifications were possible for the model 6 and some effort was needed to find the most appropriate one. A key criterion driving the specification of the model consists in checking the balancing property of the estimated GPS. In order to check the balancing property of the GPS, we applied the method described in Hirano and Imbens [26]. After creating four classes of exposure, , we applied the following procedure for each covariate X i . First, we checked the independence of each covariate and each exposure class indicator 1{Z i ∈ C k } by calculating the marginal t statistics for the mean difference X i between the class C k and all the others as a whole. Then, we calculated for each day the GPS at the median of each treatment class (M k , k = 1, . . . , 4) and we checked the independence of each covariate X i and each exposure class indicator 1{Z i ∈ C k }, conditional on r(M k ; X i ). In order to do this, given the class C k , we defined four blocks according to the quartiles of r(M k , X i ) and, for each block, we calculated the mean difference of X i between days belonging to the class C k and days belonging to the other three classes, as well as the corresponding standard error. The four differences obtained, one for each GPS block, were then combined in a weighted mean, with weights proportional to the number of observations in each block. A t statistics was thus calculated as the ratio of the combined difference and its standard error. From the descriptive comparison between these GPS-adjusted t statistics and marginal t statistics (four comparisons for each variable, one for each class C k ), we were able to evaluate if adjusting for GPS improved the balance.

Model for the outcome
We specified a quasi-Poisson model for the outcome given the exposure and the GPS: where s (z, r(z; X i )) was a penalized bivariate splines, with radial basis functions defined on a large number of knots 3 . The smoothing parameter was automatically selected by Generalized Cross Validation [44].

Estimation procedure
The estimation procedure consisted in the following steps: 1 We estimated the parameters of the exposure model, α 0 ,ˆ α X andσ 2 Z ; 3 It is worth noticing that, for a fixed value of the GPS, the regression function of Y given the exposure "does not have a causal interpretation" [26]. This means that the regression model does not directly result in the causal aDRF, which instead must be derived using the procedure described in the "Estimation procedure" section 2 We used them to predict for each day i the actual GPS, R i 4 ; 3 We used the observed data (Y i , Z i ) and R i to estimate the outcome model; 4 For each exposure level z and for each unit i, we predicted the potential outcome Y i (z) using the following steps: a We predicted the GPS at that level of exposure z,r(z; X i ) 5 ; b We used the estimated outcome model and r(z; X i ) to predict the potential outcome at z, Y i (z). 5 We used the predicted potential outcomes to estimate the quantities of interest. In particular, we estimated the average and the conditional dose-response functions by averaging the potential outcomes over all units and over all units with X i included in each subset X of interest While AD(z ) was estimated by the following sum over all units with Z i > z a more complex procedure was needed for DAD(z ). For the estimation of DAD(z ) we repeated the following steps M times: a We drew a value z m from the distribution and we finally calculated the average Standard errors and 90% confidence intervals of the causal estimands were estimated with a bootstrap method [26,45]. Bootstrap samples were obtained by using an independent resampling strategy with replacement. For each bootstrap sample the parameters of the exposure and outcome models were estimated and causal estimands were computed using the above estimation procedure.

Results
Summary statistics of daily counts of deaths, PM 10 concentrations and temperature levels are reported in Table 1.
In order to check the balancing property of the GPS we focused on the following variables: average daily temperature at lag 0-3, extreme temperature indicator (days with average temperature at lag 0-3 exceeding the 95 th percentile calculated over the study period), humidity at lag 0, and indicators for season (summer/winter), weekends, holidays and influenza epidemics. We found that, after the adjustment for the GPS calculated according to the model specification described in the Model for the exposure section, the t statistics comparing the average of each covariate between classes of exposure reduced in respect to the marginal t statistics, in particular when the initial imbalance between groups was strong ( Table 2 ). This was indicative that the balancing property was satisfied. More complex models for GPS, involving splines with a larger number of knots, a spline for humidity and interactions terms, did not bring to further improvement of the balance (results not reported).
In Fig. 1 we report the estimated aDRF for natural-cause mortality, which describes how the average level of mortality, defined as in equation 1, changes according to the PM 10 level at lag 0-1. The relationship was increasing and nonlinear, steeper at low concentrations and flat above 50-60 μg/m 3 . The aDRFs estimated for respiratory and cardiovascular mortality had a similar shape (Figs. 2 and  3). However, the tendency of these curves to flatten was less pronounced and, when present, arose at even higher PM 10 concentrations.
Especially for natural-cause and cardiovascular mortality, the shape of the relationship was unexpectedly wiggly at low levels of exposure. This was probably due to an excessive local dependence of the curve on the knots location of the bivariate spline in equation 9, related to the fact that information under 20 μg/m 3 was quite poor, relying on only 6% of days.
For natural-cause mortality, we also estimated the aDRF within three different categories of days, defined according to daily average temperature levels (at lag 0-3): low (up to 10 • C), medium (10/23 • C), high (over 23 • C) (Fig. 4 ). As expected, background mortality was higher in days characterized by lower temperatures (blue dots) and lower in days with higher temperatures (red dots); however, the slope of the exposure-response association appeared to be slightly steeper in days belonging to the higher temperature category, suggesting a possible synergic effect between PM 10 exposure and heat.
After estimating the aDRFs, we estimated the causal estimands introduced in the Causal estimands section. For the three outcomes of interest, we estimated AD (20), AD (40), DAD (40) and DAD(50) ( Table 3). The choice of the threshold z of the countefactual scenarios followed internationally recognized standards, in details:

-20 μg/m 3 is the World Health Orgaization (WHO)
Air Quality Guideline threshold [46] for PM 10 annual average; -40 μg/m 3 is the European Union (EU) limit for PM 10 annual average [47]; -50 μg/m 3 is the EU limit for PM 10 daily average, not to be exceeded for more than 35 days per year [47].
Exceeding the threshold of 20 μg/m 3 has been responsible of the largest impact. In particular, an hypothetical intervention able to set the air pollution level in all days with exposure above 20 μg/m 3 to exactly 20 μg/m 3 (AD(20)), would have avoided 2537 deaths (90% Confidence Interval (CI): 1273, 3655), 577 (90% CI: 226, 900) of which from respiratory causes and 1000 (90% CI: 464, 1488) from cardiovascular diseases. The lowest impact was estimated for an intervention able to set PM 10 concentration levels higher than 40 μg/m 3 to exactly 40 μg/m 3 (AD(40)): 1157 (90% CI: 689, 1645) attributable deaths, 312 (90% CI: 210, 418) for respiratory causes and 771 (90% CI: 580, 961) for cardiovascular causes. The fact that attributable deaths for cardiovascular and respiratory causes represented a variable proportion of the total impact was directly related to the shape of the exposureresponse functions.
With DAD(p 40 ), we quantified the impact of a hypothetical intervention able to set the air pollution levels in all days with exposure above 40 μg/m 3 to values belonging to the PM 10 distribution observed in days with exposure below 40 μg/m 3 . As expected, DAD(40) was larger  Marginal and GPS-adjusted t statistics for the mean difference between one class of exposure and the others as a whole, for a set of selected covariates than AD (40), because it also picked counterfactual levels of the exposure far below the chosen threshold of 40 μg/m 3 . For the same reason, also DAD (50), with almost 1500 attributable deaths, returned an impact larger than AD (40), despite the chosen threshold was higher. In both DAD scenarios, attributable cardiovascular deaths were about 55% of all natural attributable deaths, while respiratory mortality represented about 22% of the total.

Discussion
In this paper, we investigated the shape of the exposureresponse function describing the short-term effect of PM 10 on mortality, by using a semi-parametric procedure based on the GPS [31]. To the best of our knowledge, this is the first time that the GPS approach is applied in the analysis of short-term effects of air pollution. A recent work by Wu and colleagues [48] did propose a somewhat similar approach, but applied it to the analysis of longterm effects of air pollution; in addition, they did not quantify the mortality burden of air pollution in terms of attributable health events. On the contrary, we developed a novel procedure for the estimation of the attributable deaths based on the estimated curve, considering alternative counterfactual scenarios. As opposed to regression-based methods, including those based on splines (e.g., Gasparrini and Leone [18]), the GPS-based method we propose allows adjusting for all possible covariates including non-linearities and interactions with the exposure -the same being more difficult in a regression model -and is less prone to incur into issues of overfitting, multicollinearity and variance inflation.
Attributable events calculation within a causal inference framework is not new in the analysis of the short-term effects of air pollution on health. In order to estimate the attributable deaths, Baccini et al. [25] implemented a PS  matching on the same data set used in the present paper, after dichotomizing the exposure according to a threshold of 40 μg/m 3 . In that work, the level of mortality observed in days with exposure values exceeding the threshold was contrasted to the level of mortality we would have observed in those days had the levels of exposure been set to a value below the threshold. Differently, the approach proposed in the present paper does not require the definition of a binary version of the exposure, but allows to treat the air pollutant concentration as a continuous variable. Moreover, the number of attributable deaths is estimated following the shape of the (potentially nonlinear) average exposure-response curve.
In order to quantify the number of attributable events we proposed two estimands, AD and DAD. Both of them can be used to measure the impact on mortality of hypothetical interventions able to reduce the air pollutant levels (assuming that the effects of these hypothetical interventions occur only through the reduction of the air pollutant levels). Fixing a threshold for the daily exposure, AD compares the actual level of mortality with the level of mortality we would observe if the daily exposures exceeding the threshold were exactly equal to the threshold. On the contrary, DAD compares the actual level of mortality with the level of mortality we would observe if we were able to intervene on the days with daily exposures exceeding the threshold and set instead the level exposure to a value below the threshold, according to a hypothetical distribution -for example, the observed distribution of the exposures lower than the threshold.  It is worth noting that, while the aDRF averages the potential outcomes over all days in the study period, AD(z ) and DAD(z ) are sums defined only over the "treated" days, i.e. the days characterized by exposure levels higher than the threshold z . In this sense, these quantities are somewhat related to the so-called average causal effect on the treated (ATT) [22].
Our analysis found that the estimated aDRFs were not linear. This was true also after a log transformation (not reported), thus indicating that the usual log-linearity assumption was, in this case, not valid. Moreover, being the estimated aDRFs steeper for low values of the exposure, our results seem to indicate that a "safe" threshold does not exist. As a consequence, any reduction of the PM 10 levels, even when they are already under the actual regulatory limits, is expected to lead to a decrease in the health burden attributable to the exposure. The idea that there is no threshold, acknowledged also by governmental and international agencies [17,46,49], is supported by the observation that the population is a mixture of biologically different individuals with various levels of susceptibility, and, as a consequence, no threshold below which nobody experiences the health effect can be detected, especially in the presence of frail (and potentially hypersusceptible) subjects (e.g. children, elderly people, etc.). Further considering the shape of the aDRF, while the presence of a plateau for high concentration values (as for naturalcause mortality) would suggest that small reductions of PM 10 are ineffective in highly polluted days, the absence of a clear plateau (as for respiratory and cardiovascular mortality) suggests that any measure aimed to produce a reduction (even small) of air pollutant concentration levels can prevent deaths, regardless the initial level of exposure.
We observed evidence of a stronger effect of PM 10 on natural-cause mortality in days characterized by higher temperature. This finding seems consistent with a previous investigation conducted on the entire territory of AD (20): deaths attributable to daily exposure levels above 20 μg/m 3 , setting the counterfactual exactly to 20 μg/m 3 . AD (40): deaths attributable to daily exposure levels above 40 μg/m 3 , setting the counterfactual exactly to 40 μg/m 3 . DAD (40): deaths attributable to daily exposure levels above 40 μg/m 3 , sampling the counterfactuals from the exposure distribution below 40 μg/m 3 . DAD(50): deaths attributable to daily exposure levels above 50 μg/m 3 , sampling the counterfactuals from the exposure distribution below 50 μg/m 3  [37], deaths attributable to exceeding the limit of 40 μg/m 3 for the annual average level of exposure, calculated from a regression approach the Lombardy region (for which Milan is the capital city), where higher effects of PM10 exposure on mortality were observed during summer [35]. In this paper, aimed at developing a new approach to estimate the dose-response curve and novel estimands for impact assessment, we focused on data on only one city.
In the case of studies on multiple locations -which are common in this field -it is possible to use frequentist or Bayesian multivariate meta-analysis methods to combine the derivatives of the estimated aDRFs arising from several locations/cities [50]. We leave this extension to future work.

Comparison with previous results
The impacts estimated using the proposed approach partly confirmed the results reported in previous papers, obtained on the same data set with different methods. In Table 4 we report: • the impact of the annual average PM 10 level exceeding 40 μg/m 3 , calculated using the estimated coefficient from a Poisson regression, thus assuming the loglinearity of the effect [37] (Regression 40 in Table 4); • the impact of daily PM 10 levels above vs. below 40 μg/m 3 , estimated using a PS matching, after dichotomizing the exposure at 40 μg/m 3 [25]. In this case, mortality in each "treated" day was compared to mortality in a matched day with similar characteristics (similar PS) but exposure under 40 μg/m 3 (PS matching 40 in Table 4); • our current findings AD (40) and DAD (40).
These estimates deserve some comments. First of all, the number of AD for natural causes estimated through the standard regression approach is far smaller than the other ones reported. This is due in part to the fact that, in the regression approach, the observed mortality was compared to the mortality that we would have observed during the study period, had the annual average level of exposure been set to 40 μg/m 36 . This is different from 6 The number of attributable deaths due exceeding the threshold T for the annual average level of the air pollutant was calculated as setting each counterfactual daily level of exposure above 40 μg/m 3 to exactly 40 μg/m 3 or to values below 40 μg/m 3 , as in the present paper and in the PS matching approach [25]. In addition, we cannot exclude that the low estimated impacts can partly originate from the incorrect assumption of log-linearity on the exposure-response function.
Secondly, the estimated impacts using the GPS approach, especially AD (40), are consistent with the results obtained using PS matching. This is indicative of the substantial reproducibility of the results when applying procedures that, like these, do not force the loglinearity of the effect. At the same time, it also suggests that the simpler PS approach based on a binary version of the exposure can be considered a good "approximation" of the more complex method based on the GPS. However, the GPS approach, which estimates the attributable events according to a curve, returned more precise estimates than those obtained through PS matching in all the investigated outcomes, as shown by the narrower confidence intervals.
Finally, it is worth noticing that the impact DAD(40) estimated here is larger than the impact based on PS matching, even if both estimands seem to answer to the same research question: what would be the impact of a hypothetical intervention able to set daily exposure levels under 40 μg/m 3 ? The reason of this difference is that, as a matter of facts, the two estimands do not consider exactly the same counterfactual scenario. In fact, according to definition 4, DAD (40) and the PS matching estimand share the same threshold z = 40, but set different p (z): for DAD (40) p (z) is the observed distribution of the daily exposures for all days with observed levels under the threshold, whereas for PS matching it is the distribution of the daily exposures among the matched controls [25]. Since in PS matching the control days with a level of exposure below 40 μg/m 3 are matched to the "treated" y − y/exp(b(x − T))I(x > T),where y is the observed number of deaths during the study period, x is the annual average level of the air pollutant, b is the estimated regression coefficient expressing the air pollutant effect and I(x > T) is an indicator function which is equal to 1 if x > T and 0 elsewhere [37]. ones according to similarity in the observed characteristics, it is likely that the matched controls are those with a level of exposure closer to the threshold. For this reason, the impact estimated using PS matching was necessarily lower than DAD (40) and close to AD (40). This comparison highlights the need of providing a clear definition of the counterfactual scenario that is chosen to assess the impact.

Study limitations
Our study has some limitations. First of all, for low exposures, when the information is quite poor, a higher degree of smoothing for the bivariate spline of the outcome models would be needed, in order to reduce the sensitivity of the aDRFs to knots location. Adaptive splines could be used to reduce this problem [51,52]. Second, the bootstrap procedure that we implemented in this analysis assumed independence between units. More complex approaches which account for the autocorrelation in the mortality time series could be used, such as block bootstrap or residual bootstrap methods [53]. Third, checking the balancing property of the GPS is crucial and not straightforward; we implemented a GPS blocking-based approach which employed t statistics to quantify unbalance [26]. As an alternative, standardized differences could be used as recently proposed [54].
It also seems appropriate to make a consideration regarding the positivity assumption. In this context, the positivity assumption would state that, for all possible strata defined by the covariates, it is possible to observe any level of exposure. As a matter of fact, even if in principle our approach does not require the validity of this assumption, its strong violation could challenge estimation, due to the presence of regions of the confounder space where inference would rely on extrapolation. A widely used method to address the violation of the positivity assumption consists in trimming the sample, which means excluding from the analysis the classes of units with limited variability in Z i [55]. Although a slight evidence of violation of the positivity assumption arose in our data, we decided to perform the analysis on the entire data set anyway, without any kind of trimming. This choice was motivated by the need of quantifying the overall burden of mortality attributable to air pollution during the study period, without excluding any "treated" day.
Finally, formal statistical tests on the shape of the aDRF have not been developed within the GPS approach. As a consequence, the non-linearity of the aDRF could not be formally tested and our results on effect modification by temperature, although suggestive of a larger effect during the warm season, should be interpreted in a descriptive sense.

Conclusion
We adopted a method based on the GPS to obtain a semiparametric estimate of the average dose-response function describing the short-term effect of airborne particles on mortality and we defined novel estimands to assess the impact in terms of attributable deaths, according to the estimated curve. The estimands allow the specification of different counterfactual scenarios defined on the distribution of the daily air pollutant levels. This approach can be easily extended to other environmental epidemiology contexts. Its potential as a tool to investigate effect modification by temperature or other relevant factors deserves investigation.