Impact of exposure measurement error in air pollution epidemiology: effect of error type in time-series studies

Background Two distinctly different types of measurement error are Berkson and classical. Impacts of measurement error in epidemiologic studies of ambient air pollution are expected to depend on error type. We characterize measurement error due to instrument imprecision and spatial variability as multiplicative (i.e. additive on the log scale) and model it over a range of error types to assess impacts on risk ratio estimates both on a per measurement unit basis and on a per interquartile range (IQR) basis in a time-series study in Atlanta. Methods Daily measures of twelve ambient air pollutants were analyzed: NO2, NOx, O3, SO2, CO, PM10 mass, PM2.5 mass, and PM2.5 components sulfate, nitrate, ammonium, elemental carbon and organic carbon. Semivariogram analysis was applied to assess spatial variability. Error due to this spatial variability was added to a reference pollutant time-series on the log scale using Monte Carlo simulations. Each of these time-series was exponentiated and introduced to a Poisson generalized linear model of cardiovascular disease emergency department visits. Results Measurement error resulted in reduced statistical significance for the risk ratio estimates for all amounts (corresponding to different pollutants) and types of error. When modelled as classical-type error, risk ratios were attenuated, particularly for primary air pollutants, with average attenuation in risk ratios on a per unit of measurement basis ranging from 18% to 92% and on an IQR basis ranging from 18% to 86%. When modelled as Berkson-type error, risk ratios per unit of measurement were biased away from the null hypothesis by 2% to 31%, whereas risk ratios per IQR were attenuated (i.e. biased toward the null) by 5% to 34%. For CO modelled error amount, a range of error types were simulated and effects on risk ratio bias and significance were observed. Conclusions For multiplicative error, both the amount and type of measurement error impact health effect estimates in air pollution epidemiology. By modelling instrument imprecision and spatial variability as different error types, we estimate direction and magnitude of the effects of error over a range of error types.


Background
The issue of measurement error is unavoidable in epidemiologic studies of air pollution [1]. Although methods for dealing with this measurement error have been proposed [2,3] and applied to air pollution epidemiology specifically [4,5], the issue remains a central concern in the field [6]. Because large-scale time-series studies often use single central monitoring sites to characterize community exposure to ambient concentrations [7], uncertainties arise regarding the extent to which these monitors are representative of exposure. Zeger et al. [8] identify three components of measurement error: (1) the difference between individual exposures and average personal exposure, (2) the difference between average personal exposure and ambient levels, and (3) the difference between measured and true ambient concentrations. While the former two components of error can have a sizeable impact on epidemiologic findings that address etiologic questions of health effects and personal exposure, it is the third component that is particularly * Correspondence: james.mulholland@ce.gatech.edu 1 School of Civil and Environmental Engineering, Georgia Institute of Technology, 311 Ferst Drive, Atlanta, Georgia 30332-0512, USA Full list of author information is available at the end of the article relevant in time-series studies that address questions of the health benefits of ambient regulation [9].
Prior studies have suggested that the impact of measurement error on time-series health studies differs depending upon the type of error introduced [8, 10,11]. Two distinctly different types of error have been identified. One type is classical error, in which measurements, Z t , vary randomly about true concentrations, Z * t ; this can be considered the case for instrument error associated with ambient monitors. That is, instrument error is independent of the true ambient level, such that E[Z t |Z * t ] = Z * t . Moreover, the variation in the measurements, Z t , is expected to be greater than the variation in the true values, Z * t . Therefore, classical error is expected to attenuate the effect estimate in time-series epidemiologic studies. In contrast, under a Berkson error framework, the true ambient, Z * t , varies randomly about the measurement, Z t . This might be the case, for example, of a measured population average over the study area with true individual ambient levels varying randomly about this population average measurement. In this case, measurement error is independent of the measured population average ambient; that is, E[Z * t |Z t ] = Z t . Furthermore, the measurement, Z t , is less variable than the true ambient level, Z * t . A purely Berkson error is expected to yield an unbiased effect estimate, provided that the true dose-response is linear [3].
Several studies have investigated the impact of error type on regression models. The simultaneous impact of classical and Berkson errors in a parametric regression estimating radon exposure has been investigated [12] and error type has been assessed in a semiparametric Bayesian setting looking at exposure to radiation from nuclear testing [13,14]; however, no study to date has comprehensively assessed the impact of error type across multiple pollutants for instrument imprecision and spatial variability in a time-series context.
Error type depends on the relationship between the distribution of measurements and the distribution of true values. Because true relevant exposure in environmental epidemiologic studies is not known exactly, determination of error type is challenging; thus, here we examine the impact of error modelled as two distinctly different types: classical and Berkson. First, we examine monitor data to assess whether error is better modelled on a logged or unlogged basis. Typically, researchers investigating error type have added error on an unlogged basis (e.g. [8,11]); however, air pollution data are more often lognormal due to atmospheric dynamics and concentration levels that are never less than zero. It is plausible that true ambient exposures are distributed lognormally about a population average as well; therefore, measurement error may be best described as additive error on the log scale. We investigate the combined error from two sources that have been previously identified as relevant in time-series studies: (1) instrument precision error and (2) error due to spatial variability [9]. We limit our scope to ambient levels of pollutants measured in accordance with regulatory specifications, disregarding spatial microscale variability, such as near roadway concentrations, as well as temporal microscale variability, such as that associated with meteorological events on sub-hour time scales. Here, building on a previously developed model for the amount of error associated with selected ambient air pollutants [15], we quantitatively assess the effect of error type on the impacts of measurement error on epidemiologic results from an ongoing study of air pollution and emergency department visits in Atlanta.

Air Pollutant Data
Daily metrics of 12 ambient air pollutants were studied:  [16], including the Atlanta EPA supersite at Jefferson Street [17]; and the Assessment of Spatial Aerosol Composition in Atlanta (ASACA) network [18]. Locations of the monitoring sites are shown in Figure 1.
To assess error due to instrument imprecision and spatial variability of ambient concentrations, 1999-2004 datasets were used for the 12 pollutants with data completeness for this time period (2,192 days) ranging from 82% to 97%. Data from collocated instruments were used to characterize instrument precision error. Measurement methods and data quality are discussed in detail in our prior work [15]. Distributions of all air pollutant measures more closely approximate lognormal distributions than normal distributions ( [19], see Additional file 1, Table S1); therefore, additive error was characterized and modeled on a log concentration basis so that simulations with error added to a base case time-series would also have lognormal distributions.

Measurement Error Model
The measurement error model description here highlights differences from our previous work in which error type effects were not addressed [15]. In this study, a time-series of observed data was taken to be the "true" time-series, Z * t , serving as a base case. Classical-like or Berkson-like error was added to this base case to produce a simulated time-series, Z t , that represents a population-weighted average ambient time-series. Here, the asterisk refers to a true value (i.e. without error) as opposed to a value that contains error (i.e. the simulated values in this study). The choice of which pollutant to use for the true, or base case, time-series is arbitrary, as long as an association with a health endpoint has been observed with that pollutant. To develop simulated datasets with modeled instrument and spatial error added, the following steps were taken. Base case time-series data were normalized as follows.
Here, χ * t is the normalized log concentration on day t and μ InZ* and s InZ* are the mean and standard deviation, respectively, of the log concentrations over all days t; thus, the mean and standard deviation of χ * t are 0 and 1, respectively. Error in χ * t was modeled as multiplicative (i.e. additive on a log scale) as follows.
Here, ε ct is the modeled error in χ * t for day t, N t is a random number with distribution~N(0,1) and s err is the standard deviation of error added, a parameter derived from the population-weighted semivariance to capture the amount of error present for each pollutant, as described in the next subsection. Short-term temporal autocorrelation observed in the differences between measurements was modeled using a three-day running average of random numbers for N t [15].
To provide simulations of monitor data with error added (Z t ), the modeled error was added to normalized data and then the normalized data with error added were denormalized in two ways: one to simulate classical-like error (i.e. classical error on a log concentration basis, referred to here as type C error) and the other to simulate Berkson-like error (i.e. Berkson error on a log concentration basis, referred to here as type B error). Simulations with type C error are generated by eq. 3.
Here, c t is the standardized simulated time-series (on the log scale) with type C error added and normal distribution ∼ N 0, 1 + σ err 2 . In this case of type C error, ε ct and χ * t are independent (i.e. E[R(ε χ t , χ * t )] = 0). For type B error, ε ct and c t are independent (i.e. E[R(ε ct , c t )] = 0) and χ * t = χ t + ε χ t . It can be shown (see Additional file 2, eqs. S1-S6) that simulations with type B error can be generated from the true time-series by eq. 4. type B error : Here, c t is the standardized simulated time-series (on the log scale) with type B error added and normal distri- . After the standardized simulated time-series is generated by either eq. 3 or eq. 4, the simulations are denormalized by eq. 5.
For both error types, the simulated time-series (Z t ) and true time-series (Z * t ) have the same log means (μ InZ = μ InZ* ). For classical-like error (type C), the log standard deviation is greater for the simulated time-series than the true time-series (s InZ >s InZ* ) because the simulated values are scattered about the true values. For Berkson-like error (type B), the log standard deviation is less for the simulated time-series than the true time-series (s InZ <s InZ* ) because the true values are scattered about the simulated values.

Semivariogram Analysis
To quantify the amount of error (i.e. s err ) due to instrument imprecision and spatial variability to add to the simulated time-series for each pollutant (eq. 2), we made use of the geostatistical tool of the semivariogram, which provides information on spatial autocorrelation of data and has proved useful in air pollution applications [20,21]. Here, the semivariance of the differences between normalized observations (c k and c l ) at two locations (k and l) located a distance h apart is normalized by the temporal variance (variation over the timeseries of observations) of the average of two normalized observations to yield a scaled semivariance, g'. It can be shown that this scaled semivariance (i.e. the semivariance of normalized values) is related to the Pearson correlation coefficient (R) between normalized observations from two monitors as follows [21].
Thus, g' represents the spatial semivariance scaled to a quantity indicative of the range of exposures over which health risk is being assessed; it is unitless and allows for comparison across pollutants. A scaled semivariance value of 0 corresponds to perfectly correlated observations (R = 1) and a value of 1 corresponds to perfectly uncorrelated observations (R = 0).
Correlations between observations from all pairs of monitors measuring the same pollutant during 1999-2004 were calculated on a log concentration basis. Assuming the spatial variation of air pollutants to be isotropic, scaled semivariograms were constructed and modeled as a function of the distance between observations, h, using a sill of 1, nugget values derived from collocated measurement time-series described in previous work, and least squares regression to determine the range [15]. The estimate from the semivariogram function for each of the 660 Census tracts was weighted by the population in that tract (estimates from 2000 Census data) to derive an overall population-weighted average for each pollutant; thus, the populationweighted semivariance includes impacts of both instrument imprecision and spatial variability and represents the population-weighted average semivariance between all residences in the study area.
Here, γ is the population-weighted average scaled semivariance on a log scale, p total is the total population of the study area, p i,j is the sum of population in census tracts i and j, and γ i,j is the value of the semivariance function at the distance between centroids of census tracts i and j. For within-tract resident pairs, an average distance between residences was applied. Semivariograms for each of the twelve pollutants studied have been shown previously [15] and population-weighted semivariances are in Table 1. The population-weighted semivariance is related to the population-weighted correlation coefficient as follows.
Model parameter s err (eq. 2) is defined to provide simulations with an amount of error such that R where R is obtained from semivariogram analysis (eqs. [6][7][8]. The correlation between the true ambient time-series and a time-series with error added, i.e. R(ln Z, ln Z*), is the square root of the correlation between any two time-series, i.e. R(ln Z 1 , ln Z 2 ), where each is derived by adding the same amount of error to the true ambient time-series. Since the standard deviation of c t depends on s err , the standard deviation of the simulated time-series relative to that of the true time-series (s InZ /s InZ* ) depends on R as well. The following analytical relationships for s err and s InZ /s InZ* were derived (see Additional file 2, eqs. S7-S10).
Values of s err and s InZ /s InZ* used here can be found in Table 1.
Sets of 1000 simulated time-series with instrument and spatial error added for each pollutant for the scenarios of C and B error types were produced for the sixyear period 1999-2004. In addition, simulations of CO measurement error only were generated for a range of error types with s InZ /s InZ* values between error types C and B. Scatterplots demonstrate that C and B error types defined on a log basis (i.e. InZ -InZ*) are independent of InZ* and InZ, respectively (see Additional file 3, Figure S1).

Epidemiologic Model
Relationships between daily measures of ambient air pollution and daily counts of emergency department (ED) visits for cardiovascular disease (CVD, including ischemic heart disease, dysrhythmia, congestive heart failure, and peripheral/cerebrovascular disease) were assessed using methods described elsewhere [22] and briefly summarized here. There were 166,950 ED visits for CVD in the 20-county metropolitan Atlanta area during 1999-2004. Lag 0 associations between daily pollutant concentration and the daily count of ED visits were assessed using Poisson generalized linear models that were scaled to accounted for overdispersion. The general form of the epidemiologic model is where Y t is the count of emergency department visits, Z t is the mismeasured pollutant concentration, and confounders t is the vector of potential confounders on day t. The specific potential confounders included in the model were indicator variables for day-of-week, season, and when a hospital entered or left the study; cubic terms for maximum temperature and dew point; and a cubic spline with monthly knots for day of follow-up. Poisson regression yields a as the intercept, b as the log of the rate ratio associated with a unit change in pollutant concentration, and g as the vector of regression coefficients for the suspected confounders included in the model. The risk ratios (RR) per unit of measurement change and per interquartile range (IQR) change in pollutant concentration (Z) are given by eq. 12 and eq. 13, respectively.

RR per unit
RR per IQR = e IQR×β (13) Using data from the central monitor, preliminary epidemiologic assessments were performed for all air pollutants and ED visits for CVD. Consistent with previous findings [22], significant positive associations were found for several traffic-related pollutants, including NO x , CO and EC. For the measurement error analysis described here, we used 1-hr maximum CO data as our base case, representing in our analysis a true time-series and the measured risk ratio the true association. In this way, the exposure and health outcome values that we chose to represent true time-series have distributional characteristics expected of ambient air pollution and ED visit data. Simulations with measurement error added to the base case were used to evaluate the impact of measurement error on the epidemiologic analyses. A Monte Carlo approach was used to assess uncertainty. As already described, the relationship between this base case timeseries and a simulated time-series is that expected of the average relationship between the true ambient time-series for all people and a population-weighted average time-series based on measurements in terms of error amount, with different error types evaluated. A percent attenuation in risk ratio (toward the null hypothesis of 1) is calculated as follows, with RR* representing the true risk ratio (obtained from the base case Poisson regression) and RR representing the risk ratio obtained using simulated population-weighted time-series.

Distribution of Measurement Error Simulations
Analysis of the distributions of correlation coefficients between the true log concentrations (i.e. the base case) and the simulated log concentrations, R(InZ, InZ*), for 1000 simulations for each pollutant and each error type demonstrates that the simulations contain on average the desired amounts and types of error (Figure 2, see Additional file 4, Figure S2 for distribution of error type results). Wider distributions were observed for more spatially heterogeneous pollutants.

Impact of Error on Health Risk Assessment
For the base case of 1-hour maximum CO measurements and CVD outcomes, a RR per ppm of 1.0139 was observed, with a 95% confidence interval (CI) of 1.0078-1.0201 and a p-value of 0.000009. With an IQR of 1.00 ppm, the RR per IQR and corresponding CI are the same as those on a per unit of measurement basis for our base case. For epidemiologic models using the timeseries with simulated error added, the RR and CI results are not the same on a per measurement unit basis and a per IQR basis because the IQR of the simulated values is not 1. As expected, the simulated time-series with error type C has a greater IQR than the base case since this error is scattered about the true values, and the simulated time-series with error type B has a lower IQR than the base case since this error is scattered about the simulated values. Results of 1000 epidemiologic models for each of 12 air pollutants and two error scenarios are summarized in Table 2. The reported p-values represent those calculated from average z-score statistics and 95% confidence intervals were calculated using the asymptotic standard error estimates obtained from the regression model. When instrument imprecision and spatial variability error were added as error type C, the average IQR of simulated time-series was greater than the IQR of the base case for all pollutants; for error type B, the average IQR of simulated time-series was less than the IQR of the base case for all pollutants. As expected, adding error to the base case resulted in a reduction of significance (i.e. a higher p-value) for both error types, as shown graphically in Figure 3. The greater the amount of error (i.e. the greater the population-weighted semivariance), the greater the reduction in significance observed. Primary pollutants (SO 2 , NO 2 /NO x , CO, and EC) had more error than secondary pollutants and those of mixed origin (O 3 , SO 4 , NO 3 , NH 4 , PM 2.5 , OC, and PM 10 ) due to greater spatial variability. Regarding error type, there was a greater reduction of statistical significance when error type was modeled as type C than when error type was modeled as type B. For NO 2 and SO 2 , which have the largest amount of measurement error, there was a loss of significance (p-value > 0.05) when error was modeled as error type C.
Risk ratio results for the two error types are plotted in Figure 4 on a percent attenuation basis. RR per unit of measurement decreased, and attenuation increased, with increasing error added (i.e. increasing population-weighted semivariance) when the error was of type C. However, RR per unit increased, with increasing bias away from the null, with increasing error added when error was of type B. For NO 2 and SO 2 , which had the most measurement error, the attenuation was 92% when modeled as error type C and biased away from the null by 31% when modeled as error type B. On a per IQR basis, variation in the RR estimates between error types was much less dramatic. Both error types C and B led to lower RR estimates (i.e. bias towards the null). For NO 2 and SO 2 , which again had the most measurement error, the attenuation was 86% when modeled as type C and 34% when modeled as type B error. For error type B there was a wider distribution of results than for type C error.
To assess a range of error types, simulations were generated with values of s InZ /s InZ* ranging from that of error type C to that of type B (eq. 10) for the case of an amount of error representative of CO (γ = 0.411). Epidemiologic model results for RR attenuation are shown in Figure 5. On a per unit of measurement (ppm) basis, RR attenuation increased from -24% (i.e. a bias away from the null) for type B error to 85% for type C error. On a per IQR basis, RR attenuation increased from 28% for type B error to 85% for type C error. It is interesting to note that for s InZ /s InZ* the error (Z -Z*) is independent of Z (i.e. R(Z -Z*, Z) = 0) and the RR per unit attenuation is 0. This is the expected result when error is the Berkson type on an unlogged basis.

Discussion
The results demonstrate that error type affects the reduction in significance as well as the RR estimate in the epidemiologic analysis. Moreover, the results demonstrate a profound effect of error type on the RR estimate per unit of measurement. The RR per unit of measurement estimate is increased by the presence of type B error; that is, there is a bias away from the null. To better understand these results, we estimate the attenuation in the effect estimator β (eq. 11) in the absence of confounders from the first-order linear regression coefficient (m) of error (Z-Z*) versus Z as follows.
For RR estimates near 1 (i.e. β values near 0) as is the case in this study, the predicted attenuation in RR is approximately given as follows.
RR per unit attenuation ≈ m (16) Epidemiologic model results are compared with the predictions of eq. 16 and eq. 17 for all pollutants and both error types ( Figure 6). The degree to which the epidemiologic results differ from these predictions likely indicates the degree to which confounding variables are affecting results. As shown by the 1:1 line in Figure 6, there is strong agreement between the attenuation predicted by analysis of the error model results (i.e. m and IQR) and that obtained from the epidemiologic model.
In this study, in which quantification of error is based on the variability between monitors, error due to spatial variation is much greater than error due to instrument imprecision, particularly for primary air pollutants [15]. Conceptually, therefore, we speculate that this error is more likely of the Berkson type, with true values varying randomly about a population-weighted average represented by the base case. If spatial error is best described by the Berkson-like type defined on a log basis (our error type B) and the mean of the measurements is the same mean as the true values, we estimate there to be a  24% to 34% attenuation in RR per IQR estimates ( Figure  4, right panel), and a 19% to 31% bias away from the null in RR estimates on a per unit of measurement basis (Figure 4, left panel), for the primary pollutants studied (SO 2 , NO 2 /NO x , CO, and EC) when using a populationweighted average as the exposure metric. For the secondary pollutants and pollutants of mixed origin (O 3 , SO 4 , NO 3 , NH 4 , PM 2.5 , OC, and PM 10 ), we estimate a 5% to 15% attenuation in RR per IQR estimates and a 2% to 9% bias away from the null in RR estimates on a per unit of measurement basis. We are currently investigating different methods for estimating actual error type based on simulated pollutant fields trained to have all of the characteristics, including the pattern of spatial autocorrelation, expected of true pollutant fields.
This study addresses error between measured and true ambient concentrations. Our results are consistent with previous finding that suggest that Berkson error, as defined on an unlogged scale (additive), produces no bias in the effect estimate [8,11] as shown in Figure  5; however, Berkson-like error defined on a log basis (multiplicative) can lead to risk ratio estimates per unit increase that are biased away from the null (although with a reduction in significance). Thus, the direction and magnitude of the bias are functions of error type. With the multiplicative error structure used here in conjunction with a linear dose response, large "true" values of air pollution would likely be underestimated, resulting in an overestimation of pollution health effects. We have shown how multiple air pollution measurements over space can be used to quantify the amount of error and provide a strategy for evaluating impacts of different types of this error. The results suggest that estimating impacts of measurement error on health risk assessment are particularly important when comparing results across primary and secondary pollutants as the corresponding error will vary widely in both amount and type depending on the degree of spatial variability. These results are suggestive of error impacts one would have from time-series studies in which a single measure, such as the populationweighted average, is used to characterize an urban or regional population exposure. The methodology used here can be applied to other study areas to quantify this type of measurement error and quantify its impacts on health risk estimates. Figure 6 Attenuation in the risk ratio per unit of measurement (left panel) and per IQR (right panel) due to the introduction of measurement error, modeled both as type B and type C error. Ranges denote standard deviations for 1000 simulations. One-to-one line is also shown.