Correcting for the influence of sampling conditions on biomarkers of exposure to phenols and phthalates: a 2-step standardization method based on regression residuals

Background Environmental epidemiology and biomonitoring studies typically rely on biological samples to assay the concentration of non-persistent exposure biomarkers. Between-participant variations in sampling conditions of these biological samples constitute a potential source of exposure misclassification. Few studies attempted to correct biomarker levels for this error. We aimed to assess the influence of sampling conditions on concentrations of urinary biomarkers of select phenols and phthalates, two widely-produced families of chemicals, and to standardize biomarker concentrations on sampling conditions. Methods Urine samples were collected between 2002 and 2006 among 287 pregnant women from Eden and Pélagie cohorts, from which phthalates and phenols metabolites levels were assayed. We applied a 2-step standardization method based on regression residuals. First, the influence of sampling conditions (including sampling hour, duration of storage before freezing) and of creatinine levels on biomarker concentrations were characterized using adjusted linear regression models. In the second step, the model estimates were used to remove the variability in biomarker concentrations due to sampling conditions and to standardize concentrations as if all samples had been collected under the same conditions (e.g., same hour of urine collection). Results Sampling hour was associated with concentrations of several exposure biomarkers. After standardization for sampling conditions, median concentrations differed by ‒ 38 % for 2,5-dichlorophenol to +80 % for a metabolite of diisodecyl phthalate. However, at the individual level, standardized biomarker levels were strongly correlated (correlation coefficients above 0.80) with unstandardized measures. Conclusions Sampling conditions, such as sampling hour, should be systematically collected in biomarker-based studies, in particular when the biomarker half-life is short. The 2-step standardization method based on regression residuals that we proposed in order to limit the impact of heterogeneity in sampling conditions could be further tested in studies describing levels of biomarkers or their influence on health.


Statistical appendix
In order to standardize the biomarker concentration on sampling conditions (including hour of sampling, delay between urine collection and freezing), we take away from the observed biomarker concentration a value depending on how much the sampling conditions for subject i differ from the standard sampling conditions, i.e. those that should have been observed for the whole population in ideal conditions. This 2-step standardization method based on regression residuals is described below.
For each biomarker, we first fit a measurement model (Eq. A.1), corresponding to a linear regression model of ln-transformed biomarker concentration ln([Conc]) = Y, including as covariates all sampling conditions and potential confounders: Where (Y i ) measured is the measured concentration in subject i, β samp cond j is the regression parameter quantifying the effect of sampling condition X j on the biomarker's concentration, and Z k correspond to the potential confounders (age, socio-economic status, smoking, etc.).
The model's residuals ( i ) correspond to the variability in Y i not explained by sampling characteristics and potential confounders; i therefore includes the 'informative' part of the biomarker levels, in particular that due to variations in exposure.
In a second step, the standardized concentration (Y i ) standardized is estimated as: where X j std corresponds to the standard value for sampling condition j (e.g., 7:30 AM, in the case of sampling hour), i.e. to the sampling condition that would have been observed if the sampling protocol had been strictly followed for all study participants.
Equation (A.2) can be justified the following way: We start by writing that the expected ln-transformed biomarker concentration for subject i if sampling conditions correspond to the standard ones (and if potential confounders Z have the values corresponding to those observed for each subject i) is: Equivalently to (Eq. A.1), one can write: We now assume that the residuals of equations (A.1) and (A.3) are identical. This will be the case if these residuals are uncorrelated to the covariates X and potential confounders Z, and if the effect measure of X on Y is not modified by Z. With these assumptions, we now replace i in Eq. A.3 by its expression Eq A.4, which gives: which corresponds to equation A.2.
One can note that the standardized biomarker concentration corresponds to the measured one for subjects for whom samples were collected according to the standard sampling conditions (X i =X std ), and that the corrective factor applied to the measured concentration becomes larger (in absolute value) as X i moves away from X std , which corresponds to what one would intuitively expect.
In a third step (not presented in this article), one can use the standardized biomarker concentration to assess the relation between biomarker levels and specific health outcomes assessed in the same population. In doing so, several issues need to be kept in mind: 1) The approach outlined here yields an estimate of the standardized ln-transformed concentration; caution is required if one wishes to work on concentration itself instead of ln-transformed concentration; in theory, the estimated standardized (untransformed) 4 concentration may not correspond to the exponential of the estimated standardized lntransformed concentration (see e.g., [35], p.159-160); 2) The estimated standardized ln-transformed concentrations will have an additional source of variability, corresponding to the variability in the estimated regression coefficients corresponding to the effect of sampling conditions in the measurement model Eq. A.1; this variability will also vary with the observed value of the sampling conditions X. Regression models in which the standardized concentrations are used as covariates should take this change in variance into account.