### PROGRESS cohort description

Participants included in this analysis are mother and infant pairs enrolled in the *Programming Research in Obesity, Growth Environment and Social Stress* (PROGRESS) birth cohort study. Nine hundred forty eight pregnant women were recruited through the Mexican Social Security System (IMSS) from 2007 to 2011 and delivered a live infant. Women were eligible for the study if they were less than 20 weeks gestation, at least 18 years of age and planned to reside in Mexico City for the next 3 years. Women were excluded if they had a history of heart or kidney disease, consumed alcohol on a daily basis, used steroids or anti-epilepsy drugs or had a multiple gestation pregnancy. A more detailed description of the recruitment process can be found in detail elsewhere [19, 20]. The PROGRESS study protocols were approved by the institutional review boards (IRBs) of: the Icahn School of Medicine at Mount Sinai (IRB protocol number: 12–00751), Brigham and Women’s Hospital (IRB protocol number: 2006-P-001792) and the Mexican National Institute of Public Health (IRB protocol number: 560). Data from 393 mother-child pairs with an RVP assessment at 72 months of age, with complete data on blood metals and covariates, were selected for this work.

### Blood metal measurements

Prenatal metal exposure was determined using maternal blood samples drawn during the second trimester (between the 14th and 26th week of pregnancy) and third trimester (after the 26th week of pregnancy). Vacutainer tubes (Becton-Dickinson and Company, Franklin Lakes, New Jersey) were used to collect the samples, which were analyzed at the Lautenberg Laboratory for Environmental Health Sciences at Mount Sinai School of Medicine. Metal measurements were made using the Agilent 8800 Series triple-quadrupole inductively coupled plasma mass spectrometry coupled with high-performance liquid chromatography and a direct mercury analyzer.

### Rapid visual response

The measure of neurobehavior used in this study was the RVP, a test of executive function developed by Cambridge Neuropsychological Test Automated Battery (CANTAB, http://www.cambridgecognition.com/cantab). The test consists of a sequence of individual digits being presented to the participant who has to push a response pad when they detect a target sequence. The RVP task shares similarities with the Conner’s Continuous Performance Task (CCPT) [21], a test where decreases in performance have previously been linked to toxicant metal exposure [22,23,24]. The RVP measures several aspects of executive function, including: impulsivity, attention, working memory and processing speed. Two metrics of performance were used for this task: the overall percentage of responses that were correct (accuracy) and the median delay to a correct response (latency) to a correct response. Study personnel administering the test used a standardized protocol and were blind to the child’s level of metal exposure. Information on covariates includes: child sex, child age at RVP assessment, socio-economic status (SES), maternal education and maternal age at recruitment. The covariates were selected a priori. The SES index was determined using the 1994 Mexican Association of Intelligence Agencies Market and Opinion, which classified families into 6 levels based upon 13 questions regarding the characteristics of the household [25]. These 6 levels were collapsed into three SES tiers: low, medium and high.

### Weighted Quantile sum (WQS) regression

WQS was specifically developed to assess the joint effect of a combination of chemicals measured as a mixture. Applications have included metals and organics [26, 27]. Consider a set of *M* correlated metal concentrations at a single time point that are grouped into quantiles, *q*_{m}, and an outcome of interest *y*. The general form of the weighted index model is

$$ E\left[y\right]\approx {\beta}_0+{\beta}_1\left(\sum \limits_{m=1}^M{w}_m{q}_m\right)+{\boldsymbol{z}}^{\mathrm{T}}\boldsymbol{\phi}, $$

(1)

where *β*_{0} is the intercept, *β*_{1} is the regression coefficient for the weighted sum of the quantiled metal concentrations and is constrained to be either nonpositive or nonnegative, *z* = [*z*_{1}, …, *z*_{C}] is the set of covariates, *ϕ* are the regression coefficients corresponding to *z*, (∙)^{T} is the matrix transpose and *w*_{m} is the weight of the *m* th metal concentration, *q*_{m}, which is constrained such that 0 ≤ *w*_{m} ≤ 1 and \( {\sum}_{m=1}^M{w}_m=1 \) [14].

In order to estimate generalizable *w*_{m}, the data are first split into training and validation datasets; then *B* datasets are constructed using bootstrap samples drawn from the training dataset. For the results shown in this paper, the number of bootstrap datasets was 1000, the size of the training dataset is 40% of the subjects (approximately 157) and the size of the validation dataset is 60% of the subjects (approximately 236). All reported significances are those from the validation dataset. The proportions of the training/testing datasets are based upon those in [14]. Index weights are estimated for each bootstrap sample using maximum likelihood estimation in (1) and constrained such that the weights each must be both nonnegative and sum to one [14]. This means that the metals that have stronger associations with the outcome of interest for that particular bootstrap sample will have higher corresponding weights and those with less strong associations will have lower weights. The WQS index, *W*, is computed as:

$$ W=\sum \limits_{m=1}^M{\overline{w}}_m{q}_m, $$

(2)

where \( {\overline{w}}_m=\frac{1}{B}\sum \limits_{b=1}^B{w}_{m,b}s\left({\hat{\beta}}_{1,b}\right) \), with *w*_{m, b} being the *m* th metal from the *b* th bootstrapped subsample and \( s\left({\hat{\beta}}_{1,b}\right) \) is a measure of the significance of \( {\hat{\beta}}_{1,b} \), or a measure of signal strength, in the *b* th subsample [14]. For this study, we use the value of the chi-square goodness-of-fit test as the measure of significance. Thus, combinations of metals with the strongest associations with the outcome will have higher relative values compared with combinations with lower associations. The overall significance of the WQS index can be determined by assessing the significance of the regression coefficient corresponding to *W*, *β*_{WQS}, estimated from the following regression expression.

$$ E\left[{y}_v\right]\approx {\beta}_0+{\beta}_{\mathrm{WQS}}W+{\boldsymbol{z}}^{\mathrm{T}}\phi, $$

(3)

where *y*_{v} is the outcome data in the validation set [14]. All analyses in this study are adjusted for child age, child gender, SES, age of the mother at recruitment and maternal education. In order to test for a nonlinear relationship between the outcome and the derived WQS index, quadratic models of the following form are also assessed.

$$ E\left[{y}_v\right]\approx {\beta}_0+{\beta}_{{\mathrm{WQS}}_L}W+{\beta}_{{\mathrm{WQS}}_Q}{W}^2+{\boldsymbol{z}}^{\mathrm{T}}\phi . $$

(4)

### JWQS

JWQS extends WQS to the analysis of correlated exposures across time and enables the determination of the relative importance of exposures across time, thus providing information on critical windows for each exposure. Mechanistically, this is done by viewing the metal concentrations at different time points as exposures in a larger “exposure space” and performing WQS regression on this larger set of exposures. Thus, we use the term “joint WQS” analysis across both metals and time. This approach views time agnostically, meaning that there are no implicit constraints on the relationships between metal concentrations across time points as there are in a distributed lag model framework.

The JWQS framework for *M* metal concentrations at *T* time points is similar to the original WQS framework, with the following general model.

$$ E\left[y\right]\approx {\beta}_0+{\beta}_1\left(\sum \limits_{i=1}^{T\times M}{w}_i{q}_i\right)+{\boldsymbol{z}}^{\mathrm{T}}\phi, $$

(5)

where the weights, *w*_{i}, are indexed across all combinations of metals and time points. Estimation of the indices and the corresponding regression coefficient proceeds as in the previous section. Note that, since the sum of the weights is equal to one, it is possible to determine the relative importance of the metals across time, by adding together their corresponding weights across all time points. Similarly, it is also possible to estimate the relative importance of the time points themselves with respect to the metal exposure, by adding together all weights corresponding to the metals at each time point.

### MWQS

The MWQS approach incorporates the temporal nature of metal mixtures using a two-stage, hierarchical approach. In the first part of the analysis, a WQS model is determined on the *M* metals at each time point, producing weights, *f*_{m, t}, *m* = 1, …, *M*, *t* = 1…, *T*. Using these weights, a “total body burden index” [14], *v*_{t}, for each time point, representing the total impact of all metals together on *y*, is calculated as

$$ {v}_t=\sum \limits_{m=1}^M{f}_{m,t}{q}_{m,t}. $$

(6)

Then, these WQS (total body burden) indices, derived for each time point, are grouped into quantiles, *p*_{t}. Next, the final WQS model is estimated where the exposure are the quantiled total body burden indices for each time point, *p*_{t}.

Due to the fact that the exposures in the final WQS analysis consist of the total body burden indices for each time point, the weights, *z*_{t}, generated from this analysis determine the relative importance of each time point to the outcome *y*. Since these weights sum to one, they can be used to determine the relative importance of the *m* th metal at the *t* th time point, *w*_{m, t}, with respect to each other as

$$ {w}_{m,t}={z}_t{f}_{m,t}. $$

(7)

This means that the relative importance of each metal across time points can be determined in a similar manner to JWQS by adding together the weights, *w*_{m, t}, corresponding to the same metal across time points.

### Simulation setup

We use simulated datasets to demonstrate the application of the two proposed techniques and to compare their performances using a known ground truth. The simulated data are designed such that they have similar properties to PROGRESS data. This means that we modeled six exposures across two time points, i.e., *X*_{m, t}, *m* = 1, …. , 6 and *t* = 1, 2. These exposures are generated according to

$$ \boldsymbol{X}=\left[{X}_{1,1}\dots {X}_{1,6}{X}_{2,1}\dots {X}_{6,2}\right]\sim N\left(\mathbf{0},\boldsymbol{C}\right), $$

(8)

where **0** ∈ *ℝ*^{12} and *C* is the estimated Pearson correlation matrix of the metals across time in the PROGRESS data. We then generate an artificial outcome as

$$ y=\boldsymbol{X}\mathbf{B} +\upvarepsilon, $$

(9)

where **Β** is a set of regression coefficients and *ε* is independent and identically distributed white Gaussian noise with zero mean and unit variance. The number of samples is equal to 393, the number of subjects in our study. We consider two scenarios, the first is when **Β** = [0, 0.1,0.2,0, 0, 0, 0,0.15,0.15,0, 0, 0]^{T}, representing roughly equal weight between the two time points. The second is when **Β** = [0, 0.2,0.2,0, 0, 0, 0, 0, 0.15,0, 0, 0]^{T} and represents less equal weight between the two time points. Note that based upon [8], the first six elements of **Β** correspond to the effects of the exposures from the first time point and the second six elements correspond to the effects of the exposures from the second time point. These effects are similar to those observed in the PROGRESS data.

In these simulations, we generate 100 independent datasets and apply both JWQS and MWQS. We measure their performance using three metrics: accuracy, sensitivity and specificity. The WQS model designates an exposure as a significant contributor if the weight corresponding to that exposure is greater than 1 divided by the total number of exposures across time points, i.e., 1/12. We define accuracy as fraction of total weights, 12, where a modeled contributor is declared to be a contributor and a modeled non-contributor was declared to be a non-contributor. For example, the second, third, eighth and ninth exposures in scenario 1 are modeled contributors and the second, third and ninth exposures in scenario 2 are modeled contributors. We define sensitivity as the fraction of total modeled contributors that were declared to be contributors. Similarly, we define specificity as the fraction of total modeled non-contributors that were declared to be non-contributors. We report the results below.

### Data screening and analysis

The exposure data were grouped into deciles. Then, the data were split into a training set of 40% (approximately 157 subjects depending on the seed) and a testing set of 60% (approximately 236 subjects depending on the seed). All reported statistics and test-fit criteria are from the testing set. For both JWQS and MWQS, three different seeds for the assignment of subjects as being a part of the training set or validation set were used and the results of the models with the median Bayesian information criterion (BIC) were used. Linear models were compared with quadratic models and the model with the lower BIC value was used. In the derivation of the indices, the models were constrained to be either nonnegative or nonpositive. Models where the constraints were reversed were found to not be significant at a threshold of *p* < 0.1, so are not reported. Both outcome variables were log-transformed and treated as Gaussian distributed in the analyses. All analyses were performed using SAS 9.4.