The E3N cohort
The E3N study is a large ongoing French prospective cohort of women, set up in France in 1990. The study was approved by the French National Commission for Data Protection and Privacy (ClinicalTrials.gov identifier: NCT03285230); all participants gave written informed consent. The detailed protocol has been described elsewhere [42]. Briefly, 98 995 women born between 1925 and 1950, were included from the French national health insurance plan for people working for the national education system, the Mutuelle Générale de l'Education Nationale (MGEN). Women were enrolled in the cohort through a self-administered questionnaire, and were followed by self-administered questionnaires every two years. Detailed cancer risk factor data were collected through questionnaires at different time points during follow-up, including reproductive history, use of hormonal treatments, anthropometric characteristics, smoking habits, alcohol consumption, diet, and physical activity. The average follow-up rate per questionnaire cycle has been of 83%, and to date, the total loss to follow-up since 1990 has been < 3%.
Between 1994 and 1999, E3N participants were invited to donate blood, resulting in the collection of blood samples from approximately 25 000 participants. Each sample was separated into 28 aliquots (i.e. plasma, serum, buffy-coat, leukocytes, and erythrocytes) that were stored in plastic straws in liquid nitrogen containers (− 196 °C) in a biobank.
Breast cancer cases were identified through self-reporting in the questionnaires, from the MGEN files, or through information from death certificates. Deaths were reported by family members and by searches in the MGEN files and causes of death were obtained from the National Death Index. Pathology reports were obtained for the 93% of incident cases. We also considered cases for which pathology reports have not been obtained, because the proportion of false-positive self-reports was low in our study population (< 5%). Cases were identified up to 2013, which was therefore used as the date of end of follow-up in statistical analyses.
The nested case–control study on breast cancer
As previously described elsewhere [18], we identified 281 breast cancer cases for which at least three aliquots of serum and six aliquots of plasma were available in the biobank. From these, we excluded all cases who had not completed the dietary questionnaire in 1993 (n = 27) or who were diagnosed before the blood sampling and/or before the dietary questionnaire (n = 11). Cases of Paget’s disease and benign breast disease were also excluded (n = 3). Finally, 240 incident breast cancer cases were available. Due to budget constraints, among those, 194 incident postmenopausal breast cancer cases were randomly selected and included in the study. For each case, one control was sampled from women who were free of breast cancer at the time of diagnosis of the corresponding case (density sampling method). Controls were matched to cases by age (± 2 years), menopausal status at blood collection (premenopausal or postmenopausal), body mass index (BMI) at blood collection (< 25 or ≥ 25 kg/m2) and year of blood collection.
Measurement of biomarkers of exposure
Laboratory analysis of BFR and PFAS have been detailed elsewhere [18, 23]. Briefly, measurements of PFAS in serum were based on a preliminary alkaline digestion followed by a two-stage Solid Phase Extraction purification using polymeric Oasis HLB and graphitized carbon (ENVI-Carb) cartridges, before liquid chromatography coupled to tandem mass spectrometry (LC–MS/MS) measurement. Analysis of BFR involved a liquid/liquid extraction with pentane followed by determinations with gas chromatography (Agilent 7890A) coupled to high-resolution mass spectrometry (GC-HRMS). The quantification was conducted using the isotopic dilution method with 13C labeled analogous as internal standards. All the analyses have been conducted in an ISO 17025:2005 accredited laboratory. Serum and plasma lipids were determined enzymatically (Biolabo, Maizy, France). The following 7 BFR and 18 PFAS were measured: 2,2',4,4',5,5'-hexabromobiphenyl (PBB-153), 2,2',4,4',6-pentabromodiphenyl ether (PBDE-100), 2,2',4,4',5,5'-hexabromodiphenyl ether (PBDE-153), 2,2',4,4',5,6'-hexabromodiphenyl ether (PBDE-154), 2,4,4'-tribromodiphenyl ether (PBDE-28), 2,2',4,4'-tetrabromodiphenyl ether (PBDE-47), 2,2',4,4',5-pentabromodiphenyl ether (PBDE-99), perfluorobutane sulfonic acid (PBFS), perfluorodecane sulfonic acid (PFDS), perfluorobutanoic acid (PFBA), perfluoroalkyl phosphonic acid (PFPA), perfluorohexane sulfonic acid (PFHxA), perfluorododecanoic acid (PFDoA), perfluorohexane sulfonic acid (PFHxS), perfluoroheptane sulfonic acid (PFHpS), perfluorooctane sulfonic acid (PFOS), perfluorooctane sulfonamide (PFOSA), N-Methyl perfluorooctane sulfonamidoacetic acid (N_MeFOSAA), N-Ethyl perfluorooctane sulfonamidoacetic acid (N_EtFOSAA), perfluoroheptanoic acid (PFHpA), perfluorooctanoic acid (PFOA), perfluorononanoic acid (PFNA), perfluorodecanoic acid (PFDA) and perfluoroundecanoic acid (PFUnA).
Covariates
Logistic regression and BKMR models were adjusted for total plasma lipid content by the addition of a separate term in the model (ng/L, continuous), as recommended by Schisterman et al. [43]. The models were further adjusted for smoking status (never vs. ever), physical activity measured in metabolic equivalent tasks (MET)-hour/week (continuous), education (≤ 12 years, 12 to 14 years, > 14 years), personal history of benign breast disease (no vs. yes), family history of breast cancer (none, in first-degree relatives, in extended relatives), parity and age at first full-term pregnancy (FFTP) (no children, 1 or 2 children and < 30 years old at FFTP, ≥ 3 children and < 30 years old at FFTP, ≥ 30 years old at FFTP), total breastfeeding duration (never, ≤ 6 or > 6 months), age at menarche (years, continuous), current use of menopausal hormone therapy (yes, no), use of oral contraceptives (ever vs. never), age at menopause (menopause before age 51, menopause at age 51 or later). For the variables measured in different questionnaires, we took the value declared in the last questionnaire completed before the date of diagnosis of the case; for controls, the date of diagnosis of the matched case was used. Since BKMR and non-conditional logistic regression models do not account for case–control matching, these models were additionally adjusted for the matching criteria: age at blood draw (years, continuous), BMI at blood draw (kg/m2, continuous), and year of blood draw (continuous), except for menopausal status at blood draw, in order to avoid over-adjustment due to the a priori inclusion of age at menopause in the model.
The selection of confounders was done a priori, based on the known breast cancer risk factors available in the E3N dataset that are potentially associated with the exposures considered in the present study.
In our study population, missing values were < 5% for all variables and were imputed to the median (continuous variables) or modal value (categorical variables).
Statistical analyses
Substances for which more than 25% of the values were below the limit of detection (LOD) were eliminated from the analysis (namely PBFS, PFDS, PFBA, PFPA, PFHxA, PFDoA). For those substances which had 25% or less of the values below the LOD, these last were imputed to ½ of the LOD.
Supplementary material Table 1 presents the percentage of values below the limit of detection for each substance measured in the present study. Finally, blood levels of 7 BFR (PBB-153, PBDE-100, 153, 154, 28, 47, 99) and 11 PFAS (PFHxS, PFHpS, PFOS, PFOSA, N_MeFOSAA, N_EtFOSAA, PFHpA, PFOA, PFNA, PFDA, PFUnA) were included in the models.
Demographic characteristics of the study participants were reported using means and standard deviations or counts with percentages. Univariate conditional logistic regression models were performed to compare concentrations of PFAS and BFR between cases and controls. Exposure to substances were log-transformed for the analyses. Correlations between log-transformed PFAS and BFR concentrations were assed using Pearson’s correlation tests.
Approach 1 – Principal Component Regression (PCR)
A PCA with varimax rotation was performed on the matrix of log-transformed biomarkers, to identify a reduced number of uncorrelated components representing the exposure to substances. The number of retained components was chosen using several indicators: individual and cumulated explained variance, and interpretability and coherence of identified components [40]. Then, multiple logistic regression models were fitted, with the identified components scores as main exposure variables (continuous and categorized in quintiles groups based on the adherence distribution to the different components). In order to verify if the same patterns of exposure were observed among cases and controls, PCA was also performed separately among these two groups.
In order to investigate the hypothesis of a differential relationship between exposure to PFAS and BFR and breast cancer risk based on the tumor expression of estrogen receptors (ER– vs. ER +), the components scores previously identified were also included in two separate models: the first model included ER + cases and all controls, while the second model included ER– cases and all controls. Cases for which information on the estrogen receptors’ expression was missing were excluded from the analysis.
All these models were adjusted for the covariates described above, including matching criteria.
As a sensitivity analysis, conditional logistic regression models accounting for the matching of cases and controls were run on the overall study population, but not on specific ER + and ER- subpopulations. These models were not adjusted for the matching criteria.
Approach 2 – Bayesian Kernel Machine Regression (BKMR)
BKMR was proposed as a new approach to assess the effect of exposure to chemical mixtures on health [34]. An R package (‘bkmr’) exists for this purpose, with the possibility of adapting the model to binary outcomes, like breast cancer [35]. In the present study, the hierarchical variable selection mode was used. There are two possible levels of variable selection: a group selection, corresponding to BFR on the one hand, and PFAS on the other hand, and an individual substance selection within each group.
The model for binary outcome using the Probit link function with hierarchical variable selection is as follows:
For each subject i = 1,…,n:
$${\Phi }^{-1}(\mathrm{P}({\mathrm{Y}}_{\mathrm{i}}=1)) =\mathrm{ h}[({\mathrm{Group}}_{1} = ({\mathrm{z}}_{\mathrm{i}1},\dots ,{\mathrm{z}}_{\mathrm{iM}}) , {\mathrm{Group}}_{2} = ({\mathrm{v}}_{\mathrm{i}1},\dots ,{\mathrm{v}}_{\mathrm{iN}})] +{\mathbf{x}}_{\mathrm{i}}^{\mathrm{T}}{\varvec{\upbeta}}+ {\upvarepsilon }_{\mathrm{i}}$$
where:
Φ−1 = probit link function;
Yi = binary outcome (0/1), here breast cancer;
zi1, …,ziM, = exposure variables of group 1, here 11 PFAS (ng / mL of serum) as continuous, log-transformed and centered variables (i.e. subtraction of the mean);
vi1,…,viN = exposure variables of group 2, here 7 BFR (ng / L of plasma) as continuous, log-transformed and centered variables (i.e. subtraction of the mean);
h = flexible function of exposure variables, specified using a kernel function (exposure–response function);
xi = vector of covariates (see the list below);
β = vector of the corresponding coefficients;
εi = residuals.
First, the BKMR model with a hierarchical selection of variables provides two types of posterior inclusion probabilities (PIPs): the PIPs of each of the two groups (BFR and PFAS), and the conditional PIPs of each substance within groups. The PIPs are indicators of the contribution of each group or substance in relation to the outcome.
Secondly, the BKMR method estimates a univariate exposure–response function for each substance. This function consists of a section of the function h quantifying the relationship between a given substance and the outcome, while all other exposure variables are set at their median value. In the specific case of a binary outcome, these sections of h can be interpreted as the relationship between the exposure variable and an underlying continuous latent outcome, such as a biomarker of health status underlying the binary outcome [35].
Then, the BKMR method provides bivariable exposure–response functions, which are estimates of the relationship between exposure to a given substance and the outcome while one other substance is fixed at predefined percentiles (20th, 50th, and 80th) and all the others are fixed at their median value. This approach allows to highlight interactions between pairs of substances: an alteration in the dose–response curve of one substance at a different percentile of another substance suggests an interaction, while parallel lines indicate no interaction. With the hierarchical variable selection mode, only interactions between substances not belonging to the same group can be identified.
Finally, the cumulative effect of the overall exposure to the substances is provided by calculating the differences between the estimated value of h when all substances are fixed at a predefined percentile, compared to the estimated value of h when all substances are fixed at the 50th percentile, used as reference.
In the present study the number of iterations of the Markov Chain Monte Carlo sampler was set at 100,000 with non-informative default priors defined by the package.
As for approach 1, the model was performed first in the overall study population, and then separately for ER + cases and all controls, and for ER– cases and all controls, as described above.
As a sensitivity analysis, based on the results obtained from Pearson’s correlation tests and PCA, BKMR was also performed with a hierarchical selection including three groups: PFAS, PBDE, and, separately, PBB. All other parameters were kept identical to the main model.
The threshold for statistical significance was set at 5% and all statistical tests were two-sided. Statistical analyses were performed using SAS (version 9.4, SAS Institute) and R (version 4.0.3).