Study population
From the ongoing population-based prospective ENVIRONmental influence ON AGEing in early life (ENVIRONAGE) birth cohort [23], 199 mother-newborn (all singleton) pairs that were recruited between July 2014 and June 2015 were included. The study has been approved by the ethical committees of Hasselt University and East-Limburg Hospital and was conducted according to the ethical principles in Helsinki declaration [24]. Owing to missing exposure measurements (n = 2) and unavailable covariate information (n = 8), the final sample size in model fitting was 189.
The ENVIRONAGE birth cohort recruits mothers with a singleton newborn at arrival for delivery at the East-Limburg Hospital. At the first antennal visit (weeks 7–9 of pregnancy), maternal body mass index (BMI) was determined and the date of conception was estimated on the basis of the first day of the mother’s last menstrual period combined with the first ultrasonographic examination. We collected detailed information as newborn’s sex, gestational age, birth date, maternal age and parity from medical records. Ethnicity of a newborn was classified as European if at least 2 grandparents were Europeans and classified as non-European otherwise. Educational level of the mothers was coded “low” if they did not obtain a high school diploma, as “middle” if they obtained a high school diploma, and as “high” if they obtained a college or university degree. Maternal smoking status was categorized as “never smoked”, “former smoker” (when having quit smoking before pregnancy), or “smoker” (in case of continuing smoking during pregnancy). The presence of pregnancy complications was obtained from medical records on gestational diabetes, hypertension, hyperthyroidism or hypothyroidism, infectious disease, preclampsia, vaginal bleeding, phenylketonuria and allergy or asthma. Birth weight and length of newborns were also collected, based on which the Ponderal index (PI), an indicator of fetal growth status, was calculated according to Rohrer’s formulas: PI = 100 × Birth weight (in grams) / [Birth length (in centimeter)]3 [25].
PM2.5 exposure assessment
Daily PM2.5 concentration (μg/m3) measurements were obtained from the Belgian Interregional Environment Agency. Residential PM2.5 concentrations of the mothers during pregnancy was estimated for each mother’s address by the combination of land-cover data from satellite images and pollutant data from official fixed monitoring stations, processed by a model chain of spatial-temporal interpolation [23] and a dispersion model [26], providing high-resolution daily exposure values (on 100 m grids). Address changes during pregnancy were considered. In the Flemish region, this interpolation model predicts 80% of the spatial and temporal variance (based on R2) [27], and was validated with measurements of internal exposure in urine [10] and for gestational exposure with placental carbon load [11]. Based on the daily residential PM2.5 concentrations, we calculated the weekly mean PM2.5 concentrations of gestational week 1 to week 40 for each mother, with week 1 starting from the estimated date of conception. In case that gestational age was less than 40 weeks, the exposure values after delivery until week 40 were set to zero.
DNA methylation measurement
Details on cord blood sample collection are provided in Method S1, Additional file 1. DNA methylation was measured at the International Agency for Research on Cancer of Lyon in France. DNA extracted from buffy coat of cord blood samples collected at birth were used to determine the epigenome-wide DNA methylation profile through hybridization to the Illumina Infinium HumanMethylation450K BeadChip arrays [28]. The methylated (M) and unmethylated (U) signal intensities were detected and processed in R using the minfi package [29] to calculate the beta-value indicating the methylation level at each CpG site: β = M/(M + U + α), with α = 100 an offset value used to stabilize the calculation when both M and U are small [30]. The resulting beta-values, ranging from 0 (unmethylated) to 1 (fully methylated), were exported for quality control and pre-processing. The methylation data was filtered from cross-reactive probes and low-quality probes (probes with bead counts lower than 3 in at least 5% samples). Data quality was further evaluated by checking the distribution of the methylated and unmethylated signals. Sample outliers and gender mismatches were removed based on multidimensional scaling plots and the results of unsupervised clustering. Samples which had more than 1% of the probes with detection p-value > 0.05 were removed. The remaining DNA methylation data (485,577 probes) underwent functional normalization [31] using the minfi package. Each CpG locus was trimmed for potential outliers based on the range [Q1–3 × IQR, Q3 + 3 × IQR], with Q1 and Q3 the first and the third quartiles respectively, and IQR the inter-quartile range. Beta-values identified as outliers were replaced by missing values. Additionally, 40,590 probes targeting non-specific CpGs, 15,702 probes with missing values in over 20% of the samples, and 11,648 probes located on X or Y chromosomes were excluded. The remaining probes were examined if they contained single nucleotide polymorphisms located within 10 bps (SNP < 10 bps) of the target CpG, as suggested by Illumina [32], and the probe filtering for SNP < 10 bps was restricted to those found in newborns [33]. The CpG loci with their UCSC reference gene name referring to IGF2 or H19 were selected for the present study. In total, 145 CpGs mapping to IGF2 gene or its adjacent region including IGF2-INS and 62 CpG loci mapping to H19 were selected. Imputation methods were not applied because they tend to modify the correlation structure [34] and the amount of missing values was not substantial (Method S2, Additional file 1). We therefore excluded the CpG loci with incomplete records. 109 CpG loci of IGF2 and 53 of H19 remained for further analysis.
Statistical modelling
The beta-values and correlation structure of the CpGs mapping to each one gene was visualized using R package ComplexHeatmap [35]. In order to address the inter-correlation of the CpG loci [36], reduce the number of independent hypothesis tests and remove less relevant CpG loci, we performed factor analysis on the CpG loci mapping to IGF2 and H19, respectively. Factor analysis was performed based on the correlation matrices using iterative principal component factoring and varimax rotation, maximizing the sum of the squared loadings. Parallel analysis [37, 38] was used to decide the number of common factors to extract. The extracted factors were orthogonal to each other and the standardized factor scores were used as integrated measures for the actual methylation beta-values of the CpG loci and each entered a regression model as the response variable. CpG loci were selected as relevant to a factor if the absolute value of their factor loadings were larger than 0.45, which was visualized in chord diagrams using R package circlize [39]. Since the methylation array was performed in batches with 2 × 6 samples per each array chip, batch effects existed due to sample plate and sentrix positions (row- and column-coordinates). At least one of these grouping categories were introduced into the analysis as random effects, based on test results from the exact likelihood ratio test (LRT) for the presence of random effects for each of the IGF2 and H19 factors (Method S3, Additional file 1).
The models were adjusted for covariates chosen a priori based on previous studies [40, 41]. Those were parameters characterizing the mother and newborns: maternal age, pre-pregnancy BMI, educational level, smoking status, parity and presence of any pregnancy complication; newborn’s sex, gestational age, birth date, birth season and ethnicity. All categorical variables were contrast-coded, and all continuous variables except birth date were centered around the mean. The date of birth was calculated as the time difference in days between the actual birth date of each newborn and the first birth date in this data set. An interaction between newborn’s sex and PM2.5 exposure was included in the models based on previous evidence of sex-specific differences on molecular level in response to prenatal environmental exposures [42, 43]. Whether newborn’s sex was an effect modifier was assessed by performing a likelihood ratio test (LRT) on the interaction term. Afterwards, for each sex-group the change of factor scores for a 5 μg/m3 increment in PM2.5 concentration was estimated at each gestational week using distributed lag nonlinear models (DLNMs) proposed by Gasparrini et al. [44]. The DLNMs provide a flexible method to model the level of exposures while adjusting for lagged exposure values and thereby allows the identification of vulnerable exposure windows, which in turn provides hints on mechanisms through which exposure acts on fetal health [41, 45]. The exposure-response relationship and lag-response relationship are simultaneously involved in one model, via the construction of a cross-basis combining two basis-functions corresponding to exposure structure and lag structure, respectively. We assumed the exposure-response relationship to be linear and specified for the lag structure a natural cubic spline with 3 inner knots equally spaced along the original lag scale (week 1 to week 40) based on a previous study [41]. The total degree of freedom (DF) of the cross-basis was 5. The association between the factors and prenatal PM2.5 exposure was estimated for each gestational week. Based on the same DLNM models, the cumulative association over the entire pregnancy as well as for each trimester was calculated as the incremental cumulative predicted associations from gestational week 1 to week 40, from gestational week 1 to week 13, from gestational week 14 to 26, and from gestational week 27 to 40, respectively.
Since the maximum likelihood (ML) or restricted maximum likelihood (REML) estimators used in mixed model estimation are not robust to outliers, we applied a robust estimator using a smoothed Huber ψ-function and squared robustness weights. This method allows controlling the robustness on single observation level as well as on the group level. For the main analysis we had the estimator’s efficiency fixed at 95% relative to REML. This was done by setting the tuning parameter k for both fixed and random effects at 1.345 [46]. When likelihood needed to be calculated for model selection or LRT, the ML- or REML-estimator was used instead of the robust estimator.
In sensitivity analyses, (1) we excluded all pre-term birth observations to lower the potential influence of the missing PM2.5 measurements entering the model as zeros; (2) the flexibility of the natural cubic spline function for modelling lag structure was varied so that the total DF of cross-basis was compared between DF = 5 and DF = 7 or DF = 9; in parallel, an unconstrained DLM was fitted where all 40 weekly mean exposures entered the model separately; (3) the choice of robustness and estimation efficiency of the estimator was assessed, by comparing the main results (k = 1.345) to a non-robust REML estimator (k = ∞) and to a more robust but less efficient estimator (k = 1.69).
In order to address the functionality of DNA methylation in these two genes, we correlated the factor scores to cord blood DNA transcriptome. The detailed procedure of profiling the transcriptome, quality control, normalization and preprocessing is provided in Method S4, Additional file 1. In total, pairwise Pearson correlation was calculated between each factor and 29,164 transcripts. Based on the ascendingly ordered p-values of the correlation tests which were smaller than 0.05, the first 100 transcripts were used to perform the overrepresentation analysis (ORA) using the R package ReactomePA [47]. A pathway was considered significantly enriched if the p-value was smaller than 0.01 and q-value was smaller than 0.05 with at least 3 genes included in the functional set of a pathway. In addition, we assessed the association between the methylation level and fetal growth by regressing newborn’s birth weight or PI on the factor scores. The birth weight as well as PI were surrogates of fetal growth which have been reported to associate with changes in expression level of growth- and development-related genes [48]. The regression models were adjusted for maternal age, pre-pregnancy BMI, educational level, smoking status, parity, presence of pregnancy complications, newborn’s sex, gestational age, birth date, birth season and ethnicity. We assumed causality for the association among PM2.5 exposure, DNA methylation and fetal growth, and performed mediation analysis [49] to assess whether the change in IGF2/H19 DNA methylation mediates the association between prenatal PM2.5 exposure and fetal growth.
Data analyses were conducted in R (version 3.6.0) and SAS 9.4 (SAS Institute, Cary NC). Based on the number of extracted factors per gene, the family wise error rate (FWER) was controlled below 0.05 using the Sidak correction.