For the present study, carried out in collaboration with the Grassy Narrows First Nation community, we merged information from the GN-CHA survey and two government biomarker monitoring programs (umbilical cord Hg and blood and/or hair Hg) to examine the associations between current reported symptoms from the GN-CHA survey and past Hg exposure, using cluster analyses, structural equation models (SEM) and longitudinal mixed effects models (LMEM) (Fig. 1).
GN-CHA survey
The GN-CHA survey for adults (18 years old and over), initiated in 2015, included 266 questions that covered many aspects of their life and health: demographics, education, generational attendance of residential school, work and income, food consumption, health status, diabetes care, wellness and mental health, injuries, disability, health care access, physical activity, smoking, drinking and drug use, community wellness and traditional culture. Most questions were taken from the First Nation Regional Health Survey 2008/2010 (FNRHS 2008/2010) [27], which provided a basis for comparison with other First Nation communities in Canada. Specific questions were added about fish consumption at different periods of one’s life, as well as illnesses and symptoms consistent with Hg poisoning. The questionnaire was pre-tested with 10 community members in July 2016. A house-to-house survey design was used, with the questionnaire on a web-based platform.
Survey administration was supervised by two field coordinators from Grassy Narrows; nine local interviewers went from house-to-house between December 2016 and March 2017. A total of 213 houses were identified on the reserve; 83.6% of houses were surveyed. Persons living off-reserve were recruited using convenience sampling. The results of the survey were presented to and discussed with community members during several small group and community meetings. The final report was approved by Chief and Council and a summary was made public on May 24, 2018.
A total of 391 Registered Grassy Narrows First Nation members (Band members) participated in the GN-CHA survey. At the time of the survey, 303 were living on-reserve and 88 off-reserve.
For the present study, we used the following variables from the GN-CHA: Demographics: age, sex, living on/off reserve, schooling, generational attendance of residential school in the family. Work: currently working (yes/no), currently looking for work (yes/no), reasons for not working (disability or illness/other). Food security: struggle to pay for food once a month or more in the last 6 months. Fish consumption: childhood fish consumption at 10 years of age (5 categories grouped into once a month or less/at least several times a month); walleye consumption over the past year (not at all/a few times/often). Lifestyle: current smoking (yes/no) and current alcohol consumption (heavy drinking (yes/no), defined in the FNRHS 2008/2010 [27] as 5 drinks in one drinking occasion at least once/month in the past 12 months). Health status: obesity (yes/no) was categorized using Body Mass Index ≥30 kg/m2 and at least one reported diagnosed chronic health conditions (yes/no), as listed in the FNRHS 2008/2010 (allergies, arthritis, asthma, cancer, chronic back pain, chronic bronchitis, diabetes, emphysema, heart disease, hepatitis, high blood pressure, liver disease, osteoporosis, rheumatism, stomach or intestinal problems, thyroid problems, and tuberculosis) and at least one diagnosed nervous system disorder (blindness, epilepsy, Bell’s palsy, cerebral palsy, muscular dystrophy, Kennedy’s disease, Parkinson’s diseases, Alzheimer’ disease, senile dementia, psychological/nervous disorders, cognitive/mental disorders, Attention deficit hyperactivity disorders (ADHD), learning disability).
The GN-CHA included 59 self-reported symptoms, rated on a 5-point Likert rating scale (“Never”, “Rarely”, “From time to time”, “Very often”, “All the time”). Higher scores on the Likert rating scale indicate greater frequency of symptoms.
Historic Hg biomarker data
At the request of Chief and Council of Grassy Narrows First Nation, the authors obtained from the First Nations and Inuit Health Branch (FNIHB) of the Ministry of Indigenous Services Canada and the Ontario Ministry of Health and Long-term Care, archived Band members’ historic biomarker data (hair and/or blood Hg concentrations), gathered between 1970 and 1997 and umbilical cord Hg data, collected between 1970 and 1992 at the hospital where Grassy Narrows women gave birth (Fig. 1).
Blood and hair samples were taken as part of a monitoring program of the Medical Research Branch of Health Canada [4]. The objective of the program was to identify persons whose Hg biomarker concentrations surpassed the guidelines of the time. There was no sampling strategy. The program primarily targeted fishing guides and their families, but anyone could volunteer and provide a sample [4]. Some years. The sampling period was longer than others and there was no consistency in the month that sampling was carried out.
Samples were analysed for total Hg at Health Canada laboratories [4, 28, 29]. According to Farant et al. 1981 [28], in the initial years, sample analyses were performed according to the Magos method [29] and later, a more efficient and less time-consuming method used an improved cold-vapor atomic absorption technique. The two methods were highly correlated for blood (r = 0.98) and hair (r = 0.97) [28].
Past Hg exposure databases
We created a past Hg exposure database, which included umbilical cord Hg concentrations for 211 newborns from Grassy Narrows, born between 1970 and 1992 (Fig. 2a) and blood/hair monitoring data, collected between 1970 and 1997, from persons born between 1884 and 1992. For the latter, we used the highest measure of equivalent hair Hg concentration for each year sampled; the details are described elsewhere [6]. A total of 3603 year-based equivalent hair measurements were available over the sampling period for 657 persons (Fig. 2b),
For the present study, we selected GN-CHA participants with at least one biomarker measurement: either an umbilical cord Hg measurement (n = 99) and/or at least 1 year-based hair Hg concentration (n = 208; 1018 measurements) for a total of 242 persons. The number of year-based hair Hg measurements for each person varied between 1 and 23; 75 persons (36.1%) had 5 or more measurements, while 32 (15.4%) had 10 or more. For childhood Hg exposure, we used the mean hair Hg concentration between 5 and 15 years of age (n = 137 (56.6%)). Childhood Hg concentrations were significantly higher for persons who reported childhood fish consumption frequency at least several times a month compared to those who ate less (Additional file 1, Fig. 1). Moreover, those who ate fish at least several times a month and had attended a residential school (outside of the community) had lower Hair Hg at 10 years of age (3.8 μg/g [1.7–5.9] (n = 16) vs. 5.8 μg/g [4.2–7.3] (n = 30)).
Estimated past Hg exposures
For participants who did not have an umbilical cord Hg measurement, estimated Hg values were predicted from linear regression models using the 211 umbilical cord data (Fig. 2a) on the year of birth for the three periods of Hg exposure (1970–1976, 1977–1987 and 1988–1992). These estimates were then adjusted on where the person’s mother had spent her pregnancy and/or gave birth. For those who were born or whose mother spent her pregnancy in the Wabigoon-English River region between 1970 and 1992, the mean measured umbilical cord blood Hg for their year of birth was used. For those born prior to the Hg discharge in 1962, we attributed 1.0 μg/L for cord blood Hg, while for those born between 1962 and 1970, mean cord blood Hg in 1970 was used (65.7 μg/L), and for those born after 1992, mean cord blood Hg from 1992 was applied (4.48 μg/L). Finally, for individuals, whose mother spent her pregnancy and delivered elsewhere, cord blood Hg was set at 1.0 μg/L.
For study participants, who did not have childhood hair Hg values, we derived estimates from the yearly-hair Hg database (n = 657), using the overall mean hair Hg for the year in which they turned 10 years of age. For participants, who were ≥ 10 years old in 1962 (beginning of the discharge) childhood hair Hg was set at 0.1 μg/g. For those who were 10 years of age between 1963 and 1970, the mean measured hair Hg in children in 1970 was used. Since there were few Hg data after 1990 and there appears to be a plateau over this last period (Additional file 1, Fig. 1), we attributed 0.81 μg/g for those who were 10 years of age after 1990. For all those who were 10 years of age after 1962, estimates were then adjusted on reported childhood fish consumption and having attended a residential school. We were unable to estimate childhood hair Hg exposure for 13 persons (5.4%) of the 242 study participants, who were missing hair Hg data between 5 and 15 years old and who did not answer the question about childhood fish consumption in the GN-CHA survey.
To validate the estimated Hg measures, we tested the correlations between measured and estimated values for both umbilical cord and childhood hair Hg, using non-parametric tests (Spearman).
Symptom clustering
Self-reported symptoms from the GN-CHA survey (n = 391) were clustered based on their Likert scale, using a simultaneous two-step multivariate segmentation approach [30,31,32]. One is agglomerative and aims at maximising a homogeneity criterion by successively aggregating the variables into clusters using a hierarchical ascendant clustering algorithm. The other is representative and creates a composite variable from a weighted linear combination of symptoms (synthetic variable that can be read as a gradient) for each cluster using a mixed factorial approach (PCAMIX method) [30, 32, 33]. For each composite variable (cluster), all participants received a score based on his/her symptom frequency.
A bootstrap approach was used for maximizing the homogeneity criterion within clusters and determining the suitable number of clusters. The desired number of clusters (K) was determined from the analysis of aggregation levels, stability of the partitions via bootstrapped mean-adjusted Rand Index and boxplots. The mean-adjusted Rand index was based on the generation of 60 bootstrap samples. Each cluster was carefully inspected based on its proportion of variation explained (a minimum of 50% was mandatory) and on degree of “closeness” among the symptoms being clustered (a minimum of 0.6 squared correlation of the variable with its composite variable was required). Some symptoms add noise while providing little or no information in identifying the underlying pattern inherent to the cluster [34]. Finally, we tested for the “driving” symptoms that best reflect the cluster. We validated the cluster results, using the VARCLUS function of the SAS computer application (JMP Professional 15.0 software).
A confirmatory factor analysis (CFA) was conducted to evaluate the construct validity of symptom clustering using a series of fit parameters [35,36,37,38,39], using the following indices: Chi-square divided by the degrees of freedom (χ2/df), Standardized Root Mean Square Residual (SRMR), Root Mean Square Error of Approximation (RMSEA), Comparative Fit Index (CFI) and Tucker Lewis Index (TLI). Internal consistency (scale reliability) for each cluster was tested using the Cronbach alpha [35, 40,41,42]. Construct validity for each cluster was also confirmed by the neurologist (T. S. Lena).
Structural equation models
SEM is a comprehensive methodology for representing and simultaneously testing a network of complex relationships. This technique can incorporate observed (measured) and unobserved variables (latent), while traditional techniques only allow for measured variables. The latent variable is an unobservable multidimensional construct that cannot be directly measured [43, 44]. SEM allows for data from multiple exposures and multiple outcomes to be considered simultaneously [45]. It is based on an interrelated system of linear regression equations that extend the possibility of relations between observed and latent variables, as well as direct and indirect (mediation) relations and allows for flexibility in modeling covariance structures [19, 39, 46, 47]. Finally, SEM provides path diagrams that are useful to understand and interpret the contribution of the various variables. The standardized factor loadings allow for pathway comparison.
In the SEM, we used the latent variable for symptom clusters rather than the composite variable because it better reflects the structural concept of co-occurring symptoms. A latent variable was likewise used for Hg exposure at different times (prenatal, childhood and having had ≥5 μg/g hair Hg at least once over the monitoring period). Each cluster was analysed using two SEM, the first with the latent Hg exposure variable and the second with the Hg exposure variables at the three different times. We tested possible direct and indirect (mediation) associations between Hg exposure variables and the symptom clusters. Age, sex, year of sampling, residence, as well a series of variables representing socioeconomic status (income, struggle to pay for food), drinking, smoking, obesity and medical history (diabetes, hypertension, heart disease, stroke, hypercholesteremia, separately and/or grouped into the variable “at least one diagnosed chronic health condition”), were tested a priori as covariates. We further tested whether sex modified the relation between past Hg exposures and symptom clusters (moderation). Moderated mediation was also tested with age and sex.
Variables with a skewed distribution were log (base 10) transformed. Because latent variables, Hg exposures and covariates, are all on different scales, we used standardized path coefficients to test the magnitude of relations. Standardized coefficients indicate the expected amount of change in standard deviation (SD) in the dependent latent symptoms cluster for an increase of one SD in the predictor, while other predictors (covariates) are kept constant. The adequacy of model fit to the data was determined using multiple indices: χ2/df, CFI, TLI, RMSEA, SRMR and Akaike’s Information Criterion (AIC) and Schwarz’s Bayesian information criterion (BIC) [35,36,37,38,39]. Diagonally weighted least squares (DWLS) adjustments in SEM were used to take into account the ordinal nature of symptom frequency outcomes. To improve the reliability of the latent construct’s scale or the models, we examined residual variances between indicators. Modification indices (MI) were used to assess the addition or deletion of variables and/or associations to improve the goodness of fit of the SEM. The sequence of initial fit, MI and refitting were repeated until the best fit SEM was attained.
To ensure that the database of 242 participants was sufficient to run SEM, two types of power analyses were performed: i) to detect model misspecification and ii) to detect target effect (influence of Hg exposure on the symptom cluster) [48]. A-priori, post-hoc, and compromise power-analyses were run simultaneously to detect model misspecification based on Chi-square likelihood-ratio and RMSEA tests of close and not-close fit, [49,50,51,52]. The power for detecting specific target effect was determined with simulated data in SEM [48].
For the SEM, validated prenatal and childhood Hg exposure estimates, described above, were used when there was no measurement, while for GN-CHA variables, missing data were imputed using Multivariate Imputation by Chained Equations (MICE).
To support the SEM, we ran a series of Directed Acyclic Graphs (DAG) for sensitivity analyses. DAG is a visual representation of causal assumptions that makes underlying relations explicit. It is used to detect overt bias, notably through backdoor paths that identify the presence of confounding [53, 54]. Modifications to DAG-implied adjustment sets include conditioning confounding and variables with ambiguous causing roles [55, 56].
Longitudinal mixed effects models
LMEMs were performed using direct measurements of longitudinal hair Hg with respect to the composite variable for each cluster. We chose mixed effects modeling (MEM) as it is robust to missing data and irregularly spaced sampling; it also handles both time-invariant and time-varying covariates [57]. MEM enables multilevel modeling and partitioning of the covariance structure (random effects). We tested a priori the following covariates (age, sex, variables of lifestyle and medical history, age at sampling and sampling season). The normality of residuals was tested using a q-q plot. The most appropriate model was selected using the Akaike Information Criterion (AIC), the Baysian Information Criteria (BIC) and the likelihood ratio (LR) test at p < = 0.05.
For the longitudinal approach, LMEM requires repeated measures. Because of the absence of a biomarker sampling strategy, over half (56%) of the 208 persons with hair Hg measurements had 3 repeated measurements or less. Despite the flexibility of MEM with missing data, we preferred not to impute Hg levels for people whose diet and environmental context varied over time, particularly when living off-reserve. For LMEM, we only included persons with at least 10 hair Hg measurements (n = 32 (17 men and 15 women)), with a total of 447 hair Hg measurements, covering a period from 10 to 23 years (median 21 years).
To ensure that there were sufficient observations for the analyses, we estimated the minimal required sample size, based on formulas from Hedeker and co-authors [58] and direct calculations using the G*power software [59,60,61]. Since one centimeter of a hair sample represents an accumulation of Hg during approximately 1 month [62], we used a low correlation of repeated measures between yearly-based samples (rho 0.1 and 0.2). Because the effect size was unknown, 0.25 was chosen. Power analyses were set at 80%, with a two-tailed 5% hypothesis test, 10 time points and up to four fixed factors; the minimum number of participants required was between 29 and 42.
Threshold of statistical significance in all analyses was set at p ≤ 0.05.
Database management and descriptive statistical analyses were performed using JMP Professional 15.0 (Statistical Analysis Hardware (SAS Institute). LMEM were conducted with Stata 16 software (StataCorp. 2019. Stata Statistical Software: Release 16.0. College Station, TX: Stata Corporation). All other analyses were computed using the R statistical software version 3.6.1. (R Core Team, 2016). We used the R package ClustOfVar [31] for clustering analyses and the MICE (Multiple Imputation by Chained Equations) package for multiple imputation of missing data in CFA and SEM. PairedData and dplyr packages were used for paired-t tests. CFA and SEM models were computed with the R package Lavaan [39]. The dagitty and ggdag R packages were used for analysing SEM and DAG. A-priori, post-hoc, and compromise power-analyses were computed using the semPower package. To assess power for detecting a target effect in SEM we used an online tool: yilinandrewang.shinyapps.io/pwrSEM/ [63].