Gene-environment interaction effects on lung function- a genome-wide association study within the Framingham heart study

Background Previous studies in occupational exposure and lung function have focused only on the main effect of occupational exposure or genetics on lung function. Some disease-susceptible genes may be missed due to their low marginal effects, despite potential involvement in the disease process through interactions with the environment. Through comprehensive genome-wide gene-environment interaction studies, we can uncover these susceptibility genes. Our objective in this study was to explore gene by occupational exposure interaction effects on lung function using both the individual SNPs approach and the genetic network approach. Methods The study population comprised the Offspring Cohort and the Third Generation from the Framingham Heart Study. We used forced expiratory volume in one second (FEV1) and ratio of FEV1 to forced vital capacity (FVC) as outcomes. Occupational exposures were classified using a population-specific job exposure matrix. We performed genome-wide gene-environment interaction analysis, using the Affymetrix 550 K mapping array for genotyping. A linear regression-based generalized estimating equation was applied to account for within-family relatedness. Network analysis was conducted using results from single-nucleotide polymorphism (SNP)-level analyses and from gene expression study results. Results There were 4,785 participants in total. SNP-level analysis and network analysis identified SNP rs9931086 (Pinteraction =1.16 × 10-7) in gene SLC38A8, which may significantly modify the effects of occupational exposure on FEV1. Genes identified from the network analysis included CTLA-4, HDAC, and PPAR-alpha. Conclusions Our study implies that SNP rs9931086 in SLC38A8 and genes CTLA-4, HDAC, and PPAR-alpha, which are related to inflammatory processes, may modify the effect of occupational exposure on lung function.


Background
Chronic obstructive pulmonary disease (COPD) is defined by decreased lung function, commonly measured by forced expiratory volume in one second (FEV 1 ) and the ratio of FEV 1 to forced vital capacity (FVC). Cigarette smoking is the most important environmental risk factor for COPD. However, COPD development in smoking populations is not universal. In a study conducted by the American Thoracic Society [1], approximately 15% of COPD may be attributable to workplace exposures; a similar estimate was reported in a more recent study [2]. Indeed, an association between COPD or poor lung function and occupational exposure to dust, gas, and fumes has been reported in both industry-based studies [3][4][5][6] and community-based [7][8][9][10][11] cross-sectional studies. In addition, mounting evidence indicates that genetic factors contribute to COPD. In fact, the disease aggregates in families [12], and genome-wide association studies (GWAS) [13][14][15][16][17] have identified single nucleotide polymorphisms (SNPs) that are associated with COPD or lung function.
To date, few study has comprehensively examined combined genetic and environmental effects on lung function. GWAS on lung function have considered genetic effects alone; a recent review [18] showed that most GWAS findings do not replicate consistently, which may imply a role played by environmental exposures and gene-environmental interactions. Studies of occupational exposure have only investigated the main effect of exposure on lung function. Therefore, some disease susceptibility genes may be missed due to low marginal effects, despite being involved in the disease process through interactions with environmental exposures [19]. Further, the majority of gene-environment interaction studies of lung function or lung function has focused on gene by smoking interaction [20][21][22][23]. However, no reports describe gene by occupational exposure interactions. In addition, most researchers investigated SNPs or genes through either in silico statistical methods or by gene expression experiments, but did not integrate these results. Thus, an integrated, genome-wide gene-environment interaction study can more readily identify disease susceptibility genes when environmental factors may also be contributing.
The purpose of this study was to investigate gene by occupational exposure interactions on FEV 1 and FEV 1 /FVC thorough SNP-level analysis and network analysis. Specifically, we first performed a GWAS using data from the Framingham Heart Study (FHS) to identify SNPs that interact with occupational exposure using a population-specific job exposure matrix (JEM) to affect (FEV 1 ) and ratio of FEV 1 to FVC. We next combined our GWAS SNP results with gene expression results to build a network of biological processes that are driven by networks and not by individual genes. The findings from these analyses uncovered several SNPs/Genes that may offer avenues for future functional studies of genes contributing to lung function changes following occupational exposures.

Study population
Our study population derives from the FHS [24], which includes only Caucasians. This study has recruited participants since 1948; there have been three generations of participants: the Original Cohort, their Offspring, and the Third Generation. Spirometry measurements, detailed medical history, physical examinations, and laboratory tests were done approximately every two years. We used the 4,785 participants with complete spirometry phenotypes, occupational information, genotypes, and related covariates from the Offspring Cohort and the Third Generation Cohort.

Ethics statement
Written informed consents were provided by all participants. Protocols were approved by local institutional review boards.

Spirometry phenotypes and covariates
Spirometry from participant lung function Exam 8 and the Third Generation Exam were used in our study. We used the FEV 1 and FEV 1 ratio (FEV 1 /FVC) as continuous outcomes. Age, gender, height (inch), pack-years, and smoking status were used as covariates in our analysis.
Smoking status (never, former, and current smokers) was coded as dummy variable.

Genotyping and quality control
Genotyping for 500,568 SNPs was conducted with approximately 550 K SNPs using the Affymetrix 500 K mapping array plus Affymetrix 50 K supplemental array in 9,237 subjects from the three generations of participants. We used 4,785 subjects from two generations in our study. A quality test was conducted using the PLINK software (version1.06, http://pngu.mgh.harvard.edu/~purcell/plink/). A total of 499 individuals with genotyping call-rate <95% were deleted, and the genotyping rate in the remaining individuals was 98.6%. We conducted the Hardy-Weinberg test for all SNPs, and found 19,546 SNPs had a p-value <1 × 10 -6 . These SNPs showed an obvious deviation from the 45-degree line of a QQ plot and were excluded from our analysis. A total of 34,110 SNPs had a per-SNP missing rate >5% among all subjects and were excluded. We also excluded 146,203 SNPs with minor allele frequency lower than 5% in our study subjects. After filtering, 300,709 SNPs remained for analysis.

Occupational exposure
We used a job exposure matrix (JEM) for occupational exposure assessment. Occupational exposure was classified as high versus low likelihood for dust exposure (coded as 1 and 0) according to their job categories based on a questionnaire in Offspring Exam 8 and the Third Generation Exam ( Table 1). The questionnaire asked, "Using the occupation coding sheet choose the code that best describes your occupation". There are 29 job categories (exclude retired) on the FHS occupation coding sheet, and we classified 4 of them as highly likely dust exposure including factory/assembly/mechanic, skilled labor, general labor, and heavy labor, as modified from UCSF COPD JEM (January, 2009 revision) [25].

Statistical analysis SNP-level analysis
The family-based cohort was analyzed using linear regression-based generalized estimating equations implemented in the GWAF package for R [26] to account for within-family relatedness. The within-pedigree correlation matrix was modeled using an exchangeable working correlation matrix. We analyzed each SNP separately using FEV 1 , and FEV 1 /FVC as outcomes. For each SNP, we included age, gender, height (inch), pack-years, smoking status, occupational exposure status, and tested for the main SNP effect. We next fit an interaction model by adding an SNP by occupational exposure status interaction to the model, and tested for the SNP-exposure interaction.

Network analysis
To identify important networks among the genes that interact with occupational exposure, we used Metacore software (GeneGo, St Joseph, MI, USA) tools to build networks using the gene lists. One gene list was obtained from a previous study [27], in which human airway epithelial cells obtained from 6 normal individuals were exposed to coarse, fine, or ultrafine particulate matter for 6 and 24 hours before gene expression was assessed. The authors reported 71 unique genes altered by the particulate matter (PM). To the best of our knowledge, no other studies have focused on the effects of dust exposure or PM on gene expression in human airway cells. Another list of genes was generated from our results. We chose SNPs whose interactions with occupational exposure had p-values smaller than 10 -3 (around top 0.1% of SNPs), and mapped these SNPs to genes to generate a gene list. Gene annotation was performed using the gene prediction track "RefSeqGenes" in the UCSC browser (http://genome.ucsc.edu).

Results
This study comprised 4,785 participants, with 1,247 participants from the Offspring Cohort and 3,538 participants from the Third Generation. Among them, 4,238 participants were classified in the less likely dust exposure group and 547 participants in the highly likely dust exposure group. Table 2 summarizes the characteristics for participants by both exposure group and cohort. Mean FEV 1 was higher in the group with highly likely dust exposure, but the mean FEV 1 /FVC was similar in the two groups overall. However, the proportion of COPD cases (defined by FEV 1 /FVC less than 70%) was higher in the group with highly likely dust exposure. Lung function was higher and the proportion of COPD cases was lower in the Third Generation as compared to the Offspring Generation, most likely attributable to the younger age of participants in the Third Generation.

SNP-level analysis
The genome-wide analysis for 300,709 SNPs (Bonferroni correctionα = 1.66 × 10 -7 ) detected a significant interaction effect of SNP rs9931086 in gene SLC38A8 on chromosome 16q23.3 and occupational exposure on FEV 1 (P interaction = 1.16 × 10 -7 ). Further, gene-environment interaction effects of SNPs rs3889785 and rs4234966 in the MARCH1 gene and SNPs rs17508671, rs17430621, and rs17508706 in the ZNF804A gene and occupational exposure on FEV 1 also had small P values (all P interaction < 3.04 × 10 -5 ). The genomic inflation factor value of 1.045 calculated from the QQ plot suggested the p-values were appropriately distributed. The results for the top 30 SNPs with strongest occupational exposure interaction effects on FEV 1 are shown in Table 3 and the Manhattan plot is shown in Figure S1 (see Additional file 1).
For gene by occupational exposure interaction effects on FEV 1 /FVC, although no SNP reached the Bonferroni corrected significance level (α = 1.66 × 10 -7 ), SNPs rs7297210, rs6486961, rs7297431, and rs1875467 in gene AEBP2 and SNPs rs10751811, rs2173524, and rs11595576 in ADARB2 had small P values (all P interaction < 4.32 × 10 -5 ). In addition, SNP rs7314308 had small P values for both the main genetic effect (P SNP = 5.43 × 10 -5 ) and the gene by occupational exposure effect (P interaction = 1.20 × 10 -5 ). The genomic inflation factor value of 1.043 suggested the p-values were appropriately distributed. The results for the top 30 SNPs with strongest occupational exposure interaction effects on FEV 1 /FVC are shown in Table 4 and the Manhattan plot is shown in Figure S2 (see Additional file 1: Figure S2).

Networks analysis
With a cut-off p-value of 10 -3 , we identified from our GWAS data 450 SNPs in 142 genes that interacted with occupational exposure to affect FEV 1 and 360 SNPs in 116 genes that interacted with occupational exposure to affect FEV 1 /FVC. From our gene list, we found three network clusters consisting of 3, 6, and 13 genes. Integrating the gene list obtained from the gene expression study after exposure to PM, we found a network cluster consisting of 55 genes; the three clusters identified from our gene list are included in this big cluster (Figure 1). The genes obtained from the gene expression results linked all the network clusters together. Some key genes that interacted with occupational exposure in the network were HDAC, CTLA-4, PPAR-alpha, and OPRM1. These genes play important roles in connecting different network clusters into a big network cluster. In addition, we found that all network clusters were centered by the network cluster built from those genes identified from the expression study after exposure to PM.

Discussion
To the best of our knowledge, this paper is the first study assessing gene by occupational exposure interactions on lung function using a community-based population. Our results from SNP-level analysis suggest that SNP rs9931086 in gene SLC38A8 may significantly modify the effect of occupational exposure on FEV 1 . The results of network analysis suggest that genes HDAC, PPAR-alpha, and CTLA-4 play important roles in the gene by occupational exposure interaction network.
The novel locus, SNP rs9931086, in gene SLC38A8 had a significant gene by occupational exposure interaction effect on FEV 1 , although its genetic marginal effect was not statistically significant. For each C allele increase, the FEV 1 decreased by 214 mL. We did not identify this gene through the network analysis; this may result, in part, from missing some genes that connect SLC38A8 to the main gene network. For other top SNPs in our study, SNP rs6941466 in SMOC2 (p = 3.21 × 10 -5 ) was associated with FEV 1 /FVC through gene by occupational exposure interaction. SMOC2 has been previously reported in both a gene main effect study [28] and a gene by smoking interaction study [23]. Another top SNP, rs1289714 (p = 5.25 × 10 -4 ), in HHIP had a gene by occupational exposure effect modification on FEV 1 /FVC. The HHIP region was previously highlighted in the GWAS of FEV 1 /FVC focus on gene main effect using FHS data. The recent gene by smoking interaction on lung function study [23] discovered three gene regions, DNER, HLA-DQB1/HLA-DQA2, and KCNJ2/ SOX9, with significant SNPs. However, the p-values for the SNPs in these genes were not statistically significant in our study.
The genes we identified through the network analysis are notable. Histone deacetylases (HDAC) are enzymes that regulate inflammatory gene expression. Several studies have shown the correlation between HDAC activity and COPD disease severity, measured by FEV 1 and FEV 1 /FVC, and that HDAC activity is reduced in COPD patients [29][30][31][32][33]. Our results are consistent with a recent large-scale GWAS that identified an HDAC4 association with lung function [34], which was also found in the gene by smoking interaction study [23].
Another inflammation-related gene, PPAR-alpha, is an isotype of the peroxisome proliferator-activated receptors, which can regulate the induction of inflammatory response [35]. One study showed that COPD patients had higher percentages of PPAR-alpha-positive alveolar macrophages and cells in their alveolar wall in COPD patients [36]. In addition, PPAR-alpha agonists may enhance the inducible isoform of nitric oxide synthase (iNOS) [37], and the expression of iNOS was found to increase in patients with severe COPD [38]. CTLA-4, the cytotoxic T-lymphocyte antigen 4, plays a role in downregulation of T-cell activation [39]. Further, the amount of T-cells correlates with the severity of airway obstruction and alveolar damage [40][41][42][43]. Genomewide linkage analysis of the Boston Early-Onset COPD Study also showed a significant peak for airflow limitation on chromosome 2q, in the region of genes CTLA-4 and ICOS [44,45]. CTLA-4 gene polymorphisms are also associated with chronic bronchitis or COPD [46,47].
In the network analysis, the cluster network built by results of the gene expression experiment after PM exposure was associated with cell cycle regulation. This network of cell cycle regulation connected genes HDAC, PPAR-alpha, CTLA-4 and may imply the possible underlying biological mechanism. Thus, the findings from SNP-level analyses and network analyses complement each other well. The genes found in our study are associated with inflammatory pathways. Genetic variation in the ability to mount an inflammatory response may explain varying individual responses to occupational exposure. Importantly, we observed two genes through SNPlevel analysis (ZNF804A) and network analysis (OPRM1) that interacted with occupational exposure to affect lung function. Although their functions remain to be uncovered, these two genes are significantly associated with many psychiatric diseases (e.g., schizophrenia, alcoholism, and tobacco use disorder) in several studies [48][49][50][51][52][53]. The comorbidity of abnormal pulmonary function or COPD and psychiatric diseases has been reported in many studies [54,55]. These psychiatric diseases might share some common pathway with COPD. However, we must exercise caution in interpreting these findings because smoking prevalence is higher among psychiatric patients [56]; nonetheless, another study controlling for smoking found schizophrenia to be an independent risk factor for COPD [57].
It should be noted that this study has several limitations. The first limitation is the lack of replication in population studies that use similar methods as the FHS, a community-based study with both occupational information and genome-wide genotyping. Therefore, our results are more exploratory in nature. Further, SNPcoverage was low for certain genes, which may result in missing several genes that might interact with occupational exposures. In addition, the mean FEV 1 is higher in the group with highly likely dust exposure. This may be explained by gender distribution, since the proportion of males was higher in the group with high likelihood of dust exposure. Another potential expla- nation is the healthy worker effect confounding bias [58][59][60]. Participants in the highly likely dust exposure group were likely those in better health since most dusty jobs (e.g. heavy labor) require workers to be in better health. This bias may result in underestimation of the occupational exposure effect and reduce the effect size of our findings. Finally, our occupational exposure status was classified based on a JEM having less detailed occupational information than the UCSF COPD JEM; thus, our estimates may not reflect the true frequency. The occupational information in our study derives from a cross-sectional questionnaire, which may not reflect the longest-held job (e.g., participants may have worked in dusty job for 10 years but had switched to a non-dusty job by the time of the survey). These measurement errors might result in an underestimation of the gene by occupational exposure interaction association.
Our study indicates that integrating the results of a gene expression experiment into the GWAS of geneenvironment interaction is informative for exploring novel genes. Future studies should increase the SNP coverage and measure occupational exposure more precisely by collecting more detail occupational information including the duration of each job and detail job titles. Moreover, future research could focus on examining the roles of CTLA-4, HDAC, and PPAR-alpha in the inflammatory airway process. Figure 1 Network for genes interacting with occupational exposure to affect lung function. Network analysis revealed network clusters of genes with occupational exposure interaction effects on lung function (circled) that are connected by the network cluster built from genes affected by exposure to PM [27]. Genes modifying the occupational exposure effect on lung function are indicated by circles (FEV 1 ) and rectangles (FEV 1 /FVC). All other symbols are explained in the Metacore legend (http://ftp.genego.com/files/MC_legend.pdf).