This article has Open Peer Review reports available.
Pooling samples for “top-down” molecular exposomics research: the methodology
© Shen et al.; licensee BioMed Central Ltd. 2014
Received: 17 June 2013
Accepted: 28 January 2014
Published: 13 February 2014
Exposomics is the cutting-edge concept of screening the environmental risk factors for disease. In the novel “top-down” approach, we estimate the molecular exposome by measuring all body fluid analytes in a case-controlled study. However, to detect diverse pollutants, a sufficient sample size and multiple analytical methods are required. This may lead to dramatically increased costs and research workload.
To help reduce complexity, we suggest a sample pooling strategy along with a scheme for combining both general unknown or multi-targeted screening with targeted analysis. The sample pooling method was tested using computer simulations.
By comprehensively analysis of pooled samples, it is possible to identify environmental risk factors. Factors are initially screened in the pooled case and control population samples, then in the randomized grouped and pooled case and control subpopulation samples. In the sample grouping, five or more pools were suggested for groups having 30 individuals per pool.
This study suggests that sample pooling is a useful strategy for exposomics research, which provides a hypothesis-free method for pollutant risk screening.
KeywordsExposome Exposure biomarker Metabolome Effect biomarker Case-control study Environmental pollutants Pooled sample
An estimated 70 to 90 percent of risk factors associated with chronic diseases are thought to arise from individuals’ exposure to environmental hazards [1–3]. This drastic connection raises interest in analyzing ubiquitous environmental risk factors which collectively constitute the “exposome” , a concept first introduced by Wild  and later advocated by Rappaport [4, 7]. There is particular interest in analyzing the role of chemical pollutants in epidemiological studies . Chemical pollutant exposomics thus provides an approach to systematically assess pollutant risk for defined health outcomes in a given population.
In a so-called “top-down” strategy [4, 7] all blood analytes including both small endogenous and exogenous molecules can be clustered into one of three groups: environmental chemical pollutants and their metabolic residuals i.e., chemical pollutants’ exposome, the blood metabolome response to chemical pollutant exposure, and response to other environmental risk factors such as noise, radiation etc. The goal of “top-down” exposomics is to identify biological analytes (i.e., biomarkers) relevant for a defined outcome or disease. Both environmental pollutants and endogenous metabolome need to be considered, although some analytes are sometimes confounded factors in the theory (for example, environmental pollutants and their metabolomic responses in biology).
The methodology discussed in this study majorly focuses on the broad analysis of environmental pollutants and their metabolites in case-control samples. We propose that the chemical pollutant exposome may be screened in a similar way as in disease-oriented metabolomics analysis . Furthermore, it may aid in the development of systematic biological models for assessing and illustrating the risk and metabolic impact associated with the exogenous exposome . Lastly, we show that exposure to arsenic in males is linked to oligozoospermia via analysis of pertinent metabolomic biomarkers .
Quantitative assessment of the risks associated to pollutant exposure [7, 11] can be challenging. For instance, molecular epidemiology usually requires large sample numbers in order to verify the hypothesized relationships. Furthermore, bio-monitoring of environmental pollutants is sometimes hampered by their diverse nature in necessitating a set of integrated methods [12, 13]. In addition, limited sample volume makes multiple measurements on individual samples difficult or impossible. Considered together, the requirement of large sample size and multiple analyses for each sample proves costly and labor intensive when working with human data. To help overcome these challenges, we propose a sample pooling strategy for molecular exposomics, in which fewer, but larger volume pooled samples are analyzed with the tradeoff losses of the signatures from individual observations in the case-control study.
Chemical oriented exposomics requires an integrated sample matrix
Persistent organic pollutants (POPs), which tend to accumulate in fatty tissues and redistribute among the other parts of the body via blood as a transport vector. These are usually measured by GC-MS (or HRGC-HRMS) with the exception of some perfluorinated compounds.
Readily degradable compounds, which can readily transform to their metabolites through phase I reactions and further on partly conjugate with endogenous molecules (e.g., glucuronate, sulfate, mercapturic acid ester, acetyl ester, and so on) through phase II reactions to be easily discharged in urine, bile and sweat. These are normally measured using LC-MS techniques.
Accumulated inorganic pollutants, which may be deposited in the kidney (such as cadmium) or bone (such as lead), and which are commonly measured by LC-ICP-MS.
Non-accumulating inorganic pollutants such as arsenic, commonly measured by LC-ICP-MS.
Samples and integration of analytical methods for chemical oriented instrumental measurement
Protein conjugated form
Metal or metalloid
Metal or metalloid
Pollutants which are more efficiently metabolized accumulate in urine, and urinary bio-monitoring provides a suitable approach for assessing their internal exposure doses. However, blood samples are still usually preferred for monitoring most persistent pollutants. Although many other types of human samples can be used to ascertain pollutant residues [12, 13], blood and urine are the two most viable due to sampling difficulty, analyte enrichment, and sample preparation complexity arising in the bio-monitoring of other tissue samples types.
To systematically assess the chemical pollutants exposome, it is necessary to employ a general, untargeted screening analysis. This can be followed by a targeted analysis based on initial findings from the pre-screening. For example, a screening analysis for pollutants could be addressed by using approaches such as adductomics for characterizing electrophilic chemicals , whereas multi-targeted pollutants analysis could run on priority lists suggested by the US EPA  or European Commission . Since untargeted screening analysis usually has a worse sensitivity than commonly used targeted analysis, it is necessary to utilize pollutant enriched matrices in order to trace environmental contamination levels. Although blood is suitable for the measurement of a diverse range of pollutants [12, 13], pollutant metabolites enriched in urine are usually less abundant than in blood, rendering urine complementary to blood for comprehensive exposome analysis.
Chemical-orientated exposomics requires an integrated analytical approach
Complementary techniques are also required for screening the diverse chemicals in urine and blood (Table 1). Usually, investigations of the chemical exposome are based on the different types of mass spectrometers [17–19]. For targeted and/or untargeted detection of various trace level compounds, high throughput and sensitive mass spectrometry (MS) techniques, such as high resolution MS (HRMS), time-of-flight MS (TOF-MS), and Orbitrap MS have been employed. At present, the coupling of different chromatographic separations by various MS with different ionization techniques provides the most sensitive and specific platform for coping with the wide variety of molecules present in human tissues. For example, in a common configuration, gas chromatography (GC) is used for the separation of thermally stable, volatile and less polar molecules, while liquid chromatography (LC) is used for the separation of thermally labile, non-volatile, more polar chemicals. Lastly, at least three different ionization sources are typically required: electron impact (EI) for volatile or semi-volatile organic pollutants, electrospray ionization (ESI) for readily ionizable water-soluble polar pollutants, and inductively coupled plasma (ICP) for inorganic pollutants.
To analyze different types of chemicals, multiple analyses of the same samples are also generally required. For instance, POPs such as PCBs and PBDEs can be analyzed by HRGC-HRMS at least 50 μL blood without a complex sample cleanup procedure , whereas, another 50 μL blood might be investigated for perfluorinated compounds by LC-ESI-MS . One integration scheme for dealing with systemic measurement is proposed in Table 1. By applying well-developed omics approaches to the task of biomarker mining [22, 23] it is possible to extricate risk factors from exposome data.
As discussed earlier, measurements of individual samples in an exposomic study for a large population results not only in an increased analytical workload, but can also rapidly become prohibitively expensive. Additionally, to profile and quantify the chemical exposome in a case-control study, large sample volumes are required from participants (typically blood, urine or both), in order to facilitate multiple measurements. In a typical bio-monitoring study, 2-5 mL of blood and 2-10 mL of spot urine are collected; however such a small sample volume may not be sufficient to screen for all pollutants. Furthermore, collection of larger sample volumes may lead to decreased study participation. The sample pooling strategy suggested in this study is designed to help mitigate these problems.
Preliminary exposome analysis using pooled population samples
For disease-oriented exposome analysis, we suggest pooling samples separately for case and control populations. Equal fractions of individual samples are mixed from each population separately, and afterwards qualitative and semi-quantitative screening of the composed samples can be conducted. To adjust for potentially confounding factors such as sex, age or race, we also advocate sample stratification before pooling . Sample pooling has the advantage that: (1) sample volumes are large enough for multiple measurements, and (2) analytical workload can be greatly reduced. Large sample volumes allow measurements to be repeated 6-10 times, which provides the power to demonstrate statistical significance of the analytical variation of the mean value [25–27]. In addition, large sample volumes are suitable for a wide range of analytes measurement by using the different methodologies.
However, while pooled samples facilitate the robust analysis of diverse chemical analytes, only population means (μ case and μ control ) are available following such analysis [25–28]. In other words, among the two sources of variation, only the between-measurement analytical variation but the between-subject biological variation is available. So analyte distributions in the case and control samples are forfeited, including the variances of (σ 2 case and σ 2 control ); therefore, the results compromise the comparison between the case and control . Although less straight-forward than for unpooled samples, one can nonetheless estimate analytical variation by performing at least 6 repeated measurements the blood or urine samples. Importantly, using pooled samples, it is still possible to compute the populations mean ratio (μ case / μ control ) or fold change (FC) between case and control, a common measure for establishing differences in analyte concentration. Furthermore, as with gene microarray analysis and metabolomics analysis, FCs can be used for establishing the references or cut-offs . Because pooled samples are based on diseases, the preliminary pollutant screening may directly link the chemicals to their risks.
Primary exposome analysis using pooled subpopulation samples
The inter-subpopulation variance analysis of the means of pooled subpopulation samples could augment disease-oriented analyte screening. It stands to reason that if there is a significant difference of an analyte X between case and control populations, there should also be a significant difference in the constitutive subpopulations.
Since subpopulations’ means and their inter-subpopulation variances can be measured in a cost-efficient manner, the sample pooling technique has been proposed for contaminant biomonitoring . Although statistically significant differences in sample means between case and control can be calculated using pooled samples, statistical sensitivity to detect differences may still decrease without individual variations. Therefore, pooled data have as a tradeoff lower sensitivity compared to individual data. However, if the numbers of pools are increased, the sensitivity of case-control differences calculated using subpopulation means would increase to the sensitivity calculated using individual data. Ultimately, it is a matter of finding an adequate balance between the number of sample pools and the required precision. To investigate this interdependency more thoroughly, we conducted simulations using data derived from a mathematic model (artificial data), and also applied the methods to data obtained from the measurement of urinary arsenic.
Simulation sample pooling strategy for case-control comparison artificial data
Simulation sample pooling strategy for case-control comparison: measured data
Descriptive statistics of arsenic species in the case-1, -2, and control subjects:
Arsenic in case-1 (n=157)
Arsenic in case-2 (n=140)
Arsenic in control (n=151)
Arsenite(As III )
Arsenate(As V )
In contrast to the artificial log-normally distributed data (Figures 1 and 2), pooled subpopulation means are not sensitive enough to recognize minor, but significant differences between case and control for arsenic for FC ≤1.5 (Table 2). This may be because the data distributions are not exactly log-normal in most situations. However, even if only the subpopulation means are measured, they are still sensitive enough to reveal health risk factors, as shown for AsV and total arsenic with FC ≥ 2, which have been linked to the male infertility risk in a dose-dependent manner .
Schematic methodology for exposome analysis
In a traditional molecular epidemiology study (i.e., case-control design), monitoring of individual samples without pooling may prove a sufficient approach. However, in molecular exposomics, a broad-spectrum strategy for hypothesis-free pollutant risk screening, sample pooling is a valuable tool for reducing costs, workload and the need for large individual sample volumes. In this context, the present methodology provides a viable alternative, starting with screening analysis in pooled population samples, followed by randomized grouping of subpopulation samples. Although it has some disadvantages, such as the loss of individual information, it is nonetheless a useful tool. We suggest that approximately five pools and 30 samples per pool from the case and control populations may be sufficient to investigate the disease risk of chemical pollution. However, more sample pools and more samples per pool will always lead to higher quality research.
We are grateful to Justin Feigelman for help with the style and language of our paper. This work was financially supported by the Chinese Academy of Sciences (CAS) Hundred Talent Program 2010 for Human Exposure to Environmental Pollutant and Health Effect, NSFC 2011 research foundation (21177123), the CAS/SAFEA International Partnership Program for Creative Research Teams (KZCX2-YW-T08).
- Lichtenstein P, Holm NV, Verkasalo PK, Iliadou A, Kaprio J, Koskenvuo M, Pukkala E, Skytthe A, Hemminki K: Environmental and heritable factors in the causation of cancer - analyses of cohorts of twins from Sweden, Denmark, and Finland. N Engl J Med. 2000, 343 (2): 78-85. 10.1056/NEJM200007133430201.View ArticleGoogle Scholar
- Hindorff LA, Sethupathy P, Junkins HA, Ramos EM, Mehta JP, Collins FS, Manolio TA: Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc Natl Acad Sci. 2009, 106 (23): 9362-9367. 10.1073/pnas.0903103106.View ArticleGoogle Scholar
- Willett WC: Balancing life-style and genomics research for disease prevention. Science. 2002, 296 (5568): 695-698. 10.1126/science.1071055.View ArticleGoogle Scholar
- Rappaport SM: Implications of the exposome for exposure science. J Expo Sci Environ Epidemiol. 2011, 21 (1): 5-9. 10.1038/jes.2010.50.View ArticleGoogle Scholar
- Callaway E: Daily dose of toxics to be tracked. Nature. 2012, 491 (7426): 647-647. 10.1038/491647a.View ArticleGoogle Scholar
- Wild CP: Complementing the genome with an “exposome”: the outstanding challenge of environmental exposure measurement in molecular epidemiology. Canc Epidemiol Biomarkers Prev. 2005, 14 (8): 1847-1850. 10.1158/1055-9965.EPI-05-0456.View ArticleGoogle Scholar
- Rappaport SM, Smith MT: Environment and disease risks. Science. 2010, 330 (6003): 460-461. 10.1126/science.1192603.View ArticleGoogle Scholar
- Zou W, She J, Tolstikov VV: A comprehensive workflow of mass spectrometry-based untargeted metabolomics in cancer metabolic biomarker discovery using human plasma and urine. Metabolites. 2013, 3 (3): 787-819. 10.3390/metabo3030787.View ArticleGoogle Scholar
- Krysiak‒Baltyn K, Toppari J, Skakkebaek N, Jensen TS, Virtanen H, Schramm KW, Shen H, Vartiainen T, Kiviranta H, Taboureau O: Association between chemical pattern in breast milk and congenital cryptorchidism: modelling of complex human exposures. Int J Androl. 2012, 35 (3): 294-302. 10.1111/j.1365-2605.2012.01268.x.View ArticleGoogle Scholar
- Shen H, Xu W, Zhang J, Chen M, Martin FL, Xia Y, Liu L, Dong S, Zhu Y: Urinary metabolic biomarkers link oxidative stress indicators associated with general arsenic exposure to male infertility in a Han Chinese population. Environ Sci Tech. 2013, 47: 8843-8851.Google Scholar
- Schramm K-W, Wang J, Bi Y, Temoka C, Pfister G, Henkelmann B, Scherb H: Chemical-and effect-oriented exposomics: Three Gorges Reservoir (TGR). Environ Sci Pollut Res. 2012, 1-6.Google Scholar
- Barr DB, Wang RY, Needham LL: Biologic monitoring of exposure to environmental chemicals throughout the life stages: requirements and issues for consideration for the National Children’s Study. Environ Health Perspect. 2005, 113 (8): 1083-1091. 10.1289/ehp.7617.View ArticleGoogle Scholar
- Smolders R, Schramm K-W, Stenius U, Grellier J, Kahn A, Trnovec T, Sram R, Schoeters G: A review on the practical application of human biomonitoring in integrated environmental health impact assessment. J Toxicol Environ Health B. 2009, 12 (2): 107-123. 10.1080/15287390802706397.View ArticleGoogle Scholar
- Rappaport SM, Li H, Grigoryan H, Funk WE, Williams ER: Adductomics: characterizing exposures to reactive electrophiles. Toxicol Lett. 2012, 213 (1): 83-90. 10.1016/j.toxlet.2011.04.002.View ArticleGoogle Scholar
- Priority pollutants.http://water.epa.gov/scitech/methods/cwa/pollutants.cfm,
- Endocrine disruptors.http://ec.europa.eu/environment/chemicals/endocrine/index_en.htm,
- Mounicou S, Szpunar J, Lobinski R: Metallomics: the concept and methodology. Chem Soc Rev. 2009, 38 (4): 1119-1138. 10.1039/b713633c.View ArticleGoogle Scholar
- Rubino FM, Pitton M, Di Fabio D, Colombi A: Toward an “omic” physiopathology of reactive chemicals: thirty years of mass spectrometric study of the protein adducts with endogenous and xenobiotic compounds. Mass Spectrom Rev. 2009, 28 (5): 725-784. 10.1002/mas.20207.View ArticleGoogle Scholar
- Liebler DC: Proteomic approaches to characterize protein modifications: new tools to study the effects of environmental exposures. Environ Health Perspect. 2002, 110 (Suppl 1): 3-9.View ArticleGoogle Scholar
- Lu D, Wang D, Ip HSS, Barley F, Ramage R, She J: Measurements of polybrominated diphenyl ethers and polychlorinated biphenyls in a single drop of blood. J Chromatogr B. 2012, 891: 36-43.View ArticleGoogle Scholar
- Ma W, Kannan K, Wu Q, Bell EM, Druschel CM, Caggana M, Aldous KM: Analysis of polyfluoroalkyl substances and bisphenol A in dried blood spots by liquid chromatography tandem mass spectrometry. Anal Bioanal Chem. 2013, 405 (12): 4127-4138. 10.1007/s00216-013-6787-3.View ArticleGoogle Scholar
- Nicholson JK, Lindon JC, Holmes E: ‘Metabonomics’: understanding the metabolic responses of living systems to pathophysiological stimuli via multivariate statistical analysis of biological NMR spectroscopic data. Xenobiotica. 1999, 29 (11): 1181-1189. 10.1080/004982599238047.View ArticleGoogle Scholar
- Peng S, Yan L, Zhang J, Wang Z, Tian M, Shen H: An integrated metabonomics and transcriptomics approach to understanding metabolic pathway disturbance induced by perfluorooctanoic acid. J Pharm Biomed Anal. 2013, 86: 56-64.View ArticleGoogle Scholar
- Li H, Grigoryan H, Funk WE, Lu SS, Rose S, Williams ER, Rappaport SM: Profiling Cys34 adducts of human serum albumin by fixed-step selected reaction monitoring. Mol Cell Proteomics. 2011, 10 (3): M110.004606-10.1074/mcp.M110.004606. doi:10.1074/mcp.M110.004606View ArticleGoogle Scholar
- Wahrendorf J, Hanck A, Munoz N, Vuilleumier J, Walker A, Hoover JJ: Vitamin measurements in pooled blood samples. Am J Epidemiol. 1986, 123 (3): 544-550.Google Scholar
- Caudill SP: Important issues related to using pooled samples for environmental chemical biomonitoring. Stat Med. 2011, 30 (5): 515-521. 10.1002/sim.3885.View ArticleGoogle Scholar
- Schisterman EF, Perkins NJ, Liu A, Bondell H: Optimal cut-point and its corresponding Youden Index to discriminate individuals using pooled blood samples. Epidemiology. 2005, 16 (1): 73-81. 10.1097/01.ede.0000147512.81966.ba.View ArticleGoogle Scholar
- Caudill SP: Characterizing populations of individuals using pooled samples. J Expo Sci Environ Epidemiol. 2010, 20 (1): 29-37. 10.1038/jes.2008.72.View ArticleGoogle Scholar
- Voigt K, Bruggemann R, Scherb H, Cok I, Mazmanci B, Mazmanci MA, Turgut C, Schramm K-W: Evaluation of organochlorine pesticides in breast milk samples in Turkey applying features of the partial order technique. Int J Environ Health Res. 2013, 23 (3): 226-246. 10.1080/09603123.2012.717915.View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.