Geographic risk modeling of childhood cancer relative to county-level crops, hazardous air pollutants and population density characteristics in Texas

Background Childhood cancer has been linked to a variety of environmental factors, including agricultural activities, industrial pollutants and population mixing, but etiologic studies have often been inconclusive or inconsistent when considering specific cancer types. More specific exposure assessments are needed. It would be helpful to optimize future studies to incorporate knowledge of high-risk locations or geographic risk patterns. The objective of this study was to evaluate potential geographic risk patterns in Texas accounting for the possibility that multiple cancers may have similar geographic risks patterns. Methods A spatio-temporal risk modeling approach was used, whereby 19 childhood cancer types were modeled as potentially correlated within county-years. The standard morbidity ratios were modeled as functions of intensive crop production, intensive release of hazardous air pollutants, population density, and rapid population growth. Results There was supportive evidence for elevated risks for germ cell tumors and "other" gliomas in areas of intense cropping and for hepatic tumors in areas of intense release of hazardous air pollutants. The risk for Hodgkin lymphoma appeared to be reduced in areas of rapidly growing population. Elevated spatial risks included four cancer histotypes, "other" leukemias, Central Nervous System (CNS) embryonal tumors, CNS other gliomas and hepatic tumors with greater than 95% likelihood of elevated risks in at least one county. Conclusion The Bayesian implementation of the Multivariate Conditional Autoregressive model provided a flexible approach to the spatial modeling of multiple childhood cancer histotypes. The current study identified geographic factors supporting more focused studies of germ cell tumors and "other" gliomas in areas of intense cropping, hepatic cancer near Hazardous Air Pollutant (HAP) release facilities and specific locations with increased risks for CNS embryonal tumors and for "other" leukemias. Further study should be performed to evaluate potentially lower risk for Hodgkin lymphoma and malignant bone tumors in counties with rapidly growing population.


Background
Childhood cancer has been linked to a variety of environmental factors, including agricultural activities, industrial pollutants and population mixing, but etiologic studies have often been inconclusive or inconsistent when considering specific cancer types. More specific exposure assessments are needed. It would be helpful to optimize future studies to incorporate knowledge of high-risk locations or geographic risk patterns. Bayesian methods have begun to predominate disease mapping applications [1]. This emergence has been largely attributed to advances in computer hardware that have enabled Markov Chain Monte Carlo implementations of relatively complex Bayesian models [2] and recently developed software has made these techniques readily available to health researchers [3]. One of the potential advantages for performing the risk estimation in a Bayesian approach is that the inference is based on parameter or risk certainty and the risk can apply to the lower organizational unit, such as individuals, in a hierarchal Bayes approach [1]. Thus, the risk estimate would apply to an individual considering alternative living locations.
Pesticide exposure has long been implicated as a cause of childhood cancer and has been the focus of multiple studies, however, an unambiguous mechanistic cause-andeffect relationship has not been demonstrated [4]. Some studies whose objectives were to evaluate pesticide exposure used cropping intensity as an exposure surrogate and implicated farm or rural living as a positive risk factor [5]. These and other geographic studies have concentrated on geopolitical boundaries or buffers around point sources and have led to inconsistent results when each individual cancer type is considered among studies [6][7][8][9][10]. Even if an association was consistent, rural communities are different from urban communities in a great many ways, including population density characteristics and the extent of industrial pollution. Further research should be focused on high-risk areas to evaluate specific exposures and specific cancer types.
Hazardous air pollutants (HAP) have been linked to increased cancer risks for individuals living in close proximity to major point source HAP-releases. For example, childhood cancers and leukemias in Great Britain exhibited geographical clustering of birth places close to environmental hazards that included large scale combustion processes, processes using volatile organic compounds and waste incineration [11][12][13]. When areal source HAP were modeled at the census tract level, modeled values were related to leukemia rates in California [14]. Automobile exhaust is an area-source HAP that has received considerable scrutiny as a potential cause of childhood cancer. The studies have shown conflicting results and a critical review concluded that the weight of the epidemio-logical evidence indicates no increased risk for childhood cancer associated with exposure to traffic-related residential air pollution [15]. If surrogate exposure, like proximity to releases, is related to a rare disease, like childhood cancer, then investigation should focus on the higher risk locations.
Infectious causes of childhood cancer have been proposed and population characteristics of stability or mixing have been proposed and evaluated [16]. An Ohio study examined the geographic distribution of childhood leukemias relative to population density, population growth, and rural/urban locale. The study found higher rates for acute lymphocytic leukemia among the counties with most rapid population growth and the most urbanized counties had reduced risk for acute myeloid leukemia. The authors reasoned that the findings supported population mixing as a cause of some childhood cancers [17]. Mixing at the population level must have risks that can be estimated and communicated at the individual level. The risks for an individual to move or to be exposed to movers should be parsed and estimated in a more focused study.
The three types of proposed causal factors (cropping, HAP release and population density characteristics) are especially likely to be confounded in Texas where the spatial relationships between agricultural activity, industrial locations and characteristics of the population are especially complex. The objective of this study was to perform Bayesian geographical risk modeling of childhood cancer accounting for potential correlations among histotypes. Geographic patterns were assessed relative to county-level cropping intensity, intensive industrial releases of HAP and population density and growth. The goal of the study was to estimate the risk to an individual child based on specific characteristics of the mother's living location at the time of childbirth. Once higher risk locations are identified and characterized, more specific personal risk models can be developed.
Cancer diagnoses were grouped into 19 groups based on the most recent International Classification of Childhood Cancers (ICCC-3) [18]. Some pooling of very rare cancer types was performed as follows: childhood cancer subgroups Ic, Id and Ie were pooled and assigned the name "other leukemias"; subgroups IIb, IIc, IId and IIe were pooled into a single group and were labeled "non-Hodgkin lymphoma"; and subtypes IIIe, and IIIf were pooled into a group called "other CNS tumors." The database provided records for 3718 cancer cases distributed among 19 histotype groups and 3,805,745 total births.

County-level agronomy practices
To evaluate annual crop production, data were retrieved from the Texas Almanac Characterization Tool Version 2.0.4 (Blackland Research and Extension Center, Texas Agricultural Experiment Station, Texas A&M University System, 720 East Blackland Road, Temple, TX, USA). By acreage, there are four major crops in Texas: corn, soybeans, wheat, and sorghum. When the combined total acres planted in these crops exceeded 20% of the county's total area, the county-year was classified as extensive cropping. The definition was chosen to identify the highest production locations but also to maintain an adequate number of high production county-years for estimation stability.

County-level HAP
Hazardous air pollutants are substances that are known to be carcinogenic or to cause other serious health problems. The Environmental Protection Agency (EPA) currently identifies and records the release of 188 HAP. The data regarding Texas industries with air emissions of chemicals were available from the Toxic Release Inventory (TRI) program, a publicly available database of toxic chemical releases. This inventory was established under the Emergency Planning and Community Right-to-Know Act of 1986 (EPCRA) and expanded by the Pollution Prevention Act of 1990. The EPA compiles the TRI data each year and makes it available through several data access tools, including the TRI Explorer and Envirofacts. The data are available as either county emission summaries (countylevel) or facility-specific emissions (point-source). Releases from four industries, petroleum refineries (Standard Industrial Code (SIC) Major Group 29), petroleum refining and related industries (SIC Major Group 33), chemical industries (SIC Major Group 28) and plastics production (SIC Major Group 30), were retrieved. The total releases were summed to identify high-release county-years. For year-to-year consistency, the list of 1988 core chemicals was used. A county-year in which 100 tonnes of toxic substances were released was considered to be high intensity HAP release. This definition identified the highest release county-years while maintaining enough intensive-release county-years for estimation stability.

County-level population density
Counties were classified on population estimates from the US census bureau; the same source was used for estimates for intercensus years. County-years with populations of more than one million were classified as metropolitan and county-years with more than 50,000 residents were classified as urban. These are the standard definitions used by the U.S. census. County-years that showed population growth of more than one percent from the previous year were classified as rapid growth. The definition was chosen to be comparable to a recent study that evaluated a similar growth rate [17].

Disease Modeling
The hierarchical modeling approach followed a general framework. The observed counts Y kij of childhood cancer histotype k in county i and year j were assumed to follow independent Poisson distributions conditional on an unknown mean E kij exp(u kij ) The expected count for histotype k in county i, and year j (E kij ) was obtained by internal standardization from the given dataset such that the sum of observed cases for each histotype was exactly equal to the sum of expected cases for each histotype accounting for race. Race was defined as the mother's race as identified as one of four classes on the birth record: white, black, Hispanic and other. Year was defined as the calendar year of birth, 1990 to 2002, inclusive. Hence exp(u kij ) is the standardized morbidity ratio (SMR). County-years with exp(u kij ) > 1 had greater number of observed cancer cases than expected, and vice versa for counties with exp(u kij ) < 1. The log-SMR u kij was modeled linearly for k = 1,..., 19 histotypes and i = 1,..., 254 counties and j = 1,...,13 years, as The α k represent the histotype-specific intercept terms for the baseline log-SMR across all counties and were assigned 19 independent flat priors. The S ki represent the county and histotype-specific log-SMR due to unmeasured or random county effects. Indicator variables (HAPS ij , CROPS ij , METRO ij , URBAN ij and GROWTH ij ) were derived from the data as previously described for high intensity HAP release, high crop production, metropolitan, urban, and rapid population growth county-years, respectively. The β's represented the log-relative risk for the county characteristics and were assigned a non-informative Normal prior distribution.

Disease Mapping
The risk modeling was extended to derive overall spatial estimates for the 254 Texas counties from the 3302 county-years in the model previously described. Some of the geographic risk factors changed within a county from year to year. To evaluate each county's overall risk the mean expectation for each risk factor was calculated from the 13 years and used to estimate the county's overall risk attributable to the measured factors. The spatial model also adjusted risks for spatial associations and histotype correlations for the potential MCAR relationships that were estimated fully conditional upon all factors in the Disease Model, described previously. The parameterization used for spatial modeling was the posterior probability that the SMR estimate was greater than one [19]. This parameter is affected by both the magnitude and the precision of the SMR and was chosen to facilitate the objective of focusing further research on high-risk location and histotype combinations. The approach of establishing the probability of an increased risk is generally considered the first step for investigating a possible cluster and served the objective of identifying the locations with highest likelihood of elevated risk for further geographically focused studies. Spatial estimates were plotted using commercially available GIS software (ArcView ® GIS 3.2, Environmental Systems Research Institute, Inc., Redlands, CA).

All modeling
All models employed Bayesian inference, with vague or flexible prior beliefs and an MCMC implementation. The MCMC implementation was performed by use of Win-BUGS version 1.43 [3] and GeoBUGS version 1.2 [20]. The initial 1,000 iterations were discarded to allow for convergence and every hundredth of the following 100,000 iterations were sampled for the posterior distribution. The Bayesian estimate was taken as the posterior median of the parameter and 95% credible set was obtained from the posterior distribution quantiles.
Observing convergence of two chains with widely different initial values for the random-effects precision parameters checked convergence to the posterior distribution.

Results
Two hundred and fifty four counties were modeled for 13 years providing 3302 county-years. The majority of county-years (79.1%) were classified as rural with a population of less than 50,000. For each year of the study there were exactly 4 metropolitan counties having more than one million residents: Bexar, Dallas, Harris and Tarrant counties. Population growth varied widely with population losses of more than 1% to population growth of greater than 4% both common. Growth of greater than 1% occurred in 41.7% of the county-years ( Figure 1). The amount of HAP-release was commonly less than 50 tonnes per county-year but some very high releases were recorded, with 15.8% of the county-years having greater than 100 tonnes of release ( Figure 2). Most county-years had less than 10% of the county area planted in corn, sorghum, cotton and wheat; however some county-years had greater than 50%, with 20.1% of the county-years having greater than 20% of the county cropped with these four crops ( Figure 3).
Children born January 1, 1990 were followed for 13 years and children born January 1, 2002 for one year. The counts of incident cases by histotype and year are listed in Table 1. Independent random walk priors were used to allow autoregressive temporal smoothing for each histotype. Temporal trends were readily identifiable and they varied considerably among histotypes. Two cancers with the greatest decrease in risk over the period of study were malignant bone tumors (e.g. osteosarcoma) and Hodgkin lymphoma. Two cancers with relatively steady risk over the study period were AML and "other leukemias." The temporal smoothing parameters used in the study are presented in Figure 4.
For the combination of five geographical risk indicators and 19 cancer types, there were no SMRs whose 95% credible sets were above one. Hodgkin lymphoma appeared to be occurring with reduced risk in rapidly growing counties with > 90% of the posterior distribution less than one for SMR. There was support for an increased risk for hepatic tumors associated with high-release HAP locations and for germ cell tumors and "other" gliomas among high Frequency distribution of county-year population growth rates Figure 1 Frequency distribution of county-year population growth rates.  Table 2.
Risk maps identified counties for which the posterior likelihood of elevated SMR was greater than 95% for four cancers: other leukemias in Hidalgo County ( Figure 5), CNS embryonal tumors in Ector County (Figure 6), CNS other gliomas in Parker, Tarrant and Harris Counties ( Figure 7) and hepatic tumors in Parker, Tarrant and Smith Counties ( Figure 8). Ten of 19 cancer histotypes had greater than 90% posterior probability of SMR greater than one for at least one county. The maps also showed spatial correlation among areas of elevated risk.
The correlations among histotypes and within countyyears in the final model were generally near zero, ranging from -0.35 to 0.32.

Discussion
The investigation reported here estimated personal risks for a child to develop cancer. This risk was defined by the mother's living location at the time of birth. Tumors with peaks in infancy were of special interest because they are more likely to have had causal exposures during the prenatal period. There are several childhood cancers known to have incidence peaks early in the infancy including neuroblastoma and other peripheral nervous cell tumors, retinoblastoma, renal tumors and hepatic tumors. Acute lymphocytic leukemia has a peak in infancy that is prominent among white children but less evident among black children. There are also histotypes with peaks in infancy and another peak later in childhood, including "other" leukemias and germ cell tumors, trophoblastic tumors and neoplasms of gonads [21]. Cancers with known incidence peaks in infancy showed temporal trends with relatively slow decrease in incidence for birth years 1990 to 2002. In contrast, the observed risk for cancers with incidence peaks in teenage years, Hodgkin lymphoma and malignant bone tumors [21] showed marked decline for the birth years 1990 to 2002. The temporal trends observed in the current study can be attributed to the latency period for the cancers and the variable period for follow-up. Although the primary exposure period of interest was the prenatal period for the current study, there is also interest in critical periods of exposure including earlier in gestation and the neonatal period. Also, it may be that many environmental exposures act not as tumor initiators, but as tumor promoters, so that exposures closer to diagnosis are also of interest. These were issues not addressed in the current study. Risk estimates were computed under a Bayesian paradigm maintaining sources of uncertainty in the risk estimates.
The county-level parameters were used as potential indicators of high-risk locations for further study and were selected from the conflicting evidence supporting their possible role as causes of childhood cancer. In general, it is not expected that the association between exposure and risk is linear for these geographic factors. The current analysis evaluated the risk of the extreme values for these potential indicators as observed in Texas. Cut-points for analysis were based on high values that allowed an adequate number of county-years (i.e., 15-20%) to be classified as "at risk." Even though Texas is considered an agricultural state there were only a low number of countyyears with greater than 20% of the land area in intensive crop production. Studies in other locations may be able to evaluate a much higher cut-point. In contrast, the current study evaluated a very high cut-point of 100 tonnes of HAP. The population parameter cut-points for metropolitan and urban are used commonly by the U.S. census to classify counties. The identified factors could be related to many unknown potential causes thus the potential for confounding limits causal inference. It was the objective of this study to use county characteristics to focus further study. Once high-risk counties and their characteristics are Frequency distribution of county-year cropping intensity for total corn, sorghum, wheat and cotton Figure 3 Frequency distribution of county-year cropping intensity for total corn, sorghum, wheat and cotton.    identified, studies more specific to identifying environmental causes will become feasible.

Study-specific temporal effects
The precision for geographic risk estimates has been especially problematic in the study of childhood cancer. It has been proposed that broader geographic regions could increase the precision of areal risk estimates for rare diseases [22]. However, aggregating of areal units will reduce the resolution of the GIS risk layer and will alter the relationship with a GIS exposure layer. Aggregation problems can result from the possibility of combining areal units that are actually very different in risk. The two problems created with using broader geographic regions, resolution and aggregation, are collectively known as the modifiable areal unit problem (MAUP). Combining spatial neighboring counts can be effective if the neighbors are very similar but the pooling would lead to non-differential risk classification if neighboring areal risks are dissimilar. Hierarchical approaches have been proposed to estimate the extent of correlation among neighboring locations and then adjust the risk estimates accordingly. The justification for Bayesian hierarchal modeling with vague priors is that the data likelihood will determine the extent of this pooling.
More specific causal studies should involve geographic risk modeling with a more precise geographic scale. The current study had available geocoordinates for individual births so it was theoretically possible to plot a continuous risk surface with a Bayesian geo-statistical approach [23] or more traditional approaches for cluster identification could have been used [24]. For the current study, the geographic factors were provided at the county level and, thus, dis-aggregation of the exposure to smaller geographic units, for example census tracts, could have led to an ecologic bias. However, TRI releases are available for point-source releases at specific geo-coordinates and detailed risk mapping in proximity to these sites is possible and should be the subject of further investigation. The current study identified locations for which this approach would most likely be rewarding.
In the posterior distribution, correlations among histotype pairs were small but ranged from moderately nega-Spatial risks for "other" leukemias by county Figure 5 Spatial risks for "other" leukemias by county.
Environmental Health 2008, 7:45 http://www.ehjournal.net/content/7/1/45 tive to moderately positive correlations. All non-zero correlations contribute to increased precision. The correlations were estimated fully conditionally on the geographic factors and were much smaller than in a previous study that did not identify attributes of specific locations [25]. As cancer risk modeling proceeds with geographic risk factors more precisely defined, the correlation among histotypes will eventually become attributable to specific geographic factors. The justification for a Bayesian approach and non-informative priors for spatial correlations among histotypes is that the approach can be used to increase comparability among studies. At present, the literature reveals a variety of ad hoc approaches to the grouping and parsing of childhood cancer histotypes. Previous epidemiologic studies have often used broad case definitions and frequently pooled data from multiple childhood cancer histotypes. The appropriateness of this pooling is largely unknown. Pooling cancer types with disparate causes will lead to a non-differential misclassification and usually increase the likelihood of a null finding. Failure to pool cancer types with common causes will lead to an unnecessary loss of precision. Specifying a flex-ible prior for the covariance matrix in a Bayesian approach can preserve this uncertainty or update the certainty based upon the data likelihood. Under Bayesian modeling, if two diseases are poorly correlated, the outcomes will remain relatively uncorrelated in the posterior distribution and the risk estimates will be the similar to estimates calculated independently for each histotype.
The current study supports further studies on germ cell tumors and other gliomas in areas with intensive cropping. Several studies have linked georeferenced disease counts and cropping patterns as a surrogate for pesticide exposure [7][8][9][10]26]. These studies varied widely on how cropping patterns were defined as exposure and how the childhood cancers, as a group of outcomes, were pooled or parsed. However, when risks of specific cancer types are evaluated subjectively among studies, the cumulative evidence supports the null finding. For the vast majority of childhood cancer types, the current study goes beyond a frequentist null conclusion by demonstrating SMR that were close to one with narrow 95% credible sets.
Spatial risks for CNS embryonal tumors by county Figure 6 Spatial risks for CNS embryonal tumors by county.
The current study supports the study of childhood hepatic cancer in areas of intense HAP release. The SMR for hepatic tumors was 1.87 (0.95, 3.98) for county-years with greater than 100 tonnes of HAP releases. Studies evaluating air pollution as a cause of childhood cancer have been inconsistent among a variety of cancer types [27]. The critical review showed several studies have evaluated multiple cancer types and groupings and found one or more histotypes at increased risk but other studies have found other histotypes at risk [27]. When individual cancer types are evaluated across studies, the cumulative evidence seems to support the null. Leukemia may be the exception, with some indication of increased risk among multiple studies of air pollution [28]. For cancer types other than hepatic cancer, the current study provides SMR estimates that center on no risk and have narrow confidence bounds, providing inductive support for the frequentist null results. Incriminating areal-source HAP concentrations in childhood cancer has been and will continue to be difficult. It has been reasoned that more definitive prospective studies should utilize biomarkers to study the risks of prenatal exposures [29][30][31]. Two recent studies illustrate the utility of biomarkers for studies defining the complex causal relationships between fetal exposures to air pollution and adverse outcomes [30,31]. Such an approach may be useful to study childhood hepatic cancer around major Texas industrial facilities.
The current study supports the investigation of Hodgkin lymphoma and malignant bone tumors in areas of rapid population growth. Hodgkin lymphoma is thought to be partly attributable to Epstein-Barr virus but also has genetic and environmental factors [32,33]. Low socioeconomic status increases risk for Hodgkin lymphoma [21] and it is possible that lowered risk observed in areas of rapid population growth in Texas could have been attributed to residents of higher socioeconomic status. Malignant bone tumors, including osteosarcoma as the most common of the class [34], had a high probability of increased risk in counties with rapidly growing population. Both Hodgkin lymphoma and osteosarcoma are considered to be more common in teenagers and the current study did not include any incident cases among teen-Spatial risks for CNS "other" gliomas by county Figure 7 Spatial risks for CNS "other" gliomas by county.
agers. The risks seen for these two cancers in rapidly growing counties should receive more study.
Infectious causes and population mixing have been proposed as causes of childhood cancer [35]. The theory is that densely populated regions have high levels of herd immunity but populations with constant population mixing are at increased risk for individuals. The purpose of the current study was to evaluate the use of population characteristics for focusing further study. One study [17] found excess risk when population growth was greater than 10% in an eleven-year period, thus our risk definition of 1% per year. The population mixing theory does not parse the risk for those moving into a region from those already residing in the region and thus has only a population-based inference. For an individual deciding to move, the risks could be threefold. First, there could have been a geographic-based risk associated with the previous living location. Second, there could be a new geographic risk at the new living location. Third, there could be a risk of being a mover. The full evaluation of these risks would be complex and require hierarchical modeling if the objective included the estimation of risks interpretable at the individual mover level. The current study found median SMR for measures of population density and population growth to be very near one with narrow 95% credible sets for most childhood cancer types.
The spatial model identified counties with greater than 95% posterior likelihood of elevated SMR for specific childhood cancer histotypes. Hidalgo County had a high likelihood for increased SMR for atypical or "other" leukemias. Hidalgo County is a rapidly growing urban county on the Mexican border populated mainly by Hispanics. Ector County had a high likelihood for elevated SMR to CNS embryonal tumors. Ector County is an urban county populated relatively evenly by Hispanics and non-Hispanic whites. Three counties had a high posterior likelihood of elevated SMR to CNS "other" gliomas including Parker and Tarrant Counties collectively containing the Dallas/Fort Worth metropolitan area and Harris County which contains most the Houston metropolitan area. Both metropolitan areas are rapidly growing with considerable industrial development. Three counties had a high Spatial risks for hepatic tumors by county Figure 8 Spatial risks for hepatic tumors by county.
Environmental Health 2008, 7:45 http://www.ehjournal.net/content/7/1/45 likelihood of elevated SMR for hepatic tumors including Parker and Tarrant Counties making up the Dallas/Fort Worth metropolitan area and Smith County. Smith County is an urban county but is often considered part of the Tyler metropolitan area. The risks estimated for these counties included the portion of the risk related to the factors that were evaluated and residual or random, unexplained geographic risks. Further study of these childhood cancer histotypes in these locations is indicated.

Conclusion
The Bayesian implementation of the MCAR model provided a flexible approach to the spatial modeling of multiple childhood cancer histotypes. The approach parses the counts into specific counts of ICCC-3 classifications and flexible priors permit spatial smoothing and histotype correlations based on the data likelihood. Analysis of cancer risk by counties showed four cancer histotypes with greater than 95% likelihood of elevated SMR for further study. The identification of geographic factors supports more focused studies of germ cell tumors and "other" gliomas in areas of intense cropping, hepatic cancer near HAP release facilities and Hodgkin lymphoma and malignant bone tumors in counties with rapidly growing population.