- Open Access
- Open Peer Review
Spatial variations in the incidence of breast cancer and potential risks associated with soil dioxin contamination in Midland, Saginaw, and Bay Counties, Michigan, USA
© Dai and Oyana; licensee BioMed Central Ltd. 2008
- Received: 15 July 2008
- Accepted: 21 October 2008
- Published: 21 October 2008
High levels of dioxins in soil and higher-than-average body burdens of dioxins in local residents have been found in the city of Midland and the Tittabawassee River floodplain in Michigan. The objective of this study is threefold: (1) to evaluate dioxin levels in soils; (2) to evaluate the spatial variations in breast cancer incidence in Midland, Saginaw, and Bay Counties in Michigan; (3) to evaluate whether breast cancer rates are spatially associated with the dioxin contamination areas.
We acquired 532 published soil dioxin data samples collected from 1995 to 2003 and data pertaining to female breast cancer cases (n = 4,604) at ZIP code level in Midland, Saginaw, and Bay Counties for years 1985 through 2002. Descriptive statistics and self-organizing map algorithm were used to evaluate dioxin levels in soils. Geographic information systems techniques, the Kulldorff's spatial and space-time scan statistics, and genetic algorithms were used to explore the variation in the incidence of breast cancer in space and space-time. Odds ratio and their corresponding 95% confidence intervals, with adjustment for age, were used to investigate a spatial association between breast cancer incidence and soil dioxin contamination.
High levels of dioxin in soils were observed in the city of Midland and the Tittabawassee River 100-year floodplain. After adjusting for age, we observed high breast cancer incidence rates and detected the presence of spatial clusters in the city of Midland, the confluence area of the Tittabawassee, and Saginaw Rivers. After accounting for spatiotemporal variations, we observed a spatial cluster of breast cancer incidence in Midland between 1985 and 1993. The odds ratio further suggests a statistically significant (α = 0.05) increased breast cancer rate as women get older, and a higher disease burden in Midland and the surrounding areas in close proximity to the dioxin contaminated areas.
These findings suggest that increased breast cancer incidences are spatially associated with soil dioxin contamination. Aging is a substantial factor in the development of breast cancer. Findings can be used for heightened surveillance and education, as well as formulating new study hypotheses for further research.
- Breast Cancer
- Spatial Cluster
- Breast Cancer Incidence
- Flood Frequency
- Annual Percent Change
Dioxin refers to 210 congeners/isomers of structurally and chemically related polychlorinated dibenzo-para-dioxins (PCDDs) and polychlorinated dibenzofurans (PCDFs), and the 2,3,7,8-tetra-CDD (TCDD) is considered the most toxic dioxin congener in this group [9, 10]. A concept of toxic equivalency factors (TEFs) is used to compare the relative toxicity of other dioxin congeners with that of TCDD . A total toxic equivalent (TEQ) is then determined by adding all dioxin congeners in a sample together on the basis of TEFs. Dioxins are persistent in the environment and resistant to biodegradation. The half-life of TCDD is 5.8 to 11.3 years in human body , 9 to 15 years in surface soil, and 25 to 100 years in subsurface soil. People's exposure pathways to dioxins include inhalation, ingestion, and dermal contact [3, 8]. TCDD has been classified as a human carcinogen  and has the potential to disrupt multiple endocrine pathways [14–16]. Studies have shown an apparent increase in the incidence of breast cancer [17–19] or the mortality rates of breast cancer [20, 21] with dioxin exposure.
Breast cancer refers to cancerous tumors consisting of uncontrolled growth and spread of abnormal cells formed in breast tissues, usually ducts and lobules . It is the most common cancer among women in the United States . National breast cancer incidence experienced an apparent increase with annual percent change (APC) of 3.7 from 1980 to 1987, a slight increase from 1987 to 2001 (APC = 0.4), and a noticeable decline from 2001 to 2005 (APC = -3.1) . In 2005, the annual incidence rate was 124.3 per 100,000 females . Each year about $8.1 billion is spent on treatment of breast cancer in the United States . Although the causal factors of breast cancer are not fully known, risks factors for developing the disease include history of cancer in one breast, family history of breast cancer, breast implants, history of benign breast disease, and exposure to endocrine disruption chemicals [22, 26]. Among these risk factors, exposure to carcinogens, especially endocrine disruption chemicals, is a higher-than-average risk for females to develop breast cancer [14, 15].
Previous studies show breast cancer risk increases with exposures to high levels of dioxins [15, 17–21, 27, 28]. For example, two human epidemiological studies – the Hamburg cohort  and the Seveso women cohort  – found an apparent increase of breast cancer incidence with rising dioxin exposure after validating exposure levels using serum levels of dioxin. Other studies also reported increased breast cancer incidence  and mortality [20, 21, 28] with dioxin exposures. Dioxins act like hormone disruptors [14, 17, 29], which may explain the link between high body burden of dioxins and the increased incidence of breast cancer.
Breast cancer is a major burden in Midland, Saginaw, and Bay Counties, Michigan. Existing data from the Michigan Department of Community Health (MDCH) indicate that breast cancer was one of the highest cancer burdens in the three counties from 1985 through 2002 . For example, thirteen percent of the total cancer cases in the three counties are breast cancer, after lung and bronchus cancer (14%) and prostate gland cancer (18%) . Given the evidence from human epidemiological studies and animal studies, high incidence rates of breast cancer support the hypothesis that dioxin contamination in soils may contribute significantly to the etiology and exacerbation of the development of breast cancer in this region.
Despite a variety of studies [1–8] investigating the soil dioxin contamination in this area, the resulting health effects in the local communities are largely unknown. In particular, the spatial relation between soil dioxin contamination and risks of breast cancer development is still unclear. Other challenges persist, for example very few blood samples and only limited number of soil samples are available in part due to expensive testing for dioxins. Currently, one soil sample may cost up to $800 and one blood sample may cost between $1,200 and $1,500. The sparsity of samples and the inadequate sampling spread (Figure 1) hardly meet the requirement of conventional statistical, geostatistical, and epidemiological studies. Inspired by the challenge and the growing concern over the concurrent high breast cancer rates with high levels of dioxin in soils, we employed a variety of spatial and statistical techniques to evaluate dioxin levels in soils and analyzed whether there is a spatial association with the incidence of breast cancer. These techniques include Geographic Information System (GIS) mapping, descriptive statistics, self-organizing map algorithm (SOM) , odds ratio and their corresponding 95% confidence intervals, Kulldorff's spatial and space-time scan statistics , and genetic algorithms for spatial  and space-time clustering .
GIS analysis supported with novel clustering algorithms have become valuable tools in environmental health studies for studying the spatial distribution of environmental contaminants and potential risks associated with diseases [35–38]. For example, SOM was employed to evaluate dioxin patterns in mother milk and dietary habits from various countries and identify contributing dietary factors in different countries . Methods, such as spatial scan statistic or boundary analysis have been applied to various types of cancers to analyze the impact of pesticide use  or air toxicity . However, there is little focus on the spatial relationship between increased breast cancer incidence and background exposure to dioxin in soils. In this study, we aimed at (1) evaluating dioxin contamination in the study area and (2) investigating the hypothesis that dioxin-contaminated areas are spatially associated with high breast cancer incidence rates. Answers to the first objective provided information to understand the extents and severity of dioxin contamination and the contributing factors. Areas with high levels of dioxins can be targeted for cleanup with higher priority. Answers to the second objective would be important in targeting areas identified as having high incidences of breast cancer for heightened surveillance and education, as well as formulating new hypotheses for further research.
Study area, population, and major river systems
The study area (Figure 1) consists of Midland, Saginaw, and Bay Counties. It has 38 ZIP codes with a population of over 400,000 [39, 40]. Midland, Saginaw, and Bay Cities are three densely populated regions. The study area has several industries, notably the Dow's Midland plant, making significant contributions to economic growth in this region.
The Tittabawassee and Saginaw Rivers are two major river systems. The Tittabawassee River extends southeast from the city of Midland to the confluence of the Tittabawassee and Saginaw Rivers. The Saginaw River flows east into the Saginaw Bay on Lake Huron. Land use in the Tittabawassee River floodplain splits among residential, agricultural, public parks, and protected areas, i.e., Shiawassee National Wildlife Refuge (NWR). The Tittabawassee River has seen frequent floods resulting from rain and/or snow melt. The 1986 fall flood was classified once every 100–500 years. In 2004 spring, another extensive flood struck the area. Some of the flooded areas are currently used as private backyards or public parks.
Breast cancer data
Number of cases for female breast cancer, by age-group
Age (in years)
15 – 44
45 – 64
65 – 74
Soil dioxin data
The Michigan Department of Environmental Quality (MDEQ) provided the soil dioxin database consisting of 532 records (Figure 1) collected mainly from the city of Midland and the Tittabawassee River floodplain between 1995 and 2003, respectively. In the database, each record has a unique identification number, the coordinates of the sample location, starting depth, ending depth, dioxin concentration in parts per trillion (ppt TEQ) toxic equivalents, and the year of the sample. The TEQ values are given on a dry weight basis and it is only for PCDDs and PCDFs (not PCBs). Most samples were collected in surface soil from 0–1, 0–2 or 0–3 inches. Besides the topsoil samples, additional samples were collected below surface from 3–6, 12–15, 16–24, 36–48 or 48–60 inches downstream of the Tittabawassee River. This database is quite comprehensive and includes soil TEQ concentrations from three counties collected over various sampling efforts between 1995 and 2003 by the MDEQ, EPA, Dow, and U.S. Army Corps of Engineers (USACE). In addition, it takes into account the total toxicity of all toxic dioxin congeners by using the complete toxic equivalent approach, not only considering TCDD.
Additional geographic information used includes ZIP code boundary, county boundary, rivers, roadways, flood frequency, and census data in both 1990 and 2000. They were obtained from the MDEQ, the Michigan Center for Geographic Information (MCGI) and U.S. Census Bureau. The flood frequency is classified as 1-, 2-, 5-, 10-, 50-, 100-, and 500-year according to floodway data published by the U.S. Federal Emergency Management Agency (FEMA). For example, a 5-year floodplain refers to an area adjacent to a river that is expected to flood once every 5 years.
We used a variety of methods to process and analyze the data. These methods include (1) evaluation of soil dioxin contamination by using descriptive statistics and the SOM algorithm; (2) evaluation of the association between breast cancer rates and the ZIP codes by estimating the odds ratio and their corresponding 95% confidence intervals; and (3) cluster detection using Kulldorff's spatial and space-time scan statistics, and genetic algorithms for spatial and space-time clustering.
The SOM is an unsupervised data visualization and classification technique that reduces high-dimensional data to lower, usually 1 or 2, dimensions . Compared to variance-covariance matrix and multi-dimensional scaling, the SOM allows one to visually figure out the number of clusters, the classification of different values of each variable, and relations between variables. The SOM consists of processing elements (neurons). Each neuron is represented by a d-dimensional weight vector, where d is equal to the dimension of the input vector. In our case, four input vectors are dioxin level (Dioxin Level), distance from a sampling site to the river (Distance to River), flood frequency of a sampling site (Flood Frequency), and start depth where a sample collection begins (Start Depth). Neurons are connected through a neighborhood function (f), e.g. a Gaussian function defined by , where d is the Euclidean distance between two neurons and σ t is the neighborhood radius at time t. Hidden layers (n) act as intermediate layers between the input vector layer and output layer. The SOM then uses the input vectors to update neurons in the hidden layer to generate the next hidden layer or output layer. The update is conducted using a learning rule to train neurons, e.g. n i (t + 1) = n i (t) + α (t)f i (t) [x(t) - n i (t)], where i is the i th neuron; x(t) is an input vector from the input data set at time t; and α(t) is the learning rate at time t. The aim of the update process is to make neurons more like the input vector; the end result is that the neurons on the map become ordered and neighboring neurons are similar. The output map consists of the U-matrix and component planes. Neurons in the U-matrix with small values represent clusters in the input data and large values represent gaps. Each component represents an attribute and its classification from the input data. The neuron in a certain position in one map corresponds to the same neuron in other maps. By reading several component planes and their color legends together, it is easy to examine the correlations between different attributes. See reference  for a detailed SOM description. We implemented the SOM model using SOM Toolbox  and MatLab 7.1 (The MathWorks, Inc, Natrick, Massachusetts). The SOM model may be viewed as non-linear extensions of standard regression models in the sense that it performs various non-linear mappings between the variables in the input, hidden, and output layers . The distance in feet from a sample site to the river and the flood frequency of each sampling site were obtained using ArcGIS 9.2 (Environmental Systems Research Institute, Inc, Redlands, California).
The statistical analysis included the estimation of odds ratio and 95% confidence intervals adjusted for age at a significance level of p ≤ 0.05 using SAS 9.1 (SAS Institute, Inc, Cary, North Carolina) and Microsoft Excel (Microsoft, Inc, Redmond, Washington). Our null hypothesis was that high breast cancer incidence rates are randomly distributed in the 22 ZIP codes. The alternative hypothesis was the breast cancer rates increase when the geographic locations are close to the dioxin contamination areas. We used ZIP code 48883, an area located upstream of the river (Figure 1) as the reference for comparison. Given that the levels of dioxins in this area are close to background levels across Michigan as reported by previous studies [1, 3, 4, 8], the population was assumed to be unexposed to dioxin. The female populations in this ZIP code in both 1990 and 2000 are close to the average female population per ZIP code in the study area. To test how sensitive the result would be, we conducted a comparative analysis with remote ZIP codes 48618, 48657, 48650, 48616, and 48655 as alternative references. Residents living in these ZIP codes were assumed to be farther away from the contaminated area and have less chance of being exposed to dioxins.
The incident rates were only adjusted for age as a covariate because patient's race and other socio-economic status were not provided in the breast cancer database. Census data were linked to cases based on the ZIP code of residence at the time of diagnosis. We completed this task using ArcGIS 9.2 to join ZIP code boundary data with breast cancer and census data. All cases were matched with respective female demographics and their corresponding age groups. For input data to the space-time scan and genetic algorithm models [32, 33], we preprocessed data and projected populations to obtain values between 1990 and 2000 using linear regression. For these models, we assumed that populations before 1990 and after 2000 were the same as the official U.S. census count for the two periods 1990 and 2000.
The spatial techniques used to detect spatial clusters of breast cancer incidence include Kulldorff's spatial and space-time scan statistics , and the genetic algorithms for spatial  and space-time clustering . We first used the spatial scan statistic and genetic algorithm for spatial clustering to explore whether spatial clusters of breast cancer exist in our study area. We then used the space-time scan statistic and the genetic algorithm for space-time clustering to locate clusters in space-time. Kulldorff's spatial and space-time scan statistics were applied to test the null hypothesis (at α = 0.05) that no clusters of increased breast cancer incidences exist on the basis of 999 Monte Carlo replications. The GIS mapping tool was employed to review the resulting spatial clusters of breast cancer and any potential risks in locations suspected to be contaminated by dioxins.
Kulldorff's spatial and space-time scan statistics, built in SatScan 7.0 (developed jointly by Kulldorff M., Boston, Massachusetts and Information Management Services, Inc, Silver Spring, Maryland), are popular cluster detection tests appropriate for handling aggregated spatial data. The spatial scan statistic imposes a circular or elliptic search window on the study area. The space-time statistic uses a conic search window where the base is circular or elliptic and the height corresponds to the time interval. The cases within a search window represent a potential cluster. The search window then varies in size in each data point successively. Because the number of events in an area at one time follows Poisson distribution, the expected number of events within a search window is proportional to at-risk background population size when there are no covariates. Under the Poisson assumption, the method calculates the likelihood function for all windows. The one with the maximum likelihood represents the most likely cluster, and this cluster is least likely to have occurred by chance . The method then conducts the maximum likelihood ratio test statistic and obtains the P-value through Monte Carlo hypothesis testing . The test result shows whether the number of case patients within the search window with maximum likelihood constitutes the disease cluster and whether this disease cluster is statistically significant (at α = 0.05). The scan statistics themselves are advantageous and guarantee to find clusters if they exist; however, the SatScan 7.0 software restricts the ratios of the longest to the shortest axis of an ellipse to 1.5, 2, 3, 4 or 5 and limits the number of directions as 4, 6, 9, 12, and 15. Given that shapes and directions of clusters are usually unknown before analysis, such restrictions may include too many at-risk background populations. Therefore, a method that can "relax" these assumptions is highly desirable to validate the results.
Genetic algorithms for spatial clustering  and for space-time clustering  were employed to explore spatial patterns of breast cancer incidences and further confirm the results from Kulldorff's methods. Compared with Kulldorff's methods, the genetic algorithms do not restrict the ratios of the longest axis to the shortest axis and allow arbitrary directions of ellipses. Therefore they provide finer delineations of clusters without including unnecessary at-risk background population, thus effectively detecting long and narrow clusters. Genetic algorithms are randomized search techniques simulating the principle of survival of the fittest. They are effective in cluster detection  by producing near-optimal solutions to search problems. Each genetic algorithm consists of an initialization step, a pre-specified number of iterative generations, and three genetic operators (namely, reproduction, crossover, and mutation). The initialization step randomly generates a set of strings (chromosomes). This set of strings is called the population. In our case, each string is an ellipse with five parameters (x, y, a, b, θ), where x and y are the centroid coordinate of an ellipse; a and b are semi-major and semi-minor axes respectively; θ is a positive real number representing the orientation angle with a range from 0 to 180°. Cases within an ellipse represent a potential cluster. After the initialization step, there is an iteration of generations. In each generation, three genetic operators will run on the population. The fitness value of each string is first calculated according to a fitness function, e.g. , where c and p are the actual number of disease cases and population in an ellipse; and C and P are the total number of cases and population in the study area respectively. A string (ellipse) will be exported into a cluster list if its fitness value is larger than 0 under the Poisson assumption. The reproduction operator selects a set of strings that have higher fitness values. These selected strings become strings (children) in the next generation. Crossover then chooses a proportion (crossover rate) of the children strings and mates each pair on a randomly located position. In our case, a random integer in (1, 5) generated for each pair, e.g., 5 allows the two chosen ellipses to exchange their directions and become two new ellipses. Mutation selects bits of the mated strings with a probability (mutation rate) and changes the value on a randomly generated position on each string. In our case, an ellipse may have its position, shape, or direction mutated. A number of randomly generated strings will then be placed into the next generation to maintain the population size. The algorithm keeps updating the population for the number of iterations, aiming at preserving ellipses with higher fitness values while searching in new areas. The genetic algorithm for space-time clustering uses elliptic cylinders as strings with an elliptic base and height corresponding to time interval within a study period. Each string has 7 parameters (x, y, a, b, θ, T s , T e ), where T s and T e are starting and ending time respectively. Similar to SatScan, the genetic algorithms can adjust for covariate by comparing the observed number of case patients in a category with the corresponding underlying at-risk background population. We implemented the two clustering algorithms based on the genetic algorithm toolbox 1.2 . The performance evaluation shows that the methods are accurate and reliable. A detailed description of the algorithms is presented in [33, 34].
Assumptions for exposure assessment of breast cancer incidences and soil dioxin contamination
Description of length of stay at current residence
Length of stay at current residence
Less than 1 year
1 to 5 years
6 to 10 years
11 to 20 years
21 to 30 years
More than 30 years
We further assumed that the dioxin data from 1995 to 2003 represents dioxin levels in the preceding period when causative exposure may have occurred. Dioxin samples were collected from 1995 through 2003, and cancer cases ranged from 1985 through 2002. One critique remains, as dioxin in 2000 could not have caused cancers developed in 1985. Jacquez and Greiling  argued that because of the latency in the development of cancer, it would not even be plausible to say that contamination in 1998 could explain only 1999 diagnoses. The year of 1995 is when a comprehensive dioxin sampling was available for the study area; however, previous smaller samples of dioxin data were collected way back in 1983 . Taking into account dioxins' long half-lives and resistance to biodegradation, we assumed dioxin data from 1995 to 2003 adequately represent dioxin levels in soils prior to 1995. Although this assumption is not ideal, it is reasonable because occurrence of elevated concentrations of PCDD/Fs in sediments at depths below 60" in the river indicates that the contamination is occurring historically , mainly due to the operation of fairly inefficient incinerators in the Dow's plant since 1940s . The moderation of the facilities (99.9999% destruction of dioxins) in 2000 resulted in significant reduced emissions ; however, the contamination in soils has not received major remediation yet [1, 3, 4]. To account for the effects of long cancer latency, we used the cancer data starting from 1985 rather than using recent data only.
Frequency distribution of the levels (ppt TEQ) for dioxins in soil classified by depth and location
Depth (in inches)
0 – 3
3 – 6
6 – 15
15 – 24
24 – 36
36 – 48
48 – 60
Distribution of age and ZIP code for breast cancer cases, 1985–-2002
% Diagnosed Breast Cancer
Breast Cancer Rates (per 100,000)
In this study, we evaluated levels of dioxin in soils and analyzed spatial variations in the incidence of breast cancer. There are four major findings from this study: (1) dioxin contamination sites include the city of Midland and the Tittabawassee River 100-year floodplain. Very high levels of dioxins are limited in the areas with 10-year flood frequency; (2) the number of breast cancer cases increased from 1985 to 2002 among females aged between 45 and 64 years and they had the highest risk; (3) rather than randomly distributed in the study area, ZIP codes where high breast cancer rates exist are clustered in or near to the contaminated areas after adjustment for age; and (4) living on or close to the contaminated areas is spatially associated with the increased incidences of breast cancer. Findings in this study are consistent with findings from previous studies [1, 3, 4, 8, 16, 18, 19, 29, 49, 50].
Previous epidemiological studies have found increased breast cancer incidence [17–19] and mortality [20, 21] in females exposed to dioxins. Yet epidemiological studies are vulnerable given insufficient sample sizes [14, 19]. Spatial techniques in cancer studies have contributed to the understanding of disease etiology and the impact of contaminants [35, 37, 38]. However, little attention has been paid to using spatial techniques to evaluate dioxin contamination and to analyze its spatial association with breast cancer rates. Our study takes advantage of publicly available historical data, GIS, and spatial and statistical analysis techniques. Publicly available historical data on breast cancer provide an opportunity to quickly understand the spatial variation of the disease. The final spatial models presented for this study using maps illustrate a nonhomogenous distribution of breast cancer incidence rates and potential risks associated with soil dioxin contamination among women in three counties.
Findings in this study gave some interesting insights about the characteristics of dioxin contamination. The most important insight was that contaminated areas were predominantly the city of Midland and the Tittabawassee River 100-year floodplain. Air deposition from historical operations at the Dow and soil relocation activities may explain the presence of very high levels of dioxins in Midland . Flood may be a contributing factor that continuously sweep and redeposit contaminated soil and sediments in the floodplain [7, 8]. Sudden elevation change, soil relocation activities, or physical barriers to floods may explain the low levels of dioxins in highly flooding areas. The small sample size in deeper soil layers and along the Saginaw River warrants additional samples to determine if the distribution of dioxin is consistent. We settled for the SOM technique partly due to the following reasons. The dioxin data had significant number of outliers with extremely high TEQ values even after log transformation of the data, thus remaining outliers and nonhomogeneous variations between groups made classical statistical methods less reliable. Our approach complements Goovaerts's recently modified geostatistical method that was used to analyze soil dioxin distribution in the vicinity of an incinerator in Midland [3, 4].
Preliminary statistical analysis suggests that there is a strong association between elevated levels of breast cancer incidence and aging, particularly among females residing in the city of Midland or near areas contaminated with high dioxins levels. In fact, breast cancer incidence rates increase significantly (α = 0.05) as women get older, which is consistent with findings from previous studies [22, 38, 49, 51]. In addition, the city of Midland, where the high levels of dioxins exist, had a statistically significant (α = 0.05) increased rate of breast cancer. The statistical significance was confidently reaffirmed after conducting a comparative analysis using five different remote ZIP codes serving as references, suggesting there are important factors contributing to the high incidence of breast cancer in Midland.
Findings from this study reveal that there are elevated levels of breast cancer incidence in areas or near areas contaminated by dioxins. Residents living in or near to these contaminated areas are more likely to visit these areas; therefore, they are more likely to have been exposed to dioxins than residents living far away. Findings from the Dioxin Exposure Study  may support this argument. Long-term exposure due to air deposition of high concentrations of dioxins from inefficient incinerators in Midland presents a significant health hazard to local residents . Other pathways may also expose local residents to high risks, e.g., direct soil and household dust contact, using contaminated sediments infill material in housing projects, eating fish and game from the contaminated area, doing water-related activities in the contaminated area, and working at the Dow . Findings in the study  report that forty-six percent of people living on the floodplain have swum, picnicked, hiked, boated, and participated in other recreational activities in and around the Tittabawassee River, compared to 31% in the near floodplain, and 21% in other areas from Midland and Saginaw Counties. The same study indicates that people who live on the floodplain are the most likely to have fished in the river during their lifetime.
The cluster analysis provided further evidence of spatial association between greatly elevated levels of breast cancer incidence rates and soil dioxin contamination. The results from Kulldorff's methods and the genetic algorithms are consistent with the findings from the statistical analysis above. The city of Midland was found to have a breast cancer cluster in both space and space-time. The large female population in Midland (13,221 in 1990 and 16,796 in 2000) suggests this cluster occurred less likely by chance. The detection of clusters in ZIP codes 48611, 48623 and 48626 (Figure 5) is a false positive, since these ZIP codes have much lower rates and percent of breast cancer than the other ones (see Table 3). This is a common shortcoming of the clustering algorithms in use as they rely on minimum population size to detect high rates. The interpretation of clusters in Bay city (Figures 5 and 6) takes caution. Although these clusters are far away from Midland and the Tittabawassee River, in one recent study  it was reported that sediment and floodplain soils of the Saginaw River, where these clusters are, are considerably contaminated with high levels of dioxins similar to the ones in the Tittabawassee River with respect to their profiles. Thus dioxin contamination may be playing a role in the increase in breast cancer incidence within these clusters, though other factors cannot be ruled out. This hypothesis underscores the need for more dioxin sampling efforts in these areas. The detection of ZIP codes 48457 (Figure 6) and 48734 (Figures 5 and 6) as spatial clusters may be in part due to their small at-risk background populations (4,164 and 3,924 females in 2000 respectively). Small population problem causes an area with a small population to be less reliable due to the higher variance. This is prevalent in rare disease analysis, especially in cancer studies when rates are used to estimate the underlying risk .
The findings in this study are subject to at least four limitations. First, the sparsity of soil dioxin data and scale of the breast cancer incidence data may have introduced uncertainties into health outcomes. The lack of TEQ data for other soils from background sites/ZIP codes and locations farther away from Midland were limiting factors, therefore we could not definitively confirm spatial clusters that are located farther away. The number and distribution of soil samples clearly were not sufficient to ascertain the contamination range, yet this dioxin database is the most comprehensive in the study area to date. Second, the ZIP code of residence at diagnosis is inadequate to describe an individual's location during the development of cancer. This surrogate for exposure is insufficient especially when causative exposures occur largely in areas other than residence locations, such as in areas related to occupational or recreational activities. Further analysis should include characterization of environmental exposure and cancer risk at the individual level. Third, the data sets lacked residential history information. Breast cancer is known to have long latencies [26, 35, 49]. The time when the patient was diagnosed may not be the time when causative exposures occurred. In addition, the migration during the latencies tends to obscure relationships between environmental exposure and cancer incidence . Yet the information about residential history is restricted because of privacy concerns. Fourth, this study was not able to fully adjust all confounding risk factors of breast cancer development. We considered age effect; however, we did not adjust for other confounders, such as each patient's race, childbearing patterns, socioeconomic status, exposure to other pollutants because some of the information is not available to the public. Yet they are substantive factors in the development of breast cancer [22, 38, 53–55]. In a separate follow-up study , we have critically evaluated the spatial clusters established in this study and environmental pollutants.
Although the association between increased incidence of breast cancer and living on or close to dioxin contamination areas was found in our study, the question of whether exposure to dioxin in soil has caused or is causing breast cancer in this region is obviously complex and likely to be answered only through various comprehensive approaches and by controlling for other confounders. For example, in a separate report  we compiled more than 325 chemicals that are released into the environment besides dioxins. It is possible that these chemicals contribute to the high rates of breast cancer as well.
In summary, this study finds that there are elevated levels of dioxin contamination in the city of Midland and Tittabawassee River 100-year floodplain. We identified a spatial association between greatly elevated levels of breast cancer incidence rates in city of Midland and contaminated areas. The spatial clusters of breast cancer incidence rates near contaminated areas suggest that there are important factors that contribute to the disease burden among women that must be fully investigated in future research. Although these findings are not sufficient to establish the causal relationship between exposure to dioxin and the development of breast cancer, they are important for formulating new hypotheses regarding the dioxin contamination and incidence of breast cancer in this study region.
We thank the Michigan Department of Environmental Quality for providing the dioxin data. The Michigan Department of Community Health provided this study's breast cancer data. We also thank two referees and Dr. Fahui Wang for their constructive advice and suggestions. This research is funded in part by the Dissertation Research Award from the Graduate School of Southern Illinois University, Carbondale.
- Hilscherova K, Kannan K, Nakata H, Hanari N, Yamashita N, Bradley PW, McCabe JM, Taylor AB, Giesy JP: Polychlorinated dibenzo-p-dioxin and dibenzofuran concentration profiles in sediments and flood-plain soils of the Tittabawassee River, Michigan. Environmental Science and Technology. 2003, 37: 468-474. 10.1021/es020920c.View ArticleGoogle Scholar
- Demond A, Adriaens P, Towey T, Chang SC, Hong B, Chen Q, Chang CW, Franzblau A, Garabrant D, Gillespie B, Hedgeman E, Knutson K, Lee CY, Lepkowski J, Olson K, Ward B, Zwica L, Luksemburg W, Maier M: Statistical comparisonof residential soil concentrations of PCDDs, PCDFs, and PCBs from two communities in Michigan. Environmental Science and Technology. 2008, 42: 5441-5448. 10.1021/es702554g.View ArticleGoogle Scholar
- Goovaerts P, Trinh HT, Demond A, Franzblau A, Garabrant D, Gillespie B, Lepkowski J, Adriaens P: Geostatistical modeling of the spatial distribution of soil dioxins in the vicinity of an incinerator. 1. Theory and application to Midland, Michigan. Environmental Science and Technology. 2008, 42: 3648-3654. 10.1021/es702494z.View ArticleGoogle Scholar
- Goovaerts P, Trinh HT, Demond A, Towey T, Chang SC, Gwinn D, Hong B, Franzblau A, Garabrant D, Gillespie BW, Lepkowski J, Adriaens P: Geostatistical modeling of the spatial distribution of soil dioxins in the vicinity of an incinerator. 2. Verification and calibration study. Environmental Science and Technology. 2008, 42: 3655-3661. 10.1021/es7024966.View ArticleGoogle Scholar
- Trinh HT, Garabrant D, Franzblau A, Hong B, Gwinn D, Towey T, Goovaerts P, Demond A, Adriaens P: Congener specific differentiation of soil samples in the City of Midland, Michigan using multidimensional scaling. Organohalogen Compounds. 2007, 69: 2207-2277.Google Scholar
- Yun SH, Addink R, McCabe JM, Ostaszewski A, Mackenzie-Taylor D, Taylor AB, Kannan K: Polybrominated diphenyl ethers and polybrominated biphenyls in sediments and floodplain soils of the Saginaw River watershed, Michigan, USA. Archives of Environmental Contamination and Toxicology. 2008, 55: 1-10. 10.1007/s00244-007-9084-3.View ArticleGoogle Scholar
- Kannan K, Yun SH, Ostaszewski A, McCabe JM, Mackenzie-Taylor D, Taylor AB: Dioxin-like toxicity in the Saginaw River watershed: Polychlorinated dibenzo-p-dioxins, dibenzofurans, and biphenyls in sediments and floodplain soils from the Saginaw and Shiawassee Rivers and Saginaw Bay, Michigan, USA. Archives of Environmental Contamination and Toxicology. 2008, 54: 9-19. 10.1007/s00244-007-9037-x.View ArticleGoogle Scholar
- People's exposure to dioxin contamination along the Tittabawassee River and surrounding areas. [http://www.umdioxin.org]
- Giesy JP, Kannan K: Dioxin-like and non-dioxin-like toxic effects of polychlorinated biphenyls (PCBs): Implications for risk assessment. Critical Reviews in Toxicology. 1998, 28: 511-569. 10.1080/10408449891344263.View ArticleGoogle Scholar
- Berg Van den M, Birnbaum LS, Bosveld ATC, Brunstrom B, Cook P, Feeley M, Giesy JP, Hanberg A, Hasegawa R, Kennedy SW, Kubiak T, Larsen JC, van Leeuwen FXR, Liem AKD, Cynthia N, Peterson RE, Poellinger L, Safe S, Schrenk D, Tillitt D, Tysklind M, Younes M, Wærn F, Zacharewski T: Toxic equivalency factors (TEFs) for PCBs, PCDDs, PCDFs for humans and wildlife. Environmental Health Perspectives. 1998, 106: 775-792. 10.2307/3434121.View ArticleGoogle Scholar
- Olson JR: Pharmacokinetics of dioxin and related chemicals. Dioxins and health. Edited by: Schecter A. 1994, Hoboken, NJ: Plenum Press, 163-167. 1View ArticleGoogle Scholar
- Paustenbach DJ, Wenning RJ, Lau V, Harrington NW, Rennix DK, Parsons AH: Recent developments on the hazards posed by 2,3,7,8-tetrachlorobenzo-p-dioxin in soil: implications for setting risk-based cleanup levels at residential and industrial sites. Journal of Toxicology and Environmental Health. 1992, 36: 103-149.View ArticleGoogle Scholar
- International Agency for Research on Cancer: Polychlorinated dibenzo-para-dioxins and polychlorinated dibenzofurans. IARC Monographs on the Evaluation of Carcinogenic Risks to Humans. 1997, 69: 33-343.Google Scholar
- Birnbaum LS, Fenton SE: Cancer and developmental exposure to endocrine disruptors. Environmental Health Perspectives. 2003, 111: 389-394.View ArticleGoogle Scholar
- Birnbaum LS: Developmental effects of dioxins and related endocrine disrupting chemicals. Toxicology Letter. 1995, 82–83: 743-750. 10.1016/0378-4274(95)03592-3.View ArticleGoogle Scholar
- Eskenazi B, Mocarelli P, Warner M, Needham L, Patterson DG, Samuels S, Turner W, Gerthoux PM, Brambilla P: Relationship of serum TCDD concentrations and age at exposure of female residents of Seveso, Italy. Environmental Health Perspectives. 2004, 112: 22-27.View ArticleGoogle Scholar
- Flesch-Janys D, Becher H, Manz A, Morgenstern I, Nagel S, Steindorf K: Epidemiologic investigation of breast cancer incidence in a cohort of female workers with high exposure to PCDD/F and HCH. Organohalogen Compounds. 1999, 44: 379-382.Google Scholar
- Manz A, Berger J, Dwyer J, Flesch-Janys D, Nagel S, Waltsgott H: Cancer mortality among workers in chemical plant contaminated with dioxin. Lancet. 1991, 338: 959-964. 10.1016/0140-6736(91)91835-I.View ArticleGoogle Scholar
- Warner M, Eskenazi B, Mocarelli P, Gerthoux PM, Samuels S, Needham L, Patterson D, Brambilla P: Serum dioxin concentrations and breast cancer risk in the Seveso women's health study. Environmental Health Perspectives. 2002, 110: 625-628.View ArticleGoogle Scholar
- Consonni D, Pesatori AC, Zocchetti C, Sindaco R, D'oro LC, Rubagotti M, Bertazzi PA: Mortality in a population exposed to dioxin after the Seveso, Italy, accident in 1976: 25 years of follow-up. American Journal of Epidemiology. 2008, 167: 847-858. 10.1093/aje/kwm371.View ArticleGoogle Scholar
- Kogevinas M, Becher H, Benn T, Bertazzi PA, Boffetta P, Bueno-de-Mesquita HB, Coggon D, Colin D, Flesch-Janys D, Fingerhut M, Green L, Kauppinen T, Littorin M, Lynge E, Mathews JD, Neuberger M, Pearce N, Saracci R: Cancer mortality in workers exposed to phenoxy herbicides, chlorophenols, and dioxins: An expanded and updated international cohort study. American Journal of Epidemiology. 1997, 145: 1061-1075.View ArticleGoogle Scholar
- Burdette WJ: Cancer, etiology, diagnosis, and treatment. 1998, New York: McGraw-HillGoogle Scholar
- Parkin DM: Global cancer statistics in the year 2000. Lancet Oncology. 2001, 2: 533-543. 10.1016/S1470-2045(01)00486-7.View ArticleGoogle Scholar
- SEER cancer statistics review, 1975–2005. [http://seer.cancer.gov/csr/1975_2005/]
- Brown ML, Riley GF, Schussler N, Etzioni RD: Estimating health care costs related to cancer treatment from SEER-Medicare data. Medical Care. 2002, 40: 104-117. 10.1097/00005650-200208001-00014.View ArticleGoogle Scholar
- Terkel SN, Lupiloff-Brazz M: Understanding cancer. 1993, New York: Franklin WattsGoogle Scholar
- Baccarelli A, Mocarelli P, Patterson DG, Bonzini M, Pesatori AC, Caporaso N, Landi MT: Immunologic effects of dioxin: New results from Seveso and comparison with other studies. 2002, 12: 1169-1173.Google Scholar
- Revich B, Aksel E, Ushakova T, Ivanova I, Zhuchenko N, Klyuev N, Brodsky B, Sotskov Y: Dioxin exposure and public health in Chapaevsk, Russia. Chemosphere. 2001, 43: 951-966. 10.1016/S0045-6535(00)00456-2.View ArticleGoogle Scholar
- Birnbaum LS: The mechanism of dioxin toxicity: Relationship to risk assessment. Environmental Health Perspectives. 1994, 102: 157-167. 10.2307/3432197.View ArticleGoogle Scholar
- Community cancer incidence and mortality. [http://www.mdch.state.mi.us/pha/osr/index.asp?Id=13]
- Kohonen T: Self-organized formation of topological correct feature maps. Biological Cybernetics. 1982, 43: 59-69. 10.1007/BF00337288.View ArticleGoogle Scholar
- Kulldorff M: A spatial scan statistic. Communications in Statistics-Theory and Methods. 1997, 26: 1481-1496. 10.1080/03610929708831995.View ArticleGoogle Scholar
- Dai D, Oyana TJ: An improved genetic algorithm for spatial clustering. 18th IEEE International Conference on Tools with Artificial Intelligence; Arlington, Virginia, USA. 371-380.Google Scholar
- Dai D, Oyana TJ: A genetic algorithm for spatiotemporal cluster detection and analysis. 9th International Conference on GeoComputation; NUI Maynooth, Ireland.Google Scholar
- Jacquez GM, Greiling DA: Geographic boundaries in breast, lung and colorectal cancers in relation to exposure to air toxic in Long Island, New York. International Journal of Health Geograhics. 2003, 2:Google Scholar
- Nadal M, Espinosa G, Schuhmacher M, Domingo JL: Patterns of PCDDs and PCDFs in human milk and food and their characterization by artificial neural networks. Chemosphere. 2002, 54: 1375-1382. 10.1016/j.chemosphere.2003.10.045.View ArticleGoogle Scholar
- Poulstrup A, Hansen HL: Use of GIS and exposure modeling as tools in a study of cancer incidence in a population exposed to airborne dioxin. Environmental Health Perspectives. 2004, 112: 1032-1036.View ArticleGoogle Scholar
- Reynolds P, Hurley SE, Gunier RB, Yerabati S, Quach T, Hertz A: Residential proximity to agricultural pesticide use and incidence of breast cancer in California, 1988–1997. Environmental Health Perspectives. 2005, 113: 993-1000.View ArticleGoogle Scholar
- U.S. Census Bureau: Census of population and housing, 2000: summary file 3 (Michigan). 2000, Washington, DC, U.S. Census BureauGoogle Scholar
- U.S. Census Bureau: Census of population and housing, 1990: summary file 3 (Michigan). 1990, Washington, DC, U.S. Census BureauGoogle Scholar
- Vesanto J, Himberg J, Alhoniemi E, Parhankangas J: SOM Toolbox for Matlab 5. 2000, SOM toolbox team Helsinki University of Technology, 8-20-2005Google Scholar
- Wang F: Modeling traffic emissions with artificial neural networks and regressions. Geographical and Environmental Modelling. 1998, 2: 103-113.Google Scholar
- Dwass M: Modified randomization tests for nonparametric hypotheses. Annals of Mathematical Statistics. 1957, 28: 181-187. 10.1214/aoms/1177707045.View ArticleGoogle Scholar
- Openshaw S: Developing automated and smart spatial pattern exploration tools for geographical information systems applications. The Statistician. 1995, 44: 3-16. 10.2307/2348611.View ArticleGoogle Scholar
- Chipperfield AJ, Fleming PJ, Fonseca CM: Genetic algorithm tools for control systems engineering. The First International Conference on Adaptive Computing in Engineering Design and Control. 128-133.Google Scholar
- Jacquez GM, Greiling DA: Local clustering in breast, lung, and colorectal cancer on Long Island, NY. International Journal of Health Geograhics. 2003, 2:Google Scholar
- Reynolds P, Hurley SE, Gunier RB, Yerabati S, Quach T, Hertz A: Regional variations in breast cancer incidence among California women, 1988–1997. Cancer Causes and Control. 2005, 16: 139-150. 10.1007/s10552-004-2616-5.View ArticleGoogle Scholar
- Assessment of Tittabawassee River Floodplain residents' opinions regarding a dioxin exposure investigation. [http://www.michigan.gov/documents/MDCH_TittabawasseeRiverFloodPlainNAReport_92275_7.pdf]
- Becher H, Flesch-Janys D: Dioxins and furans: epidemiologic assessment of cancer risks and other human health effects. Environmental Health Perspectives. 1998, 106: 623-624. 10.2307/3433817.View ArticleGoogle Scholar
- Eskenazi B, Mocarelli P, Warner M, Samuels S, Vercellini P, Olive D, Needham L, Patterson DG, Brambilla P, Gavoni N, Casalini S, Panazza S, Turner W, Gerthoux PM: Serum dioxin concentrations and endometriosis: A cohort study in Seveso, Italy. Environmental Health Perspectives. 2002, 110: 629-634.View ArticleGoogle Scholar
- de Waard F: Risk factors for beast cancer at various ages. European Journal of Cancer Prevention. 1998, 7:Google Scholar
- Mu L, Wang F: A scale-space clustering method: Mitigating the effect of scale in the analysis of zone-based data. Annals of the Association of American Geographer. 2008, 98: 85-101. 10.1080/00045600701734224.View ArticleGoogle Scholar
- Hall SA, Rockhill B: Race, poverty, affluence, and breast cancer [letter]. American Journal of Public Health. 2002, 92: 1559-View ArticleGoogle Scholar
- Heck KE, Pamuk ER: Explaining the relation between education and postmenopausal breast cancer. American Journal of Epidemiology. 1997, 145: 366-372.View ArticleGoogle Scholar
- Yost K, Perkings C, Cohen R, Morris C, Wright W: Socioeconomic status and breast cancer incidence in California for different rate/ethnic groups. Cancer Causes and Control. 2001, 12: 703-711. 10.1023/A:1011240019516.View ArticleGoogle Scholar
- Guajardo OA, Oyana TJ: A critical assessment of geographic clusters of breast and lung cancer incidences among residents living near the Tittabawassee River, Michigan [abstract]. Association of American Geographers (AAG) West Lakes Regional Meeting. 2007Google Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.