Although we observed spatial variability in ASD prevalence in unadjusted analyses, the pattern of ASD appeared to be largely explained by factors influencing diagnosis -- maternal age and education -- which differed across the study area. The larger optimal span size and global p-value in the adjusted analyses, as well as decreased variability in prevalence ratios across the study area, all indicated less geographic variability after adjustment for known risk and predictive factors for ASD. Our results corroborate previous reports
[15, 16] demonstrating spatial variability in ASD prevalence before adjusting for spatial confounding. In secondary analyses Van Meter and colleagues reported the impact of living in neighborhoods with high ASD prevalence was diminished in analyses adjusting for parental education, a finding consistent with our results
Our results highlight the importance of adjusting for the geographic distribution of known individual-level predictors in spatial analyses. Here, adjustment for socioeconomic factors associated with diagnosis greatly attenuated geographic variability in ASD prevalence. This has implications for studies of ASD etiology that assign environmental exposures based on geography, such as air pollutants and agricultural pesticides, for which we caution the causal interpretation of geographic patterns that are not controlled for individual-level factors. Using GAMs allowed us to carefully account for known risk and diagnostic factors, a major strength of these analyses
. In addition, we considered both the statistical and qualitative aspects of the observed geographic patterns, tempering interpretation of areas of sparse data by considering tests of global deviance and changes in the optimal span size along-side visual inspection of the patterns in our interpretation of results.
NC-ADDM Network provided a large population based sample with information on co-occurring conditions, allowing us to consider differences in spatial confounding by the severity and type of disability. Patterns for separate disability groups, ASD+ID and ASD-ID, were visually similar to those for all ASD; however, ASD-ID appeared to be more impacted by spatial confounding as indicated by a greater attenuation in prevalence ratios and a increase of the optimal span size in the adjusted analysis. The associations between ASD (+/−ID) and maternal education and age that we observed in these analyses are consistent with previously reported associations in non-spatial analyses
[3, 4] which indicate that ASD-ID is more strongly associated with higher SES than ASD+ID.
Although the global significance test did not indicate variability, our results suggest that ID prevalence varied geographically across the study area; even after adjusting for known spatial confounders the optimal span size remained small and results were visually suggestive of geographic differences in ID prevalence. There are several possible explanations for the observed variability in ID, including residual confounding by unmeasured variables and differences in the distribution of environmental factors across the study area.
Linking data from the NC-ADDM Network and vital records also strengthened our analyses. Vital records provided individual-level data on a number of covariates, allowing us to account for spatial confounding by these factors, and also provided residential location at the time of birth, which may be more relevant to ASD etiology than the address at diagnosis. Although the majority of children with ASD at age 8 in ADDM were also born in the study area, local changes in address were common (68.14% of children with ASD had a change in residential address between birth and age 8). If the spatial patterns we observed are due to difference in diagnosis, spatial variability may be more apparent using a later address corresponding to the time of diagnosis.
A final strength of our study was our ability to evaluate the influence of potential sibling clusters in analyses. While it is often suggested that including siblings in analyses will induce clustering, we did not observe this pattern in our data. One possible explanation is that the number of siblings in our analyses was relatively small (e.g. 269 sibling pairs in the ASD analysis of 11,566 children), due in part to the biennial study design and the selection of only part of the source population (15%).
There are also several important limitations of our methods. We chose the optimal tradeoff between bias and variance of the smooth (the optimal span size) by minimizing the AIC. Selecting the optimal span size for a dataset, however, may obscure important small scale variability in maps; it is possible that our analyses were conducted on a scale too large to identify small scale environmental exposures relevant in the etiology of ASD. Examining different span sizes may reveal important features of the data. Similarly, we assessed spatial confounding by visually comparing maps of prevalence before and after adjustment holding the span size and included observations constant and by investigating changes in the optimal span size before and after adjustment
; however, more objective methods of assessing confounding are needed and are an important topic for future research. We also used p-values as a screening tool, to evaluate the global spatial variation as well as areas of increased or decreased prevalence. Our use of p-values here was to evaluate whether or not spatial variability existed (i.e. whether the prevalence of ASD was constant across the geographic area). Nonetheless, many epidemiologists prefer the use of confidence intervals, which provide information on the precision of the observed association
. While it is possible to calculate confidence intervals in these analyses, it is visually difficult to display three surfaces simultaneously on maps. Additionally, we conducted permutation tests using the span size of the observed data to test the null hypothesis that the map is flat. Permuted datasets under this null hypothesis may have had a larger optimal span size, particularly for the ID analyses, which had a relatively small span size for the observed data (optimal span = 0.30). Consequently, using the span size of the observed data could result in a p-value that is too small