Accuracy of two geocoding methods for geographic information system-based exposure assessment in epidemiological studies

Background Environmental exposure assessment based on Geographic Information Systems (GIS) and study participants’ residential proximity to environmental exposure sources relies on the positional accuracy of subjects’ residences to avoid misclassification bias. Our study compared the positional accuracy of two automatic geocoding methods to a manual reference method. Methods We geocoded 4,247 address records representing the residential history (1990–2008) of 1,685 women from the French national E3N cohort living in the Rhône-Alpes region. We compared two automatic geocoding methods, a free-online geocoding service (method A) and an in-house geocoder (method B), to a reference layer created by manually relocating addresses from method A (method R). For each automatic geocoding method, positional accuracy levels were compared according to the urban/rural status of addresses and time-periods (1990–2000, 2001–2008), using Chi Square tests. Kappa statistics were performed to assess agreement of positional accuracy of both methods A and B with the reference method, overall, by time-periods and by urban/rural status of addresses. Results Respectively 81.4% and 84.4% of addresses were geocoded to the exact address (65.1% and 61.4%) or to the street segment (16.3% and 23.0%) with methods A and B. In the reference layer, geocoding accuracy was higher in urban areas compared to rural areas (74.4% vs. 10.5% addresses geocoded to the address or interpolated address level, p < 0.0001); no difference was observed according to the period of residence. Compared to the reference method, median positional errors were 0.0 m (IQR = 0.0-37.2 m) and 26.5 m (8.0-134.8 m), with positional errors <100 m for 82.5% and 71.3% of addresses, for method A and method B respectively. Positional agreement of method A and method B with method R was ‘substantial’ for both methods, with kappa coefficients of 0.60 and 0.61 for methods A and B, respectively. Conclusion Our study demonstrates the feasibility of geocoding residential addresses in epidemiological studies not initially recorded for environmental exposure assessment, for both recent addresses and residence locations more than 20 years ago. Accuracy of the two automatic geocoding methods was comparable. The in-house method (B) allowed a better control of the geocoding process and was less time consuming.


Background
Environmental epidemiology requires reliable assessment of both temporal and spatial components of exposure. In response to these challenges, epidemiological studies are increasingly using residential addresses of study participants and geographic information systems (GIS) to improve characterization of environmental exposures and examine their association with human health risks for a large variety of disease conditions [1]. GIS, for instance, have been used to investigate the relationship between environmental exposures and risk of breast cancer [2][3][4], leukemia [5][6][7], Parkinson's diseases [8,9], adverse birth outcomes [10,11], and respiratory health [12][13][14][15]. GISbased exposure assessment using residential proximity to the environmental exposure source (e.g. farmland treated with pesticides, industrial facilities or traffic roads) as an exposure surrogate relies on the positional accuracy of the subjects' residences to avoid exposure misclassification [16]. There is increasing use of existing prospective cohorts for investigating environmental causes of diseases, although most of them had not been initially designed for environmental exposure assessment [17,18]. While the strength of using existing cohorts relies on the prospective data collection at the individual level over many years allowing to adjust for individual disease risk factors, the subjects' postal addresses have rarely been collected to be geocoded (i.e. to be converted into precise geographic coordinates) for their use in GIS. This may result in poor positional accuracy of subjects' addresses and may represent an important source of misclassification and imprecision in environmental exposure assessment [13,16,[19][20][21][22][23][24].
The process of geocoding and assigning geographic coordinates (latitude and longitude) to the study subject's residential addresses is one of the first steps in GIS-based epidemiological studies [20,[24][25][26]. The quality of geocoding depends on the completeness and the level of positional accuracy of located addresses. Completeness is the proportion of addresses that can be geocoded and depends on the quality of the collected data on addresses. The positional accuracy reflects the level of proximity of geocoded objects to their true location [27,28]. Geocoding residential addresses can be performed using three methods. A first method consists in using online geocoding services to obtain subjects' coordinates or to create online maps with subjects' residence locations [29,30]. These free services are available on the Internet and do not require specific expertise in geocoding [21]. A second approach consists in using a commercial service that can handle all steps of geocoding from the spell checking of addresses to their map location [11,13,24,31]. The third method is the use of an inhouse method of geocoding where the geocoding process is handled by the research team using commercially available GIS software equipped with a geocoding tool and a reference street database [7,21,24,32,33]. In Europe, and particularly in France, there is a lack of studies comparing accuracy of geocoding between different geocoding tools, as well as according to characteristics of residential locations and date of residence.
Several American and European studies have evaluated the accuracy of different geocoding methods and of their reference network database in comparison to field location using Global Positioning System (GPS) [13,20,27,34] and manual location based on aerial-photography [28,35]. These studies have raised awareness on the divergence of geocoding accuracy between methods, with variations in median positional errors ranging from 25 m to 201 m. Also, accuracy levels of geocoding may vary according to the urban or rural status of the subjects' residential location [20,24,35,36]. Furthermore, studies investigating differences in geocoding accuracy of residential addresses by date of residence yielded inconsistent results [20,36].
The few studies that have previously explored the feasibility and the quality of geocoding residential addresses of an existing cohort in a European context were conducted on small populations (i.e. n = 30 [29], n = 100 [27] or n = 354 [13]). Moreover, these studies did not explore the accuracy of geocoding over various geographical areas (urban or rural) or time periods. Furthermore, the spatial distribution characteristics of towns and rural settlements, street pattern (e.g. grid type, street lengths) and population density factors that have been shown to affect the accuracy of geocoding [20,24,35,37], differ between Europe and the United States (where most of the previous studies were conducted). Our study aimed at comparing the accuracy of two automatic geocoding methods, an online method and an in-house method, with a manual method of geocoding used as the reference, in a French national prospective cohort initiated in 1990. The present study will assess the respective levels of accuracy and confidence of each geocoded method tested in the European context. Our study further assessed the geocoding accuracy according to urban and rural status of the addresses and the period of residence. The study was performed in order to subsequently use the most suitable method for geocoding of subjects' residences to assess environmental exposure in a casecontrol study nested within the same prospective cohort with regard to positional accuracy, ethical use of addresses and privacy protection as well as time and resources required.

Study population
Our analysis involved study subjects from a nested case control study (the Geo3N research project) including 5,455 breast cancer cases and 5,455 matched controls that aimed at analyzing the association between environmental dioxin exposure and breast cancer risk in the E3N (Etude Epidémiologique des Femmes de la Mutuelle Générale de l'Education Nationale) cohort [38]. E3N is an ongoing prospective French cohort study of 98,995 women investigating female cancer risk factors. E3N is the French component of the European Prospective Investigation into Cancer and Nutrition [39]. E3N participants were enrolled in 1990 at the age of 40 to 65 years old and were members of a national teachers' health insurance. Subjects are followed up by self-administered questionnaires every 2 to 3 years. E3N was approved by the French commission for Data Protection and Privacy. For the present analysis, we selected 1,730 subjects from the nested case control study. All selected subjects lived in the Rhône-Alpes region at recruitment in the E3N cohort. The Rhône-Alpes region covers a territory of 43,196 km 2 with over 6 million inhabitants and presents a broad diversity of territories with rural, mountainous, and highly urbanized areas. Residential addresses of study participants were collected through the baseline and four follow-up questionnaires, sent in 1990, 1997, 2000, 2002, and 2005 respectively.

Data cleaning
To improve standardization and quality of geocoding, all subjects' addresses were verified manually for the spelling of street and municipality names, using free online databases referencing French postal codes (e.g. www.codespostaux.com/, www.pagesjaunes.fr/pagesblanches/). We also completed address fields of subjects (i.e. missing or incomplete postal code, municipality name, street name and street number) by matching with data from previous and subsequent questionnaires for similar residential location (e.g. same street name and same city name). After exclusion of 45 subjects with missing address, postal code, or municipality name, 1,685 subjects corresponding to a total of 4,247 addresses consecutively collected at each questionnaire between 1990 and 2008 were included in the analysis.

Geocoding methods
For each assessed method, dots representing addresses were located along the street at the entrance of the building (Fig. 1). A trained technician geocoded all addresses blinded to the case-control status of the subjects.

Automatic methods
The first technique, "method A", consisted in an automatic method based on a free online geocoding service accessible at http://dehaese.free.fr/Gmaps/testGeocoder.htm. The reference street network database was based on Google Maps ® ; the total number of addresses stored in the database was not provided by Google ® . After automatic online geocoding processing, latitude and longitude coordinates in the WGS 84 projection system were exported for each address geocoded as well as accuracy of each location ranking from 0 to 9 (0: not found, 1: country level, 2: region (state and district), 3: county, 4: city, 5: postal code, 6: street segment, 7: intersection of streets, 8: address, 9: point of interest (building names, church…)). In France, levels 2, 3 and 7 did not exist in the administrative division of territories and were therefore not applicable. For geocoded addresses with a precision lower than 6 (street segment level), the spelling of street names and municipalities were checked again manually and corrected if necessary. Revised addresses were then geocoded a second time with the same online geocoder. The database was imported into ArcGIS Fig. 1 Illustration of address locations in urban and rural areas with the three distinct methods. a example of residence located in urban area; b example of residence located in rural area (circle: ArcGIS online location for method R (a); triangle: manually improved location with method R used as reference; cross: location with method A; square: location with method B; dashed lines representing the distances between addresses located with ArcGIS online for method R and method R (a); methods A and R (a, b) and methods B and R (a, b)) 10.0 (Environmental System Research Institute-ESRI-Redlands, CA, USA) to create a data layer and all coordinates were converted into Lambert 93, which is the projection system currently used in France.
The second automatic method, "method B", consisted in an in-house method based on the BD Adresse for ArcGIS [40,41] and its reference street network database, BD Adresse® (National Geographic Institute, IGN, Saint Mandé, France) that includes 26 million addresses. For each geocoded address, the ArcGIS software provided two indicators to determine the best position: the spelling sensitivity that is the degree to which the spelling variation of a street name is allowed during a search for likely match candidates and the minimum candidate score that is a potential match record requires to be considered as a candidate [42]. The spelling sensitivity setting for an address locator is a value between 0 and 100. A higher value will restrict candidates to exact matches. Generally, the spelling sensitivity vary from 60 to 80 [21,24,43,44], allowing only minor variations in spelling. Since the addresses in the cohort were not recorded to be geocoded, the correct spelling of street names or municipality was not certain. To maximize the proportion of participants and be assigned geographic coordinate, we used a lower spelling sensitivity, set arbitrarily to 50, based on the studies by Duncan et al. 2011 [21], Bell et al. 2012 [44] and Schootman et al. 2007 [43], to allow greater variation in spelling and retrieval of additional candidates. To select the most likely candidates with a high level of certainty without being too restrictive, we set the minimum candidate score to 80, similarly to previous studies conducted in the US [21,24,43,44]. For method B, the geocoding accuracy levels ranked from 0 to 7: 0: not found, 1: exact address, 2: interpolated address, 3: street segment, 4: locality, 5: town hall, 6: postal code, 7: city. Interpolated addresses are located based on known positioned addresses along the street. For addresses with several possible matches (N = 499), we selected the address with the highest candidate score. Data were projected in Lambert 93.

Reference method
The reference method in the present study, "method R", was created by manually relocating addresses located by the online geocoder ( http://dehaese.free.fr/Gmaps/test-Geocoder.htm) used in method A and the ArcGIS online geocoding service. The location at the address level meant positioning of the address in front of the residence (house or building). As it was not feasible, timewise, to check manually all addresses, it was decided to check manually all addresses with an accuracy level equal to or lower than street segment (≤6) or all the addresses (accuracy level 8 or 9) deviating more than 50 m from the position obtained with an independent method of geocoding (ArcGIS online geocoding service). Considering property width, we considered locations accurate at 50 m. Consequently, all addresses with an accuracy level of 8 (address) or 9 (point of interest: school, hometown, etc.) were geocoded again with the ArcGIS online geocoding service [45]. The street network database of ArcGIS online is based on Navteq; the number of reference addresses stored in the database was not available from ESRI. For each address geocoded by both method A (batchgeocoder) and ArcGis online, the Euclidean distance between addresses geocoded by the online geocoder and the addresses geocoded by ArcGis Online was computed, using the point to point function in ArcGis 10.0. We used the Euclidean distance in the present study [13,21,24,32,46] as the (straight-line) distance from a person's residence to an environmental exposure source has been shown to be a key factor of human exposure to environmental pollutants, such as dispersion of agricultural pesticides, dioxins or traffic-related emissions [5,16,47] and thus is used in GIS-based environmental exposure modelling. Addresses with a distance greater than 50 m were selected for manual checking and verified by a trained technician; dots were relocated to the right location when necessary. For manually checked addresses, relocation and the new accuracy level (town hall, locality, street segment or addresses) were recorded, as well as the information on the database used to determine manually the most accurate location (i.e. Google Maps®, Google Street View®, Geoportail®, Yahoo Map®, Geoportail® from IGN or BD Adresses® from IGN®) [35,43,48]. Addresses were located at the best available location based on the specification of the address itself (e.g., when the street number was missing, the best accuracy would have been to the street segment). When Geo3N addresses did not have exact postal address information (missing street name), the best location was the town hall. "Method R" was used as the reference to compare the accuracy of the two automatic geocoding methods (methods A and B).

Data analyses
To facilitate comparison of the accuracy of the two automatic geocoding methods with the accuracy of method R, we regrouped the accuracy levels for each method into three categories, i.e. city or postal code, street segment, address or point of interest for method A; postal code or town hall, locality or street segment, interpolated address or address for method B; town hall, street segment or locality, address for method R. To assess the geocoding accuracy of each of the two automatic methods, we selected unique addresses among the 4,247 residential addresses collected consecutively. As the addresses were collected for each questionnaire, if the woman lived at the same address for two consecutive questionnaires, there were two addresses in the database. Minimal differences could occur in the spelling of these addresses. To identify unique addresses, we needed to know the points that overlap. Thus, we calculated X and Y coordinates of each automatically geocoded point from methods A and B, as well as the X and Y coordinates from method R. By matching X and Y coordinates from methods A and R, and methods B and R, we obtained respectively 2,224 and 2,425 pairs of X and Y coordinates corresponding to unique addresses. For method R, we obtained 2,112 unique pairs of X and Y coordinates. We computed two distance matrices, one for method A and one for method B, by calculating the Euclidean distance between each automatically geocoded unique address and its corresponding address in method R. We grouped distances into six categories (0-25 m, 26-50 m, 51-100 m, 101-400 m, 401-800 m and greater than 800 m) and the proportion of addresses in each category was computed for methods A and B. These categories were chosen to ease comparison of our results with those from previous studies [20,24].
To provide details on the imprecision of addresses located at the street segment level, we calculated the median length of street segments (BD Adresses®from IGN) in urban and rural areas and in the city of Lyon. We selected a sample of streets located in the Rhône-Alpes region in a mainly urban department (Rhône) and a mainly rural department (Ardèche) to calculate, in each department, the median length of street segments overall and according to their rural or urban status. For each address, the urban or rural status was established for the year of residence using the French national institute for statistics and economic studies (INSEE) data. To account for changes in status over time, we used the 1990 urban area definition (UAD) for addresses from 1990 to 1995; the 1999 UAD for addresses from 1996 to 2004 and the 2010 UAD for addresses from 2005 to 2008. For each of the three geocoding methods, levels of accuracy were compared between urban and rural areas and according to two timeperiods (1990-2000 and 2001-2008) using Chi-Square tests. All p-values were two-sided and p-values < 0.05 were considered as statistically significant. All p-values were two-sided and the significance level was set at 0.05. Cohen's kappa coefficients were calculated to assess the agreement between the accuracies of methods A and B and the accuracy of method R [49]. The kappa coefficients were also calculated by timeperiods (1990-2000 and 2000-2008) and urban/rural status. We used the SAS statistical software version 9.4 (SAS Institute Inc., Cary, North Carolina) for data analysis.
Based on 2,112 unique pairs of X and Y coordinates of addresses coded by method R, 723 (34.2%) addresses with an accuracy level lower or equal to six were checked manually and 329 (45.5%) of the latter were relocated. Also, 1,389 (65.8%) addresses with accuracy levels of 8 and 9 were re-geocoded with ArcGis online and 203 (14.6%) of the latter were relocated manually. Overall, with this reference layer, 63.2% of residence addresses were located to the address, 29.2% to the street segment or to the locality and 7.6% to the town hall. In the reference layer, 74.4% of addresses were located at the address level in urban areas versus 10.5% in rural areas (p < 0.0001, data not shown). The level of accuracy did not vary according to the time-period of residence: 62.1% and 63.3% of addresses were located at the address level for the time periods 1990-2000 and 2001-2008, respectively (p = 0.67, data not shown).
The positional errors of addresses located by method A and method B compared with method R are presented in Table 1. With method A, 405 (18.2%) addresses had a level of accuracy to the city or to the postal code, 363 (16.3%) to the street segment, 1,448 (65.1%) to the address or to the point of interest; 8 addresses (0.4%) could not be geocoded. Among addresses geocoded to the street segment, and address or point of interest, 226 (62.3%) and 1,241 (85.7%) respectively had a positional error of less than 25 m when compared with the layer generated by method R. For addresses geocoded to the city level or postal code, 160 (39.5%) had a positional error lower than 25 m and 146 (36.1%) above 400 m. Seventeen addresses (0.8%) had a positional error of over 30 km. The latter were incomplete and presented a wrong spelling of the municipality. Using method B, 1,490 (61.4%) of the addresses were geocoded to the point address or interpolated address level, 558 (23.0%) to the street segment or locality, and 377 (15.5%) to the town hall or postal code. One thousand (67.1%) addresses were located to the address or to interpolated address with a positional error of less than 25 m, as well as 53 (14.1%) were located at the postal code or town hall level, and 132 (23.7%) were located at the street segment. Addresses with the highest positional error (>400 m) compared with method R were geocoded to the town hall or the postal code (n = 244, 64.8%). Overall, 14.7% of addresses required manual checking with method B compared with 30.7% with method A, resulting in less geocoding time spent by the technician for method B.  The concordance with method R was assessed separately for each of the two automatic methods (Table 2). Overall, Kappa coefficients were 0.60 between methods A and R and 0.61 between methods B and R. For addresses located in urban areas agreements of 0.56 and 0.52 were found respectively between methods A and R, and methods B and R, while in rural areas, agreements with method R were 0.39 and 0.54 for methods A and B respectively. Agreement with method R was 0.61 and 0.60 respectively for methods A and B for the period 1990-1999 and 0.56 and 0.70 respectively for methods A and B for the period 2000-2008.

Discussion
In the present study, we compared the accuracy of two automatic geocoding methods, overall, and according to urban or rural status of addresses and to the time period of residence (from 1990 to 2008). Compared with the reference method, the two methods of geocoding gave similar results in terms of general accuracy, with more than 60% of addresses geocoded to the exact address level, and more than 15% to the street segment level. Accuracy was higher in urban areas than in rural areas, while no difference was observed according to the period of residence. Compared with method A, method B allowed more control at all steps of the geocoding process and was less time consuming, in particular regarding the manual checking.
Based on the Euclidean distance to the address located by method R, 82.5% of addresses geocoded with method A and 71.3% of addresses with method B had a positional error lower than 100 m. This difference can be explained by the use of the same initial automatic geocoding (online geocoding) for method A and method R. Overall, these proportions are comparable to those from other studies conducted in France and outside France, with figures of 80.9% to 82.0% of addresses with positional error below 100 m in two French studies [31,36], and 72.0% to 86.0% in international studies [20,24,34]. The accuracy level may have important implications on misclassification of individuals' exposure, depending on the spatial concentration gradient of the exposure of interest and should be considered in the study design. While these geocoding errors observed in our study appear overall modest in magnitude, Ganguly et al. showed that positional errors exceeding 100 m may alter exposure estimates, in particular for exposures with important spatial gradients, such as traffic-related air pollution [16,50]. To minimize misclassification, studies investigating this type of exposure should include only addresses with a level of accuracy at the address level and exclude those with an accuracy level at the street segment or the locality. This would be even more important for studies investigating the health impact of high voltage showing an even steeper spatial gradient [31,51]. For other exposures, such as airborne dioxins emitted by industries with elevated stack height, the pollutant concentrations decrease to near background levels at distances of 3 km to 5 km making it possible to include geocoded addresses both at the street segment and at the locality levels [52,53]. These observations stress the importance of conducting sensitivity analyses to examine the potential impact of positional errors on exposure estimates.
Our findings are consistent with other studies showing a more precise and accurate geocoding for addresses located in urban areas compared to rural areas [20,24,35,36]. In these studies, the median values of the positional error ranked from 31 m to 56 m in urban areas and from 45 m to 212 m in rural areas where addresses lack frequently street number and are often limited to the name of the hamlet. Three studies (two US and one French) have geocoded historical addresses covering periods ranking respectively from 1948 to 2000, 1930 to 2000, and 1960 to 2001 [2, 20, 36]. In agreement with the present study, two of them did not observe major variations in the positional accuracy according to the timing of addresses [20,36], whereas Brody et al., a US study, reported a better positional accuracy for recent addresses, with 37% of addresses located to the address level in 1930 vs. 62% for 1970-1980 and 97% for 1990-2001 [2]. Kappa coefficients showed overall good agreement with the reference method for the two automatic methods. However, for rural addresses, agreement was higher with method B compared with method A (0.54 vs. 0.39, respectively). Also, recent addresses (from 2000-2008) showed a higher Kappa coefficient for method B compared with method A (0.70 vs. 0.56).
The strengths of our study include the large number of addresses (n = 4,247) and study subjects (n = 1,685) for whom residential addresses had been prospectively recorded over a 19-year study period (1900-2008). Moreover, we were able to classify addresses according to the rural or urban status of the area of residence. Our study is one of the first to investigate the geocoding of subjects from a national prospective cohort and offer both a spatial and temporal analysis on the quality of geocoding using different tools. In addition, the manual checking of the correct location of addresses was done for a large number of addresses and based on aerial images (Fig. 2). However, our study has several limitations. First, E3N addresses were not initially designed to be geocoded and this could have affected positional accuracy. However, the findings on addresses located with a  positional error lower than 100 m were consistent with another French study in which addresses were recorded to be geocoded [36]. Also, to take into account potential errors in the spelling of addresses not collected to be geocoded we used, for method B, a threshold of 50 for spelling sensitivity. A higher threshold, as used by some authors, would have allowed only minor spelling variations of addresses and restrict the candidates to exact matches [21,24,41,43]. As E3N participants are mostly teachers, some of them indicated only the name of their school (n = 49) or workplace (n = 6) in the address field; because these names are not available as such in reference databases, those could not be automatically geocoded and this may have had a minor impact on the global accuracy of both automatic geocoding methods. Second, because of the large number of addresses in our study and size of the study territory, as well as bad GPS signal reception in cities [54] feasibility of using field GPS location to validate the true location of all addresses, as performed by others was limited [13,20,24,34]. However, the use of aerial photography [28,44,55] via Google Maps ® , Geoportail ® and Google Street View® to manually check all addresses initially geocoded with a low level of accuracy, allowed us to be confident in the precision of the address location in our reference layer (method R). Third, one weakness of methods A and R is the impossibility, despite repeatedly contacting Google and ESRI France, to obtain the number of addresses recorded in their reference database. Furthermore, geocoding of addresses in environmental epidemiology using external services or free online devices, such as the batch online geocoding, raises privacy and ethical considerations [56]. Since addresses may allow the personal identification of the study subjects, their transfer to third parties may breach participants' confidentiality and anonymity, even after removal of any sensitive information, and in particular in defined geographic areas with small numbers of study subjects. Inhouse geocoding generally allows a better control of any type of unauthorized access to sensitive information.
The present study confirmed the geocoding method to be used in the E3N national cohort as a basis of GIS-based exposure modelling of environmental pollutants at the national level and analyses of related disease risk, such as breast cancer. The findings will contribute to strengthen the reliability of geocoding/GIS-based methods to assess environmental exposures, while taking into account privacy and ethical issues. Our results can be further used for applications in other European cohorts to make greater and more efficient use of the impressive resource of existing cohort data, to investigate environmental risk factors based on past and current places of residence. Thus, the present study could be reproduced in other European cohorts by integrating national road network databases into GIS software (i.e. ARCGIS). Future studies should precisely explore the impact of positional errors and accuracy level of addresses on misclassification for various environmental pollutants with varying distance decline pattern. Further methodological work is still needed on the feasibility of precisely geocoding addresses before 1990 (complete residential history from birth to recruitment into the cohort) in order to assess lifetime exposures from birth onward.

Conclusion
Our study demonstrated the feasibility of geocoding addresses in epidemiological studies not initially designed to be used for environmental exposure assessment purposes, for both recent addresses and residence locations dated from more than 20 years. Furthermore, our results showed no major difference in final geocoding accuracy between the two automatic geocoding methods, compared with the manual reference method. Overall, more addresses showed a positional error lower than 100 m with method A, while the Kappa coefficients showed higher agreement with the reference method for method B, for both rural areas and the 2000-2008 period. Also, this in-house method allowed a better control at all steps of the geocoding process and was less time consuming. Future epidemiological studies should prospectively record residential addresses in a way that would improve geocoding for environmental exposure assessment. Finally, knowing the accuracy of the geocoding tool used in the context of environmental exposure assessment will help to limit misclassification bias due to positional errors. Epidemiological studies should be able to report their street network reference database and the accuracy of their geocoding method.