Skip to main content

Spatial variability in levels of benzene, formaldehyde, and total benzene, toluene, ethylbenzene and xylenes in New York City: a land-use regression study



Hazardous air pollutant exposures are common in urban areas contributing to increased risk of cancer and other adverse health outcomes. While recent analyses indicate that New York City residents experience significantly higher cancer risks attributable to hazardous air pollutant exposures than the United States as a whole, limited data exist to assess intra-urban variability in air toxics exposures.


To assess intra-urban spatial variability in exposures to common hazardous air pollutants, street-level air sampling for volatile organic compounds and aldehydes was conducted at 70 sites throughout New York City during the spring of 2011. Land-use regression models were developed using a subset of 59 sites and validated against the remaining 11 sites to describe the relationship between concentrations of benzene, total BTEX (benzene, toluene, ethylbenzene, xylenes) and formaldehyde to indicators of local sources, adjusting for temporal variation.


Total BTEX levels exhibited the most spatial variability, followed by benzene and formaldehyde (coefficient of variation of temporally adjusted measurements of 0.57, 0.35, 0.22, respectively). Total roadway length within 100 m, traffic signal density within 400 m of monitoring sites, and an indicator of temporal variation explained 65% of the total variability in benzene while 70% of the total variability in BTEX was accounted for by traffic signal density within 450 m, density of permitted solvent-use industries within 500 m, and an indicator of temporal variation. Measures of temporal variation, traffic signal density within 400 m, road length within 100 m, and interior building area within 100 m (indicator of heating fuel combustion) predicted 83% of the total variability of formaldehyde. The models built with the modeling subset were found to predict concentrations well, predicting 62% to 68% of monitored values at validation sites.


Traffic and point source emissions cause substantial variation in street-level exposures to common toxic volatile organic compounds in New York City. Land-use regression models were successfully developed for benzene, formaldehyde, and total BTEX using spatial indicators of on-road vehicle emissions and emissions from stationary sources. These estimates will improve the understanding of health effects of individual pollutants in complex urban pollutant mixtures and inform local air quality improvement efforts that reduce disparities in exposure.

Peer Review reports


Despite regulatory controls, urban populations are exposed to toxic air pollutants with potential to cause cancer or other serious health effects. The 1999 Amendments to the Clean Air Act identified 187 hazardous air pollutants (HAPs) subject to emissions based controls due to health effects associated with ambient exposures [1]. These regulations include controls on 174 stationary source categories to meet maximum achievable control technology standards and mobile source air toxics rules that reduce vehicle emissions through fuel controls, including lowering limits on benzene in gasoline beginning in 2011 [2].

HAPs commonly found in urban areas include formaldehyde and a group of aromatic volatile organic compounds (VOC): benzene, toluene, ethylbenzene, xylene (together known as BTEX). Among these, benzene and formaldehyde are classified by the International Agency for Research on Cancer as human carcinogens (Group 1); both are key drivers of estimated cancer risk from organic HAPs in the US [3, 4]. Other BTEX compounds--toluene, ethylbenzene, and xylene--have been found to produce adverse health effects including respiratory and neurological effects [57] and react to form secondary organic aerosols, contributing to ambient fine particulate matter (PM2.5) [8]. BTEX and formaldehyde also play important roles in the photochemical reactions that form ozone [9].

Recent analyses suggest that 49% of New York City residents live in census tracts exceeding the 1 in 10,000 HAP-attributable cancer risk benchmark compared to 4.8% of the population nationwide, with the majority of the risk attributed to benzene and formaldehyde exposures [10, 11]. Primary local sources of BTEX are on-road and non-road gasoline vehicles and engines, with emissions from petroleum transport/storage and solvent usage also making substantial contributions [12]. On- and non-road gasoline and diesel vehicles and engines are also predominant sources of primary formaldehyde emissions in NYC with additional contributions from stationary-source fuel combustion [12]. Formaldehyde is also formed secondarily by photooxidation of hydrocarbons. Ambient formaldehyde levels in New York City have been observed to peak in summer months, likely due to seasonal increases in photochemical activity [13].

While national air toxics regulations have reduced exposures, the limited number of monitoring sites in urban areas restricts the ability to assess spatial variation in concentrations within cities for developing local control policies. For example, in New York City there are currently six regulatory monitors reporting VOC measurements and five reporting aldehydes, with monitors operating only every sixth day [14]. While this network provides valuable information on air toxic trends useful in evaluating exposure and regulating ozone, they are not sufficient to understand fine scale intra-urban spatial variation in concentrations due to localized sources such as traffic [15, 16].

Recently, land-use regression (LUR) models have been increasingly used to estimate intra-urban spatial variability of air pollutants and in developing exposure estimates for epidemiological research [17, 18]. They have been used in New York City to develop exposure estimates for fine particulate matter (PM2.5), oxides of nitrogen (NOx), and sulfur dioxide (SO2) (Clougherty et al. submitted 2011, [19]). While many LUR studies focus on nitrogen dioxide NO2 and PM2.5, they have also been used to estimate BTEX concentrations [16, 2023].

This paper evaluates spatial variation in benzene, total BTEX and formaldehyde concentrations across New York City using a saturation sampling campaign conducted in the spring of 2011 and land-use regression modeling.


Spatial and temporal allocation of sites

BTEX and formaldehyde monitoring was conducted at a subset of the 150 sites routinely monitored for PM2.5, elemental carbon, PM2.5 constituents, NOx, SO2 and ozone throughout NYC as part of the New York City Community Air Survey (NYCCAS) network, an initiative within the City’s sustainability plan, PlaNYC [24]. The NYCCAS monitoring network sites were selected to capture the range in variation of key local emissions sources while providing adequate spatial coverage throughout the City. A description of the selection process for these 150 sites is described elsewhere (Matte et al. submitted 2011). In short, 120 sites were selected for monitoring through stratified random sampling of 7,756 300 m x 300 m grid cells with oversampling in areas of high traffic and high building density- indicators of two categories of important local emissions sources- to account for skewed distributions of these source proxies within New York City. We chose building density rather than population density as an indicator of source activity suitable for both residential and commercial areas of the city. Thirty additional sites were selected to fill spatial gaps and capture areas of interest.

Of the original 150 sites, we selected 70 sites for air toxics monitoring (referred to as “distributed” sites) by first retaining 21 sites that were geographically isolated from other monitoring locations or had produced high residuals in our prior statistical models for NOx, SO2, PM2.5, and EC. These sites were included to ensure that the monitoring captured a full range of traffic and land-use settings. We then randomly selected from the remaining available sites. We compared the distributions of these 70 sites in relation to traffic and building density to the distribution in the original 150 sites to confirm that similar coverage of major source density was achieved in the subset of sites selected for air toxics monitoring (Table 1). Three reference sites were selected in parks, away from major sources, in Central Park in Manhattan, Queens College in Queens, and La Tourette Golf Course in Staten Island (Figure 1).

Table 1 Distribution of traffic and building density at NYCCAS network sites and Air Toxics sampling sites
Figure 1
figure 1

Map of New York City Community Air Survey sites monitored for BTEX compounds and formaldehyde.

We collected samples of BTEX and formaldehyde at each of the 70 distributed sites, 14 of which were allocated at random to each of five two-week sessions, from 3/22/2011 to 6/1/2011. At the three reference sites, samples were collected during all five sessions to assess city-wide temporal variation related to meteorology.

Air sampling and analysis

Formaldehyde and BTEX compounds were measured with Radiello radial passive sampling tubes (Fondazione Salvatore Maugeri, Padova, Italy). Samplers were placed in weather protective shelters and mounted at 10 feet onto street-side signal and lamp posts. Formaldehyde measurements were taken for 1-week while BTEX measurements were conducted for 2-weeks to meet sampler manufacturer’s sample time specifications [25, 26].

Passive BTEX samplers contained activated charcoal that collects VOCs by adsorption. Sample analysis was conducted by Air Toxics Limited (Folsom, CA) by extraction with carbon disulfide and analyzed using gas chromatography with mass spectrometry (GCMS). GCMS identified five BTEX compounds: benzene, toluene, ethylbenzene, o-xylene, and m/p-xylene, which were summed to compute the total BTEX concentration. These samplers have been used in VOC field monitoring campaigns [2729] as well as prior LUR studies [20].

Passive aldehyde samplers contained 2,4-dinitrophenylhydrazine (2,4-DNPH) coated silica which converts aldehydes to stable hydrazone derivatives, 2,4-dinitrophenylhydrazone. Sample analysis was performed by Air Toxics Limited (Folsom, CA) by extracting hydrazones with acetonitrile and analyzing using reverse phase high-pressure liquid chromatography with ultra-violet detection at 360 nm (HPLC-UV). Passive sampling by 2,4-DNPH derivitaziation has been evaluated and applied extensively in ambient formaldehyde monitoring studies [3032].

Quality assurance

During each sampling session one field blank was placed unopened at the La Tourette reference site for the duration of the session and analyzed alongside all other samplers. At two sites in each session, two sets of samplers were deployed side by side to assess differences in collocated samplers. Laboratory quality control procedures followed guidelines established for passive VOC and aldehyde monitoring by the sampler manufacturer using standard EPA and OSHA methodologies [33, 34]. For each pollutant, descriptive statistics were computed by session to identify potential outliers for further investigation.

Data analysis

Descriptive analysis

We computed descriptive statistics across all distributed and reference site measurements and compared concentrations to those reported during the same time period at rooftop regulatory monitors [14]. Raw measurements were then adjusted for temporal variation by dividing the distributed site measurements by the mean reference value in each session then multiplying this ratio by the mean of reference sites across the entire period. We described spatial variability by computing the coefficient of variation (CV) of temporally adjusted measurements across all sessions. We examined spatial distributions within each session by computing the CV (based on unadjusted values) within each session and examining plots of monitored concentrations, session means, and reference site means. To assess temporal variation, we regressed raw distributed site concentrations on session-specific means of reference sites, and used the R-squared (R2) as the indicator of temporal variation (referred to as “temporal R2” in Results section).

Geographic variables

Spatial data on emission source indicators were collected and analyzed using ArcGIS 9.2 (ESRI, Redlands CA). These datasets were obtained from a variety of public and private sources and encompassed a range of data types and resolution from highly resolved road network line data to traffic volume modeled along “links” between destinations. Source indicator categories included total and road-specific measures of traffic, mobile source diesel combustion, population metrics, built space area, land-use type, and emissions permits from point sources, transportation facilities, and waste treatment and transfer facilities (Table 2). City-issued permits on point sources were filtered by searching the business description field using keywords derived from the EPA National Emissions Inventory [12] of processes known to produce the air toxics of interest. For each indicator, covariates were calculated within 15 buffers surrounding each monitoring location, at distances of 50 to 1000 meters. Detailed descriptions of the GIS datasets used to develop source indicators for NYCCAS analyses are available in Additional file 1: Table S1.

Table 2 Summary of GIS-based source indicators

LUR model building process

Prior to modeling, concentrations among the three reference sites across the five sampling sessions were examined for similarity in temporal patterns. For benzene, while two reference sites were highly correlated (Pearson’s Correlation (r) = 0.84), one site showed low correlation with the others (r = 0.13 and −0.18) potentially indicating local source influence on temporal variation at that specific site. This site’s benzene measurements were removed to avoid distortion or bias in temporal adjustment. Raw concentrations were then used as the dependent variable in the model building process and each session’s mean pollutant concentration at the reference sites was added as a covariate [35] to adjust for city-wide temporal variation due to meteorology while explicitly accounting for error in estimating the temporal term.

Source indicator variables were grouped into six emission indicator-based categories: total traffic density, truck and bus traffic, permitted combustion-related emissions from point sources, built space density, population density, non-combustion permitted emissions (solvent use, petroleum/chemical bulk storage). For each pollutant, we used a Pearson’s correlation matrix to select the two buffer specific variables within each category most correlated with temporally adjusted pollutant concentrations. Each of these two variables was paired with a second category-specific term that optimized the R2 in a two-variable model against the pollutant concentration. This resulted in a total of four candidate covariates per category that were considered in subsequent model building.

We followed a manual forward step-wise model-building process using reference site concentrations, emissions source covariates, and site characteristics. Models were first fit using a randomly selected “modeling subset” of 85% (n = 59) of distributed sites and the resulting provisional models were validated by comparing predicted values with measured values at the remaining 15% (n = 11) of sites. Model diagnostics, including studentized residuals and Cook’s distance values, were inspected for outliers and highly influential points and models were evaluated for coherence with known emission source patterns and for sensitivity to alternative emission source indicators. Once the provisional models were validated, raw measurements from all 70 sites were used to produce final model parameters describing the spatial and temporal variability in pollutant concentrations and for predictions of seasonal mean values. After building the final model we computed an additional purely spatial model that regressed the temporally-adjusted pollutant concentrations onto the final set of spatial source terms to confirm that both temporal adjustment strategies produced comparable results. The overall fit of this model is reported in the Results section as the amount of spatial variability explained by the model.


Descriptive statistics

Across 10 weeks of monitoring, 70 sites were sampled successfully for formaldehyde while 69 of 70 scheduled sites were sampled successfully for BTEX compounds due to a field error where a sampler was not deployed to one site scheduled for monitoring. Measurements in all samples exceeded the limit of quantification (LOQ) for BTEX compounds and formaldehyde. Field blank concentrations were below the LOQ for all BTEX compounds and all but one formaldehyde sample. Collocated samples (n = 10) showed good agreement with mean absolute percent differences of 10.9%, 8.0%, and 4.6% and R2 of 0.80, 0.94, and 0.98 for benzene, BTEX, and formaldehyde, respectively. One formaldehyde result was removed from the analysis because of implausibly high concentrations. This yielded 69 total benzene, BTEX and formaldehyde samples from distributed sites used in further analyses.

Street-side concentrations of all pollutants were higher on average than reference site concentrations while average benzene and BTEX levels at distributed sites showed higher concentrations and wider ranges than those reported at regulatory monitoring sites during the same period (Table 3). Average formaldehyde levels from distributed sites were slightly lower than average regulatory site measurements due to one regulatory monitor reporting high concentrations for several days during the campaign.

Table 3 Summary statistics for pollutant concentrations at NYCCAS sites and rooftop regulatory monitoring sites from 3/22/2011-6/1/2011

Spatial variability, estimated by the CV across all temporally adjusted measurements, was greatest for BTEX, followed by benzene, then formaldehyde (CV of 0.57, 0.35, 0.22, respectively). Benzene and BTEX concentrations showed little temporal variation; 8% and 3% of variance, respectively, was explained by session (Figure 2). Formaldehyde showed the most city-wide temporal variability (temporal R2 = 46%), with levels generally increasing as the season progressed and temperatures increased (Figure 2). Temporally adjusted concentrations were spatially correlated across all three pollutants with slightly better correlation between benzene and total BTEX or formaldehyde (r = 0.73) than formaldehyde and BTEX (r =0.69).

Figure 2
figure 2

Distribution of two-week average benzene and BTEX and one-week average formaldehyde concentrations with average session temperatures measured at monitoring sites.

Modeling results


Predicted concentrations from the provisional model explained 62% of the variance in concentrations at the validation sites. Spatial and temporal variability of benzene was associated with, in order of importance based on partial R2, traffic signal density within 400 m of the monitors, length of interstate, state, and county highways within 100 m, and the reference site mean. The bivariate relationships between the spatial model terms and temporally adjusted concentrations demonstrated consistent positive associations across all 69 monitoring sites (Figure 3). Including all 69 sites in the final model showed that after controlling for other model terms, an inter-quartile range (IQR) increase in traffic signal density (an indicator of vehicle traffic and congestion) was associated with an increase in benzene concentration of 0.32 μg/m3 while an IQR increase in road length was associated with an average increase in benzene of 0.15 μg/m3. These terms describe 60% of the spatial variability (not shown) of benzene across all monitoring sites and, together with the reference site means, 65% of the temporal and spatial variation in benzene (Table 4, Figure 4).

Figure 3
figure 3

Scatterplots of GIS covariates and temporally adjusted concentrations.

Table 4 Land-use regression model results for benzene, BTEX, and formaldehyde. Final model terms listed in order of importance based on partial R 2
Figure 4
figure 4

Comparisons of temporally adjusted observed measurements vs. LUR predicted estimates at monitoring sites.


Two sites showed high studentized residuals (>8) and high Cook’s distance values (>0.6) potentially indicating unusual emissions patterns near the site. These sites, located in the industrial areas of the South Bronx, were not outliers for benzene and formaldehyde, but showed very high levels of toluene, ethylbenzene, and the xylenes. To avoid distortion of the final, city-wide model, we elected to remove these sites from the final model. Predicted concentrations from the provisional model explained 65% of the variance in concentrations at the validation sites. The bivariate relationships between these spatial model terms and temporally adjusted concentrations confirmed that consistent positive associations were observed across all 67 sites (Figure 3). Spatial and temporal variability of BTEX compounds was associated with, in order of importance based on partial R2, traffic signal density within 450 m of the monitors, kernel-weighted density of solvent-use industries within 500 m, and reference site mean. The final model that included all 67 sites showed an IQR increase in traffic signal density was associated with an increase in BTEX concentration of 1.62 μg/m3 while an IQR increase in density of permitted solvent-use industries was associated with an increase in BTEX concentration of 0.52 μg/m3. These terms described 64% of the spatial variability (not shown) in BTEX across all monitoring sites and, in combination with the reference site means, explained 70% of the spatial and temporal variation in BTEX (Table 4, Figure 4).


Predicted concentrations from the provisional model explained 68% of the variance in concentrations at the validation sites. Spatial and temporal variability of formaldehyde was associated with, in order of importance based on partial R2, reference site mean, traffic signal density within 400 m of the monitors, length of roads within 100 m, and interior built space within 100 m. The bivariate relationships between these spatial model terms and temporally adjusted concentrations demonstrated consistent positive associations across all 69 monitoring sites (Figure 3). The final model that included all 69 sites showed an IQR increase in signal density was associated, on average, with an increase of 0.36 μg/m3 formaldehyde, an IQR increase in interior built space density (index of amount of fuel combustion for heating) was associated with an increase of 0.08 μg/m3, and an IQR increase in road density was associated an increase of 0.19 μg/m3. These terms described 69% of the spatial variation (not shown) in formaldehyde across all monitoring sites, and in combination with the reference site means, they described 83% of the spatial and temporal variation (Table 4, Figure 4).


This study demonstrates significant intra-urban spatial variability in ambient levels of benzene, total BTEX, and formaldehyde across New York City monitoring sites, with the widest range in concentrations found in total BTEX. Within the season, we observed limited temporal variability for benzene and BTEX while formaldehyde levels increased with increasing average temperatures. Land-use regression models explained 65%, 70%, and 83% of the total variability of benzene, BTEX, and formaldehyde, respectively with temporal terms and spatial variables representing traffic density, solvent-use industries and built space. The provisional models built with the modeling subset were found to predict concentrations well, predicting 62% to 68% of monitored values at validation sites.

Average benzene and BTEX levels were higher than those measured at rooftop regulatory monitors during the study period, reflecting closer proximity of NYCCAS monitoring sites to traffic sources. Prior NYC-based monitoring studies of air toxics showed higher ambient levels of benzene and BTEX at residential sites mainly in the Bronx and Northern Manhattan than levels reported here [13, 36]. This is likely explained by overall decreases in concentrations in NYC and nationwide over the past decade as well as relatively higher levels of traffic related pollutants in Northern Manhattan and the Bronx compared to the city overall [14, 37]. Associations of benzene and BTEX concentrations with high traffic density are consistent with prior monitoring studies [23, 38, 39].

We found that variables specific to traffic congestion and volume best explained the spatial variability of benzene, with traffic volume indicated through total road lengths around monitoring sites and indicators of traffic density and congestion represented by traffic signal density. These variables were consistent with known sources of benzene in NYC, where gasoline vehicles are, collectively, the predominant source [12]. Prior LUR models for benzene have shown similar results, although some included additional terms related to petroleum usage, proximity to point sources, and population density [16, 2123]. The association of benzene concentrations with traffic within 400 meters of monitoring locations is consistent with observations that increased benzene levels near roadways decay to background within around 300 meters [40]. In contrast to many prior LUR studies, we chose to address temporal variation by using raw unadjusted concentrations as the dependent variable and the reference site mean as a covariate with the spatial covariates in the model. The advantage of this approach over a model in which temporally adjusted values are regressed onto spatial covariates is that, in estimating the slope for emission source terms, it adjusts for city-wide temporal variation due to meteorology while explicitly accounting for error in estimating the temporal term.

The correlates of spatial variability in total BTEX we observed in New York City are also consistent with known local emission sources including traffic and solvent usage [12] and with prior studies linking higher BTEX concentrations to traffic as well as distance to VOC emitting point sources [20, 21, 41]. Likely due to limited geographic distribution throughout the city, we did not find associations with large point sources reported in the National Emissions Inventory [12] and Toxics Release Inventory [42] or petroleum storage facilities. We did however find associations with density of nearby facilities too small to require Title V permits, but permitted by the City to use solvents in industries known to produce BTEX compounds such as spray booths, graphics industries, and auto body and detailing shops. These facilities are distributed throughout many neighborhoods, although more concentrated in industrial areas. An important limitation of our data is the lack of detailed information on solvent type and quantity at these smaller permitted facilities. Additional sampling near different types of facilities and improved emissions data or proxies could help elucidate these patterns in future work.

Formaldehyde measurements showed less spatial variability than benzene and total BTEX, compatible with findings from prior intra-urban analyses of data from national monitoring networks [43]. We found more temporal variability in formaldehyde with levels increasing with higher average temperatures. These findings are consistent with studies indicating higher temperature and longer daylight hours increase photochemical formation of secondary formaldehyde and levels peak during warm months and mid-day periods [4345]. To our knowledge there have been no published LUR models for formaldehyde. The predictors of spatial variation found are consistent with known sources of local primary ambient formaldehyde with higher levels found in areas of increased traffic emissions and interior built space indicating increased fuel combustion related to space and water heating.

This study indicates that LUR modeling can be applied successfully to predicting benzene, BTEX, and formaldehyde levels for use in exposure assessment and epidemiological research in complex urban environments like New York City. Prior VOC and aldehyde exposure assessments have applied modeled data from EPA’s National Air Toxics Assessment (NATA) [3, 4648], regulatory monitoring data [49, 50], and combinations of fixed site and personal monitoring [13, 41]. While NATA modeling is useful in estimating relative concentrations in regional scale assessments, in fine scale, urban analyses, estimates are subject to limited spatial resolution of area and mobile sources in the National Emissions Inventory [51]. Similarly, using few central-site regulatory monitors for exposure classification limits the ability to assess near source concentration gradients, such as near roadways [15]. Prior air toxics assessments conducted in New York City using fixed site and personal monitoring have provided important data on indoor, outdoor, and personal exposures among cohorts in specific neighborhoods [13, 36] but have not offered comprehensive assessments across the City.

City-wide average temporally adjusted springtime measurements of benzene correspond to concentrations between EPA’s 1 in 105 and 106 lifetime cancer risk benchmarks [52]. Average formaldehyde levels in this study correspond to concentrations above the EPA 1 in 105 lifetime cancer risk benchmark [53]. While risk benchmarks are based on continuous exposures experienced over a lifetime, these springtime results suggest HAPs may contribute meaningfully to cancer and other health risks among large populations of New Yorkers who reside in close proximity to traffic and other local emission sources.

An important limitation to these results is that data was collected during a single spring season. Pollutant concentrations observed may differ in other seasons, particularly for formaldehyde where differences in photochemical activity will affect secondary formation. However, spatial variation should be consistent throughout the year as patterns in source density overall remain relatively unchanged over short time periods. As with all LUR studies, limited data on specific emitters of VOC compounds adds uncertainty to model estimates and likely attenuates associations between observed concentrations and source indicators.

These findings, and those from prior saturation sampling and land-use regression studies conducted in New York City (Clougherty et al. submitted 2012, [19, 37]), indicate many of the neighborhoods impacted by high levels of PM2.5 and NO2 exposure may also experience high levels of benzene, BTEX and formaldehyde. High traffic density contributes to higher levels of both criteria and toxic pollutants evaluated here while areas of high building density are associated with high PM2.5 and formaldehyde levels. Because most studies of intra-urban spatial variation in air pollution exposures have focused on criteria pollutants, characterizing spatial patterns of exposure to common urban air toxics will be valuable in elucidating the health effects of individual pollutants in common pollutant mixtures [54] as well as development of emissions reduction strategies that maximize health benefits.


In this analysis we used high density air quality monitoring and land-use regression methods to estimate variability in ambient exposures to benzene, BTEX compounds, and formaldehyde in New York City. We found significant intra-urban spatial variability in all compounds. Indicators of motor vehicle traffic, solvent usage, and stationary source combustion explained much of the variability in concentrations of these air toxics. Many of the same neighborhoods identified by prior studies as being impacted by high levels of criteria air pollutants are also found to have relatively higher levels of these common air toxics due to shared local sources. Characterization of these spatial patterns in air toxics will help improve understanding of the health effects of individual pollutants in complex urban air pollution mixtures and develop targeted air quality management strategies that reduce health disparities in pollutant-attributable adverse health outcomes.



Sum of Benzene, Toluene, Ethylbenzene, Xylenes


Coefficient of Variation


4-DNPH, 2,4-dinitrophenylhydrazine


U.S. Environmental Protection Agency


Gas Chromatograph with Mass Spectrometry


Hazardous Air Pollutant


High Pressure Liquid Chromatography with Ultra-Violet detection


Land-Use Regression


National Air Toxics Assessment


Nitrogen Dioxide


Oxides of nitrogen


New York City Community Air Survey


Occupational Health and Safety Administration


Particulate Matter with aerodynamic diameter less than or equal to 2.5 micrometer


Pearson’s correlation coefficient




Sulfur Dioxide


Volatile Organic Compounds


World Health Organization.


  1. U.S. EPA: About Air Toxics. 2010,,

    Google Scholar 

  2. U.S. EPA: Control of Hazardous Air Pollutants from Mobile Sources. EPA-HQ-OAR-2005-0036. 2007,,

    Google Scholar 

  3. Loh MM, Levy JI, Spengler JD, Houseman A, Bennett DH: Ranking cancer risks of organic hazardous air pollutants in the United States. Environ Health Perspect. 2007, 115 (8): 1160-1168. 10.1289/ehp.9884.

    Article  CAS  Google Scholar 

  4. WHO: World Health Organization International Agency for Research on Cancer. 2011,,

    Google Scholar 

  5. U.S. EPA: Toluene: Hazard Summary. Technology Transfer Network Air Toxics Web Site. 2000a,,

    Google Scholar 

  6. U.S EPA: Ethylbenzene: Hazard Summary. Technology Transfer Network Air Toxics Web Site. 2000b,,

    Google Scholar 

  7. U.S. EPA: Xylenes: Hazard Summary. Technology Transfer Network Air Toxics Web Site. 2000c,,

    Google Scholar 

  8. Odum JR, Jungkamp TPW, Griffin RJ, Forstner HJL, Flagan RC, Seinfeld JH: Aromatics, Reformulated Gasoline, and Atmospheric Organic Aerosol Formation. Environ Sci Technol. 1997, 31: 1891-1897.

    Article  Google Scholar 

  9. U.S. EPA: Air Quality Criteria for Ozone and Related Photochemical Oxidants. EPA/600/R-05/004aF. 2006, EPA Office of Research and Development,,

    Google Scholar 

  10. U.S. EPA: 2005 National Air Toxics Assessment. EPA Office of Air and Radiation. 2011a,,

    Google Scholar 

  11. Sax SN, Bennett DH, Chillrud SN, Ross J, Kinney PL, Spenger JD: A cancer risk assessment of inner-city teenagers living in New York City and Los Angeles. Environ Health Perspec. 2006, 114 (10): 1558-1566. 10.1289/ehp.8507.

    Article  CAS  Google Scholar 

  12. U.S. EPA: 2005 National Emissions Inventory. 2011b,,

    Google Scholar 

  13. Kinney PL, Chillrud SN, Ramstrom S, Ross J, Spengler JD: Exposures to Multiple Air Toxics in New York City. Environ Health Perspect. 2002, 110 (Supp 4): 539-546.

    Article  CAS  Google Scholar 

  14. U.S. EPA: EPA Air Quality System Datamart. 2011c,,

    Google Scholar 

  15. Isakov V, Touma JS, Khlystov A: A method of assessing air toxics concentrations in urban areas using mobile platform measurements. J Air Waste Manage. 2007, 57: 1286-1295.

    Article  CAS  Google Scholar 

  16. Johnson M, Isakov V, Touma JS, Mukerjee S, Ozkaynak H: Evaluation of land-use regression models used to predict air quality concentrations in an urban area. Atmos Environ. 2010, 44: 3660-3668. 10.1016/j.atmosenv.2010.06.041.

    Article  CAS  Google Scholar 

  17. Hoek G, Beelen R, de Hoogh K, Vienneau D, Gulliver J, Fischer P, Briggs D: A review of land-use regression models to assess spatial variation of outdoor air pollution. Atmos Environ. 2008, 42: 7561-7578. 10.1016/j.atmosenv.2008.05.057.

    Article  CAS  Google Scholar 

  18. Jerrett M, Arian MA, Kanaroglou P, Beckerman B, Crouse D, Gilbert NL, Brook JF, Finkelstein N, Finkelstein MM: Modeling the intraurban variability of ambient traffic pollution in Toronto Canada. J Toxicol Env Health. 2007, 70 (3–4): 200-212.

    Article  CAS  Google Scholar 

  19. Ross Z, Jerrett M, Ito K, Tempalski B, Thurston GD: A land use regression for predicting fine particulate matter concentrations in the New York City region. Atmos Environ. 2007, 41 (11): 2255-2269. 10.1016/j.atmosenv.2006.11.012.

    Article  CAS  Google Scholar 

  20. Aguilera I, Sunyer J, Fernandez-Patier R, Hoek G, Aguirre-Alfaro A, Meliefste K, Bomboi-Mingarro MT, Nieuwehuijsen MJ, Herce-Garraleta D, Brunekreef B: Estimation of outdoor NOx, NO2, and BTEX exposure in a cohort of pregnant women using land use regression modeling. Environ Sci Technol. 2008, 42: 815-821. 10.1021/es0715492.

    Article  CAS  Google Scholar 

  21. Mukerjee S, Smith LA, Johnson MM, Neas LM, Stallings CA: Spatial analysis and land use regression of VOCs and NO2 from school-based urban air monitoring in Detroit/Dearborn, USA. Sci Total Environ. 2009, 407: 4642-4651. 10.1016/j.scitotenv.2009.04.030.

    Article  CAS  Google Scholar 

  22. Smith L, Mukerjee S, Gonzales M, Stallings C, Naes L, Norris G, Ozkaynak H: Use of GIS and ancillary variables to predict volatile organic compounds and nitrogen dioxide levels at unmonitored locations. Atmos Environ. 2006, 40: 3773-3787. 10.1016/j.atmosenv.2006.02.036.

    Article  CAS  Google Scholar 

  23. Wheeler AJ, Smith-Doiron M, Xu X, Gilbert NL, Brook JR: Intra-urban variability of air pollution in Windsor, Ontario- Measurement and modeling for human exposure assessment. Environ Res. 2008, 106: 7-16. 10.1016/j.envres.2007.09.004.

    Article  CAS  Google Scholar 

  24. New York City: PlaNYC 2030. 2007,,

    Google Scholar 

  25. Sigma-Aldrich: Radiello diffusive sampling: Aldehydes. 2011,,

    Google Scholar 

  26. Sigma-Aldrich: Radiello diffusive sampling: Volatile organic compounds (VOCs) chemically desorbed with CS2. 2011,,

    Google Scholar 

  27. Cocheo V, Sacco P, Boaretto C, De Saeger E, Ballesta PP, Skov H, Goelen E, Gonzalez N, Caracena AB: Urban benzene and population exposure. Nature. 2000, 404: 141-142. 10.1038/35004651.

    Article  CAS  Google Scholar 

  28. Gallego E, Roca FJ, Perales JF, Guardino X: Evaluation of the effect of different sampling time periods and ambient air pollutant concentrations on the performance of the Radiello diffusive sampler for the analysis of VOCs by TD-GC/MS. J Environ Monitor. 2011, 13: 2612-2622. 10.1039/c1em10075k.

    Article  CAS  Google Scholar 

  29. Pilidis GA, Karakitsios SP, Kassomenos PA, Kazos EA, Stalikas CD: Measurements of benzene and formaldehyde in a medium sized urban environment, indoor/outdoor health risk implications on special population groups. Environ Monit Assess. 2009, 150: 285-294. 10.1007/s10661-008-0230-9.

    Article  CAS  Google Scholar 

  30. Grosjean D, Williams EL: A passive sampler for airborne formaldehyde. Atmos Environ. 1992, 26 (16): 2923-2928. 10.1016/0960-1686(92)90284-R.

    Article  Google Scholar 

  31. Kume K, Ohura T, Amagai T, Fusaya M: Field monitoring of volatile organic compounds using passive air samplers in an industrial city in Japan. Environ Pollut. 2007, 153: 649-657.

    Article  Google Scholar 

  32. Mason JB, Fujita EM, Campbell DE, Zielinska B: Evaluation of Passive Samplers for Assessment of Community Exposure to Toxic Air Contaminants and Related Pollutants. Environ Sci Technol. 2011, 45: 2243-2249. 10.1021/es102500v.

    Article  CAS  Google Scholar 

  33. Air Toxics Limited: NELAP Quality Manual: 13.0 Passive Sampling –Volatile Organic Compounds. Revision 18, 1/2011. 2011, Page 52

    Google Scholar 

  34. Air Toxics Limited: Methods Manual: 4.0 TO-5, TO-11A, Method 0011, CARB 430- Aldehydes and Ketones. Revision 18, 1/2011. 2011, Page 12

    Google Scholar 

  35. Levy JI, Clougherty JE, Baxter LK, Houseman EA, Paciorek CJ: Evaluating Heterogeneity in Indoor and Outdoor Air Pollution using Land-Use Regression and Constrained Factor Analysis. Health Effects Institute. 2010, 152:

  36. Sax SN, Bennett DH, Chillrud SN, Kinney PL, Spenger JD: Differences in source emissions rates of volatile organic compounds in inner-city residences of New York City and Los Angeles. J Expo Sci Environ Epidemiol. 2004, 14: S95-S109.

    Article  CAS  Google Scholar 

  37. New York City: The New York Community Air Survey: Results from Year One Monitoring 2008–2009. 2011,,

    Google Scholar 

  38. Bruno P, Caselli M, de Gennaro G, de Gennaro L, Tutino M: High spatial resolution monitoring of benzene and toluene in the urban area of Taranto (Italy). J Atmosp Chem. 2006, 54: 177-187. 10.1007/s10874-006-9030-1.

    Article  CAS  Google Scholar 

  39. Kwon J, Weisel CP, Turpin BJ, Zhang J, Korn LR, Morandi MT, Stock TH, Colome S: Source Proximity and Outdoor-Residential VOC Concentrations: Results from the RIOPA Study. Environ Sci Technol. 2006, 40: 4074-4082. 10.1021/es051828u.

    Article  CAS  Google Scholar 

  40. Kerner AA, Eisinger DS, Niemeier DA: Near-roadway air quality: Synthesizing the findings from real-world data. Environ Sci Technol. 2010, 44: 5334-5344. 10.1021/es100008x.

    Article  Google Scholar 

  41. Smith LA, Stock TH, Chung KC, Mukerjee S, Liao XL, Stallings C, Afshar M: Spatial Analysis of Volatile Organic Compounds from a Community-Based Air Toxics Monitoring Network in Deer Park, Texas USA. Environ Monit Assess. 2007, 128: 369-379. 10.1007/s10661-006-9320-8.

    Article  CAS  Google Scholar 

  42. U.S. EPA: EPA Toxics Release Inventory. 2011d,,

    Google Scholar 

  43. Touma JS, Cox WM, Tikvart JA: Spatial and temporal variability of ambient air toxics data. J Air Waste Manage. 2006, 56: 1716-1725. 10.1080/10473289.2006.10464576.

    Article  CAS  Google Scholar 

  44. Friedfeld S, Fraser M, Ensor K, Tribble S, Rehle D, Leleux D, Tittel F: Statistical analysis of primary and secondary atmospheric formaldehyde. Atmos Environ. 2002, 36: 4767-4775. 10.1016/S1352-2310(02)00558-7.

    Article  CAS  Google Scholar 

  45. Lei W, Zavala M, de Foy B, Volamer R, Molina MJ, Molina LT: Impact of primary formaldehyde on air pollution in the Mexico City metropolitan area. Atmos Chem Phys. 2009, 9: 2607-2618. 10.5194/acp-9-2607-2009.

    Article  CAS  Google Scholar 

  46. Kalkbrenner AE, Kaniels JL, Chen JC, Poole C, Emch M, Morrissey J: Perinatal exposure to hazardous air pollutants and autism spectrum disorders at age 8. Epidemiol. 2010, 21 (5): 631-641. 10.1097/EDE.0b013e3181e65d76.

    Article  Google Scholar 

  47. Lupo PJ, Symanski E, Waller DK, Chan W, Langlois PH, Canfield MA, Mitchell LE: Maternal exposure to ambient levels of benzene and neural tube defects among offspring: Texas, 1999–2004. Environ Health Perspect. 2011, 119 (3): 397-402.

    Article  Google Scholar 

  48. Sexton K, Linder SH, Marko D, Bethel H, Lupo PJ: Comparative assessment of Air Pollution-Related Health Risks in Houston. Environ Health Perspect. 2007, 115 (10): 1388-1393.

    Google Scholar 

  49. McCarthy MC, O’Brien TE, Charrier JG, Hafner HR: Characterization of the Chronic Risk and Hazard of Hazardous Air Pollutants in the United States using Ambient Monitoring Data. Environ Health Perspect. 2009, 117 (5): 790-79. 10.1289/ehp.11861.

    Article  CAS  Google Scholar 

  50. Whitworth KW, Symanski E, Lai D, Coker AL: Kriged and modeled ambient air levels of benzene in an urban environment: an exposure assessment study. Environ Health. 2011, 10: 21-10.1186/1476-069X-10-21.

    Article  Google Scholar 

  51. Touma JS, Isakov V, Ching J, Seigneur C: Air quality modeling of hazardous pollutants: current status and future directions. J Air Waste Manage. 2006, 56: 547-558. 10.1080/10473289.2006.10464480.

    Article  CAS  Google Scholar 

  52. U.S. EPA: Integrated Risk Information System: Benzene. 2000d,,

    Google Scholar 

  53. U.S. EPA: Integrated Risk Information System: Formaldehyde. 1991,,

    Google Scholar 

  54. Brauer M: How much, how long, what, and where: Air pollution exposure assessment for epidemiologic studies of respiratory disease. Proc Am Thorac Soc. 2010, 7: 111-115. 10.1513/pats.200908-093RM.

    Article  Google Scholar 

Download references


We thank John Gorczynski, Alyssa Benson, Andres Camacho, Jordan Werbe-Fuentes, Rolando Munoz, Bolivar Camacho, and Manny Ortega of Queens College for their help in data collection. This work was supported by City of New York tax levy funds.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Iyad Kheirbek.

Additional information

Competing interests

The authors declare they have no competing financial interests.

Authors’ contributions

IK contributed to study design, data collection and analysis, and drafted and edited the manuscript. SJ and ZR conducted the statistical analysis and contributed to manuscript drafting and editing. SJ, ZR, and GP contributed to developing the GIS data layers. HE participated in developing sampling protocols, overseeing the field data collection, and provided comments on the manuscript. KI contributed to interpreting results and provided edits and comments to the manuscript. TM oversaw the method development, implementation, and data analysis and contributed to drafting and editing the manuscript. All authors participated in interpretation of the results and all authors read and approved the final manuscript.

Electronic supplementary material

Authors’ original submitted files for images

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Kheirbek, I., Johnson, S., Ross, Z. et al. Spatial variability in levels of benzene, formaldehyde, and total benzene, toluene, ethylbenzene and xylenes in New York City: a land-use regression study. Environ Health 11, 51 (2012).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: