Skip to main content

Inferential challenges when assessing racial/ethnic health disparities in environmental research


Numerous epidemiologic studies have documented environmental health disparities according to race/ethnicity (R/E) to inform targeted interventions aimed at reducing these disparities. Yet, the use of R/E under the potential outcomes framework implies numerous underlying assumptions for epidemiologic studies that are often not carefully considered in environmental health research. In this commentary, we describe the current state of thinking about the interpretation of R/E variables in etiologic studies. We then discuss how such variables are commonly used in environmental epidemiology. We observed three main uses for R/E: i) as a confounder, ii) as an effect measure modifier and iii) as the main exposure of interest either through descriptive analysis or under a causal framework. We identified some common methodological concerns in each case and provided some practical solutions. The use of R/E in observational studies requires particular cautions in terms of formal interpretation and this commentary aims at providing a practical resource for future studies assessing racial/ethnic health disparities in environmental research.

Peer Review reports


Over the last decade, there has been increasing interest in assessing the role of environmental determinants in health disparities [1,2,3]. Among the social factors that can influence the distribution of hazards or modify the impact of environmental factors on health, race/ethnicity (R/E) has been among the more commonly studied [4, 5]. Numerous epidemiologic studies have been conducted to document environmental health disparities according to R/E to inform targeted interventions toward reducing these disparities [6,7,8]. The classification of race/ethnic groups used in public health research captures long-established and systemic consequences of political, historical, and economic structures and social constructs [9,10,11]. Race/ethnicity in epidemiological studies may operate through various pathways such as differential treatment, social isolation and structural racism to generate observed disparities [12, 13].

Nevertheless, the use of R/E implies numerous underlying considerations for epidemiologic studies that are often not carefully considered. In parallel, various methodological challenges have recently been highlighted in the social epidemiology literature when using R/E as a variable in statistical models [14, 15] and such challenges have not been discussed explicitly in the context of environmental research.

In this commentary, we first briefly provide a historical overview of how R/E has been used in epidemiological research and discuss interpretation and inferential challenges. We then introduce the potential outcomes (counterfactual) framework and related assumptions for identification in causal inference. Finally, we assess how R/E is used in environmental studies, using air pollution as a case study, highlight some frequent methodological challenges that relate to the role of environmental determinants in health disparities and propose solutions for further studies that consider racial/ethnic disparities in health.

The use of R/E as a social construct in epidemiological research: a brief historical overview

Every human civilization has crafted categories of race, ethnicity, or caste as a means of creating a hierarchy among groups within the population on the basis of appearance, lineage or geographic origin. These categories are distinct from groupings based on religion or ideology in being fixed at birth, although this distinction can be blurred, as in the case of those who identify as Jewish. The definitions of and boundaries between groups are generally highly mobile over time and place, often as the result of political mobilization and popular agency [16, 17]. Because race and ethnicity have deep resonance within human societies for family formation (partnering, marriage and adoption) and access to resources such as employment, housing and services, they are also highly predictive of structural inequalities and the distribution of health and disease [18].

Racial and ethnic variables have therefore been thoroughly integrated into epidemiological and public health research all over the world, and especially so in countries that created stark racial hierarchies through race-based slavery (e.g. USA, Brazil), racial apartheid (e.g. South Africa), colonization (e.g. Rwanda) or the systematic disenfranchisement of a native population by European settlers (e.g. Canada, Australia, New Zealand). In all of these settings, R/E becomes a major axis of epidemiologic risk, along with sex/gender and social class/position [19, 20]. Policies differ by country, but in the US, the centrality of the R/E experience in patterning exposures, risks and access to services motivates the government to collect these variables on official censuses, health surveys and other administratively gathered databases [21]. These data are then used in surveillance, to detect disparities, and in documenting evidence of racial discrimination in access to resources and services [22].

Epidemiology and clinical medicine have exploited the availability of R/E information in datasets, especially in the US, to generate a vast literature focused on these categories. Even when R/E is not the focus of the research, it is rare to find a US biomedical article on human subjects that does not refer to R/E in the description of the study population or use this information as a control variable in analyses. This has led to numerous recommendations, critiques and lamentations about the ambiguity, confusion or misinterpretation that such ubiquitous and reflexive use has engendered [9].

Causal interpretation(s) of race/ethnicity in epidemiology

A first challenge when using racial or ethnic categories to define health disparities is related to the ambiguous interpretation or active misinterpretation of the” effects” of R/E. Race and ethnicity as epidemiologic variables have been notably theorized through the counterfactual (or potential outcomes) framework [23] which is widely used in contemporary epidemiological research as well as other fields.

Introducing the potential outcomes framework

The potential outcomes (PO), or counterfactual framework is widely considered as fundamental to understanding causal effects in epidemiology [24, 25]. For example, if one is interested in estimating the effect of a heat wave event on stroke rates, it requires contrasting the observed rate with an estimate of the unobserved stroke rates in the same world on the same day if the heat wave had not occurred. In this framework, causal effects are often defined as an average difference (or ratio) between two potential outcomes, one observed (factual) and the other unobserved (counterfactual), although in many settings, both halves of the contrast are counterfactual [26]. The fundamental problem of causal inference in this context is that, by definition, at least one half of the contrast is necessarily impossible to observe [27]. Therefore, identification strategies and related assumptions have been formulated when estimating causal effects [28].

Three main assumptions are usually formulated when aiming to identify causal effects under the potential outcomes framework: exchangeability, positivity and consistency. Exchangeability means that the counterfactual outcome and the actual treatment are independent. Some authors also refer to unconfoundedness of the assignment to exposure [24]. Practically, this refers to the absence of (measured or unmeasured) common causes of the exposure and outcome. Randomization is notably expected to achieve exchangeability but different identification strategies can be used for observational research including instrumental variables or difference-in-differences, for example, to achieve exchangeability [29, 30]. Exchangeability can also be addressed by achieving covariate balance between exposure groups for measured confounders; different analytic or design strategies can be used in this regard, such as standardization and matching. Although intuitively straightforward for most exposures, this appeal to randomization as a mechanism to achieve covariate balance is inconceivable for R/E variables, since R/E cannot by assigned in a trial, except in artificial circumstances [13]. Moreover, the common causes of R/E variables are generally diffuse historical and political processes that are beyond the scope of any dataset.

Positivity, the next assumption of the potential outcomes framework, requires that individuals in every stratum of the covariates have a non-zero probability of being in the exposed group and of being in the unexposed group. This assumption can be violated when studying R/E health disparities. For example, Messer et al. [31] showed that, in the case of the effects of SES and racial residential segregation on preterm birth, positivity violations may lead to meaningless conclusions because effect estimates from regression modeling were based on little or no actual data. This has been described as “structural confounding” in the sense that certain combinations of R/E and social status are rare in the population as a function of structured relations between social groups. Messer et al. noted that areas of sparse or missing data in their analyses were a structural reality about the systematic co-occurrence of racial segregation and poverty, a relationship that was mandated by law until the Civil Rights Act of 1965.

Finally, the consistency assumption requires that all exposed individuals receive the same version of treatment. This is also sometimes referred to as the stable unit treatment value assumption (SUTVA) that additionally requires no interference between treated and control groups [32]. This assumption implies that the exposure of interest must be defined with sufficient precision that any variation within the exposure specification would not result in a different outcome [33]. It has been shown that this assumption is commonly violated in social epidemiological research when using variables such as education or income [34], but also when using race/ethnicity as an exposure of interest. Indeed, it is possible to consider two distinct situations where the main exposure of interest is: i) R/E, ii) another variable on the pathway between R/E and the health outcome of interest (e.g. residential exposure to fine particulate matter, noise, greenspace) [9]. Considering R/E as the exposure of interest can be thus seen as ambiguous and typically violates the consistency assumption under the potential outcomes framework.

Manipulability of the exposure and different perspectives about the use of R/E in epidemiology

Another identified challenge when using R/E as the exposure of interest for inferential research (in contrast to descriptive and predictive research) is related to the potential manipulability of the exposure. Indeed, many authors have argued that R/E does not meet the criteria for a well-defined intervention, implying that the required causal assumptions discussed above may not hold [23, 35, 36]. Yet estimating the effect of R/E seems more conceivable in some contexts if R/E represents the experience of individual discrimination or structural inequality. For example, self-reported race for job applications [37] or randomizing fictitious subjects to be classified by race and observe differential diagnosis or treatment by a clinician [38] have been described as well-defined and potentially manipulable interventions. Even so, some authors [39] point out that some ambiguity remains regarding the meaning of discrimination.

Various authors [40, 41] have proposed that observational studies should specify an ideal RCTs as a target for the causal effect to be estimated, and implicitly suggest that any non-manipulable exposure is not eligible as an exposure of interest. In parallel, other authors propose a non-interventionist interpretation to such causal reasoning allowing for the consideration of R/E as an exposure of interest and reject the argument that the causal effects of race require any hypothetical manipulation whatsoever [42,43,44].

In this context, VanderWeele and Robinson [15] proposed two possible interpretations of the “effects” of race, as the main exposure of interest. First the authors proposed a so called “stronger” interpretation of race where “once the components of race are specified, the effect of race corresponds to the joint effects of these specific components for which interventions are at least somewhat more conceivable”. Secondly, they proposed a “weaker” interpretation where “R/E regression coefficients in a model with certain control variables are interpreted as estimating what would happen to an observed health inequality if certain socioeconomic status distributions were set to something other than what they in fact were”. They suggest estimating direct and mediated inequality measures by including in the same model for the health outcome a R/E variable contrast, such as Blacks versus Whites, a “mediating” variable of interest, such as residential exposure to fine particulate matter, and potentially any number of mediator-outcome confounders such as age. From this model, the coefficient for the R/E variable can be interpreted, for example, as the inequality in the health outcome remaining had the distribution of residential fine particular matter (PM2.5) for the Black population been set to the PM2.5 exposure of the White population with the same values of the confounders. They also proposed to estimate the magnitude of the “mediated inequality effect” through residential PM2.5 by comparing the inequality measure before/after controlling for PM2.5 and mediator-outcome confounders. However, traditional regression adjustments would lead to biased estimates in the presence of another mediator between R/E and Y that itself affects PM2.5 [45] and in that case alternative analytical strategies including inverse probability weighting (IPW), G-computation or stochastic mediation analysis would be required. Following VanderWeele and Robinson [15], Jackson and VanderWeele [46] proposed a comprehensive set of hypothetical interventions under the potential outcome framework to quantify impacts on social disparities of interest. The authors stressed how important it is to specify the hypothetical intervention scenario and related implications for health disparities and how each scenario corresponds to distinct analytical strategies.

Besides the potential outcome framework, it is worth mentioning other frameworks to study R/E disparities in health, such as the theory of fundamental causes [47]. An interesting facet of this framework is the distinction between replaceable mechanisms, which are framed as not directly actionable, and flexible resources that need to be targetted to ultimately reduce R/E inequalities in health. Naimi [48] proposed to integrate the PO framework and fundamental cause theory to generate evidence on the potential interventions to mitigate racial/ethnic disparities in health. He noted that “[…] contrasts of the risk of a health outcome between racial/ethnic groups can validly be interpreted as quantitative expressions of long-standing race relations and are thus not without meaning.” He also argued that racial/ethnic disparity questions “are counterfactual in that they relate to the risk of a health outcome among different racial or social groups that would be observed if a third variable was modified.” Similar interpretations have been suggested by Krieger [49].

This ongoing vigorous discussion shows that R/E variables can be approached in different ways when assessing health disparities. One must therefore pay particular attention to the interpretation of this variable, be transparent about the intended interpretation and check that assumptions required for causal inference are sufficiently satisfied and appropriate identification strategies are mobilized. Whatever causal framework is used to highlight potential social disparities in health, the use of race/ethnic variables requires particular caution in terms of its formal interpretation when included in statistical models. This contrasts with the more cavalier practice often encountered in the epidemiologic and biomedical literature.

How race/ethnicity variables are used in environmental epidemiology

Although discussions about the interpretation of R/E may be relatively common in the social epidemiology literature, they are largely absent in environmental epidemiology. Here, we provide examples of environmental health studies that included R/E indicators in order to describe the different ways it has been used in these studies. Using Google Scholar, we selected a convenience sample of 15 illustrative studies related to air pollution and health in North America from 2014 to 2019, as this is an active area of research where health disparities related to air pollution are a great concern. Details about keywords are provided in Table 1. We focus on how R/E variables are considered in the US context but it is worth mentioning that potential mechanisms linking R/E and outcomes vary among countries. Furthermore, some of the issues we highlight in this paper can be applicable to other SES variables including education and income. It is important to acknowledge that, as R/E is largely fixed at birth, all hypothetical interventions are on intermediates, and it is possible to improve education or income with policy interventions [65].

Table 1 Illustrative papers that used race/ethnicity in relation to air pollution exposure or effects on health

For each type of application, we aimed to identify some persistent concerns and propose solutions for future studies. This overview does not constitute an assessment of the overall scientific value of the cited papers; it is only a review of the methodology employed for the different uses of R/E variables. Table 1 presents an overview of illustrative papers that used R/E in relation to air pollution exposure or effects on health. It is important to note that a formal interpretation of the “effects” of race was rarely provided in the selected papers. Thus, we recorded the ways that R/E variables were used based on our best interpretations of the text provided in each article.

Race/ethnicity as a confounder

Race/Ethnicity is often introduced in regression models as a covariate representing a potential confounder in the relationship between air pollution and studied health outcomes [50,51,52,53,54,55]. This is by far the most common way that R/E has been incorporated in air pollution health effects studies.

In the US context, R/E is a powerful predictor of SES, as it shapes one’s access to education, occupational opportunity, ability to accumulate wealth and where someone lives. Yet, when air pollution is the exposure, using individual R/E can be seen as a proxy for racial residential segregation and not a direct cause of individuals’ air pollution exposure. Such approximation should be clearly motivated and stated when using individual R/E as a confounder when air pollution is the exposure of interest. Furthermore, neighborhood-level indicators of racial segregation [66] seem to be more strongly associated with air pollution then individual level traits using socio-economic indicators for example [67].

Another situation sometimes encountered is the reporting, and more problematically, the interpretation of regression coefficients for R/E from a model where the exposure of interest is air pollution, as exemplified in a paper that assessed the impacts of race, social factors and air pollution on birth outcomes [54]. In such settings, air pollution is considered as a mediator on the pathway between R/E and birth outcomes and the interpretation of the R/E coefficient should be viewed with caution (as discussed above). Such practices [68] have been critiqued as a source of potential confusion in many areas of epidemiologic research. This problem can be easily prevented by only presenting and interpreting coefficients related to the main exposure of interest (e.g. PM2.5). For example, we can consider individual-level exposure to PM2.5 as our main exposure of interest, serum C-reactive protein (CRP) levels as the outcome of interest and neighborhood R/E composition as the only confounder (for simplicity). In this setting, we can simply use a multivariable linear regression model where CRP is the dependent variable and PM2.5 and neighborhood R/E are included as independent variables and get a coefficient (i.e. slope) for each of these two independent variables. The coefficient for PM2.5 will approximate our causal quantity of interest, namely the change in CRP due to PM2.5 conditioned on the distribution of R/E. But the interpretation of the coefficients for R/E is quite different. Indeed, in this context PM2.5 is considered as an intermediate between R/E and CRP and this R/E coefficient represents the association between R/E and CRP removing the pathway through PM2.5. This is the reason for which we recommend to only present and interpret coefficients related to the main exposure of interest for such inferential research questions. Yet, it is possible to consider an environmental exposure (e.g. PM2.5) as a mediator between R/E and a given outcome and we discuss such setting in a subsequent section below.

In parallel, as discussed above, the positivity assumption can be particularly relevant in environmental epidemiology studies and requires careful attention. For example, considering exposure to specific hazardous industrial emissions such as polycyclic aromatic hydrocarbons (PAHs), exposed individuals may be a very distinct population from all potential unexposed individuals or it is possible that exposure to such industrial emissions and poverty systematically co-occur. In this context, it is notably important to ensure that there is covariate balance between exposed and unexposed individuals so that unexposed individuals are comparable in regards to measured confounders including R/E. Propensity score methods [69], for example, can assist in checking covariate balance and selecting the appropriate control group and thereby improve inference of the effects of air pollution on a given health outcome.

Race/ethnicity as an effect measure modifier

It is first important to clarify how the concepts of effect measure modification (EMM) and interaction differ under the PO framework. Considering an exposure E, an outcome Y and a third variable Z, the concept of EMM (where the effect of E on Y is modified by Z) refers to the causal effect of E on Y only and not to the causal effect of Z. This means that only E is considered to be a variable on which we hypothetically intervene and for which exchangeability, positivity and consistency apply. In parallel, the concept of interaction refers to a joint causal effect of two distinct treatments E and Z. Identifying interaction thus requires exchangeability, positivity, and consistency for both E and Z [70].

That being said and following the non-manipulability feature of R/E variables described above, we can assume that R/E variables may be included as potential effect measure modifiers in order to represent some differential vulnerability or susceptibility by R/E status of the air pollution impacts on health [57,58,59]. To do this, studies typically included R/E in their regression models as product terms with exposure to air pollutants. EMM can occur on both absolute and multiplicative scales. In the papers we reviewed, it is notable that these product terms were most often included in multiplicative models (e.g. logistic, Poisson) and by default represent deviations from multiplicative joint effects and not deviations from additivity. Many papers discuss why the absolute scale is a more appropriate scale for inferring public health policy implications [71]. For authors interested in quantifying EMM on the additive scale from multiplicative models, some tools have been proposed to facilitate this calculation. The Relative Excess Risk due to Interaction (RERI) or Interaction Contrast Ratio (ICR) [72] as well as the synergy index (SI) or ratio of joint exposures (RJE) have all been proposed to measure additive interaction based on relative measures such as risk ratios or odds ratios [73]. Knol and VanderWeele [73] recommend to assess and report effect modification on both absolute and relative scales.

Other studies conducted stratified analyses to assess potential vulnerability to air pollution health effects [56, 58]. While not directly observed in the selected papers, an error that is nevertheless common in observational studies is that a “significant” exposure effect in one stratum is considered distinct from a “non-significant” exposure effect in another stratum [74]. As Gelman & Stern describe this error, the “difference between significant and not significant is not itself statistically significant”. Therefore, if statistical testing is used to show some evidence of an association between air pollution and a given health outcome among one R/E group, but this relation is not observed to be “significant” in another R/E group, it does not necessarily imply that the effect of air pollution is heterogeneous across the two R/E groups. Some simple solutions exist to deal with these issues. For example, if analyses are conducted across different R/E strata, conducting heterogeneity tests such as the Cochran Q test can provide statistical confirmation of EMM [75]. These heterogeneity tests can be used on both additive and multiplicative scales, although may be underpowered [76]. It is also possible to directly test the ratio or the difference between stratified effect measures [77]. In environmental epidemiology, some examples of using a ratio of risk ratios have been recently published [78, 79]. Finally, if a time series analysis is used, which is common in studies investigating acute health impacts of air pollution, it is also possible to directly address the association between air pollution and the intra-population disparities using daily differences between 2 or more groups [80]. Note that the more general issues related to null hypothesis significance testing are not discussed here, but we include in supplemental material a list of papers that discuss such issues and provide solutions.

Race/ethnicity as the main exposure of interest

Finally, some studies have considered R/E as the main exposure of interest and air pollution as the outcome of interest either through descriptive analysis or under a causal framework. Studies that adjust on other variables thought to be confounders, such as age and sex, are implicitly operating under a causal framework, and therefore invoking the challenges discussed in previous sections. This is because confounding is itself a causal construct. In the papers reviewed, two specifications can be observed in this regard.

In the first specification, no health outcome is included; instead understanding inequities in air pollution exposures is the objective as with many environmental justice studies [60,61,62,63,64]. The inclusion of SES variables which may be on the causal pathway between R/E and air pollution may result in some attenuation of the association of interest. It should be noted, however, that the underlying causal mechanism that creates the association between R/E and locally undesirable land uses (LULU) in environmental justice studies has been widely debated and two non-exclusive mechanisms have been proposed. Some evidence indicates that minority communities are targeted for the placement of polluting facilities (disproportionate siting: disadvantaged R/E precedes LULU) [81,82,83]; while other research suggests that demographic changes occur after a hazardous facility has been placed in an area (post-siting demographic change: LULU precedes disadvantaged R/E) [84, 85].

In the second, far less common specification, a health outcome is included and the question of interest is to formally decompose the total effect between R/E and a given health outcome into an indirect effect through exposure to air pollution and a direct effect representing the effect of R/E through other pathways [64]. Treating air pollution as a mediator is potentially supported by the environmental justice theory of disproportionate siting. In this approach, a variety of different analytic approaches can be applied.

The use of causal mediation analyses has expanded in social epidemiology [86] over the last decade, and more recently in environmental epidemiology studies examining a variety of environmental exposures [87,88,89,90]. Besides the identification challenges related to the non-manipulability and consistency violation of R/E variables that are described above, some additional assumptions are required to estimate causal effects when conducting mediation analyses [91], including the absence of unmeasured confounders of the exposure-outcome, mediator-outcome and exposure-mediator associations (only required when estimating natural effects), as well as the absence of mediator-outcome confounders which are also affected by the exposure. As previously emphasized, if the decomposition of R/E disparities according to one or more environmental pathways is of interest, we strongly recommend to clearly specify the hypothetical health disparities scenario that is targeted and adopt an appropriate estimation strategy.

In parallel, econometric methods such as the Oaxaca-Blinder decomposition [92,93,94], as well as decomposition of inequality metrics, such as the concentration index [93, 95, 96], enable the simultaneous estimation of the contribution of multiple environmental exposures, to R/E disparities in a given health outcome under policy-relevant counterfactual intervention scenarios [97]. Jackson and VanderWeele [46] show that under some circumstances, Oaxaca-Blinder decomposition and mediation analysis coincide. They also explain that when there is time-dependent confounding, the Oaxaca–Blinder technique would lead to selection bias and should not be used. In this context, mediation analysis can be preferable with appropriate methods to deal with time-varying confounders, such as marginal structural models and G-computation. Furthermore, in a recent paper, the same authors emphasize the importance of clarifying the interpretation of variables representing a social construct when applying decomposition techniques in order to provide interpretable and actionable estimates for addressing intersectional health disparities [98].


In this commentary, we described the current state of thinking about interpreting R/E variables in epidemiologic studies and provided examples of how R/E is typically used in environmental epidemiology. Although there are ongoing debates about how best to use and interpret R/E in observational studies more broadly, at the very least it is clear that authors should state unambiguously how they are conceptualizing R/E in a given study, provide thoughtful interpretation of R/E variables and evaluate causal inference assumptions as they relate to R/E. We identified three ways that R/E variables are used, highlighted some frequent methodological concerns and proposed solutions for further studies when assessing racial/ethnic health disparities in environmental research.

Availability of data and materials

Not Applicable


  1. 1.

    Brulle RJ, Pellow DN. Environmental justice: human health and environmental inequalities. Annu Rev Public Health. 2006;27:103–24.

    Article  Google Scholar 

  2. 2.

    Cushing L, Faust J, August LM, Cendak R, Wieland W, Alexeeff G. Racial/Ethnic Disparities in Cumulative Environmental Health Impacts in California: Evidence From a Statewide Environmental Justice Screening Tool (CalEnviroScreen 1.1). Am J Public Health. 2015;105(11):2341–8.

    Article  Google Scholar 

  3. 3.

    Saxton DI, Brown P, Seguinot-Medina S, Eckstein L, Carpenter DO, Miller P, Waghiyi V. Environmental health and justice and the right to research: institutional review board denials of community-based chemical biomonitoring of breast milk. Environ Health. 2015;14(1):90.

    Article  CAS  Google Scholar 

  4. 4.

    Hicken MT, Gee GC, Morenoff J, Connell CM, Snow RC, Hu H. A novel look at racial health disparities: the interaction between social disadvantage and environmental health. Am J Public Health. 2012;102(12):2344–51.

    Article  Google Scholar 

  5. 5.

    Wakefield SE, Baxter J. Linking health inequality and environmental justice: articulating a precautionary framework for research and action. Environ Justice. 2010;3(3):95–102.

    Article  Google Scholar 

  6. 6.

    Johnson R, Ramsey-White K, Fuller CH. Socio-demographic differences in toxic release inventory siting and emissions in metro Atlanta. Int J Environ Res Public Health. 2016;13(8):747.

    Article  CAS  Google Scholar 

  7. 7.

    Pope R, Wu J, Boone C. Spatial patterns of air pollutants and social groups: a distributive environmental justice study in the phoenix metropolitan region of USA. Environ Manag. 2016;58(5):753–66.

    Article  Google Scholar 

  8. 8.

    Teixeira S, Zuberi A. Mapping the racial inequality in place: using youth perceptions to identify unequal exposure to neighborhood environmental hazards. Int J Environ Res Public Health. 2016;13(9):844.

    Article  Google Scholar 

  9. 9.

    Kaufman JS, Cooper RS. Commentary: considerations for use of racial/ethnic classification in etiologic research. Am J Epidemiol. 2001;154(4):291–8.

    CAS  Article  Google Scholar 

  10. 10.

    Krieger N. Refiguring “race”: epidemiology, racialized biology, and biological expressions of race relations. Int J Health Serv. 2000;30(1):211–6.

    CAS  Article  Google Scholar 

  11. 11.

    Krieger N, Chen JT, Waterman PD, Rehkopf DH, Subramanian S. Race/ethnicity, gender, and monitoring socioeconomic gradients in health: a comparison of area-based socioeconomic measures—the public health disparities geocoding project. Am J Public Health. 2003;93(10):1655–71.

    Article  Google Scholar 

  12. 12.

    Bravo MA, Anthopolos R, Bell ML, Miranda ML. Racial isolation and exposure to airborne particulate matter and ozone in understudied US populations: Environmental justice applications of downscaled numerical model output. Environ Int. 2016;92:247–55.

    Article  CAS  Google Scholar 

  13. 13.

    Kaufman JS. Dissecting disparities. Los Angeles: Sage Publications Sage CA; 2008.

    Google Scholar 

  14. 14.

    Naimi AI, Kaufman JS, MacLehose RF. Mediation misgivings: ambiguous clinical and public health interpretations of natural direct and indirect effects. Int J Epidemiol. 2014;43(5):1656–61.

    Article  Google Scholar 

  15. 15.

    VanderWeele TJ, Robinson WR. On causal interpretation of race in regressions adjusting for confounding and mediating variables. Epidemiology. 2014;25(4):473.

    Article  Google Scholar 

  16. 16.

    Association AS. The importance of collecting data and doing social scientific research on race. Washington, DC: American Sociological Association; 2003. Retrieved October 20, 2005. In.

    Google Scholar 

  17. 17.

    Sen M, Wasow O. Race as a Bundle of Sticks: Designs that Estimate Effects of Seemingly Immutable Characteristics. Annu Rev Polit Sci. 2016;19:499–522.

    Article  Google Scholar 

  18. 18.

    Hahn RA, Truman BI, Williams DR. Civil rights as determinants of public health and racial and ethnic health equity: health care, education, employment, and housing in the United States. SSM-Popul Health. 2018;4:17–24.

    CAS  Article  Google Scholar 

  19. 19.

    Borrell C, Palència L, Muntaner C, Urquía M, Malmusi D, O'Campo P. Influence of macrosocial policies on women's health and gender inequalities in health. Epidemiol Rev. 2013;36(1):31–48.

    Article  Google Scholar 

  20. 20.

    Nuru-Jeter AM, Michaels EK, Thomas MD, Reeves AN, Thorpe RJ Jr, LaVeist TA. Relative roles of race versus socioeconomic position in studies of health inequalities: a matter of interpretation. Annu Rev Public Health. 2018;39:169–88.

    Article  Google Scholar 

  21. 21.

    Management Oo, Budget. Revisions to the standards for the classification of federal data on race and ethnicity. Fed Regist. 1997;62(210):58782–90.

    Google Scholar 

  22. 22.

    Smedley B, Stith A, Nelson A. Institute of Medicine. Unequal treatment: Confronting racial and ethnic disparities in health care. Washington, DC: National Academies Press; 2003.

    Google Scholar 

  23. 23.

    Kaufman JS, Cooper RS. Seeking causal explanations in social epidemiology. Am J Epidemiol. 1999;150(2):113–20.

    CAS  Article  Google Scholar 

  24. 24.

    Imbens GW, Rubin DB. Causal inference in statistics, social, and biomedical sciences: Cambridge University Press; 2015.

  25. 25.

    Rubin DB. Estimating causal effects of treatments in randomized and nonrandomized studies. J Educ Psychol. 1974;66(5):688.

    Article  Google Scholar 

  26. 26.

    Maldonado G, Greenland S. Estimating causal effects. Int J Epidemiol. 2002;31(2):422–9.

    Article  Google Scholar 

  27. 27.

    Holland PW. Statistics and causal inference. J Am Stat Assoc. 1986;81(396):945–60.

    Article  Google Scholar 

  28. 28.

    Glass TA, Goodman SN, Hernán MA, Samet JM. Causal inference in public health. Annu Rev Public Health. 2013;34:61–75.

    Article  Google Scholar 

  29. 29.

    Angrist JD, Pischke J-S. Mastering'metrics: The Path from Cause to Effect: Princeton University Press; 2014.

  30. 30.

    Bind M-A. Causal modeling in environmental health. Annu Rev Public Health. 2019;40:23–43.

    Article  Google Scholar 

  31. 31.

    Messer LC, Oakes JM, Mason S. Effects of socioeconomic and racial residential segregation on preterm birth: a cautionary tale of structural confounding. Am J Epidemiol. 2010;171(6):664–73.

    Article  Google Scholar 

  32. 32.

    Rubin DB. Comment: Which ifs have causal answers. J Am Stat Assoc. 1986;81(396):961–2.

    Google Scholar 

  33. 33.

    VanderWeele TJ. Concerning the consistency assumption in causal inference. Epidemiology. 2009;20(6):880–3.

    Article  Google Scholar 

  34. 34.

    Rehkopf DH, Glymour MM, Osypuk TL. The consistency assumption for causal inference in social epidemiology: when a rose is not a rose. Curr Epidemiol Rep. 2016;3(1):63–71.

    Article  Google Scholar 

  35. 35.

    Holland PW. The false linking of race and causality: lessons from standardized testing. Race Soc. 2001;4(2):219–33.

    Article  Google Scholar 

  36. 36.

    VanderWeele TJ, Hernan MA. Causal inference under multiple versions of treatment. J Causal Inference. 2013;1(1):1–20.

    Article  Google Scholar 

  37. 37.

    Bertrand M, Mullainathan S. Are Emily and Greg more employable than Lakisha and Jamal? A field experiment on labor market discrimination. Am Econ Rev. 2004;94(4):991–1013.

    Article  Google Scholar 

  38. 38.

    Loring M, Powell B. Gender, race, and DSM-III: A study of the objectivity of psychiatric diagnostic behavior. J Health Soc Behav. 1988:1–22.

  39. 39.

    Kohler-Hausmann I. Eddie Murphy and the dangers of counterfactual causal thinking about detecting racial discrimination. Nw UL Rev. 2018;113:1163.

    Google Scholar 

  40. 40.

    Hernán MA, Robins JM. Using big data to emulate a target trial when a randomized trial is not available. Am J Epidemiol. 2016;183(8):758–64.

    Article  Google Scholar 

  41. 41.

    Bind M-AC, Rubin DB. Bridging observational studies and randomized experiments by embedding the former in the latter. Stat Methods Med Res. 2019;28(7):1958–78.

    Article  Google Scholar 

  42. 42.

    Glymour C, Glymour MR. Commentary: race and sex are causes. Epidemiology. 2014;25(4):488–90.

    Article  Google Scholar 

  43. 43.

    Glymour MM, Spiegelman D. Evaluating Public Health Interventions: 5. Causal Inference in Public Health Research—Do Sex, Race, and Biological Factors Cause Health Outcomes? Am J Public Health. 2017;107(1):81–5.

    Article  Google Scholar 

  44. 44.

    Schwartz S, Gatto NM, Campbell UB. Causal identification: a charge of epidemiology in danger of marginalization. Ann Epidemiol. 2016;26(10):669–73.

    Article  Google Scholar 

  45. 45.

    VanderWeele TJ, Vansteelandt S, Robins JM. Effect decomposition in the presence of an exposure-induced mediator-outcome confounder. Epidemiology (Cambridge, Mass). 2014;25(2):300.

    Article  Google Scholar 

  46. 46.

    Jackson JW, VanderWeele TJ. Decomposition analysis to identify intervention targets for reducing disparities. Epidemiology. 2018;29(6):825–35.

    Article  Google Scholar 

  47. 47.

    Phelan JC, Link BG. Is racism a fundamental cause of inequalities in health? Annu Rev Sociol. 2015;41:311–30.

    Article  Google Scholar 

  48. 48.

    Naimi AI. The counterfactual implications of fundamental cause theory. Curr Epidemiol Rep. 2016;3(1):92–7.

    Article  Google Scholar 

  49. 49.

    Krieger N. On the causal interpretation of race. Epidemiology. 2014;25(6):937.

    Article  Google Scholar 

  50. 50.

    Nobles CJ, Grantz KL, Liu D, Williams A, Ouidir M, Seeni I, Sherman S, Mendola P. Ambient air pollution and fetal growth restriction: Physician diagnosis of fetal growth restriction versus population-based small-for-gestational age. Sci Total Environ. 2019;650:2641–7.

    CAS  Article  Google Scholar 

  51. 51.

    McGuinn LA, Schneider A, McGarrah RW, Ward-Caviness C, Neas LM, Di Q, Schwartz J, Hauser ER, Kraus WE, Cascio WE. Association of long-term PM2. 5 exposure with traditional and novel lipid measures related to cardiovascular disease risk. Environ Int. 2019;122:193–200.

    CAS  Article  Google Scholar 

  52. 52.

    Bragg-Gresham J, Morgenstern H, McClellan W, Saydah S, Pavkov M, Williams D, Powe N, Tuot D, Hsu R, Saran R. County-level air quality and the prevalence of diagnosed chronic kidney disease in the US Medicare population. PLoS One. 2018;13(7):e0200612.

    Article  CAS  Google Scholar 

  53. 53.

    Ng C, Malig B, Hasheminassab S, Sioutas C, Basu R, Ebisu K. Source apportionment of fine particulate matter and risk of term low birth weight in California: exploring modification by region and maternal characteristics. Sci Total Environ. 2017;605:647–54.

    Article  CAS  Google Scholar 

  54. 54.

    Gray SC, Edwards SE, Schultz BD, Miranda ML. Assessing the impact of race, social factors and air pollution on birth outcomes: a population-based study. Environ Health. 2014;13(1):4.

    Article  CAS  Google Scholar 

  55. 55.

    Chen H, Li Q, Wang J, Lavigne E, Burnett R, Goldberg M, Villeneuve P, Cakmak S, Copes R. Increased Ischemic Heart Disease And Stroke-related Hospitalizations From Cold Temperature in Ontario, Canada: Population-based Study. Am Heart Assoc. 2015;132(suppl_3):A16987.

    Google Scholar 

  56. 56.

    Leiser CL, Hanson HA, Sawyer K, Steenblik J, Al-Dulaimi R, Madsen T, Gibbins K, Hotaling JM, Ibrahim YO, VanDerslice JA. Acute effects of air pollutants on spontaneous pregnancy loss: a case-crossover study. Ferti Steril. 2019;111(2):341–7.

    CAS  Article  Google Scholar 

  57. 57.

    Laurent O, Hu J, Li L, Kleeman MJ, Bartell SM, Cockburn M, Escobedo L, Wu J. Low birth weight and air pollution in California: Which sources and components drive the risk? Environ Int. 2016;92:471–7.

    Article  CAS  Google Scholar 

  58. 58.

    Delfino RJ, Wu J, Tjoa T, Gullesserian SK, Nickerson B, Gillen DL. Asthma morbidity and ambient air pollution: effect modification by residential traffic-related air pollution. Epidemiology. 2014;25(1):48–57.

    Article  Google Scholar 

  59. 59.

    Strickland MJ, Klein M, Flanders WD, Chang HH, Mulholland JA, Tolbert PE, Darrow LA. Modification of the effect of ambient air pollution on pediatric asthma emergency visits: susceptible subpopulations. Epidemiology. 2014;25(6):843.

    Article  Google Scholar 

  60. 60.

    Grineski SE, Collins TW. Geographic and social disparities in exposure to air neurotoxicants at US public schools. Environ Res. 2018;161:580–7.

    CAS  Article  Google Scholar 

  61. 61.

    Tonne C, Milà C, Fecht D, Alvarez M, Gulliver J, Smith J, Beevers S, Anderson HR, Kelly F. Socioeconomic and ethnic inequalities in exposure to air and noise pollution in London. Environ Int. 2018;115:170–9.

    CAS  Article  Google Scholar 

  62. 62.

    Kravitz-Wirtz N, Crowder K, Hajat A, Sass V. The Long-term Dynamics of Racial /ethnic inequality un neighborhood air pollution exposure, 1990–2009. Du Bois Rev Soc Sci Res Race. 2016;13(2):237–59.

    Article  Google Scholar 

  63. 63.

    Jones MR, Diez-Roux AV, Hajat A, Kershaw KN, O’Neill MS, Guallar E, Post WS, Kaufman JD, Navas-Acien A. Race/ethnicity, residential segregation, and exposure to ambient air pollution: the Multi-Ethnic Study of Atherosclerosis (MESA). Am J Public Health. 2014;104(11):2130–7.

    Article  Google Scholar 

  64. 64.

    Jones MR, Diez-Roux AV, O'neill MS, Guallar E, Sharrett AR, Post W, Kaufman JD, Navas-Acien A. Ambient air pollution and racial/ethnic differences in carotid intima-media thickness in the Multi-Ethnic Study of Atherosclerosis (MESA). Idemiol Community Health. 2015:jech-2015-205588.

  65. 65.

    Harper S, Strumpf EC. Commentary: Social EpidemiologyQuestionable Answers and Answerable Questions. Epidemiology. 2012;23(6):795–8.

    Article  Google Scholar 

  66. 66.

    Karlsen S, Nazroo JY. Measuring and analyzing “race,” racism, and racial discrimination. Methods Soc Epidemiol. 2006;1:86.

    Google Scholar 

  67. 67.

    Hajat A, Diez-Roux AV, Adar SD, Auchincloss AH, Lovasi GS, O’Neill MS, Sheppard L, Kaufman JD. Air pollution and individual and neighborhood socioeconomic status: evidence from the Multi-Ethnic Study of Atherosclerosis (MESA). Environ Health Perspect. 2013;121(11–12):1325–33.

    Article  Google Scholar 

  68. 68.

    Westreich D, Greenland S. The table 2 fallacy: presenting and interpreting confounder and modifier coefficients. Am J Epidemiol. 2013;177(4):292–8.

    Article  Google Scholar 

  69. 69.

    Resa M, Zubizarreta JR. Evaluation of subset matching methods and forms of covariate balance. Stat Med. 2016;35(27):4961–79.

    Article  Google Scholar 

  70. 70.

    VanderWeele TJ. On the distinction between interaction and effect modification. Epidemiology. 2009;20(6):863–71.

    Article  Google Scholar 

  71. 71.

    Panagiotou OA, Wacholder S. Invited commentary: how big is that interaction (in my community)—and in which direction? Am J Epidemiol. 2014;180(12):1150–8.

    Article  Google Scholar 

  72. 72.

    Richardson DB, Kaufman JS. Estimation of the relative excess risk due to interaction and associated confidence bounds. Am J Epidemiol. 2009:kwn411.

  73. 73.

    Knol MJ, VanderWeele TJ. Recommendations for presenting analyses of effect modification and interaction. Int J Epidemiol. 2012;41(2):514–20.

    Article  Google Scholar 

  74. 74.

    Gelman A, Stern H. The difference between “significant” and “not significant” is not itself statistically significant. Am Stat. 2006;60(4):328–31.

    Article  Google Scholar 

  75. 75.

    Kaufman JS, MacLehose RF. Which of these things is not like the others? Cancer. 2013;119(24):4216–22.

    Article  Google Scholar 

  76. 76.

    Greenland S. Tests for interaction in epidemiologic studies: a review and a study of power. Stat Med. 1983;2(2):243–51.

    CAS  Article  Google Scholar 

  77. 77.

    Altman DG, Bland JM. Interaction revisited: the difference between two estimates. Bmj. 2003;326(7382):219.

    Article  Google Scholar 

  78. 78.

    Bell ML, Son J-Y, Peng RD, Wang Y, Dominicic F. Ambient PM2.5 and Risk of Hospital Admissions Do Risks Differ for Men and Women? Epidemiology. 2015;20:00.

    CAS  Google Scholar 

  79. 79.

    Benmarhnia T, Deguen S, Kaufman JS, Smargiassi A. Vulnerability to heat-related mortality. Epidemiology. 2015;26(6):781–93.

    Article  Google Scholar 

  80. 80.

    Benmarhnia T, Grenier P, Brand A, Fournier M, Deguen S, Smargiassi A. Quantifying vulnerability to extreme heat in time series analyses: a novel approach applied to neighborhood social disparities under climate change. Int J Environ Res Public Health. 2015;12(9):11869–79.

    Article  Google Scholar 

  81. 81.

    Laurian L, Funderburg R. Environmental justice in France? A spatio-temporal analysis of incinerator location. J Environ Plan Manag. 2014;57(3):424–46.

    Article  Google Scholar 

  82. 82.

    Norton JM, Wing S, Lipscomb HJ, Kaufman JS, Marshall SW, Cravey AJ. Race, wealth, and solid waste facilities in North Carolina. Environ Health Perspect. 2007:1344–50.

  83. 83.

    Schwarz L, Benmarhnia T, Laurian L. Social Inequalities Related to Hazardous Incinerator Emissions: An Additional Level of Environmental Injustice. Environ Justice. 2015;8(6):213–9.

    Article  Google Scholar 

  84. 84.

    Mohai P, Pellow D, Roberts JT. Environmental justice. Annu Rev Environ Resour. 2009;34:405–30.

    Article  Google Scholar 

  85. 85.

    Mohai P, Saha R. Which came first, people or pollution? A review of theory and evidence from longitudinal environmental justice studies. Environ Res L. 2015;10(12):125011.

    Article  Google Scholar 

  86. 86.

    Nandi A, VanderWeele TJ. Mediation Analysis in Social Epidemiology. Methods Soc Epidemiol. 2017;Chapter 17.

  87. 87.

    Clemente DB, Casas M, Vilahur N, Begiristain H, Bustamante M, Carsin A-E, Fernández MF, Fierens F, Gyselaers W, Iñiguez C. Prenatal ambient air pollution, placental mitochondrial DNA content, and birth weight in the INMA (Spain) and ENVIRONAGE (Belgium) birth cohorts. Environ Health Perspect. 2016;124(5):659.

    CAS  Article  Google Scholar 

  88. 88.

    Ferguson KK, Chen Y-H, VanderWeele TJ, McElrath TF, Meeker JD, Mukherjee B. Mediation of the relationship between maternal phthalate exposure and preterm birth by oxidative stress with repeated measurements across pregnancy. Environ Health Perspect. 2017;125(3):488.

    CAS  Article  Google Scholar 

  89. 89.

    Laurent O, Benmarhnia T, Milesi C, Hu J, Kleeman MJ, Cockburn M, Wu J. Relationships between greenness and low birth weight: Investigating the interaction and mediation effects of air pollution. Environ Res. 2019;175:124–32.

    CAS  Article  Google Scholar 

  90. 90.

    Peng C, Bind M-AC, Colicino E, Kloog I, Byun H-M, Cantone L, Trevisi L, Zhong J, Brennan K, Dereix AE. Particulate air pollution and fasting blood glucose in nondiabetic individuals: associations and epigenetic mediation in the Normative Aging Study, 2000–2011. Environ Health Perspect. 2016;124(11):1715.

    Article  Google Scholar 

  91. 91.

    VanderWeele TJ. Mediation analysis: a practitioner's guide. Annu Rev Public Health. 2016;37:17–32.

    Article  Google Scholar 

  92. 92.

    Fairlie RW. An extension of the Blinder-Oaxaca decomposition technique to logit and probit models. Yale Univ Econ Growth Center Discuss Pap. 2006;(873).

  93. 93.

    O'Donnell OA, Wagstaff A. Analyzing health equity using household survey data: a guide to techniques and their implementation: World Bank Publications; 2008.

  94. 94.

    Oaxaca R. Male-female wage differentials in urban labor markets. Int Econ Rev. 1973:693–709.

  95. 95.

    Bohra T, Benmarhnia T, McKinnon B, Kaufman JS. Decomposing Educational Inequalities in Child Mortality: A Temporal Trend Analysis of Access to Water and Sanitation in Peru. Am J Trop Med Hyg. 2017;96(1):57–64.

    Article  Google Scholar 

  96. 96.

    Su JG, Jerrett M, Morello-Frosch R, Jesdale BM, Kyle AD. Inequalities in cumulative environmental burdens among three urbanized counties in California. Environ Int. 2012;40:79–87.

    CAS  Article  Google Scholar 

  97. 97.

    Benmarhnia T, Huang J, Basu R, Wu J, Bruckner TA. Decomposition Analysis of Black–White Disparities in Birth Outcomes: The Relative Contribution of Air Pollution and Social Factors in California. Environ Health Perspect. 2017;107003:1.

    Google Scholar 

  98. 98.

    Jackson JW, VanderWeele TJ. Intersectional decomposition analysis with differential exposure, effects, and construct. Soc Sci Med. 2017;226:254–9.

    Article  Google Scholar 

Download references


None declared


None declared

Author information




TB, AH and JSK were involved in manuscript conceptualization and contributed to manuscript writing, review and editing. All authors have read and agreed to the submitted version of the manuscript.

Corresponding author

Correspondence to Tarik Benmarhnia.

Ethics declarations

Ethics approval and consent to participate

Not Applicable

Consent for publication

Not Applicable

Competing interests

None declared

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Benmarhnia, T., Hajat, A. & Kaufman, J.S. Inferential challenges when assessing racial/ethnic health disparities in environmental research. Environ Health 20, 7 (2021).

Download citation


  • Air pollution and health
  • Race/ethnicity
  • Causal inference
  • Social epidemiology