Statistical strategies for constructing health risk models with multiple pollutants and their interactions: possible choices and comparisons

Sun, Zhichao; Tao, Yebin; Li, Shi; Ferguson, Kelly K; Meeker, John D; Park, Sung Kyun; Batterman, Stuart A; Mukherjee, Bhramar

doi:10.1186/1476-069X-12-85

Environmental Health

Table 2 Simulation results under Scenario 2: single step versus two-step strategy

From: Statistical strategies for constructing health risk models with multiple pollutants and their interactions: possible choices and comparisons

Predictor	β	Measure	(A) One-step regression using all predictors				(B) Two-step strategy employing CART at screening step
Predictor	β	Measure	DSA	LASSO	PLSR	SPCA	BMA¹	DSA²	LASSO³	PLSR⁴	SPCA⁵
X ₁	0.50	Estimate (ESE)	0.93 (0.29)	0.08 (0.19)	0.03 (0.01)	0.03 (0.04)	0.32 (0.38)	0.93 (0.29)	0.35 (0.36)	0.11 (0.04)	2.3×10^-4 (0.01)
X ₁	0.50	Percent included	N/A	28.2%	N/A	98.5%	65.2%	N/A	68.3%	N/A	58.8%
X ₂	0.50	Estimate (ESE)	0.75 (0.27)	0.07 (0.22)	0.02 (0.01)	0.02 (0.03)	0.25 (0.32)	0.74 (0.25)	0.33 (0.38)	0.09 (0.04)	−2.7×10^-4 (0.01)
X ₂	0.50	Percent included	N/A	22.6%	N/A	94.0%	63.5%	N/A	63.8%	N/A	58.9%
X ₆	0.50	Estimate (ESE)	0.88 (0.29)	0.07 (0.19)	0.03 (0.02)	0.02 (0.03)	0.29 (0.36)	0.88 (0.25)	0.36 (0.36)	0.10 (0.05)	−1.2×10^-4 (0.01)
X ₆	0.50	Percent included	N/A	25.8%	N/A	96.2%	63.6%	N/A	67.4%	N/A	57.9%
X ₇	0.50	Estimate (ESE)	0.71 (0.26)	0.04 (0.22)	0.02 (0.01)	0.01 (0.02)	0.24 (0.30)	0.67 (0.26)	0.32 (0.34)	0.08 (0.04)	9.1×10^-4 (0.01)
X ₇	0.50	Percent included	N/A	18.1%	N/A	82.4%	65.6%	N/A	64.3%	N/A	57.8%
X ₁*X ₂	0.20	Estimate (ESE)	0.002 (0.03)	0.17 (0.14)	0.07 (0.04)	0.07 (0.07)	0.24 (0.22)	0.006 (0.06)	0.21 (0.18)	0.27 (0.08)	0.28 (0.13)
X ₁*X ₂	0.20	Percent included	0.3%	79.2%	N/A	96.3%	78.4%	1.1%	84.0%	N/A	98.6%
X ₁*X ₆	0.20	Estimate (ESE)	0.003 (0.05)	0.20 (0.18)	0.06 (0.03)	0.05 (0.06)	0.21 (0.26)	0.006 (0.08)	0.22 (0.22)	0.23 (0.08)	0.19 (0.11)
X ₁*X ₆	0.20	Percent included	0.3%	77.3%	N/A	99.0%	66.7%	0.9%	78.1%	N/A	99.8%
X ₆*X ₇	0.20	Estimate (ESE)	0.002 (0.04)	0.17 (0.16)	0.06 (0.03)	0.03 (0.05)	0.25 (0.27)	0.004 (0.05)	0.21 (0.21)	0.19 (0.08)	0.15 (0.11)
X ₆*X ₇	0.20	Percent included	0.3%	74.4%	N/A	94.1%	73.3%	0.5%	76.9%	N/A	98.6%
		Average model size	20.1	22.8	210	79.3	6.0	4.2	6.7	10.0	8.2

Average estimated effects, empirical standard errors, percentages of correct identification of non-zero coefficients, and average model size corresponding to four available statistical methods in a cross-sectional study with continuous responses and 20 air pollutants were provide in panel A. Similar results of five statistical methods after an initial CART variable selection using the two-step modeling strategy were summarized in panel B. Sample size for each replicate was N=250. The true model size was 7 without accounting for the intercept, and the possible maximum model size was 210. ESE, empirical standard error of the estimate. Results are based on 1000 replicates.
Estimate of the non-zero predictor is calculated as the mean of the products that estimated regression coefficient of this predictor multiplies the indicator function that this predictor is correctly identified during each replication. The percentage of the non-zero predictor quantifies the proportion of correct identification of this predictor over 1000 replicates in each method. ¹In BMA, predictors with their posterior probabilities greater than 10% are regarded as identified. ²In DSA, there is no variable selection for main effects as individual exposures are enforced when their interactions are of interest. Identification of interaction refers to the inclusion of interaction term in the cross-validated best predictive model. ³Predictors with their estimated LASSO regression coefficients not equal to zero are considered identified. ⁴No variable selection has been applied in PLSR because it uses all predictors. ⁵In SPCA, predictors are identified if their Wald’s statistics from univariate models are larger than a threshold value.

Back to article page

ISSN: 1476-069X

Contact us

General enquiries: journalsubmissions@springernature.com