Statistical strategies for constructing health risk models with multiple pollutants and their interactions: possible choices and comparisons

Table 4 Simulation results of four statistical methods under Scenario 4

Predictor	β	Measure	BMA¹	LASSO²	PLSR³	SPCA⁴
X ₁	0.20	Estimate (ESE)	0.19 (0.12)	0.15 (0.04)	0.15 (0.03)	0.006 (0.009)
X ₁	0.20	Percent included	89.3%	99.9%	N/A	44.1%
X ₃	0.20	Estimate (ESE)	0.19 (0.09)	0.15 (0.03)	0.12 (0.03)	0.004 (0.006)
X ₃	0.20	Percent included	94.6%	99.8%	N/A	32.8%
X ₆	0.20	Estimate (ESE)	0.19 (0.09)	0.14 (0.03)	0.12 (0.04)	0.0006 (0.0014)
X ₆	0.20	Percent included	95.8%	99.9%	N/A	18.2%
X ₉	0.20	Estimate (ESE)	0.20 (0.09)	0.14 (0.03)	0.08 (0.03)	0.0001 (0.0006)
X ₉	0.20	Percent included	94.5%	99.9%	N/A	6.2%
X ₁*X ₃	0.10	Estimate (ESE)	0.10 (0.03)	0.11 (0.01)	0.10 (0.01)	0.10 (0.07)
X ₁*X ₃	0.10	Percent included	99.2%	100%	N/A	97.1%
X ₁*X ₆	0.10	Estimate (ESE)	0.10 (0.03)	0.11 (0.01)	0.10 (0.01)	0.06 (0.05)
X ₁*X ₆	0.10	Percent included	99.5%	100%	N/A	87.0%
		Average model size	13.1	21.1	55	9.8

Average estimated effects, empirical standard errors, percentages of correct identification of non-zero coefficients, and average model size corresponding to four statistical approaches in a time-series study with count response and 10 air pollutants. Sample size for each replicate was N=800. The true model size was 6 with intercept not counted, and the possible maximum model size was 55. ESE, empirical standard error of the estimate. Results are based on 1000 replicates.
Estimate of the non-zero predictor is calculated as the mean of the products that estimated regression coefficient of this predictor multiplies the indicator function that this predictor is correctly identified during each replication. The percentage of the non-zero predictor quantifies the proportion of correct identification of this predictor over 1000 replicates in each method. ¹In BMA, predictors with their posterior probabilities greater than 10% are regarded as identified. ²Predictors with their estimated LASSO regression coefficients not equal to zero are considered identified. ³No variable selection has been applied in PLSR because it uses all predictors. ⁴In SPCA, predictors are identified if their Wald’s statistics from univariate models are larger than a threshold value.

ISSN: 1476-069X