Predictor

β

Measure

BMA^{1}

DSA^{2}

LASSO^{3}

PLSR^{4}

SPCA^{5}


X
_{
2
}

0.50

Estimate (ESE)

0.22 (0.40)

0.76 (0.62)

0.40 (0.39)

0.04 (0.02)

0.08 (0.18)

Percent included

51.8%

N/A

70.8%

N/A

90.6%

X
_{
3
}

0.50

Estimate (ESE)

0.25 (0.44)

0.86 (0.59)

0.43 (0.43)

0.05 (0.03)

0.08 (0.18)

Percent included

53.0%

N/A

67.9%

N/A

90.6%

X
_{
2
}*X
_{
3
}

0.20

Estimate (ESE)

0.29 (0.11)

0.02 (0.11)

0.19 (0.14)

0.23 (0.11)

0.16 (0.11)

Percent included

96.0%

4.4%

83.2%

N/A

82.5%


Average model size

3.2

4.5

3.7

10

7.1

 Average estimated effects, empirical standard errors, percentage of correct identification of nonzero coefficients, and average model size corresponding to 5 statistical methods in a crosssectional study with continuous responses and 4 candidate air pollutants. Sample size for each replicate was N=250. The true model size was 3 without accounting for the intercept, and the possible maximum model size was 10. ESE, empirical standard error of the estimate. Results are based on 1000 replicates.
 Estimate of the nonzero predictor is calculated as the mean of the products that estimated regression coefficient of this predictor multiplies the indicator function that this predictor is correctly identified during each replication. The percentage of the nonzero predictor quantifies the proportion of correct identification of this predictor over 1000 replicates in each method. ^{1}In BMA, predictors with their posterior probabilities greater than 10% are regarded as identified. ^{2}In DSA, there is no variable selection for main effects as individual exposures are enforced when their interactions are of interest. Identification of interaction refers to the inclusion of interaction term in the crossvalidated best predictive model. ^{3}Predictors with their estimated LASSO regression coefficients not equal to zero are considered identified. ^{4}No variable selection has been applied in PLSR because it uses all predictors. ^{5}In SPCA, predictors are identified if their Wald’s statistics from univariate models are larger than a threshold value.