20 Jan, 2016

Stochastic Search Variable Selection

Penalized Optimization (Concluding)

Problems with Bootstrap for finding predictive intervals:
- Not applicable in very high dimensions
- Bootstrap only works (theoretically) when $p^3 \lt n$

Bayesian Approach in High Dimensions

Bernstain-Von Mises Theorem (Bayesian CLT)

\[\pi(\beta,\gamma|y) \approx N\p{ {\hat{\beta} \choose \hat{\gamma}}, I{\hat{\beta} \choose \hat{\gamma}}^{-1} },\]

for $n \gg p$. The priors also wash out. (Under regularity conditions.)

Friedman & Diaconis (1986)
- Posterior Consistency: $\pi(\beta,y) \overset{p}{\rightarrow} \delta_{\beta_0}$, when the true $\beta$ is $\beta_0$
- Improper priors lead to inconsistency
- priors need to be informative and contain “some structure”
- noninformative priors work in $n \gg p$

Spike and Slab Prior

\[\begin{array}{rcl} \beta_j = 0 & w.p. & \pi_0 \\\\ \beta_j = g & w.p. & 1-\pi_0 \\\\ g & \sim & N(0,c) \end{array}\]

apriori, we have mean number of parameters included in the model to be $p(1-\pi_0)$. This is not always good). This can be fixed by putting a (beta) prior on $\pi_0$.
Explores $2^p$ model space
- Gibbs not feasible when $p$ is large (e.g. more than 10)
- We need a MCMC scheme to solve this problem

Stochastic Search Variable Selection (SSVS)

Relies on MCMC
Include parameter if $\hat p(\gamma_j=1) \gt .5$. This is proven to be optimal. (In Annals of Statistics with minimal mathematics. Important paper.)
Fails when 3 predictors are highly correlated
choice of $g$ is problematic, subjective
$p \lt 5000$, otherwise it becomes too slow. Generally not scalable, but there are tricks to parallelize.

Reading & Other Notes:

Barbitari & Berger, Median Probability Model
Always standardize covariates when doing regression