20 Jan, 2016
Stochastic Search Variable Selection
Penalized Optimization (Concluding)
- Problems with Bootstrap for finding predictive intervals:
- Not applicable in very high dimensions
- Bootstrap only works (theoretically) when $p^3 \lt n$
Bayesian Approach in High Dimensions
- Bernstain-Von Mises Theorem (Bayesian CLT)
\[\pi(\beta,\gamma|y) \approx N\p{ {\hat{\beta} \choose \hat{\gamma}}, I{\hat{\beta} \choose \hat{\gamma}}^{-1} },\]
for $n \gg p$. The priors also wash out. (Under regularity conditions.)
- Friedman & Diaconis (1986)
- Posterior Consistency: $\pi(\beta,y) \overset{p}{\rightarrow} \delta_{\beta_0}$, when the true $\beta$ is $\beta_0$
- Improper priors lead to inconsistency
- priors need to be informative and contain “some structure”
- noninformative priors work in $n \gg p$
Spike and Slab Prior
\[\begin{array}{rcl}
\beta_j = 0 & w.p. & \pi_0 \\\\
\beta_j = g & w.p. & 1-\pi_0 \\\\
g & \sim & N(0,c)
\end{array}\]
- apriori, we have mean number of parameters included in the model to be $p(1-\pi_0)$. This is not always good). This can be fixed by putting a (beta) prior on $\pi_0$.
- Explores $2^p$ model space
- Gibbs not feasible when $p$ is large (e.g. more than 10)
- We need a MCMC scheme to solve this problem
Stochastic Search Variable Selection (SSVS)
- Relies on MCMC
- Include parameter if $\hat p(\gamma_j=1) \gt .5$. This is proven to be optimal. (In Annals of Statistics with minimal mathematics. Important paper.)
- Fails when 3 predictors are highly correlated
- choice of $g$ is problematic, subjective
- $p \lt 5000$, otherwise it becomes too slow. Generally not scalable, but there are tricks to parallelize.
Reading & Other Notes:
- Barbitari & Berger, Median Probability Model
- Always standardize covariates when doing regression