20 Jan, 2016

Stochastic Search Variable Selection


Penalized Optimization (Concluding)

  • Problems with Bootstrap for finding predictive intervals:
    • Not applicable in very high dimensions
    • Bootstrap only works (theoretically) when $p^3 \lt n$

Bayesian Approach in High Dimensions

  • Bernstain-Von Mises Theorem (Bayesian CLT)
\[\pi(\beta,\gamma|y) \approx N\p{ {\hat{\beta} \choose \hat{\gamma}}, I{\hat{\beta} \choose \hat{\gamma}}^{-1} },\]

for $n \gg p$. The priors also wash out. (Under regularity conditions.)

  • Friedman & Diaconis (1986)
    • Posterior Consistency: $\pi(\beta,y) \overset{p}{\rightarrow} \delta_{\beta_0}$, when the true $\beta$ is $\beta_0$
    • Improper priors lead to inconsistency
    • priors need to be informative and contain “some structure”
    • noninformative priors work in $n \gg p$

Spike and Slab Prior

\[\begin{array}{rcl} \beta_j = 0 & w.p. & \pi_0 \\\\ \beta_j = g & w.p. & 1-\pi_0 \\\\ g & \sim & N(0,c) \end{array}\]
  • apriori, we have mean number of parameters included in the model to be $p(1-\pi_0)$. This is not always good). This can be fixed by putting a (beta) prior on $\pi_0$.
  • Explores $2^p$ model space
    • Gibbs not feasible when $p$ is large (e.g. more than 10)
    • We need a MCMC scheme to solve this problem

Stochastic Search Variable Selection (SSVS)

  • Relies on MCMC
  • Include parameter if $\hat p(\gamma_j=1) \gt .5$. This is proven to be optimal. (In Annals of Statistics with minimal mathematics. Important paper.)
  • Fails when 3 predictors are highly correlated
  • choice of $g$ is problematic, subjective
  • $p \lt 5000$, otherwise it becomes too slow. Generally not scalable, but there are tricks to parallelize.

Reading & Other Notes:

  • Barbitari & Berger, Median Probability Model
  • Always standardize covariates when doing regression