3 Feb, 2016

Variable selection on BNP


GP

\[\kappa(x_i,x_j;\phi) = (x_i-x_j)' P (x_i-x_j)\]

where $P$ is a diagonal matrix, and use spike and slab prior on the diagonal elements of $P$.

New Topic

For the model,

\[\begin{aligned} y &= X\beta + \epsilon, \epsilon \sim N(0,\sigma^2) \\ \beta &\sim \pi \end{aligned}\]

say the true model is $y = X\beta^0 + \epsilon$. Then,

\[\pi(\beta \in U| y) = \frac{\int_U~f(y|\beta)~\pi(\beta)~d\beta}{\int~f(y|\beta)~\pi(\beta)~d\beta} = \frac{\int_U~\frac{f(y|\beta)}{f(y|\beta^0)}~\pi(\beta)~d\beta}{\int~\frac{f(y|\beta)}{f(y|\beta^0)}~\pi(\beta)~d\beta}\]

Let $U$ be a neighborhood of $\beta$ around $\beta^0$. A model is posterior consistent if $\pi(\beta \in U|y) \rightarrow 1$ as $n\rightarrow\infty$ under $\beta^0$ for any nbd U.

How do we choose a neighborhood?

  • KL nbd around $\beta^0$
    • KL($\beta,\beta^0$) = $\int~f(y|\beta^0)~\log\frac{f(y|\beta^0)}{f(y|\beta)}~dy$
  • Hellinger nbd around $\beta^0$
    • H($\beta,\beta^0$) = $\int~\paren{ \sqrt{f(y|\beta^0)} - \sqrt{f(y|\beta)}}^2~dy$
  • L2 distance $\norm{\beta-\beta^0}$

  • $U = \{ \beta: \norm{\beta-\beta^0}_2 \lt \epsilon \}$ for example.
  • L2 norm convergence implies KL, H. KL convergence and H convergence are equivalent.
  • If you fix any $\epsilon \gt 0$, $\pi(U_\epsilon|y) \rightarrow 1$ as $n\rightarrow\infty$ under $\beta^0$

Let $B_\epsilon = \{ \beta: \norm{\beta-\beta^0}_2 \lt \epsilon \}$. Let us show $\pi(B_\epsilon|y) \rightarrow 1$ as $n\rightarrow\infty$.

\[\pi(B_\epsilon| y) = \frac{\int_{B_\epsilon}~\frac{f(y|\beta)}{f(y|\beta^0)}~\pi(\beta)~d\beta}{\int_{\mathbb{R}^p}~\frac{f(y|\beta)}{f(y|\beta^0)}~\pi(\beta)~d\beta}\]

Let $\{\Phi\}^\infty_{n=1}$ be a sequence of test functions for testing $H_0:\beta=\beta^0$ vs. $H_1: \beta \in B_\epsilon^C$.

\[E_{\beta^0}(\Phi_n) \le c_1 \exp\paren{-cn}, ~~~ c, b \gt 0\]

and $\underset{\beta\in B_\epsilon^C}{Sup}~E_\beta\paren{1-\phi_n} \le b_1 \exp\paren{-bn}$.

The level of the sequence of tests goes to 0 as $n$ goes to infinity in an exponential rate and power function of the sequence of tests goes to 1 as $n$ goes to infinity in an exponential rate.

\[\begin{aligned} \pi(B_\epsilon | y) &= \frac{\int_{B_\epsilon}~\frac{f(y|\beta)}{f(y|\beta^0)}~\pi(\beta)~d\beta}{\int_{\mathbb{R}^p}~\frac{f(y|\beta)}{f(y|\beta^0)}~\pi(\beta)~d\beta} \\ &= \frac{J_{B_\epsilon}}{J} \\ & \le \Phi_n + \frac{(1-\Phi_n)J_{B_\epsilon}}{J} \end{aligned}\] \[\begin{aligned} P_{\beta^0} (\Phi_n \gt \exp\paren{-cn/2}) &\le \frac{E_{\beta_0}(\Phi_n)}{\exp\paren{-cn/2}} \\\\ &=c_1 \exp\paren{\frac{-cn}{2}} \end{aligned}\]

$\sum_{n=1}^\infty P_{\beta^0} (\Phi_n \gt \exp\paren{-cn/2}) \le \sum_{n=1}^\infty c_1 \exp\paren{\frac{-cn}{2}} \lt \infty$ $ \Rightarrow P_{\beta^0} (\Phi_n \gt \exp\paren{-cn/2 \text{ infinitely often}}) = 0.$ So, for all but finite number of $n$’s, $\Phi_n$ goes to 0. (Using Chebychev’s then Borel Cantelli)

$\Phi_n = I_{\{ \norm{\hat{\beta_n}-\beta^0} \gt \epsilon/2 \}}$, $\hat\beta = (X’X)^{-1}X’y$

$\brac{\Phi_n}$ is an exponentially consistent sequence of test functions for testing $H_0:\beta=\beta_0$ vs $H_1:\beta \in B^C$.


Potential Project

  • Doing Generalized Pareto on the diagonals of $P$ in a GP (or other shrinkage priors)