1 Feb, 2016

g-prior

Let $\phi$ be the precision parameter. The formulations of g-prior is

\[\beta | \phi \sim N(0, \frac{g}{\phi} \paren{X'X}^{-1}), \pi(\phi) \propto 1/\phi\]

for $n \approx 100-1000, p \approx 16-30$

$\pi(y | M_\gamma) = \int~like*prior~d\beta d\phi$

For $M_{null}$, $R_\gamma^2 = 0, p_\gamma=0$. So, bayes factor = $\frac{\pi(y|M_\gamma)}{\pi(y|M_{null})}= \frac{(1+g)^{(n-1-p_\gamma)/2}}{[1+g(1-R_\gamma^2)]^{(n-1)/2}}$

As $g \rightarrow \infty$, BF($M_\gamma: M_{null}$) $\rightarrow 0$. This mean that non-informative prior on $\beta$ will help reject the model even if it is correct. (Bartlett Paradox).

As $R_\gamma^2 \rightarrow 1$, BF $\rightarrow (1+g)^{-p_\gamma/2}$

Since $M_\gamma$ approaches to a model with a perfect fit when $R_\gamma^2 \rightarrow 1$, BF should go to $\infty$. (Information Paradox)

Both paradoxes found in Langer et. al 2008.

Since fixing $g$ leads to paradoxes, we need to put a prior on $g$.

Zellner-Siow prior
- $\pi(g) = \frac{\sqrt{n/2}}{\Gamma(1/2)} g^{-3/2} e^{-n/(2g)}$, $g \gt 0$
- $\beta | \phi \sim $ multivariate t
Hyper-g prior (pretty useless prior - nobody uses it)
- $\pi(g) = \frac{a-2}{2} (1+g)^{-a/2}$, $g \gt 0$
- $\frac{g}{1+g} \sim Beta(1, \frac{a}{2}-1)$
- $\pi(g|y,M_\gamma) = $ ugly…
- but BF is in closed form (need to do 1 numerical integration)

Gaussian Process

$y = \mu_0(x) + \epsilon$, $\mu_0$ is an unknown function, continuously differentiable up to order $s$. $\mu_0^{(\nu)}$ is Lipschitz cont of order $\nu-[\nu]$, $[x]$ is the floor of $x$. Stronger than continuous.

When using GP, you need Lipschitz continuity.

$y = \mu(x) + \epsilon$, $\mu \sim GP(0,\kappa(.,.; \phi))$. e.g. exponential correlation function $\kappa(x_i,x_j;\phi) = e^{-\norm{x_i-x_j}\phi}$, matern class of correlation function.

Likelihood: $y \sim N(\mu,\sigma^2 I)$
Prior: $\mu(x) \sim N(\mathbf 0, K)$, $K_{ij} = \kappa(x_i,x_j; \phi)$
Posterior: available in closed form.
- $p$ is big, model is sparse (not great for GP).
  1. use: $\kappa(x_i,x_j;\phi) = e^{-(x_i-x_j)’P(x_i-x_j)\phi}$. $P$ is a diagonal matrix. - put spike and slab prior of diag elements of P.
  2. Try: $y = \mu(\Phi(x)) + \epsilon$, $p=20000, n = 5000$. Raj, et al (2016 JMLR)

Potential project:

Bayesian Lasso, Generalized Double Pareto, or other shrinkage prior on GP predictors