7 Jan, 2016

Bayesian Inference Glossary

Posterior Predictive: $p(y^* | y) = \displaystyle \int p(y^* | \theta,y)~p(\theta | y)~d\theta$

Note that the second portion in the integral is what we sample from (the posterior in this case), and the first portion is what we evaluate based on the sampled posterior to obtain our posterior predictive samples. Also, the first portion is the data generating mechanism, which is simply $p(y^*|\theta)$ if the mechanism doesn’t depend on previous draws (which isn’t the case in a time series).

Prior Predictive (a.k.a. Marginal): $p(y^*)$ = $\displaystyle \int p(y^*|\theta)~p(\theta)~d\theta$

Conjugate Prior: A prior is conjugate if the posterior comes from the same distributional family as the prior. Note that if the likelihood is in the exponential family, there exists a conjugate prior (often from the exponential family). You can also find likelihoods that are not in the exponential family that have conjugate priors (e.g. Uniform likelihood, Pareto prior). You can also get non-conjugate priors with closed form posteriors (e.g. Normal likelihood, Laplace prior, used in LASSO).

Mixtures of conjugate priors are conjugate

Stopping Rule Principle: Stopping rule $\tau$ should have no influence on the final reported evidence about $\theta$ obtained from the data. $\tau: \mathbb R^p \rightarrow [0,1]$, and $\tau_p(x_1,…,x_n)$ probability of stopping after observing $x_1,…,x_p$.

Natural Exponential Family: $p(x|\eta) = h(x)\exp\{\sum_{k=1}^K \eta_k t_k(x) - \psi(\eta)\}$, with natural parameter $\eta$, and sufficient statistic $\sum_{k=1}^K t_k(x)$.

Finding Conjugate Priors

If $ p(x|\eta) $ is the sampling distribution in the natural exponential family, then the conjugate prior is $\pi(\eta) = A(\nu,\mu)\exp\\{\sum_{k=1}^K \mu_k\eta_k - \nu\psi(\eta) \\}$ where $\nu,\mu$ are hyperparameters.

$p(\eta|x) = \pi(\eta) p(x|\eta)$ is a family in the natural exponential family with $\tilde\nu = \nu+n$ and $\tilde{\mu}_k = \mu_k + \sum_{i=1}^nt_k(x_i)$.

Exercises:

find the commonly used conjugate priors from the sampling distributions using this technique
find the posterior / prior predictives using this technique
- prior pred.: $p(y) = \displaystyle\frac{\prod_{i=1}^n h(y_i) A(\nu,\mu)}{A(\tilde\nu,\tilde\mu)}$
- post. pred.: $p(y^*|y) = \displaystyle\frac{h(y^*) A(\tilde\nu,\tilde\mu)}{A(\tilde\nu+1,\tilde\mu+t(y^*))}$

Computing Normal Mixtures using Linear Models (much easier)

$p(x) = \displaystyle\int\prod_{i=1}^n N(x_i;\theta,\sigma^2) N(\theta;\mu,\tau^2)d\theta$
Notice that:

$x_i = \theta + \epsilon_i$, $\epsilon_i \sim N(0,\sigma^2)$
$\theta = \mu + \delta$, $\delta \sim N(0,\tau^2)$

So,
$ \mathbf x = \mathbf 1 \theta + \mathbf\epsilon $
$ \mathbf x = \mathbf 1 (\mu+\delta) + \mathbf\epsilon $
E[$ \mathbf x$] = $\mathbf 1 \mu$
V[$ \mathbf x$] = $\tau^2 \mathbf{11’} + \sigma^2 I_n$

Typical Transformations for Metropolis Hastings (using Wikipedia parametrizations)

Original R.V	Transformation	Back Transformation	$f_w(w)$
$\phi\sim \text{Unif}(a,b)$	$w = \displaystyle\log\p{\frac{\phi-a}{b-\phi}}$	$\phi=\displaystyle\frac{be^w+a}{1+e^w}$	$\displaystyle\frac{e^w}{\p{1+e^w}^2}$
$\phi\sim \text{InvGamma}(a,b)$	$w=\log(\phi)$	$\phi=e^w$	$\displaystyle\frac{b^a}{\Gamma(a)} e^{-aw-be^{-w}} $
$\phi\sim \text{Gamma}(a,rate=b)$	$w=\log(\phi)$	$\phi=e^w$	$\displaystyle\frac{b^a}{\Gamma(a)} e^{aw-be^{w}} $