8 Feb, 2016

Sequential MCMC (2016)

Algorithm

At time $t+1$ obtain $D_{t+1}$
Update sufficient stats
$C_{1,t+1} = g(C_{1,t},D_{t+1}),…,C_{m,t+1}$
where $C$ is a sufficient statistic and $m$ is the number of sufficient stats
Draw $n_{t+1}$ MCMC samples from the conditional distributions.
Set $t+1$ to $t+2$ and carry out 1-3

Since conditional distributions at the $t+1$ only depend on the data through a few sufficient statistics, once we update sufficient stat at time $t+1$ we can draw samples from the full conditional.

Let $(\theta_1^{(t)},…,\theta_k^{(t)})$ be the $n_t$-th MCMC sample at time $t$. At time $t+1$, this will be the first MCMC sample.

Model: $y_t = X_t \beta + \epsilon$, $\epsilon \sim N(0,\sigma^2)$ $\beta \sim N(0,I)$

$\beta | \sigma^2, D_1,...,D_t \sim N(S\sum_{l=1}^t X_l Y_l, S),$ where $S=\brak{\frac{\sum_{l=1}^t X_l’X_l}{\sigma^2}+I}^{-1}$

$\sigma^2 \sim InvGamma(a+st/2, b+\sum_{l=1}^t (y_t - X_t\beta)’(y_t - X_t\beta)/2)$

$C_{1,t} = \sum_{l=1}^t X_l’X_l$. So at time $t$, $g(C_{1,t},D_{t+1}) = C_{1,t}+X_{t+1}’X_{t+1}$.

Let $\pi_t$ be the full posterior distribution at time $t$. $T_t$ be the transition kernel (proposal density).

If your posterior $\pi_t$ changes slowly over time as $t\rightarrow\infty$
And the transition kernel is “ergodic” (something which is required for markov chain to converge)

then the sequential MCMC

Sequential MC for Fixed Parameter Settings (2002)

important idea that tech companies are starting to use if they are at all using Bayesian statistics
Based on importance sampling
- refer to notes
- Suppose the aim is to compute $E[h(\theta)]$. This means evaluating the integral $\frac{\int~h(\theta)\pi(\theta)~d\theta}{\int~\pi(\theta)~d\theta}$
- if it’s easy to sample from $\pi$ but easy to sample from $g$
- $\frac{\int~h(\theta)\pi(\theta)~d\theta}{\int~\pi(\theta)~d\theta} = \frac{\int~h(\theta)\frac{\pi(\theta)}{g(\theta)}g(\theta)~d\theta}{\int~\frac{\pi(\theta)}{g(\theta)}g(\theta)~d\theta}$

Algorithm:

Draw samples from $\pi_0$ the prior for $\theta$
You got $D_1$
You have to draw samples from $\pi_1$ using importance sampling
Get $D_2$
Use importance sampling with $g=\pi_1$ to draw samples from $\pi_2$.

Possible Project

Compare sequential updating MLE vs Sequential MCMC samples will correspond to samples from $\pi_t$ at large $t$.