8 Feb, 2016

Sequential MCMC (2016)


Algorithm

  1. At time $t+1$ obtain $D_{t+1}$
  2. Update sufficient stats
    $C_{1,t+1} = g(C_{1,t},D_{t+1}),…,C_{m,t+1}$
    where $C$ is a sufficient statistic and $m$ is the number of sufficient stats
  3. Draw $n_{t+1}$ MCMC samples from the conditional distributions.
  4. Set $t+1$ to $t+2$ and carry out 1-3

Since conditional distributions at the $t+1$ only depend on the data through a few sufficient statistics, once we update sufficient stat at time $t+1$ we can draw samples from the full conditional.

Let $(\theta_1^{(t)},…,\theta_k^{(t)})$ be the $n_t$-th MCMC sample at time $t$. At time $t+1$, this will be the first MCMC sample.

Model: $y_t = X_t \beta + \epsilon$, $\epsilon \sim N(0,\sigma^2)$ $\beta \sim N(0,I)$

\(\beta | \sigma^2, D_1,...,D_t \sim N(S\sum_{l=1}^t X_l Y_l, S),\) where $S=\brak{\frac{\sum_{l=1}^t X_l’X_l}{\sigma^2}+I}^{-1}$

$\sigma^2 \sim InvGamma(a+st/2, b+\sum_{l=1}^t (y_t - X_t\beta)’(y_t - X_t\beta)/2)$

$C_{1,t} = \sum_{l=1}^t X_l’X_l$. So at time $t$, $g(C_{1,t},D_{t+1}) = C_{1,t}+X_{t+1}’X_{t+1}$.

Let $\pi_t$ be the full posterior distribution at time $t$. $T_t$ be the transition kernel (proposal density).

  1. If your posterior $\pi_t$ changes slowly over time as $t\rightarrow\infty$
  2. And the transition kernel is “ergodic” (something which is required for markov chain to converge)

then the sequential MCMC

Sequential MC for Fixed Parameter Settings (2002)

  • important idea that tech companies are starting to use if they are at all using Bayesian statistics
  • Based on importance sampling
    • refer to notes
    • Suppose the aim is to compute $E[h(\theta)]$. This means evaluating the integral $\frac{\int~h(\theta)\pi(\theta)~d\theta}{\int~\pi(\theta)~d\theta}$
    • if it’s easy to sample from $\pi$ but easy to sample from $g$
    • $\frac{\int~h(\theta)\pi(\theta)~d\theta}{\int~\pi(\theta)~d\theta} = \frac{\int~h(\theta)\frac{\pi(\theta)}{g(\theta)}g(\theta)~d\theta}{\int~\frac{\pi(\theta)}{g(\theta)}g(\theta)~d\theta}$

Algorithm:

  1. Draw samples from $\pi_0$ the prior for $\theta$
  2. You got $D_1$
  3. You have to draw samples from $\pi_1$ using importance sampling
  4. Get $D_2$
  5. Use importance sampling with $g=\pi_1$ to draw samples from $\pi_2$.

Possible Project

  • Compare sequential updating MLE vs Sequential MCMC samples will correspond to samples from $\pi_t$ at large $t$.