12 Feb, 2016
Approximated posterior belongs to the same class where prior $p^{(0)}(u)$ clbelongs to.
Computer science notation:
$t_0(u) = P^{(0)}(u)$ is the prior $t_i(u) = P(y_i | u)$ is the density $P(D,u) = \prod_{i=0}^n t_i(u)$ is the prior $\times$ likelihood (note the index starts from 0, the index for the prior).
When $P(y_i | u)$ is the density of a mixture Gaussain, one generally uses the approximated class as Gaussian.
We have observed $y_1,…,y_{i-1}$ and our approximated posterior is $q^{-i}$.
WHen the approximating class is Gaussian and $p(y_i | u)$ belongs to the exponential family then the best approximating Gaussian distribution can be obtained by matching moments. (i.e. $E_{\hat{p}}(u) = E_q[u]$ and $E_{\hat{p}}(uu’) = E_q[uu’]$) Nobody has used this in statistics. But computer scientists use this. This depends on data ordering.
ADF is viewed as the approximated posterior with exact likelihood. We can view it as an exact posterior with approximate likelihood (proposed by Minka, 2009).
If the approximating class is in the exponential family, then both $q(u)$ and $q^{-i}(u)$ will be in the same family. Which means $\tilde{t_i(u)}$ will also be in the same family.
Refer to lecture notes for details and issues in ADF.
In ADF, $\tilde{t_i(u)}$ is update only once implicitly but easlier $\tilde{t_i(u)}$’s may be inaccurate.