17 Feb, 2016

Expectation Propagation

Estimated posterior $\hat{p}(u) = \frac{q^{-i}(u) t_i(u)}{q^{-i}(u) t_i(u)}$

Example

\[p(y | u) = (1-w) N(y|u,I) + w~ N(y|0,10I)\]

where w is known.

$ p(u) = N(u | 0,100 I)$. $D = \brac{y_1,...,y_n}$

Goal: Find posterior $p(u | y_1,…,y_n$. We want to find an approximating posterior using ADF.

\[\brac{q:q(u) = N(u|m_u,v_uI)}\]

So the posterior and prior have the same form.

\[\begin{aligned} q^{-i} &= N(u|m_u^{-i},v_u^{-i}I_d) \\ \\ z_i &= \int~p(y_i|u)q{-i}(u)~du \\ &=(1-w)N(y_i | m^{-i}_u, (v_u^{-i})I) + w N(y_i|0,10I)\\ \\ \hat{p}(u) &= \frac{p(y_i|u)q^{-i}(u)}{\int~p(y_i|u)q^{-i}(u)~du} \\ &= \frac{p(y_i|u)q^{-i}(u)}{z_i} \end{aligned}\]

choose $q$ that minimizes KL with $\hat{p}$

$\hat{p}(u)$ approximate this density with a spherical normal distribution $q(u)$. This approximation is wrt the KL divergence. Match moments: solve these two equations for first and second moments: $E_{hat{p}}[u] = E_q[u]$, $E_{hat{p}}[uu’] = E_q[uu’]$. This will be $m_u,v_u I$.

In the example above, we’ll get that $m_u = m_u^{-i} + v_u^{-i} r_i \frac{y_i-m_i^{-i}}{v_u^{-i}+1}$, where

\[\begin{aligned} r_i &= 1-\frac{w}{z_i} N(y_i \| 0, 10I_d) \\ v_u &= v_u^{-i} - \frac{r_i(v_u^{-i})^2}{v_u^{-i}+1} + r_i(1-r_i)(v_u^{-i})^2\frac{\norm{y_i-m_u^{-i}}^2}{d(v_u^{-i}+1)} \\ \end{aligned}\]

Initialize $m_u,v_u$ at the prior.

For each data block $y_i$, update $m_u,v_u$. $S=S^{-i}Z_i$ where $S^{-i}$ is the normalizing constant at time just before $i$.

Check out Ormerod (2015)

Expectation Propagation

\[\begin{aligned} q(u) &= \frac{\prod_{i=0}^n\tilde{t_i}(u)}{\int~\prod_{i=0}^n\tilde{t_i}(u)~du} \\ \tilde{t_i}(u) &\propto \frac{q(u)}{q^{-i}(u)} \text{, which is normal}\\ \end{aligned}\] \[\tilde{t_i}(u) = s_i\exp\brac{-\frac{1}{2v_i} (u-m_i)'(u-m_i)}\]

treat $s_i, m_i, v_i$ as parameters and update them.

priors
- $v_0 = 100$
- $m_0 = 0$
- $s_0 = (2\pi v_0)^{-d/2}$
Initialize $m_u = m_0, v_u = v_0$
$\tilde{t_i}(u)=1, i\ge 1 \Rightarrow m_i=0, v_i=\infty, s_i =1$, the left part is the likelihood, right side are uninformative priors.
Repeat until ($s_i, m_i, v_i$) converge
$i = 1,…,n$. Remove $ \tilde{t_i}(u) $
- $(m_u,v_u) \rightarrow (m_u^{-i},v_u^{-i})$
- $(v_u^{-i})^{-1} = v_u^{-1}-v_i^{-1}$
- $m_u^{-i} = m_u + \frac{v_u^{-1}}{v_i}(m_u-m_i)$
- $q^{-i}(u) = N(m_u^{-i} + v_u^{-i} I_d)$
- get $q(u)$ from KL divergence