17 Feb, 2016

Expectation Propagation


Estimated posterior \(\hat{p}(u) = \frac{q^{-i}(u) t_i(u)}{q^{-i}(u) t_i(u)}\)

Example

\[p(y | u) = (1-w) N(y|u,I) + w~ N(y|0,10I)\]

where w is known.

$ p(u) = N(u | 0,100 I)$. \(D = \brac{y_1,...,y_n}\)

Goal: Find posterior $p(u | y_1,…,y_n$. We want to find an approximating posterior using ADF.

\[\brac{q:q(u) = N(u|m_u,v_uI)}\]

So the posterior and prior have the same form.

\[\begin{aligned} q^{-i} &= N(u|m_u^{-i},v_u^{-i}I_d) \\ \\ z_i &= \int~p(y_i|u)q{-i}(u)~du \\ &=(1-w)N(y_i | m^{-i}_u, (v_u^{-i})I) + w N(y_i|0,10I)\\ \\ \hat{p}(u) &= \frac{p(y_i|u)q^{-i}(u)}{\int~p(y_i|u)q^{-i}(u)~du} \\ &= \frac{p(y_i|u)q^{-i}(u)}{z_i} \end{aligned}\]

choose $q$ that minimizes KL with $\hat{p}$


$\hat{p}(u)$ approximate this density with a spherical normal distribution $q(u)$. This approximation is wrt the KL divergence. Match moments: solve these two equations for first and second moments: $E_{hat{p}}[u] = E_q[u]$, $E_{hat{p}}[uu’] = E_q[uu’]$. This will be $m_u,v_u I$.

In the example above, we’ll get that \(m_u = m_u^{-i} + v_u^{-i} r_i \frac{y_i-m_i^{-i}}{v_u^{-i}+1}\), where

\[\begin{aligned} r_i &= 1-\frac{w}{z_i} N(y_i \| 0, 10I_d) \\ v_u &= v_u^{-i} - \frac{r_i(v_u^{-i})^2}{v_u^{-i}+1} + r_i(1-r_i)(v_u^{-i})^2\frac{\norm{y_i-m_u^{-i}}^2}{d(v_u^{-i}+1)} \\ \end{aligned}\]

Initialize \(m_u,v_u\) at the prior.

  1. For each data block \(y_i\), update \(m_u,v_u\). \(S=S^{-i}Z_i\) where \(S^{-i}\) is the normalizing constant at time just before $i$.

Check out Ormerod (2015)

Expectation Propagation

\[\begin{aligned} q(u) &= \frac{\prod_{i=0}^n\tilde{t_i}(u)}{\int~\prod_{i=0}^n\tilde{t_i}(u)~du} \\ \tilde{t_i}(u) &\propto \frac{q(u)}{q^{-i}(u)} \text{, which is normal}\\ \end{aligned}\] \[\tilde{t_i}(u) = s_i\exp\brac{-\frac{1}{2v_i} (u-m_i)'(u-m_i)}\]

treat \(s_i, m_i, v_i\) as parameters and update them.

  1. priors
    • $v_0 = 100$
    • $m_0 = 0$
    • $s_0 = (2\pi v_0)^{-d/2}$
  2. Initialize $m_u = m_0, v_u = v_0$
  3. $\tilde{t_i}(u)=1, i\ge 1 \Rightarrow m_i=0, v_i=\infty, s_i =1$, the left part is the likelihood, right side are uninformative priors.
  4. Repeat until ($s_i, m_i, v_i$) converge
  5. $i = 1,…,n$. Remove $ \tilde{t_i}(u) $
    • $(m_u,v_u) \rightarrow (m_u^{-i},v_u^{-i})$
    • $(v_u^{-i})^{-1} = v_u^{-1}-v_i^{-1}$
    • $m_u^{-i} = m_u + \frac{v_u^{-1}}{v_i}(m_u-m_i)$
    • $q^{-i}(u) = N(m_u^{-i} + v_u^{-i} I_d)$
    • get $q(u)$ from KL divergence