26 Feb, 2016
log is a concave function. Convex Eggregex. So, $E(log(X)) < log(E(X))$.
So, $KL(p\lVert q) + ELBO = \log p(y) \Rightarrow $ maximizing ELBO is equivalent to minimizing KL.
For the Gaussian Mixtures example in the slikds,
\[\begin{aligned} q(\theta_1,...,\theta_K,z_1,...,z_n) &= \prod_{k=1}^K q(\theta_k|\tilde{\mu}_k) \prod_{i=1}^n q(z_i|\phi_i) \\ q^*(z_i) &\propto \exp\bc{E_{z_{-i}}\bk{\log p(\theta_{1:k}, z_i,z_{-i}, y_{1:n}) }} \\ &= \log p(\theta_{1:k}) + \sum_{j\ne i} \log(p(z_j)) + \sum_{j\ne i} \log(p(y_j|z_j)) + \log p(z_i) + \log p(y_i|z_i) \\ \Rightarrow q^*(z_i) &\propto \exp\bc{\log\pi_{z_i} + E\bk{\log p(y_i|z_i} } \\ E\bk{\log p(y_i|\theta_i} &= -\frac{1}{2} \log 2\pi - \frac{y_i^2}{2} + y_i E\bk{\theta_{z_i}} - E\bk{\theta_{z_i}^2/2} \\ \Rightarrow q^*(z_i=k) &\propto \exp\bc{\log\pi_k + y_i E\bk{\theta_k} - E\bk{\theta_k^2/2}} \\ \Rightarrow q^*(\theta_i) &\propto \exp\bk{E_{\theta_{-i},z_{1:n}} \bk{\log p(\theta_{1:n})} + \sum_{j=1}^n \log p(y_j|z_j) } \end{aligned}\]Let the prior on $\theta_i \sim N(\lambda_1,\lambda_2)$. Then
\[\tilde\lambda_1 = \frac{\lambda_1 + \sum{i=1}^n E[z_i^k]y_i }{\lambda_2 + \sum_{i=1}^n E[z_i^k]}, ~~~ \tilde\lambda_2 = \frac{1}{\lambda_2 + \sum_{i=1}^n E[z_i^k]}\]And $q_k^* \propto N(\theta,\tilde\lambda_1,\tilde\lambda_2)$, where $z_i^k$ is an indicator that $z_i=k$.
Solve:
\[\begin{aligned} E[z_i^k] &= f_1(E[\theta_k],E[\theta_k^2]) \\ E[\theta_k] &= f_2(E[z_i^k]) \\ \end{aligned}\]The point estimates are very good but very sensitive to starting values.