29 Feb, 2016

GP - Latent Variable Model with Variational Inference


Titsias (2009 - 2010)

A Special Case of Variational Inference

Model

\[y = \mu(x) + \epsilon\]

$Y$ is $N\times D$. $X$ is latent, and $N\times Q$, $Q \ll D$, that is latent variables much less than dimension of data. \(p(Y\v X) = \prod_{d=1}^D p(y_d\v X)\), because each row in $Y$, which represents one coordinate, are independent of another. \(Y=[y_1:y_2:\cdots:y_D]\). \(\kappa(x,x') = \sigma^2_f\exp\p{-\frac{1}{2} \sum_{q=1}^Q\alpha_q(x_q-x_q')^2}\). This is a GP-ARD kernel.

\(p(x) = \prod_{n=1}^N N(x_n\v 0,I_Q)\). \(p(Y,X)=p(Y\v X)p(X)\).

Side Note: Sparse GP is called predictive process in machine learning.

$p(X\v Y)$ is difficult to estimate. So, we estimate with

\[q(X) = \prod_{n=1}^N N(x_n\v\mu_n,S_n)\]

$S_n$ is a diagonal matrix. $q$ is variationally approximated posterior of $X$.

\[\begin{aligned} ELBO &= \int~q(X)\log\frac{p(Y\v X)p(X)}{q(X)}~dX \\ &= \int~q(X)\log p(Y\v X)~dX - \int~q(X)\log\frac{q(X)}{p(X)}~dX \\ &= \sum_{d=1}^D\int~q(X)\log p(y_d\v X)~dX - \int~q(X)\log\frac{q(X)}{p(X)}~dX \\ &= \tilde{F}(q) - KL(q\lVert p) \end{aligned}\]

Where \(\tilde{F}(q) = \sum_{d=1}^D\int~q(X)\log p(y_d\v X)~dX = \sum_{d=1}^D\tilde{F}_d(q)\)

\[p(y_d,f_d,u_d \v X,Z) = p(y_d\v f_d) p(f_d\v u_d,X,Z) p(u_d\v Z)\]

where \(f_d = \begin{pmatrix} \mu(x_1) \\ \mu(x_2) \\ \vdots \\ \mu(x_N) \\ \end{pmatrix}\), \(u_d = \begin{pmatrix} \mu(x_1^*) \\ \mu(x_2^*) \\ \vdots \\ \mu(x_M^*) \\ \end{pmatrix}\), \(Z=\bc{x_1^*,...,x_M^*}\). $Z$ are knots (like in predictive process).

\[p(y_d\v f_d) = N(y_d \v f_d,\beta^{-1}I_N)\]

where \(\begin{pmatrix} \epsilon_1\\ \vdots\\ \epsilon_N \end{pmatrix} \sim N(0,\beta^{-1}I_N)\).

Surface at the latent knots. Aka, predictive process:

\[p(f_d\v u_d,X,Z) = N(f_d\v K_{NM}K^{-1}_{MM}u_d,K_{NN}-K_{NM}K^{-1}_{MM}K_{MN})\]

Finally knot GP, evaluated at knot points. Should pass through the knot points:

\[p(u_d\v Z) = N(u_d\v 0,K_{MM})\]

\(p(f_d,u_d\v y_d,X)\) is approximated with \(q(f_d,u_d) = p(f_d\v y_d,X)\phi(u_d)\).

\[p(y_d\v X) \ge \int\phi(u_d)\frac{ \log p(u_d) N(y_d\v K_{MN}K_NN)^{-1}u_d,\beta^{-1}I_N)}{\phi(u_d)} d u_d - \frac{\beta}{2} Tr(K_{NN}-K_{NM}K^{-1}_{MM}K_{MN})\] \[\begin{aligned} \tilde{F}_d(q) &\ge \int q(X)[\int\phi(u_d)\frac{ \log p(u_d) N(y_d\v K_{MN}K_NN)^{-1}u_d,\beta^{-1}I_N)}{\phi(u_d)} d u_d - \frac{\beta}{2} Tr(K_{NN}-K_{NM}K^{-1}_{MM}K_{MN})] dX\\ \end{aligned}\]

Then, change order of integrals… I’ll skip the typing…

\[\begin{array}{rcl} \psi_0 &=& Tr(E_{q(X)}[K_{NN}])\\ \psi_1 &=& E_{q(X)}[K_{NM}]\\ \psi_2 &=& E_{q(X)}[K_{MN}K_{NN}]\\ \end{array}\]

So,…

\[\begin{array}{rcl} \psi_0 &=& \sigma^2_f \prod_{q=1}^Q \frac{\exp\bc{-.5\frac{\alpha_q(\mu_{nq}-z_{mq})^2}{\alpha_q S_{nq}+1}} }{(\alpha_q S_{nq}+1)^{1/2}}\\ \end{array}\]

Too much typing… So, just read the paper… Good luck! Variational inference is quite tedious.

The point is $\tilde{F}_d(q)$ has a lower bound.