22 Feb, 2016
For matrix inversion of $\sigma^2I+K$ there are two ways to solve
$\Rightarrow y = Mz + \epsilon$, $\epsilon \sim N(0,\sigma^2I)$, $z\sim N(0,\sigma^2I)$
\[\begin{aligned} \pi(-|y) & \propto & N(y|Mz,\sigma^2I) \times N(z|\sigma^2I) \times \pi(-) \\ \pi(-|y) &\propto & N(y|0,M\Sigma M'+\sigma^2I) \times \pi(-) \end{aligned}\]We need to evaluate $N(y|0,M\Sigma M’+\sigma^2I)$ every mcmc iteration. So, \((M\Sigma M' + \sigma^2I)^{-1} = \frac{1}{\sigma^2}I - \frac{M}{\sigma^2} (\frac{1}{\sigma_z^2}I+\frac{M'M}{\sigma^2})^{-1} \frac{M'}{\sigma^2}\) if $m$ (number of knots) is small, poor approximation; $m$ is large then computational issues occur. Choice of kernel $\kappa$ is arbitrary. $u$ are the know points. They can be an equispacing grid points of the spatial map. Random knots work too. $m$ is the number of knots.
\[y = \sum_l^L \kappa_l z_l + \epsilon\]$\kappa_l = \kappa(x-u_l^*)$ but we do not known which $\kappa$ to use. Sometimes, wavelet based functions.
Where $G$ is \(\begin{pmatrix} \tau^2 & C \\ C' & K^* \end{pmatrix}\)
So, \(E[\mu(s) \v \mu(s_1^*),..\mu(s_m^*)] = 0 + C (K^*)^{-1} \mu(s^*)\)
where \(C = cov(\mu(s), \brak{\mu(s_1^*),....,\mu(s_m^*)} )\), and $s$ is any one location in the data set.
if \(C(K^*)^{-1}\) is the basis function and \(\mu(s^*)\) is the basis coefficients, then
\[\begin{aligned} y &= \tilde\mu(s) + \epsilon \\ &= C(s) (K^*)^{-1} \mu(s^*) + \epsilon \\ &= H\mu(s^*) + \epsilon \end{aligned}\]where, $\epsilon \sim N(0,\sigma^2I)$ and $C(s)$ is a matrix of the previous $C$ at every location.
Now,
\[\begin{aligned} \pi(-|y) \propto N(y | H\mu(s^*), \sigma^2I) \times N(\mu(s^*)|0,K^*) \times \pi(-) \\ \pi(-|y) \propto N(y | 0, \sigma^2I + HK^*H') \times \pi(-) \\ \pi(-|y) \propto N(y | 0, \sigma^2I + CK^{*-1}C') \times \pi(-) \\ \end{aligned}\]Put uniform prior for $\phi$ and Inverse Gamma for $\sigma^2$ and $\tau^2$.
MCMC package in R by Charles Geyer does metropolis. But, you need to transform parameters to have support from $(-\infty,\infty)$.
Block update $(\log(\tau),\log(\sigma^2),\log(\frac{\phi-a}{b-\phi}))$ with a MVN proposoal.
The second line is obtained by integrating out \(\mu(s^*)\). We can get posterior for \(\mu^*\), and get \(\mu(s)=C[K^*]^{-1}\mu(s^*)\).
Now we can use woodbury to invert the covariance matrix
\[\begin{aligned} (\sigma^2I + HK^*H')^{-1} &= \frac{1}{\sigma^2}I - \frac{H}{\sigma^2} \p{(K^*)^{-1}+\frac{H'H}{\sigma^2}}^{-1} \frac{H'}{\sigma^2} \\ (\sigma^2I + CK^{*-1}C')^{-1} &= \frac{1}{\sigma^2}I - \frac{C}{\sigma^2} \p{K^*+\frac{C'C}{\sigma^2}}^{-1} \frac{C'}{\sigma^2} \\ \end{aligned}\]$n = 10000, m = 500-600$
For the determinant, we can use woodbury-sharman matrix inversion
\[\begin{aligned} det(A + UWV^T) &= det(W^{-1} + V^T A^{-1} U) det(W) det(A) \\ \abs{\sigma^2I + CK^{*-1}C'} &= \abs{K^* + \frac{C'C}{\sigma^2}} \frac{(\sigma^2)^n}{\abs{K^*}} \\ \end{aligned}\]The predictive process is not a stationary GP because the cov function \(cov(\tilde\mu(s),\tilde\mu(s'))\)is not a function of distance of the observations \(s-s'\)alone.
Want \(y = \mu(s) + \epsilon\) but fitting $y = E[\mu(s) | \mu(s_1^*),…,\mu(s_1^*)] + \epsilon$. Note that $Var(\mu(s)) = Var(E[\mu(s) | \mu(s^*)]) + E(Var[\mu(s) | \mu(s^*)])$. So the predictive model will understate the variance.
The posterior for mean function at the knot points is
\[\mu^* \v y,\sigma^2,\tau^2,\phi \sim \text{MVN}(S^{-1}H'y,S^{-1})\]where $S =K^*+\frac{H’H}{\sigma^2}$. And the mean function at the original locations is obtained deterministically as \(\mu(s)=C[K^*]^{-1}\mu(s^*)\).
We set \(Var(\mu(s_i)) = \tau^2 = Var(\tilde\mu(s_i)) + Var(\eta(s_i))\)
Let \(g = (\kappa(s_i,s_1^*),..., \kappa(s_i,s_1^*))\)
\[\eta(s) \overset{ind}{\sim} N(0,(\tau^2 - g) {K^*}^{-1}g' )\]Choosing to fit the model with M nearest neighbors. Look up the paper…
The problem once again is inverting $G=(\sigma^2I+K)$ which is $n\times n$. This is $O(n^3)$. If we can strategically make $G$ sparse, we can invert the matrix in much less time.