Dynamic Linear Model (WIP)

Consider the following Normal Dynamic Linear Model (model):

\[\begin{aligned} \bm y_t \mid \bm\theta_t &\sim \MvNormal(\bm F_t^T \bm\theta_t, \bm V_t) \\ \bm \theta_t \mid \bm\theta_{t-1} &\sim \MvNormal(\bm G_t \bm\theta_{t-1}, \bm W_t) \end{aligned}\]

for $t\in \bc{1,\dots,T}$, with $\bm\theta_0 \sim \MvNormal(\bm m_0, \bm C_0)$.

Let $\bm D_t = \bc{y_1, \dots, y_T}$. Then at each time $t$, we desire the distribution $\bm\theta_t \mid \bm D_t$, which can be obtained analytically due to all distributions being Normal.

So, at each time point, we have

\[\begin{aligned} \bm y_t \mid \bm\theta_t &\sim \MvNormal(\bm F_t^T \bm\theta_t, \bm V_t) \\ \bm \theta_t \mid \bm\theta_{t-1} &\sim \MvNormal(\bm G_t \bm\theta_{t-1}, \bm W_t) \\ \bm \theta_{t-1} \mid \bm D_{t-1} &\sim \MvNormal(\bm m_{t-1}, \bm C_{t-1}), \end{aligned}\]

where, for convenience, $(\bm \theta_{0} \mid \bm D_{0}) = \bm\theta_0$.

We can marginalize over $\bm\theta_{t-1}$ to obtain

\[\begin{aligned} \bm y_t \mid \bm\theta_t &\sim \MvNormal(\bm F_t^T \bm\theta_t, \bm V_t) \\ \bm \theta_t \mid \bm D_{t-1} &\sim \MvNormal( \bm G_t \bm m_{t-1}, ~\bm G_t \bm C_{t-1} \bm G_t ^ T + \bm W_t ) \\ \end{aligned}\]

Thus, by applying the content from Bayesian Linear Model , we can obtain $ \bm \theta_t \mid \bm y_t, \bm D_{t-1} = \bm \theta_t \mid \bm D_{t} \sim \MvNormal(\bm m_t, \bm C_t) $, where

\[\begin{aligned} \bm C_t &= \p{ \bm F_t \bm V_t^{-1} \bm F_t^T + \bk{\bm G_t \bm C_{t-1} \bm G_t^T + \bm W_t}^{-1} }^{-1} \\ \bm m_t &= \bm C_t \p{ \bm F_t \bm V_t^{-1} \bm y_t + \bk{\bm G_t \bm C_{t-1} \bm G_t^T + \bm W_t}^{-1} \bm G_t \bm m_{t-1} }. \end{aligned}\]

Note that the outer inversion for computing $\bm C_t$ can be done using the Sherman-Morrison-Woodbury formula if the dimensions of $\bm y_t$ is much larger than that of $\bm\theta_t$.

TODO

  • discount factor
    • When $\bm W_t$ is unknown, one can alternatively specify it using a “discount factor” $\delta \in (0, 1]$. Recall that the variance of $\bm\theta_t \mid \bm D_{t-1}$ is $\bm G_t \bm C_{t-1} \bm G_t^T + \bm W_t$. Let $\bm P_t = \bm G_t \bm C_{t-1} \bm G_t^T$. If we now let $\bm P_t + \bm W_t = \bm P_t / \delta$, then $W_t = \bm P_t (1 - \delta) / \delta$. Thus, $W_t$ is obtained by inflating $P_t$ via the discount factor. The discount factor is usually specified using maximum likelihood estimation.
  • Unknown but non-time-varying $\bm V_t$.
    • Can place Inverse Gamma prior on $V_t$ and marginalize it out.
    • Or, can compute ratio between theoretical variance and estimated variance from forecasts.