Combining the content from Bayesian Linear Model and Multivariate Linear Model we can show the following. Consider the following multivariate (response) linear model:
\[\begin{aligned} \bm y_i \mid \bm B &\sim \MvNormal(\bm x_i^T \bm B, \bm\Sigma) \\ \vec(\bm B) &\sim \MvNormal(\bm m, \bm S) \end{aligned}\]with dimensions:
$\bm y_i$ | $\bm x_i$ | $\bm\Sigma$ | $\bm B$ |
---|---|---|---|
$(R \times 1)$ | $(P\times 1)$ | $(R\times R)$ | $(P\times R)$ |
Construct $\bm Y$ ($N \times R$) by stacking $N$ rows of $\bm y_i^T$. Similarly, construct $\bm X$ ($N \times P$) by stacking $N$ rows of $\bm x_i^T$.
Then,
\[\begin{aligned} \vec(\bm B) \mid \vec(\bm Y) &\sim \MvNormal(\bm m^\star, \bm S^\star), \text{ where} \\ \bm S^\star &= \p{\bm\Sigma^{-1} \otimes \bm X^T \bm X + \bm S^{-1}}^{-1} \\ \bm m^\star &= \bm S^\star \p{(\bm\Sigma^{-1} \otimes \bm X^T) ~\vec(\bm Y) + \bm S^{-1} \bm m} \\ \Rightarrow \bm m^\star &= \bm S^\star \p{\vec(\bm X^T \bm Y \bm\Sigma^{-1}) + \bm S^{-1} \bm m}. \end{aligned}\]Note that with minor adjustments, instead of $\vec(\cdot)$, we could have used
$\flat(\cdot)$ throughout, where $\flat(\cdot)$ stacks the rows into a single
column. This is useful in programming languages that store matrices in
row-major order (e.g., in python). Python has a flatten
method, which is
essentially $\flat(\cdot)$. Replacing vec
with flat
usually changes the
order of entries in the Kronecker product. E.g., The previous equations would
be rewritten as: