16 Oct, 2015

LU and SVD


LU decomposition and singular value decomposition (SVD) are both techniques for solving systems of equations. They both solve problems of the form $A \mathbf x = b$, where $\mathbf x$ is the unknown. The difference is that SVD’s can handle matrices $A$ that are $m \times n$, whereas LU’s can typically only solve problems where $A$ is $n \times n$. Computationally, LU is simpler than SVD.


LU Decomposition

Every matrix $\underset{n \times n}{A}$ can be rewritten as $A=LU$, where $L$ and $U$ are lower and upper traingulars respectively. That is

\[L = \left[ \begin{array}{lll} 1 & 0 & 0 \\\\ L_{11} & 1 & 0 \\\\ L_{21} & L_{32} & 1 \\\\ \end{array} \right]\] \[U = \left[ \begin{array}{lll} U_{11} & U_{12} & U_{13} \\\\ 0 & U_{22} & U_{23} \\\\ 0 & 0 & U_{33} \\\\ \end{array} \right]\]

Solving for the values of $L$ and $U$ is trivial. The key of the LU decomp is noticing that $A\mathbf x = b \Rightarrow L(U\mathbf x) = b$. If we let $\mathbf y = U\mathbf x$, it becomes clear that we need to solve two easy systems of equations in order to solve for $\mathbf x$ \(\begin{cases} \begin{array}{rcl} L\mathbf y &=& b \\\\ U\mathbf x &=& \mathbf y \\\\ \end{array} \end{cases}\) This means that we can first solve for $\mathbf y$ in $L\mathbf y = b$ where $\mathbf y = U\mathbf x$, and then solve for $\mathbf x$ in $U\mathbf x = \mathbf y$.


Singular Value Decomposition (SVD)

Suppose $\underset{n \times m}{A}$ can be rewritten as $USV^\dagger$, where $U,V$ are unitary (which implies that $U^\dagger = U^{-1}$, etc.) and have dimensions $n \times n$ and $m \times m$ respectively, and $\underset{n \times m}{S}$ is “diagonal” in the sense that the first $p=min(n,m)$ diagonal elements are not necessarily 0, but the remainder of the matrix is 0. This means that

\[\begin{array}{rcccl} A^\dagger A &=& VS^\dagger U^\dagger USV^\dagger &=& VS^\dagger SV^\dagger \\\\ A A^\dagger &=& USV^\dagger VS^\dagger U^\dagger &=& USS^\dagger U^\dagger \\\\ \end{array}\]

Recall that for a square matrix $X$, $X = WDW^\dagger = WDW^{-1}$, where $D$ is the diagonalization of $X$ where the diagonal elements are the eigenvalues of $X$, and $S$ is the normalized eigenvectors of $X$. This means that $S$ is also unitary. Then by simple but careful examination, we see that

  1. $S^\dagger S$ and $SS^\dagger$ are “diagonal” (in the sense we said $S$ was diagonal above),
  2. $V$ = normalized eigenvectors for $A^\dagger A$
  3. $U$ = normalized eigenvectors for $AA^\dagger $
  4. diag($S^\dagger S$) are the eigenvalues for $A^\dagger A$
  5. diag($SS^\dagger $) are the eigenvalues for $AA^\dagger $.

The diagonal elements of $S^\dagger S$ and $SS^\dagger $ are $s_1^2,…,s_p^2$. And the singular values for $A$ are $s_1,…,s_p$. Conventionall, the $s_i$’s are arranged such that $s_1>s_2>…>s_p$. And these become the “diagonal” elements of $S$.

Now, we note that $A\mathbf x = b \Rightarrow USV^\dagger \mathbf x= b$. We can estimate $\mathbf x$ with $\hat x = A^-b = (USV^\dagger)^-b = V\bar{S}U^\dagger b$, where $\bar{S}$ is the “inverse” of $S$, computed by transposing $S$ and inverting the non-zero elements (which only occur on the diagonal). Here, I refer you to Mathematical Methods for Physics and Engineering, by Riley, Hobson and Bence (see chapter 8.18.3) for the proof that this is the least squares estimate for $x$.