15 Jan, 2016
Nonegative Matrix Factorization (NMF)
- Distance based clustering (e.g. k-means) is difficult in high dimensions because ni high dimensions, elements are sparse and far apart.
- Alternatives:
- Spectral Clustering
- Clustering via NMF
- Originally, NMF was used to save on storage. That is, storing $W,H$ is much cheaper than storing $M$ ($\because (p+k)n \ll p\times n$). In this case, $W,H$ need not be unique, as long as we can retrieve $M=WH$.
$ \underset{n\times p}{X} $ is a matrix s.t. $n \ll p$.
Let $M = X’ \Rightarrow \underset{p\times n}{M}$, and let
$\underset{p\times k}{W},\underset{k\times n}{H} \gg 0$ (i.e. every element $\gt 0$), $k \lt n$.
We seek to optimize
\(\underset{W,H}{\text{argmin}} \norm{M-WH}_2.\)
NMF
- Similar to factor analysis. NMF provides interpretation for matrix $X$.
- k-means
- $M = WF$
- $ \underset{W}{\text{argmin}} \norm{M-WH}_2 $
- Each of the $k$ columns in $W$ is a cluster center. Each column in $H$ have exactly one $1$ and all other elements are $0$.
- $H_{l,j} = 1 \Rightarrow$ sample $j$ belongs to cluster $l$.
- NMF
- $M \approx WF$
- If $H_{l,j}$ is the biggest element in $H_{,j}$ then sample $j$ belongs to the $l^{th}$ cluster
- $W$ isn’t interpreted but $H$. $W$ is like the $k$ means of the clusters, but no solud interpretation. Only used for clustering.
From a practical perspective, Bayesian methods are used only when uncertainty is of interest. If the practitioner is only interested point estimation, frequentist methods are preferred.