15 Jan, 2016

Nonegative Matrix Factorization (NMF)


  • Distance based clustering (e.g. k-means) is difficult in high dimensions because ni high dimensions, elements are sparse and far apart.
    • Alternatives:
      • Spectral Clustering
      • Clustering via NMF
        • Originally, NMF was used to save on storage. That is, storing $W,H$ is much cheaper than storing $M$ ($\because (p+k)n \ll p\times n$). In this case, $W,H$ need not be unique, as long as we can retrieve $M=WH$.

$ \underset{n\times p}{X} $ is a matrix s.t. $n \ll p$. Let $M = X’ \Rightarrow \underset{p\times n}{M}$, and let $\underset{p\times k}{W},\underset{k\times n}{H} \gg 0$ (i.e. every element $\gt 0$), $k \lt n$. We seek to optimize \(\underset{W,H}{\text{argmin}} \norm{M-WH}_2.\)

NMF

  • Similar to factor analysis. NMF provides interpretation for matrix $X$.
  • k-means
    • $M = WF$
    • $ \underset{W}{\text{argmin}} \norm{M-WH}_2 $
    • Each of the $k$ columns in $W$ is a cluster center. Each column in $H$ have exactly one $1$ and all other elements are $0$.
    • $H_{l,j} = 1 \Rightarrow$ sample $j$ belongs to cluster $l$.
  • NMF
    • $M \approx WF$
    • If $H_{l,j}$ is the biggest element in $H_{,j}$ then sample $j$ belongs to the $l^{th}$ cluster
    • $W$ isn’t interpreted but $H$. $W$ is like the $k$ means of the clusters, but no solud interpretation. Only used for clustering.

Concluding Remarks about Frequentist Methods

From a practical perspective, Bayesian methods are used only when uncertainty is of interest. If the practitioner is only interested point estimation, frequentist methods are preferred.