NMF: Nonnegative Matrix Factorizatioin
#Machine Learning #Factorization
Decomposition
To make it easier to understand, we start with a data point $\mathbf P$ in a $k$dimensional space spanned by $k$ basis vectors $\mathbf V^k$. Naturally, we could write down the component decomposition of the point using the basis vectors $\mathbf V^k$,
$$ \mathbf P = P_k \mathbf V^k. $$
This is immediately obvious to us since we have been dealing with rank 2 $(k, 1)$ basis vectors and we are talking about the $k$ coordinates for a point.
This point is represented by a matrix of rank 2 $(k, 1)$ given this basis.
$$ \mathbf P \to \begin{pmatrix} P_1, P_2, \cdots, P_k \end{pmatrix} $$
If we assume a cartesian coordinate system, the basis are normal vectors.
$$ \mathbf V^1 \to \begin{pmatrix} 1\\ 0\\ \vdots\\ 0 \end{pmatrix}, \mathbf V^2 \to \begin{pmatrix} 0\\ 1\\ \vdots\\ 0 \end{pmatrix}, \cdots $$
These representations of the basis vectors can be joined together
$$ \mathbf V \to \begin{pmatrix} 1 & 0 & \cdots & 0 \\ 0 & 1 & \cdots & 0 \\ \vdots & \vdots & \vdots & \vdots\\ 0 & 0 & 0 & 1 \end{pmatrix} $$
Using these representations of the abstract vectors, we could represent the point $\mathbf P$ using
$$ \mathbf P \to \begin{pmatrix} P_1, P_2, \cdots, P_k \end{pmatrix} \begin{pmatrix} 1 & 0 & \cdots & 0 \\ 0 & 1 & \cdots & 0 \\ \vdots & \vdots & \vdots & \vdots\\ 0 & 0 & 0 & 1 \end{pmatrix}, $$
which is insanely trivial since the representation of basis in this case is an identity matrix.
NMF is talking about the same idea: coordinate transformations,
$$ X_{n}^{\phantom{n}k} = H_n^{\phantom{n}r} W_r^{\phantom{r}k}. $$
We will view $X_{n}^{\phantom{n}k}$ as a vertical stack of $n$ points:
$$ \begin{pmatrix} \mathbf P_1 \\ \mathbf P_2 \\ \vdots \\ \mathbf P_n \end{pmatrix} \to \begin{pmatrix} P_{11} & P_{12} & \cdots & P_{1k} \\ P_{21} & P_{22} & \cdots & P_{2k} \\ \vdots & \vdots & \ddots & \vdots \\ P_{n1} & P_{n2} & \cdots & P_{nk} \end{pmatrix} = \begin{pmatrix} H_{11} & H_{12} & \cdots & H_{1r} \\ H_{21} & H_{22} & \cdots & H_{2r} \\ \vdots & \vdots & \ddots & \vdots \\ H_{n1} & H_{n2} & \cdots & H_{nr} \end{pmatrix} \begin{pmatrix} W_{11} & W_{12} & \cdots & W_{1k} \\ W_{21} & W_{22} & \cdots & W_{2k} \\ \vdots & \vdots & \ddots & \vdots \\ W_{r1} & W_{r2} & \cdots & W_{rk} \end{pmatrix} \leftarrow \begin{pmatrix} \mathbf H_1 \\ \mathbf H_2 \\ \vdots \\ \mathbf H_n \end{pmatrix} \begin{pmatrix} \mathbf W^1 & \mathbf W^2 & \cdots & \mathbf W^k \end{pmatrix} $$
For a point $i$, we have
$$ \mathbf P_i = \mathbf H_i \begin{pmatrix}\mathbf W^1 & \mathbf W^2 & \cdots & \mathbf W^k \end{pmatrix} $$
Compare this with the decomposition of a point in the cartesian coordinate system, this is a decomposition of each point on a basis spanned by $\mathbf W^i$.
Nonnegative Matrix Factorization
NMF is a dimension reduction method that decomposes $X_{n}^{\phantom{n}k}$ using
$$ \begin{equation} X_{n}^{\phantom{n}k} \sim H_n^{\phantom{n}r} W_r^{\phantom{r}k}. \label{eqnmf} \end{equation} $$
while requiring the elements of the decomposition to be nonnegative. But there are many possible decompositions! Then we require this approximation to be as accurate as possible.
How do we measure the approximations?
We use the Frobenius distance Frobenius distance Frobenius distance between the matrix $X_{n}^{\phantom{n}k}$ and $H_n^{\phantom{n}r} W_r^{\phantom{r}k}$, $$ \lVert X_{n}^{\phantom{n}k}  H_n^{\phantom{n}r} W_r^{\phantom{r}k} \rVert^2 \equiv \sum_{n,k} (X_{n}^{\phantom{n}k}  H_n^{\phantom{n}r} W_r^{\phantom{r}k})^2. $$ between the matrix $X_{n}^{\phantom{n}k}$ and $H_n^{\phantom{n}r} W_r^{\phantom{r}k}$,
$$ \lVert X_{n}^{\phantom{n}k}  H_n^{\phantom{n}r} W_r^{\phantom{r}k} \rVert^2 \equiv \sum_{n,k} (X_{n}^{\phantom{n}k}  H_n^{\phantom{n}r} W_r^{\phantom{r}k})^2. $$
So NMF will require this Frobenius distance to be minimal.
Why Does It Even Work?
Well, it doesn’t always work. We might have many different NMFs for one single matrix.
How do we find a proper decomposition using NMF? We put restrictions on it, such as penalties. Apart from this, we also need to choose the rank of the decomposition $r$ in Eq. ($\ref{eqnmf}$). Even if we fixed all these problems, NMF is computation expensive.
L Ma (2019). 'NMF: Nonnegative Matrix Factorizatioin', Datumorphism, 06 April. Available at: https://datumorphism.leima.is/wiki/machinelearning/factorization/nmf/.
Table of Contents
Current Ref:

wiki/machinelearning/factorization/nmf.md
Links to:
SVD: Singular Value Decomposition
Given a matrix $\mathbf X \to X_{m}^{\phantom{m}n}$, we can decompose it into three matrices $$ …
Eigenvalues and Eigenvectors
To find the eigenvectors $\mathbf x$ of a matrix $\mathbf A$, we construct the eigen equation $$ …
Unsupervised Learning: PCA
Principal component analysis is a method to remove redundancies of the features by looking into the …
Unsupervised Learning: SVM
unsupervised learning: support vector machine
Frobenius distance
Frobenius distance between the matrix $X_{n}^{\phantom{n}k}$ and $H_n^{\phantom{n}r} …