NMF: Nonnegative Matrix Factorizatioin
Decomposition
For simplicity, we start with a data point $\mathbf P$ in a $k$-dimensional space spanned by $k$ basis vectors $\mathbf V^k$. Naturally, we could write down the component decomposition of the point using the basis vectors $\mathbf V^k$,
$$ \mathbf P = P_k \mathbf V^k. $$
This is immediately obvious to us since we have been dealing with rank 2 $(k, 1)$ basis vectors and we are talking about the $k$ coordinates for a point.
This point is represented by a matrix of rank 2 $(k, 1)$ given this basis.
$$ \mathbf P \to \begin{pmatrix} P_1, P_2, \cdots, P_k \end{pmatrix} $$
If we assume a cartesian coordinate system, the basis are normal vectors.
$$ \mathbf V^1 \to \begin{pmatrix} 1\\ 0\\ \vdots\\ 0 \end{pmatrix}, \mathbf V^2 \to \begin{pmatrix} 0\\ 1\\ \vdots\\ 0 \end{pmatrix}, \cdots $$
These representations of the basis vectors can be joined together
$$ \mathbf V \to \begin{pmatrix} 1 & 0 & \cdots & 0 \\ 0 & 1 & \cdots & 0 \\ \vdots & \vdots & \vdots & \vdots\\ 0 & 0 & 0 & 1 \end{pmatrix} $$
Using these representations of the abstract vectors, we could represent the point $\mathbf P$ using
$$ \mathbf P \to \begin{pmatrix} P_1, P_2, \cdots, P_k \end{pmatrix} \begin{pmatrix} 1 & 0 & \cdots & 0 \\ 0 & 1 & \cdots & 0 \\ \vdots & \vdots & \vdots & \vdots\\ 0 & 0 & 0 & 1 \end{pmatrix}, $$
which is insanely trivial since the representation of basis in this case is an identity matrix.
NMF is talking about the same idea: coordinate transformations,
$$ X_{n}^{\phantom{n}k} = H_n^{\phantom{n}r} W_r^{\phantom{r}k}. $$
We will view $X_{n}^{\phantom{n}k}$ as a vertical stack of $n$ points:
$$ \begin{pmatrix} \mathbf P_1 \\ \mathbf P_2 \\ \vdots \\ \mathbf P_n \end{pmatrix} \to \begin{pmatrix} P_{11} & P_{12} & \cdots & P_{1k} \\ P_{21} & P_{22} & \cdots & P_{2k} \\ \vdots & \vdots & \ddots & \vdots \\ P_{n1} & P_{n2} & \cdots & P_{nk} \end{pmatrix} = \begin{pmatrix} H_{11} & H_{12} & \cdots & H_{1r} \\ H_{21} & H_{22} & \cdots & H_{2r} \\ \vdots & \vdots & \ddots & \vdots \\ H_{n1} & H_{n2} & \cdots & H_{nr} \end{pmatrix} \begin{pmatrix} W_{11} & W_{12} & \cdots & W_{1k} \\ W_{21} & W_{22} & \cdots & W_{2k} \\ \vdots & \vdots & \ddots & \vdots \\ W_{r1} & W_{r2} & \cdots & W_{rk} \end{pmatrix} \leftarrow \begin{pmatrix} \mathbf H_1 \\ \mathbf H_2 \\ \vdots \\ \mathbf H_n \end{pmatrix} \begin{pmatrix} \mathbf W^1 & \mathbf W^2 & \cdots & \mathbf W^k \end{pmatrix} $$
For a point $i$, we have
$$ \mathbf P_i = \mathbf H_i \begin{pmatrix}\mathbf W^1 & \mathbf W^2 & \cdots & \mathbf W^k \end{pmatrix} $$
Compare this with the decomposition of a point in the cartesian coordinate system, this is a decomposition of each point on a basis spanned by $\mathbf W^i$.
Nonnegative Matrix Factorization
NMF is a dimension reduction method that decomposes $X_{n}^{\phantom{n}k}$ using
$$ \begin{equation} X_{n}^{\phantom{n}k} \sim H_n^{\phantom{n}r} W_r^{\phantom{r}k}. \label{eq-nmf} \end{equation} $$
while requiring the elements of the decomposition to be nonnegative. But there are many possible decompositions! Then we require this approximation to be as accurate as possible.
How do we measure the approximations?
We use the [[Frobenius distance]] Frobenius distance Frobenius distance between the matrix $X_{n}^{\phantom{n}k}$ and $H_n^{\phantom{n}r} W_r^{\phantom{r}k}$, $$ \lVert X_{n}^{\phantom{n}k} - H_n^{\phantom{n}r} W_r^{\phantom{r}k} \rVert^2 \equiv \sum_{n,k} (X_{n}^{\phantom{n}k} - H_n^{\phantom{n}r} W_r^{\phantom{r}k})^2. $$ between the matrix $X_{n}^{\phantom{n}k}$ and $H_n^{\phantom{n}r} W_r^{\phantom{r}k}$,
$$ \lVert X_{n}^{\phantom{n}k} - H_n^{\phantom{n}r} W_r^{\phantom{r}k} \rVert^2 \equiv \sum_{n,k} (X_{n}^{\phantom{n}k} - H_n^{\phantom{n}r} W_r^{\phantom{r}k})^2. $$
So NMF will require this Frobenius distance to be minimal.
Why Does It Even Work?
Well, it doesn’t always work. We might have many different NMFs for one single matrix.
How do we find a proper decomposition using NMF? We put restrictions on it, such as penalties. Apart from this, we also need to choose the rank of the decomposition $r$ in Eq. ($\ref{eq-nmf}$). Even if we fixed all these problems, NMF is computation expensive.
wiki/machine-learning/factorization/nmf
Links to:L Ma (2019). 'NMF: Nonnegative Matrix Factorizatioin', Datumorphism, 06 April. Available at: https://datumorphism.leima.is/wiki/machine-learning/factorization/nmf/.