# Covariance Matrix

Covariance of two discrete series $A$ and $B$ is defined as

$$ \text{Cov} ({A,B}) = \sigma_{A,B}^2 = \frac{ (a_i - \bar A) (b_i - \bar B) }{ n- 1 }, $$

where $n$ is the length of the series. The normalization factor is set to $1/(n-1)$ to mitigate the bias for small $n$.

One could show that

$$ \mathrm{Cov}({A,B}) = E( A,B ) - \bar A \bar B. $$

For a complete picture of the data, we build a matrix for all the possible combinations of the covariances,

$$ \mathbf{C} = \begin{pmatrix} \mathrm{Cov} (A_1, A_1) & \mathrm{Cov} (A_1, A_2) \\ \mathrm{Cov} (A_2, A_1) & \mathrm{Cov} (A_2, A_2) \end{pmatrix}. $$

For real series, $\mathrm{Cov} (A_2, A_1) = \mathrm{Cov} (A_1, A_2)$.

Given a dataset $X$,

$$ X = \begin{pmatrix} \mathbf X_{1} & \mathbf X_{2} & \cdots & \mathbf X_{N} \end{pmatrix} $$

where $N$ is the number of features (variables). The covariance matrix is

$$ C_{ij} = \operatorname{Cov}(\mathbf X_i, \mathbf X_j). $$

The covariance becomes variance when $i=j$.

`cards/statistics/covariance-matrix`

:L Ma (2020). 'Covariance Matrix', Datumorphism, 03 April. Available at: https://datumorphism.leima.is/cards/statistics/covariance-matrix/.