# Kolmogorov Complexity

A naive definition of **description** of a sequence $\mathcal D=\{x_1, \cdots, x_i, \cdots, x_N\}$ is a program $p$, which can be mapped to $\mathcal D$ through a map $f$, i.e., $f(p)=\mathcal D$. For example, a universal Turing machine can serve as a map $f$ and is mostly independent of the tasks.

For a given data sequence of length $N$, denoted as $\mathcal D = \{x_1, \cdots, x_i, \cdots, x_N\}$, the length of the shortest description of $\mathcal D$ using a specific universal language $l$ is denoted as $L_l(\mathcal D)$ (length of $p$).

The **invariance theorem** indicates that the difference between two shortest descriptions using two different universal languages $l_1$ and $l_2$ is negligible when the length of the sequence $N$ becomes large, ^{1}

$$ \begin{align} \lim_{N\to\infty} \frac{L_{l_1}(\mathcal D) - L_{l_2}(\mathcal D) }{N} = 0. \end{align} $$

Kolmogorov complexity $C_f$ is ^{2}

$$ C_f(x) = \begin{cases} \operatorname{min}\{ \vert p \vert : f(p) = x \} & \text{if x is generated by f} \\ \infty & \text{otherwise} \end{cases} $$

L Ma (2020). 'Kolmogorov Complexity', Datumorphism, 11 April. Available at: https://datumorphism.leima.is/cards/statistics/kolmogorov-complexity/.