The code function produces code words. The expected length of the code word is limited by the entropy from the source probability $p$.
The Shannon information content, aka self-information, is described by
$$ - \log_2 p(x=a), $$
for the case that $x=a$.
The Shannon entropy is the expected information content for the whole sequence with probability distribution $p(x)$,
$$ \mathcal H = - \sum_x p(x\in X) \log_2 p(x). $$
The Shannon source coding theorem says that for $N$ samples from the source, we can roughly compress it into $N\mathcal H$.
Lei Ma (2021). 'Coding Theory Concepts', Datumorphism, 02 April. Available at: https://datumorphism.leima.is/cards/information/coding-theory-concepts/.
Maximum Entropy models makes least assumption about the data
Restricted Boltzmann Machine
Introducing latent variables to Boltzmann machine and restrict the connections within groups.
MDL and Neural Networks
Minimum Description Length ( MDL Minimum Description Length MDL is a measure of how well a model …
The Fraser information is $$ I_F(\theta) = \int g(X) \ln f(X;\theta) , \mathrm d X. $$ When …
The information is a measurement of the entropy of the dataset.