Coding Theory Concepts

Published:
Category: { Information }
Summary: The code function produces code words. The expected length of the code word is limited by the entropy from the source probability $p$. The Shannon information content, aka self-information, is described by $$ - \log_2 p(x=a), $$ for the case that $x=a$. The Shannon entropy is the expected information content for the whole sequence with probability distribution $p(x)$, $$ \mathcal H = - \sum_x p(x\in X) \log_2 p(x). $$ The Shannon source coding theorem says that for $N$ samples from the source, we can roughly compress it into $N\mathcal H$.
Pages: 8

Fisher Information

Published:
Category: { Information }
Summary: Fisher information measures the second moment of the model sensitivity with respect to the parameters.
Pages: 8

Fraser Information

Published:
Category: { Information }
Summary: The Fraser information is $$ I_F(\theta) = \int g(X) \ln f(X;\theta) , \mathrm d X. $$ When comparing two models, $\theta_0$ and $\theta_1$, the information gain is $$ \propto (F(\theta_1) - F(\theta_0)). $$ The Fraser information is closed related to [[Fisher information]] Fisher Information Fisher information measures the second moment of the model sensitivity with respect to the parameters. , Shannon information, and [[Kullback information]] KL Divergence Kullback–Leibler divergence indicates the differences between two distributions 1. Fraser DAS. On Information in Statistics. aoms. 1965;36: 890–896. doi:10.1214/aoms/1177700061 ↩︎
Pages: 8

Mutual Information

Published:
Category: { Information }
Summary: Mutual information is defined as $$ I(X;Y) = \mathbb E_{p_{XY}} \ln \frac{P_{XY}}{P_X P_Y}. $$ In the case that $X$ and $Y$ are independent variables, we have $P_{XY} = P_X P_Y$, thus $I(X;Y) = 0$. This makes sense as there would be no “mutual” information if the two variables are independent of each other. Entropy and Cross Entropy Mutual information is closely related to entropy. A simple decomposition shows that $$ I(X;Y) = H(X) - H(X\mid Y), $$ which is the reduction of uncertainty in $X$ after observing $Y$. KL Divergence This definition of mutual information is equivalent to the following [[KL Divergence]] KL Divergence Kullback–Leibler divergence indicates the differences between two distributions ,
Pages: 8

Shannon Entropy

Published:
Category: { Information }
Summary: Shannon entropy $S$ is the expectation of information content $I(X)=-\log \left(p\right)$1, \begin{equation} H(p) = \mathbb E_{p}\left[ -\log \left(p\right) \right]. \end{equation} shannon_entropy_wiki Contributors to Wikimedia projects. Entropy (information theory). In: Wikipedia [Internet]. 29 Aug 2021 [cited 4 Sep 2021]. Available: https://en.wikipedia.org/wiki/Entropy_(information_theory)  ↩︎
Pages: 8

Jensen-Shannon Divergence

Published:
Category: { Information }
Summary: The Jensen-Shannon divergence is a symmetric divergence of distributions $P$ and $Q$, $$ \operatorname{D}_{\text{JS}} = \frac{1}{2} \left[ \operatorname{D}_{\text{KL}} \left(P \bigg\Vert \frac{P+Q}{2} \right) + \operatorname{D}_{\text{KL}} \left(Q \bigg\Vert \frac{P+Q}{2}\right) \right], $$ where $\operatorname{D}_{\text{KL}}$ is the [[KL Divergence]] KL Divergence Kullback–Leibler divergence indicates the differences between two distributions .
Pages: 8

f-Divergence

Published:
Category: { Information }
Summary: The f-divergence is defined as1 $$ \operatorname{D}_f = \int f\left(\frac{p}{q}\right) q\mathrm d\mu, $$ where $p$ and $q$ are two densities and $\mu$ is a reference distribution. Requirements on the generating function The generating function $f$ is required to be convex, and $f(1) =0$. For $f(x) = x \log x$ with $x=p/q$, f-divergence is reduced to the KL divergence $$ \begin{align} &\int f\left(\frac{p}{q}\right) q\mathrm d\mu \\ =& \int \frac{p}{q} \log \left( \frac{p}{q} \right) \mathrm d\mu \\ =& \int p \log \left( \frac{p}{q} \right) \mathrm d\mu. \end{align} $$ For more special cases of f-divergence, please refer to wikipedia1. Nowozin et al also provided a concise review of f-divergence2.
Pages: 8

Cross Entropy

Published:
Category: { Information }
Summary: Cross entropy is1 $$ H(p, q) = \mathbb E_{p} \left[ -\log q \right]. $$ Cross entropy $H(p, q)$ can also be decomposed, $$ H(p, q) = H(p) + \operatorname{D}_{\mathrm{KL}} \left( p \parallel q \right), $$ where $H(p)$ is the [[entropy of $P$]] Shannon Entropy Shannon entropy $S$ is the expectation of information content $I(X)=-\log \left(p\right)$1, \begin{equation} H(p) = \mathbb E_{p}\left[ -\log \left(p\right) \right]. \end{equation} shannon_entropy_wiki Contributors to Wikimedia projects. Entropy (information theory). In: Wikipedia [Internet]. 29 Aug 2021 [cited 4 Sep 2021]. Available: https://en.wikipedia.org/wiki/Entropy_(information_theory)  ↩︎ and $\operatorname{D}_{\mathrm{KL}}$ is the [[KL Divergence]] KL Divergence Kullback–Leibler divergence indicates the differences between two distributions .
Pages: 8