Information
Introduction: My Knowledge Cards
Coding Theory Concepts
Published:
Category: { Information }
Tags:
References:
- Shannon’s Source Coding Theorem (Foundations of information theory: Part 3)
- Lecture 8: Source Coding Theorem, Huffman coding by Aarti Singh
- The Source Coding Theorem by Mario S. Alvim
Summary: The code function produces code words. The expected length of the code word is limited by the entropy from the source probability $p$.
The Shannon information content, aka self-information, is described by
$$ - \log_2 p(x=a), $$
for the case that $x=a$.
The Shannon entropy is the expected information content for the whole sequence with probability distribution $p(x)$,
$$ \mathcal H = - \sum_x p(x\in X) \log_2 p(x). $$
The Shannon source coding theorem says that for $N$ samples from the source, we can roughly compress it into $N\mathcal H$.
Pages: 8
Fisher Information
Published:
Category: { Information }
Tags:
References:
- Ly A, Marsman M, Verhagen J, Grasman R, Wagenmakers E-J. A Tutorial on Fisher Information. arXiv [math.ST]. 2017. Available: http://arxiv.org/abs/1705.01064
- Fraser DAS. On Information in Statistics. aoms. 1965;36: 890–896. doi:10.1214/aoms/1177700061
Summary: Fisher information measures the second moment of the model sensitivity with respect to the parameters.
Pages: 8
Fraser Information
Published:
Category: { Information }
Tags:
References:
- Fraser DAS. On Information in Statistics. aoms. 1965;36: 890–896. doi:10.1214/aoms/1177700061
Summary: The Fraser information is
$$ I_F(\theta) = \int g(X) \ln f(X;\theta) , \mathrm d X. $$
When comparing two models, $\theta_0$ and $\theta_1$, the information gain is
$$ \propto (F(\theta_1) - F(\theta_0)). $$
The Fraser information is closed related to [[Fisher information]] Fisher Information Fisher information measures the second moment of the model sensitivity with respect to the parameters. , Shannon information, and [[Kullback information]] KL Divergence Kullback–Leibler divergence indicates the differences between two distributions 1.
Fraser DAS. On Information in Statistics. aoms. 1965;36: 890–896. doi:10.1214/aoms/1177700061 ↩︎
Pages: 8
Mutual Information
Published:
Category: { Information }
Tags:
References:
- Liu X, Zhang F, Hou Z, Wang Z, Mian L, Zhang J, et al. Self-supervised Learning: Generative or Contrastive. arXiv [cs.LG]. 2020. Available: http://arxiv.org/abs/2006.08218
- Latham PE, Roudi Y. Mutual information. Scholarpedia. 2009;4. doi:10.4249/scholarpedia.1658
Summary: Mutual information is defined as
$$ I(X;Y) = \mathbb E_{p_{XY}} \ln \frac{P_{XY}}{P_X P_Y}. $$
In the case that $X$ and $Y$ are independent variables, we have $P_{XY} = P_X P_Y$, thus $I(X;Y) = 0$. This makes sense as there would be no “mutual” information if the two variables are independent of each other.
Entropy and Cross Entropy Mutual information is closely related to entropy. A simple decomposition shows that
$$ I(X;Y) = H(X) - H(X\mid Y), $$
which is the reduction of uncertainty in $X$ after observing $Y$.
KL Divergence This definition of mutual information is equivalent to the following [[KL Divergence]] KL Divergence Kullback–Leibler divergence indicates the differences between two distributions ,
Pages: 8
Shannon Entropy
Published:
Category: { Information }
Tags:
Summary: Shannon entropy $S$ is the expectation of information content $I(X)=-\log \left(p\right)$1,
\begin{equation} H(p) = \mathbb E_{p}\left[ -\log \left(p\right) \right]. \end{equation}
shannon_entropy_wiki Contributors to Wikimedia projects. Entropy (information theory). In: Wikipedia [Internet]. 29 Aug 2021 [cited 4 Sep 2021]. Available: https://en.wikipedia.org/wiki/Entropy_(information_theory) ↩︎
Pages: 8
Jensen-Shannon Divergence
Published:
Category: { Information }
Tags:
Summary: The Jensen-Shannon divergence is a symmetric divergence of distributions $P$ and $Q$,
$$ \operatorname{D}_{\text{JS}} = \frac{1}{2} \left[ \operatorname{D}_{\text{KL}} \left(P \bigg\Vert \frac{P+Q}{2} \right) + \operatorname{D}_{\text{KL}} \left(Q \bigg\Vert \frac{P+Q}{2}\right) \right], $$
where $\operatorname{D}_{\text{KL}}$ is the [[KL Divergence]] KL Divergence Kullback–Leibler divergence indicates the differences between two distributions .
Pages: 8
f-Divergence
Published:
Category: { Information }
Tags:
References:
- Contributors to Wikimedia projects. F-divergence. In: Wikipedia [Internet]. 17 Jul 2021 [cited 4 Sep 2021]. Available: https://en.wikipedia.org/wiki/F-divergence
- Nowozin S, Cseke B, Tomioka R. f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization. arXiv [stat.ML]. 2016. Available: http://arxiv.org/abs/1606.00709
Summary: The f-divergence is defined as1
$$ \operatorname{D}_f = \int f\left(\frac{p}{q}\right) q\mathrm d\mu, $$
where $p$ and $q$ are two densities and $\mu$ is a reference distribution.
Requirements on the generating function
The generating function $f$ is required to
be convex, and $f(1) =0$. For $f(x) = x \log x$ with $x=p/q$, f-divergence is reduced to the KL divergence
$$ \begin{align} &\int f\left(\frac{p}{q}\right) q\mathrm d\mu \\ =& \int \frac{p}{q} \log \left( \frac{p}{q} \right) \mathrm d\mu \\ =& \int p \log \left( \frac{p}{q} \right) \mathrm d\mu. \end{align} $$
For more special cases of f-divergence, please refer to wikipedia1. Nowozin et al also provided a concise review of f-divergence2.
Pages: 8
Cross Entropy
Published:
Category: { Information }
Tags:
References:
- Contributors to Wikimedia projects. Cross entropy. In: Wikipedia [Internet]. 4 Jul 2021 [cited 4 Sep 2021]. Available: https://en.wikipedia.org/wiki/Cross_entropy
- Mehta P, Wang C-H, Day AGR, Richardson C, Bukov M, Fisher CK, et al. A high-bias, low-variance introduction to Machine Learning for physicists. Phys Rep. 2019;810: 1–124. doi:10.1016/j.physrep.2019.03.001
Summary: Cross entropy is1
$$ H(p, q) = \mathbb E_{p} \left[ -\log q \right]. $$
Cross entropy $H(p, q)$ can also be decomposed,
$$ H(p, q) = H(p) + \operatorname{D}_{\mathrm{KL}} \left( p \parallel q \right), $$
where $H(p)$ is the [[entropy of $P$]] Shannon Entropy Shannon entropy $S$ is the expectation of information content $I(X)=-\log \left(p\right)$1, \begin{equation} H(p) = \mathbb E_{p}\left[ -\log \left(p\right) \right]. \end{equation} shannon_entropy_wiki Contributors to Wikimedia projects. Entropy (information theory). In: Wikipedia [Internet]. 29 Aug 2021 [cited 4 Sep 2021]. Available: https://en.wikipedia.org/wiki/Entropy_(information_theory) ↩︎ and $\operatorname{D}_{\mathrm{KL}}$ is the [[KL Divergence]] KL Divergence Kullback–Leibler divergence indicates the differences between two distributions .
Pages: 8