Information

Introduction: My Knowledge Cards

Coding Theory Concepts

Published: 2021-02-17

Category: { Information }

Tags:

References: - Shannon’s Source Coding Theorem (Foundations of information theory: Part 3) - Lecture 8: Source Coding Theorem, Huffman coding by Aarti Singh - The Source Coding Theorem by Mario S. Alvim

Summary: The code function produces code words. The expected length of the code word is limited by the entropy from the source probability $p$. The Shannon information content, aka self-information, is described by $$ - \log_2 p(x=a), $$ for the case that $x=a$. The Shannon entropy is the expected information content for the whole sequence with probability distribution $p(x)$, $$ \mathcal H = - \sum_x p(x\in X) \log_2 p(x). $$ The Shannon source coding theorem says that for $N$ samples from the source, we can roughly compress it into $N\mathcal H$.

Pages: 8

Fisher Information

Published: 2021-05-05

Category: { Information }

Tags:

#Information Theory

References: - Ly A, Marsman M, Verhagen J, Grasman R, Wagenmakers E-J. A Tutorial on Fisher Information. arXiv [math.ST]. 2017. Available: http://arxiv.org/abs/1705.01064 - Fraser DAS. On Information in Statistics. aoms. 1965;36: 890–896. doi:10.1214/aoms/1177700061

Summary: Fisher information measures the second moment of the model sensitivity with respect to the parameters.

Pages: 8

Fraser Information

Published: 2021-05-05

Category: { Information }

Tags:

#Information Theory

References: - Fraser DAS. On Information in Statistics. aoms. 1965;36: 890–896. doi:10.1214/aoms/1177700061

Summary: The Fraser information is $$ I_F(\theta) = \int g(X) \ln f(X;\theta) , \mathrm d X. $$ When comparing two models, $\theta_0$ and $\theta_1$, the information gain is $$ \propto (F(\theta_1) - F(\theta_0)). $$ The Fraser information is closed related to [[Fisher information]] Fisher Information Fisher information measures the second moment of the model sensitivity with respect to the parameters. , Shannon information, and [[Kullback information]] KL Divergence Kullback–Leibler divergence indicates the differences between two distributions 1. Fraser DAS. On Information in Statistics. aoms. 1965;36: 890–896. doi:10.1214/aoms/1177700061 ↩︎

Pages: 8

Mutual Information

Published: 2021-08-13

Category: { Information }

Tags:

#Information Theory #Mutual Information

References: - Liu X, Zhang F, Hou Z, Wang Z, Mian L, Zhang J, et al. Self-supervised Learning: Generative or Contrastive. arXiv [cs.LG]. 2020. Available: http://arxiv.org/abs/2006.08218 - Latham PE, Roudi Y. Mutual information. Scholarpedia. 2009;4. doi:10.4249/scholarpedia.1658

Summary: Mutual information is defined as $$ I(X;Y) = \mathbb E_{p_{XY}} \ln \frac{P_{XY}}{P_X P_Y}. $$ In the case that $X$ and $Y$ are independent variables, we have $P_{XY} = P_X P_Y$, thus $I(X;Y) = 0$. This makes sense as there would be no “mutual” information if the two variables are independent of each other. Entropy and Cross Entropy Mutual information is closely related to entropy. A simple decomposition shows that $$ I(X;Y) = H(X) - H(X\mid Y), $$ which is the reduction of uncertainty in $X$ after observing $Y$. KL Divergence This definition of mutual information is equivalent to the following [[KL Divergence]] KL Divergence Kullback–Leibler divergence indicates the differences between two distributions ,

Pages: 8

Shannon Entropy

Published: 2021-09-04

Category: { Information }

Tags:

#Information Theory #Entropy

References: - Contributors to Wikimedia projects. Entropy (information theory). In: Wikipedia [Internet]. 29 Aug 2021 [cited 4 Sep 2021]. Available: https://en.wikipedia.org/wiki/Entropy_(information_theory)

Summary: Shannon entropy $S$ is the expectation of information content $I(X)=-\log \left(p\right)$1, \begin{equation} H(p) = \mathbb E_{p}\left[ -\log \left(p\right) \right]. \end{equation} shannon_entropy_wiki Contributors to Wikimedia projects. Entropy (information theory). In: Wikipedia [Internet]. 29 Aug 2021 [cited 4 Sep 2021]. Available: https://en.wikipedia.org/wiki/Entropy_(information_theory) ↩︎

Pages: 8

Jensen-Shannon Divergence

Published: 2021-09-04

Category: { Information }

Tags:

#Information Theory #Divergence

References: - Contributors to Wikimedia projects. Jensen–Shannon divergence. In: Wikipedia [Internet]. 15 Jun 2021 [cited 6 Sep 2021]. Available: https://en.wikipedia.org/wiki/Jensen%E2%80%93Shannon_divergence

Summary: The Jensen-Shannon divergence is a symmetric divergence of distributions $P$ and $Q$, $$ \operatorname{D}_{\text{JS}} = \frac{1}{2} \left[ \operatorname{D}_{\text{KL}} \left(P \bigg\Vert \frac{P+Q}{2} \right) + \operatorname{D}_{\text{KL}} \left(Q \bigg\Vert \frac{P+Q}{2}\right) \right], $$ where $\operatorname{D}_{\text{KL}}$ is the [[KL Divergence]] KL Divergence Kullback–Leibler divergence indicates the differences between two distributions .

Pages: 8

f-Divergence

Published: 2021-09-04

Category: { Information }

Tags:

#Information Theory #Divergence

References: - Contributors to Wikimedia projects. F-divergence. In: Wikipedia [Internet]. 17 Jul 2021 [cited 4 Sep 2021]. Available: https://en.wikipedia.org/wiki/F-divergence - Nowozin S, Cseke B, Tomioka R. f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization. arXiv [stat.ML]. 2016. Available: http://arxiv.org/abs/1606.00709

Summary: The f-divergence is defined as1 $$ \operatorname{D}_f = \int f\left(\frac{p}{q}\right) q\mathrm d\mu, $$ where $p$ and $q$ are two densities and $\mu$ is a reference distribution. Requirements on the generating function The generating function $f$ is required to be convex, and $f(1) =0$. For $f(x) = x \log x$ with $x=p/q$, f-divergence is reduced to the KL divergence $$ \begin{align} &\int f\left(\frac{p}{q}\right) q\mathrm d\mu \\ =& \int \frac{p}{q} \log \left( \frac{p}{q} \right) \mathrm d\mu \\ =& \int p \log \left( \frac{p}{q} \right) \mathrm d\mu. \end{align} $$ For more special cases of f-divergence, please refer to wikipedia1. Nowozin et al also provided a concise review of f-divergence2.

Pages: 8

Cross Entropy

Published: 2021-09-04

Category: { Information }

Tags:

#Information Theory #Entropy

References: - Contributors to Wikimedia projects. Cross entropy. In: Wikipedia [Internet]. 4 Jul 2021 [cited 4 Sep 2021]. Available: https://en.wikipedia.org/wiki/Cross_entropy - Mehta P, Wang C-H, Day AGR, Richardson C, Bukov M, Fisher CK, et al. A high-bias, low-variance introduction to Machine Learning for physicists. Phys Rep. 2019;810: 1–124. doi:10.1016/j.physrep.2019.03.001

Summary: Cross entropy is1 $$ H(p, q) = \mathbb E_{p} \left[ -\log q \right]. $$ Cross entropy $H(p, q)$ can also be decomposed, $$ H(p, q) = H(p) + \operatorname{D}_{\mathrm{KL}} \left( p \parallel q \right), $$ where $H(p)$ is the [[entropy of $P$]] Shannon Entropy Shannon entropy $S$ is the expectation of information content $I(X)=-\log \left(p\right)$1, \begin{equation} H(p) = \mathbb E_{p}\left[ -\log \left(p\right) \right]. \end{equation} shannon_entropy_wiki Contributors to Wikimedia projects. Entropy (information theory). In: Wikipedia [Internet]. 29 Aug 2021 [cited 4 Sep 2021]. Available: https://en.wikipedia.org/wiki/Entropy_(information_theory) ↩︎ and $\operatorname{D}_{\mathrm{KL}}$ is the [[KL Divergence]] KL Divergence Kullback–Leibler divergence indicates the differences between two distributions .

Pages: 8