Cross Entropy

#Information Theory #Entropy

Cross entropy is1

$$ H(p, q) = \mathbb E_{p} \left[ -\log q \right]. $$

Cross entropy $H(p, q)$ can also be decomposed,

$$ H(p, q) = H(p) + \operatorname{D}_{\mathrm{KL}} \left( p \parallel q \right), $$

where $H(p)$ is the entropy of $P$ Shannon Entropy Shannon entropy $S$ is the expectation of information content $I(X)=-\log \left(p\right)$1, \begin{equation} H(p) = \mathbb E_{p}\left[ -\log \left(p\right) \right]. \end{equation} shannon_entropy_wiki Contributors to Wikimedia projects. Entropy (information theory). In: Wikipedia [Internet]. 29 Aug 2021 [cited 4 Sep 2021]. Available: ↩︎ and $\operatorname{D}_{\mathrm{KL}}$ is the KL Divergence KL Divergence Kullback–Leibler divergence indicates the differences between two distributions .

Cross entropy is widely used in classification problems, e.g., logistic regression Logistic Regression logistics regression is a simple model for classification 2.

  1. cross_entropy_wiki Contributors to Wikimedia projects. Cross entropy. In: Wikipedia [Internet]. 4 Jul 2021 [cited 4 Sep 2021]. Available: ↩︎

  2. Mehta2019 Mehta P, Wang C-H, Day AGR, Richardson C, Bukov M, Fisher CK, et al. A high-bias, low-variance introduction to Machine Learning for physicists. Phys Rep. 2019;810: 1–124. doi:10.1016/j.physrep.2019.03.001 ↩︎

Published: by ;

Lei Ma (2021). 'Cross Entropy', Datumorphism, 09 April. Available at:

Current Ref:

  • cards/information/