Cross entropy is1

$$ H(p, q) = \mathbb E_{p} \left[ -\log q \right]. $$

Cross entropy $H(p, q)$ can also be decomposed,

$$ H(p, q) = H(p) + \operatorname{D}_{\mathrm{KL}} \left( p \parallel q \right), $$

where $H(p)$ is the entropy of $P$ Shannon Entropy Shannon entropy $S$ is the expectation of information content $I(X)=-\log \left(p\right)$1, \begin{equation} H(p) = \mathbb E_{p}\left[ -\log \left(p\right) \right]. \end{equation} shannon_entropy_wiki Contributors to Wikimedia projects. Entropy (information theory). In: Wikipedia [Internet]. 29 Aug 2021 [cited 4 Sep 2021]. Available: ↩︎ and $\operatorname{D}_{\mathrm{KL}}$ is the KL Divergence KL Divergence Kullback–Leibler divergence indicates the differences between two distributions .

Cross entropy is widely used in classification problems, e.g., logistic regression Logistic Regression logistics regression is a simple model for classification 2.

