Cross entropy is1

$$H(p, q) = \mathbb E_{p} \left[ -\log q \right].$$

Cross entropy $H(p, q)$ can also be decomposed,

$$H(p, q) = H(p) + \operatorname{D}_{\mathrm{KL}} \left( p \parallel q \right),$$

where $H(p)$ is the and $\operatorname{D}_{\mathrm{KL}}$ is the .

Cross entropy is widely used in classification problems, e.g., 2.

## Binary Cross Entropy

For dataset with 2 classes ($0$ and $1$) in the target, we denote the true label probability is $p$, and the predicted probability is $q$. For example, $q_{y=1}$ denotes the probability of predicted label being $1$.

\begin{align*} H(p, q) =& - p_{y=0} \log (q_{\hat y=0}) - p_{y=1} \log (q_{\hat y=1}) \\ =& - p_{y=0} \log (q_{\hat y=0}) - (1 - p_{y=0}) \log ( 1 - q_{\hat y=0} ) \end{align*}

For $y\in \{0,1\}$, we have

$$H(p, q) = \begin{cases} - \log (q_{\hat y=0}) , & \text{for } y=0 \\ - \log ( 1 - q_{\hat y=0} ) , & \text{for } y=1. \end{cases}$$

Combining the two expressions, we can simply use the following formula,

$$H(p, q) = - y \log (q_{\hat y=0}) - y \log ( 1 - q_{\hat y=0} ).$$

The two probabilities of $q_{\hat y=0}$ and $q_{\hat y=1}$ can be predicted by a model.

