The log-sum-exp Trick
The cross entropy for a binary class is
$$ p \ln \hat p + (1-p) \ln (1-\hat p), $$
where $p$ is the probability of the label A and $\hat p$ is the predicted probability of label A. Since we have binary classes, $p$ is either 1 or 0. However, the predicted probabilities can be any value between $[0,1]$.
The problem is, exponentials may blow up if $p\to 0$. To deal with it, we can factor a exponential out,
$$ p = e^{a} ( e^{-a}p ). $$
We choose $a$ carefully so that this eliminates the factors that leads to $p\to 0$.
For example, if we have a Gaussian like probability
$$ \ln p \sim \ln \left(\sum_i \exp \left( -x_i^2 \right)\right), $$
we know that $x$ can be as large as $1e^6$. One such value in the training data will destroy our loss function as we will have to calculate the exponential then $\ln$. Though we can calculate this manually and it is fine, our computer will treat $e^{-1e6}$ as 0 and $\ln 0$ leads to negative infinity. We do not get the correct answer. To deal with it, we will factor out the exponentials, and rewrite the expression as
$$ \begin{align} p \sim & a + \ln\left(e^{-a} \sum_i \exp \left( -x_i^2 \right)\right) \\ \sim & a + \ln \left(\sum_i \exp \left( -x_i^2 - a \right)\right) \end{align} $$
where $a$ can be $-1e6$. In this case, we do not hit the overflow problem in the computer.
- Eisele R. The log-sum-exp trick in Machine Learning • Computer Science and Machine Learning. In: Robert Eisele [Internet]. 22 Jun 2016 [cited 28 Jul 2021]. Available: https://www.xarg.org/2016/06/the-log-sum-exp-trick-in-machine-learning/
- Wang X. Numerical stability of binary cross entropy loss and the log-sum-exp trick – Integrative Biology and Predictive Analytics. In: Integrative Biology and Predictive Analytics [Internet]. 26 Sep 2018 [cited 28 Jul 2021]. Available: http://tagkopouloslab.ucdavis.edu/?p=2197
cards/machine-learning/neural-networks/log-sum-exp-trick
Links to:LM (2021). 'The log-sum-exp Trick', Datumorphism, 07 April. Available at: https://datumorphism.leima.is/cards/machine-learning/neural-networks/log-sum-exp-trick/.