The logsumexp Trick
#Numerical #Neural Network #Basics
The cross entropy for a binary class is
$$ p \ln \hat p + (1p) \ln (1\hat p), $$
where $p$ is the probability of the label A and $\hat p$ is the predicted probability of label A. Since we have binary classes, $p$ is either 1 or 0. However, the predicted probabilities can be any value between $[0,1]$.
The problem is, exponentials may blow up if $p\to 0$. To deal with it, we can factor a exponential out,
$$ p = e^{a} ( e^{a}p ). $$
We choose $a$ carefully so that this eliminates the factors that leads to $p\to 0$.
For example, if we have a Gaussian like probability
$$ \ln p \sim \ln \left(\sum_i \exp \left( x_i^2 \right)\right), $$
we know that $x$ can be as large as $1e^6$. One such value in the training data will destroy our loss function as we will have to calculate the exponential then $\ln$. Though we can calculate this manually and it is fine, our computer will treat $e^{1e6}$ as 0 and $\ln 0$ leads to negative infinity. We do not get the correct answer. To deal with it, we will factor out the exponentials, and rewrite the expression as
$$ \begin{align} p \sim & a + \ln\left(e^{a} \sum_i \exp \left( x_i^2 \right)\right) \\ \sim & a + \ln \left(\sum_i \exp \left( x_i^2  a \right)\right) \end{align} $$
where $a$ can be $1e6$. In this case, we do not hit the overflow problem in the computer.
LM (2021). 'The logsumexp Trick', Datumorphism, 07 April. Available at: https://datumorphism.leima.is/cards/machinelearning/neuralnetworks/logsumexptrick/.
References:
 Eisele R. The logsumexp trick in Machine Learning • Computer Science and Machine Learning. In: Robert Eisele [Internet]. 22 Jun 2016 [cited 28 Jul 2021]. Available: https://www.xarg.org/2016/06/thelogsumexptrickinmachinelearning/
 Wang X. Numerical stability of binary cross entropy loss and the logsumexp trick – Integrative Biology and Predictive Analytics. In: Integrative Biology and Predictive Analytics [Internet]. 26 Sep 2018 [cited 28 Jul 2021]. Available: http://tagkopouloslab.ucdavis.edu/?p=2197
Current Ref:

cards/machinelearning/neuralnetworks/logsumexptrick.md
Links to: