This article reuses a lot of materials from the references. Please see the references for more details on ELBO.

Given a probability distribution density $p(X)$ and a latent variable $Z$, we have the marginalization of the join probability being

$$\int dZ p(X, Z) = p(X).$$

## Using Jensen’s Inequality

In many models, we are interested in the log probability density $\log p(X)$ which can be decomposed using an auxillary density of the latent variable $q(Z)$,

\begin{align} \log p(X) =& \log \int dZ p(X, Z) \\ =& \log \int dZ p(X, Z) \frac{q(Z)}{q(Z)} \\ =& \log \int dZ q(Z) \frac{p(X, Z)}{q(Z)} \\ =& \log \mathbb E_q \left[ \frac{p(X, Z)}{q(Z)} \right]. \end{align}

shows that

$$\log \mathbb E_q \left[ \frac{p(X, Z)}{q(Z)} \right] \geq \mathbb E_q \left[ \log\left(\frac{p(X, Z)}{q(Z)}\right) \right],$$

as $\log$ is a concave function.

Applying this inequality, we get

\begin{align} \log p(X) =& \log \mathbb E_q \left[ \frac{p(X, Z)}{q(Z)} \right] \\ \geq& \mathbb E_q \left[ \log\left(\frac{p(X, Z)}{q(Z)}\right) \right] \\ =& \mathbb E_q \left[ \log p(X, Z)- \log q(Z) \right] \\ =& \mathbb E_q \left[ \log p(X, Z) \right] - \mathbb E_q \left[ \log q(Z) \right] . \end{align}

Uing the definition of entropy and cross entropy, we know that

$$H(q(Z)) = - \mathbb E_q \left[ \log q(Z) \right]$$

is the entropy of $q(Z)$ and

$$H(q(Z);p(X,Z)) = -\mathbb E_q \left[ \log p(X, Z) \right]$$

is the cross entropy. For convinence, we denote

$$L = \mathbb E_q \left[ \log p(X, Z) \right] - \mathbb E_q \left[ \log q(Z) \right] = - H(q(Z);p(X,Z)) + H(q(Z)),$$

which is called the evidence lower bound (ELBO) as

$$\log p(X) \geq L.$$

## KL Divergence

In a latent variable model, we might need to calculate the posterior $p(Z|X)$. When this is intractable, we find an approximation $q(Z|\theta)$ where $\theta$ is the parametrization such as neural network parameters. To make sure we have a good approximation of the posterior, we find the KL divergence of $q(Z|\theta)$ and $p(Z|X)$.

The is

\begin{align} D_\text{KL}(q(Z|\theta)\parallel p(Z|X)) =& -\mathbb E_q \log\frac{p(X|Z)}{q(Z|\theta)} \\ =& -\mathbb E_q \log\frac{p(X, Z)/p(X)}{q(Z|\theta)} \\ =& -\mathbb E_q \log\frac{p(X, Z)}{q(Z|\theta)} - \mathbb E_q \log\frac{1}{p(X)} \\ =& - L + \log p(X). \end{align}

Since $D(q(Z|\theta)\parallel p(Z|X))\geq 0$, we have

$$\log p(X) \geq L,$$

which also indicates that $L$ is the lower bound of $\log p(X)$.

In fact,

$$L - \log p(X) = - D_\text{KL}(q(Z|\theta)\parallel p(Z|X))$$

is the Jensen gap.

Published: by ;

LM (2021). 'Evidence Lower Bound: ELBO', Datumorphism, 04 April. Available at: https://datumorphism.leima.is/wiki/machine-learning/bayesian/elbo/.