Evidence Lower Bound: ELBO
Given a probability distribution density $p(X)$ and a latent variable $Z$, we have the marginalization of the joint probability
$$ \int dZ p(X, Z) = p(X). $$
Using Jensen’s Inequality
In many models, we are interested in the log probability density $\log p(X)$ which can be decomposed using an auxiliary density of the latent variable $q(Z)$,
$$ \begin{align} \log p(X) =& \log \int dZ p(X, Z) \\ =& \log \int dZ p(X, Z) \frac{q(Z)}{q(Z)} \\ =& \log \int dZ q(Z) \frac{p(X, Z)}{q(Z)} \\ =& \log \mathbb E_q \left[ \frac{p(X, Z)}{q(Z)} \right]. \end{align} $$
[[Jensen's inequality]] Jensen's Inequality Jensen’s inequality shows that $$ f(\mathbb E(X)) \leq \mathbb E(f(X)) $$ for a concave function $f(\cdot)$. shows that
$$ \log \mathbb E_q \left[ \frac{p(X, Z)}{q(Z)} \right] \geq \mathbb E_q \left[ \log\left(\frac{p(X, Z)}{q(Z)}\right) \right], $$
as $\log$ is a concave function.
Applying this inequality, we get
$$ \begin{align} \log p(X) =& \log \mathbb E_q \left[ \frac{p(X, Z)}{q(Z)} \right] \\ \geq& \mathbb E_q \left[ \log\left(\frac{p(X, Z)}{q(Z)}\right) \right] \\ =& \mathbb E_q \left[ \log p(X, Z)- \log q(Z) \right] \\ =& \mathbb E_q \left[ \log p(X, Z) \right] - \mathbb E_q \left[ \log q(Z) \right] . \end{align} $$
Uing the definition of entropy and cross entropy, we know that
$$ H(q(Z)) = - \mathbb E_q \left[ \log q(Z) \right] $$
is the entropy of $q(Z)$ and
$$ H(q(Z);p(X,Z)) = -\mathbb E_q \left[ \log p(X, Z) \right] $$
is the cross entropy. For convinence, we denote
$$ L = \mathbb E_q \left[ \log p(X, Z) \right] - \mathbb E_q \left[ \log q(Z) \right] = - H(q(Z);p(X,Z)) + H(q(Z)), $$
which is called the evidence lower bound (ELBO) as
$$ \log p(X) \geq L. $$
KL Divergence
In a latent variable model, we might need to calculate the posterior $p(Z|X)$. When this is intractable, we find an approximation $q(Z|\theta)$ where $\theta$ is the parametrization such as neural network parameters. To make sure we have a good approximation of the posterior, we find the KL divergence of $q(Z|\theta)$ and $p(Z|X)$.
The [[KL divergence]] KL Divergence Kullback–Leibler divergence indicates the differences between two distributions is
$$ \begin{align} D_\text{KL}(q(Z|\theta)\parallel p(Z|X)) =& -\mathbb E_q \log\frac{p(Z|X)}{q(Z|\theta)} \\ =& -\mathbb E_q \log\frac{p(X, Z)/p(X)}{q(Z|\theta)} \\ =& -\mathbb E_q \log\frac{p(X, Z)}{q(Z|\theta)} - \mathbb E_q \log\frac{1}{p(X)} \\ =& - L + \log p(X). \end{align} $$
Since $D(q(Z|\theta)\parallel p(Z|X))\geq 0$, we have
$$ \log p(X) \geq L, $$
which also indicates that $L$ is the lower bound of $\log p(X)$.
In fact,
$$ L - \log p(X) = - D_\text{KL}(q(Z|\theta)\parallel p(Z|X)) $$
is the Jensen gap.
wiki/machine-learning/bayesian/elbo
:wiki/machine-learning/bayesian/elbo
Links to:LM (2021). 'Evidence Lower Bound: ELBO', Datumorphism, 04 April. Available at: https://datumorphism.leima.is/wiki/machine-learning/bayesian/elbo/.