Diffusion Models for Forecasting
Objective
In a denoising diffusion model, given
- an input $\mathbf x^0$ drawn from a complicated and unknown distribution $q(\mathbf x^0)$,
we find
- a latent space with a simple and manageable distribution, e.g., normal distribution, and
- the transformations from $\mathbf x^0$ to $\mathbf x^n$, as well as
- the transformations from $\mathbf x^n$ to $\mathbf x^0$.
An Example
For example, with $N=5$, the forward process is
and the reverse process is
The joint distribution we are searching for is
$$ q(\mathbf x^1, \mathbf x^2, \mathbf x^3, \mathbf x^4, \mathbf x^5 \vert \mathbf x^0) = q(\mathbf x^5\vert \mathbf x^4) q(\mathbf x^4\vert \mathbf x^3) q(\mathbf x^3\vert \mathbf x^2)q(\mathbf x^2\vert \mathbf x^1)q(\mathbf x^1\vert \mathbf x^0), $$
A diffusion model assumes a simple diffusion process, e.g.,
$$ \begin{equation} q(\mathbf x^n \vert \mathbf x^{n-1}) \equiv \mathcal N (\mathbf x^n ; \sqrt{ 1 - \beta_n} \mathbf x ^{n -1}, \beta_n\mathbf I). \label{eq-guassian-noise} \end{equation} $$
This simulates an information diffusion process. The information in the original data is gradually smeared.
If the chosen diffusion process is revertible, the reverse process of it can be modeled by a similar Markov process
$$ p_\theta (\mathbf x^0, \mathbf x^1, \mathbf x^2, \mathbf x^3, \mathbf x^4, \mathbf x^5) = p_\theta (\mathbf x^0 \vert \mathbf x^1) p_\theta (\mathbf x^1 \vert \mathbf x^2) p_\theta (\mathbf x^2 \vert \mathbf x^3) p_\theta (\mathbf x^3 \vert \mathbf x^4) p_\theta (\mathbf x^4 \vert \mathbf x^5) p(\mathbf x^5). $$
This reverse process is the denoising process.
As long as our model estimates $p_\theta (\mathbf x^n \vert \mathbf x^{n-1})$ nicely, we can go $\mathbf x^0 \to \mathbf x^N$ and $\mathbf x^N \to \mathbf x^0$.
The Reverse Process: A Gaussian Example
With Eq \ref{eq-guassian-noise}, the reverse process is
$$ \begin{equation} p_\theta (\mathbf x^{n-1} \vert \mathbf x^n) = \mathcal N ( \mathbf x^{n-1} ; \mu_\theta(\mathbf x^n, n), \Sigma_\theta(\mathbf x^n, n)\mathbf I). \label{eqn-guassian-reverse-process} \end{equation} $$
Summary
- Forward: perturbs data to noise, step by step;
- Reverse: converts noise to data, step by step.
Optimization
The forward chain is predefined. To close the loop, we have to find $p_\theta$. A natural choice for our loss function is the negative log-likelihood,
$$ \mathbb E_{q(\mathbf x^0)} \left( - \log ( p_\theta (\mathbf x^0) ) \right). $$
(Ho et al., 2020) proved that the above loss has an upper bound related to the diffusion process defined in Eq \ref{eq-guassian-noise}1
$$ \begin{align} &\operatorname{min}_\theta \mathbb E_{q(\mathbf x^0)} \\ \leq & \operatorname{min}_\theta \mathbb E_{q(\mathbf x^{0:N})} \left[ -\log p(\mathbf x^N) - \sum_{n=1}^{N} \log \frac{p_\theta (\mathbf x^{n-1}\vert \mathbf x^n)}{q(\mathbf x^n \vert \mathbf x^{n-1})} \right] \\ =& \operatorname{min}_\theta \mathbb E_{\mathbf x^0, \epsilon} \left[ \frac{\beta_n^2}{2\Sigma_\theta \alpha_n (1 - \bar \alpha_n)} \lVert \epsilon - \epsilon_\theta ( \sqrt{ \bar \alpha_n} \mathbf x^0 + \sqrt{1-\bar \alpha_n} \epsilon , n ) \rVert \right] \end{align} $$
where $\epsilon$ is a sample from $\mathcal N(0, \mathbf I)$. The second step assumes the Gaussian noise in Eq \ref{eq-guassian-noise}, which is equivalent to1
$$ q(\mathbf x^n \vert \mathbf x^0) = \mathcal N (\mathbf x^n ; \sqrt{\bar \alpha_n} \mathbf x^0, (1 - \bar \alpha_n)\mathbf I), $$
with $\alpha_n = 1 - \beta _ n$, $\bar \alpha _ n = \Pi _ {i=1}^n \alpha_i$, and $\Sigma_\theta$ in Eq \ref{eqn-guassian-reverse-process}.
wiki/machine-learning/energy-based-model/diffusion-model
:wiki/machine-learning/energy-based-model/diffusion-model
Links to:LM (2023). 'Diffusion Models for Forecasting', Datumorphism, 02 April. Available at: https://datumorphism.leima.is/wiki/machine-learning/energy-based-model/diffusion-model/.