Generative Model: Normalizing Flow

#Self-supervised Learning #Generative Model #Normalizing Flow #Basics

Normalizing flow is a method to convert a complicated distribution $p(x)$ to a simpler distribution $\tilde p(z)$ by building up a map $z=f(y)$ for the variable $x$ to $z$. The relations between the two distributions is established using the conservation law for distributions, $\int p(x) \mathrm d x = \int \tilde p (z) \mathrm d z = 1$. One could imagine that changing the variable also brings in the Jacobian.

Liu X, Zhang F, Hou Z, Wang Z, Mian L, Zhang J, et al. Self-supervised Learning: Generative or Contrastive. arXiv [cs.LG]. 2020. Available: http://arxiv.org/abs/2006.08218

Normalizing Flows: An Introduction and Review of Current Methods

To generate complicated distributions step by step from a simple and interpretable distribution.

Architecture

For a probability density $p(x)$ and a transformation of coordinate $x=g(z)$ or $z=f(x)$, the density can be expressed using the coordinate transformations, i.e.,¹

$$ \begin{align} p(x) &= \tilde p (f(x)) \lvert \operatorname{det} \operatorname{D} g(f(x)) \rvert^{-1} \\ &= \tilde p(f(x)) \lvert \operatorname{det}\operatorname{D} f(x) \rvert \end{align} $$

where the Jacobian is

$$ \operatorname{D} g(z) \to \frac{\partial }{\partial z} g. $$

The operation $g _ { * }\circ \tilde p(z)$ is the pushforward of $\tilde p(z)$. The operation $g _ { * }$ will pushforward simple distribution $\tilde p(z)$ to a more complex distribution $p(x)$.

The generative direction: sample $z$ from distribution $\tilde p(z)$, apply transformation $g(z)$;
The normalizing direction: “simplify” $p(x)$ to some simple distribution $\tilde p(z)$.

The key to the flow model is the chaining of the transformations

$$ \operatorname{det} \operatorname{D} f(x) = \Pi_{i=1}^N \operatorname{det} \operatorname{D} f_i (x_i) $$

where

$$ \begin{align} x_i &= g_i \circ \cdots \circ g_1 (z)\\ &= f_{i+1} \circ \cdots \circ f_N (x). \end{align} $$

Applications

Normalizing flow is good at estimating densities, fast.¹

Variational Inference

One interesting use case of the normalizing flow model is variational inference. We reiterate section 2.2.2 of Liu2020 here.¹

Variational Auto-Encoder

In an inference problem, $p(z\vert x)$, which is used to infer $z$ from $x$.

p(z\vert x) = \frac{p(x, z)}{p(x)}.

For example, we have an observable $x$ and a latent space $z$, we would like to find a good latent space for the observable $x$. However, $p(x)$ is something we don’t really know. We would like to use some simpler quantities to help us inferring $z$ from $x$ or generating $x$ from $z$.

Now we introduce a simple distribution $q(z\vert x)$. We want to make sure this $q(z\vert x)$ …

The variational inference problem: $\ln p(x) = \int \ln p(x, y) dy = $:
- $x$ is the observable;
- $y$ is the latent variable.
Introduce an approximation of the posterior $q(y\vert x, \theta)$, see

Liu2020 Liu X, Zhang F, Hou Z, Wang Z, Mian L, Zhang J, et al. Self-supervised Learning: Generative or Contrastive. arXiv [cs.LG]. 2020. Available: http://arxiv.org/abs/2006.08218 ↩︎ ↩︎ ↩︎

Planted: 2021-08-13 by L Ma;

References:

Liu2020 Liu X, Zhang F, Hou Z, Wang Z, Mian L, Zhang J, et al. Self-supervised Learning: Generative or Contrastive. arXiv [cs.LG]. 2020. Available: http://arxiv.org/abs/2006.08218