# 5 Variational Auto-Encoder

Published:
Category: { Machine Learning }
Summary: In an inference problem, $p(z\vert x)$, which is used to infer $z$ from $x$. $$p(z\vert x) = \frac{p(x, z)}{p(x)}.$$ For example, we have an observable $x$ and a latent space $z$, we would like to find a good latent space for the observable $x$. However, $p(x)$ is something we don’t really know. We would like to use some simpler quantities to help us inferring $z$ from $x$ or generating $x$ from $z$. Now we introduce a simple distribution $q(z\vert x)$. We want to make sure this $q(z\vert x)$ is doing a good job of replacing $p(z\vert x)$, i.e., minimizing the [[KL divergence]] KL Divergence Kullback–Leibler divergence indicates the differences between two distributions ,
Pages: 5

# 4 Generative Model: Auto-Encoder

Published:
Category: { Machine Learning }
Summary: Autoencoders (AE) are machines that encodes inputs into a compact latent space. The simplest auto-encoder is rather easy to understand. The loss can be chosen based on the demand, e.g., cross entropy for binary labels. Notation: dot ($\cdot$) We use a single vertically centered dot, i.e., $\cdot$, to indicate that the function or machine can take in arguments. A simple autoencoder can be achieved using two neural nets, e.g., \begin{align} {\color{green}h} &= {\color{blue}g}{\color{blue}(}{\color{blue}b} + {\color{blue}w} x{\color{blue})} \ \hat x &= {\color{red}\sigma}{\color{red}(c} + {\color{red}v} {\color{green}h}{\color{red})}, \end{align} where in this simple example, ${\color{blue}g(b + w \cdot )}$ is the encoder, and ${\color{red}\sigma(c + v \cdot )}$ is the decoder.
Pages: 5

# 3 Generative Model: Normalizing Flow

Published:
Category: { Machine Learning }
Summary: Normalizing flow is a method to convert a complicated distribution $p(x)$ to a simpler distribution $\tilde p(z)$ by building up a map $z=f(y)$ for the variable $x$ to $z$. The relations between the two distributions is established using the conservation law for distributions, $\int p(x) \mathrm d x = \int \tilde p (z) \mathrm d z = 1$. One could imagine that changing the variable also brings in the Jacobian. Liu X, Zhang F, Hou Z, Wang Z, Mian L, Zhang J, et al. Self-supervised Learning: Generative or Contrastive. arXiv [cs.LG]. 2020. Available: http://arxiv.org/abs/2006.08218 Normalizing Flows: An Introduction and Review of Current Methods To generate complicated distributions step by step from a simple and interpretable distribution.
Pages: 5

# 2 Generative Model: Autoregressive Model

Published:
Category: { Machine Learning }
Summary: An autoregressive (AR) model is autoregressive, $$$$\log p_\theta (x) = \sum_{t=1}^T \log p_\theta ( x_{t} \mid {x_{<t}} ).$$$$ In the above example, the likelihood is modeled as \begin{align} p_\theta (x) &= \Pi_{t=1}^T p_\theta (x_t \mid x_{1:t-1}) \\ &= p_\theta(x_2 \mid x_{1:1}) p_\theta(x_3 \mid x_{1:2}) \cdots p_\theta(x_T \mid x_{1:T-1}) \end{align} Taking the log of it $$\ln p_\theta (x) = \sum_{t=1}^T \ln p_\theta (x_t \mid x_{1:t-1})$$ Notations and Conventions In AR models, we have to mention the preceding nodes (${x_{<t}}$) of a specific node ($x_{t}$). For $t=5$, the relations between ${x_{<5}}$ and $x_5$ is shown in the following illustration.
Pages: 5

# 1 An Introduction to Generative Models

Published:
Category: { Machine Learning }
Summary: Discriminative model: The conditional probability of class label on data (posterior) $p(C_k\mid x)$ Generative models: Likelihood $p(x\mid C_k)$ Sample from the likelihood to generate data With latent variables $z$ and some neural network parameters $\theta$: $P(x,z\mid \theta) = p(x\mid z, \theta)p(z)$
Pages: 5