Generative Models

Generative self-supervised learning models can utilize more data

⁵ Variational Auto-Encoder

Published: 2021-08-13

Category: { Machine Learning }

Tags:

#Self-supervised Learning #Generative Model #VAE #Basics

References: - Liu X, Zhang F, Hou Z, Wang Z, Mian L, Zhang J, et al. Self-supervised Learning: Generative or Contrastive. arXiv [cs.LG]. 2020. Available: http://arxiv.org/abs/2006.08218 - Doersch C. Tutorial on Variational Autoencoders. arXiv [stat.ML]. 2016. Available: http://arxiv.org/abs/1606.05908 - Kingma DP, Welling M. An Introduction to Variational Autoencoders. arXiv [cs.LG]. 2019. Available: http://arxiv.org/abs/1906.02691 - Jordan J. Variational autoencoders. Jeremy Jordan. 19 Mar 2018. Available: https://www.jeremyjordan.me/variational-autoencoders/. Accessed 22 Aug 2021. - Courses DS. Ali Ghodsi, Lec : Deep Learning, Variational Autoencoder, Oct 12 2017 [Lect 6.2]. YouTube. 2017. Available: https://youtu.be/uaaqyVS9-rM?t=19m42s

Summary: In an inference problem, $p(z\vert x)$, which is used to infer $z$ from $x$. $$ p(z\vert x) = \frac{p(x, z)}{p(x)}. $$ For example, we have an observable $x$ and a latent space $z$, we would like to find a good latent space for the observable $x$. However, $p(x)$ is something we don’t really know. We would like to use some simpler quantities to help us inferring $z$ from $x$ or generating $x$ from $z$. Now we introduce a simple distribution $q(z\vert x)$. We want to make sure this $q(z\vert x)$ is doing a good job of replacing $p(z\vert x)$, i.e., minimizing the [[KL divergence]] KL Divergence Kullback–Leibler divergence indicates the differences between two distributions ,

Pages: 5

⁴ Generative Model: Auto-Encoder

Published: 2021-08-13

Category: { Machine Learning }

Tags:

#Self-supervised Learning #Generative Model #Auto-Encoder #Basics

References: - Lippe P. Tutorial 9: Deep Autoencoders — UvA DL Notebooks v1.1 documentation. In: UvA Deep Learning Tutorials [Internet]. [cited 20 Sep 2021]. Available: https://uvadlc-notebooks.readthedocs.io/en/latest/tutorial_notebooks/tutorial9/AE_CIFAR10.html

Summary: Autoencoders (AE) are machines that encodes inputs into a compact latent space. The simplest auto-encoder is rather easy to understand. The loss can be chosen based on the demand, e.g., cross entropy for binary labels. Notation: dot ($\cdot$) We use a single vertically centered dot, i.e., $\cdot$, to indicate that the function or machine can take in arguments. A simple autoencoder can be achieved using two neural nets, e.g., $$ \begin{align} {\color{green}h} &= {\color{blue}g}{\color{blue}(}{\color{blue}b} + {\color{blue}w} x{\color{blue})} \ \hat x &= {\color{red}\sigma}{\color{red}(c} + {\color{red}v} {\color{green}h}{\color{red})}, \end{align} $$ where in this simple example, ${\color{blue}g(b + w \cdot )}$ is the encoder, and ${\color{red}\sigma(c + v \cdot )}$ is the decoder.

Pages: 5

³ Generative Model: Normalizing Flow

Published: 2021-08-13

Category: { Machine Learning }

Tags:

#Self-supervised Learning #Generative Model #Normalizing Flow #Basics

References: - Liu X, Zhang F, Hou Z, Wang Z, Mian L, Zhang J, et al. Self-supervised Learning: Generative or Contrastive. arXiv [cs.LG]. 2020. Available: http://arxiv.org/abs/2006.08218

Summary: Normalizing flow is a method to convert a complicated distribution $p(x)$ to a simpler distribution $\tilde p(z)$ by building up a map $z=f(y)$ for the variable $x$ to $z$. The relations between the two distributions is established using the conservation law for distributions, $\int p(x) \mathrm d x = \int \tilde p (z) \mathrm d z = 1$. One could imagine that changing the variable also brings in the Jacobian. Liu X, Zhang F, Hou Z, Wang Z, Mian L, Zhang J, et al. Self-supervised Learning: Generative or Contrastive. arXiv [cs.LG]. 2020. Available: http://arxiv.org/abs/2006.08218 Normalizing Flows: An Introduction and Review of Current Methods To generate complicated distributions step by step from a simple and interpretable distribution.

Pages: 5

² Generative Model: Autoregressive Model

Published: 2021-08-13

Category: { Machine Learning }

Tags:

#Self-supervised Learning #Generative Model #Autoregressive Model #Basics

References: - Uria B, Côté M-A, Gregor K, Murray I, Larochelle H. Neural Autoregressive Distribution Estimation. arXiv [cs.LG]. 2016. Available: http://arxiv.org/abs/1605.02226 - Triebe O, Laptev N, Rajagopal R. AR-Net: A simple Auto-Regressive Neural Network for time-series. arXiv [cs.LG]. 2019. Available: http://arxiv.org/abs/1911.12436 - Ho G. George Ho. In: Eigenfoo [Internet]. 9 Mar 2019 [cited 19 Sep 2021]. Available: https://www.eigenfoo.xyz/deep-autoregressive-models/ - Papamakarios G, Pavlakou T, Murray I. Masked Autoregressive Flow for Density Estimation. arXiv [stat.ML]. 2017. Available: http://arxiv.org/abs/1705.07057 - Germain M, Gregor K, Murray I, Larochelle H. MADE: Masked autoencoder for distribution estimation. 32nd International Conference on Machine Learning, ICML 2015. 2015;2: 881–889. Available: http://arxiv.org/abs/1502.03509 - Liu X, Zhang F, Hou Z, Wang Z, Mian L, Zhang J, et al. Self-supervised Learning: Generative or Contrastive. arXiv [cs.LG]. 2020. Available: http://arxiv.org/abs/2006.08218 - Lippe P. Tutorial 12: Autoregressive Image Modeling — UvA DL Notebooks v1.1 documentation. In: UvA Deep Learning Tutorials [Internet]. [cited 20 Sep 2021]. Available: https://uvadlc-notebooks.readthedocs.io/en/latest/tutorial_notebooks/tutorial12/Autoregressive_Image_Modeling.html - rogen-george. rogen-george/Deep-Autoregressive-Model. In: GitHub [Internet]. [cited 20 Sep 2021]. Available: https://github.com/rogen-george/Deep-Autoregressive-Model

Summary: An autoregressive (AR) model is autoregressive, $$ \begin{equation} \log p_\theta (x) = \sum_{t=1}^T \log p_\theta ( x_{t} \mid {x_{<t}} ). \end{equation} $$ In the above example, the likelihood is modeled as $$ \begin{align} p_\theta (x) &= \Pi_{t=1}^T p_\theta (x_t \mid x_{1:t-1}) \\ &= p_\theta(x_2 \mid x_{1:1}) p_\theta(x_3 \mid x_{1:2}) \cdots p_\theta(x_T \mid x_{1:T-1}) \end{align} $$ Taking the log of it $$ \ln p_\theta (x) = \sum_{t=1}^T \ln p_\theta (x_t \mid x_{1:t-1}) $$ Notations and Conventions In AR models, we have to mention the preceding nodes (${x_{<t}}$) of a specific node ($x_{t}$). For $t=5$, the relations between ${x_{<5}}$ and $x_5$ is shown in the following illustration.

Pages: 5

¹ An Introduction to Generative Models

Published: 2021-08-13

Category: { Machine Learning }

Tags:

#Self-supervised Learning #Generative Model #Basics

References: - Liu X, Zhang F, Hou Z, Wang Z, Mian L, Zhang J, et al. Self-supervised Learning: Generative or Contrastive. arXiv [cs.LG]. 2020. Available: http://arxiv.org/abs/2006.08218

Summary: Discriminative model: The conditional probability of class label on data (posterior) $p(C_k\mid x)$ Generative models: Likelihood $p(x\mid C_k)$ Sample from the likelihood to generate data With latent variables $z$ and some neural network parameters $\theta$: $P(x,z\mid \theta) = p(x\mid z, \theta)p(z)$

Pages: 5

Generative Models

Generative self-supervised learning models can utilize more data

5 Variational Auto-Encoder

4 Generative Model: Auto-Encoder

3 Generative Model: Normalizing Flow

2 Generative Model: Autoregressive Model

1 An Introduction to Generative Models

⁵ Variational Auto-Encoder

⁴ Generative Model: Auto-Encoder

³ Generative Model: Normalizing Flow

² Generative Model: Autoregressive Model

¹ An Introduction to Generative Models