Deep Generative Models(Part 1): Taxonomy and VAEs

 

A Generative Model learns a probability distribution from data with prior knowledge, producing new images from learned distribution.

Deep Generative Models: A Taxonomy

Key choices

Representation

There are two main choices for learned representation: factorized model and latent variable model.

Factorized model writes probability distribution as a product of simpler terms, via chain rule. Deep Generative Models: A Taxonomy

Latent variable model defines a latent space to extract the core information from data, which is much smaller than the original one.

Deep Generative Models: A Taxonomy

Learning

Max Likelihood Estimation

  • fully-observed graphical models: PixelRNN & PixelCNN -> PixelCNN++, WaveNet(audio)
  • latent-variable models: VAE -> VQ-VAE
  • latent-variable invertible models(Flow-based): NICE, Real NVP -> MAF, IAF, Glow

Adversarial Training

  • GANs: Vanilla GAN -> improved GAN, DCGAN, cGAN -> WGAN, ProGAN -> SAGAN, StyleGAN, BigGAN

Comparison of GAN, VAE and Flow-based Models Deep Generative Models: A Taxonomy

VAE: Variational AutoEncoder

Auto-Encoding Variational Bayes - Kingma - ICLR 2014

  • Title: Auto-Encoding Variational Bayes
  • Task: Image Generation
  • Author: D. P. Kingma and M. Welling
  • Date: Dec. 2013
  • Arxiv: 1312.6114
  • Published: ICLR 2014

Highlights

  • A reparameterization of the variational lower bound yields a lower bound estimator that can be straightforwardly optimized using standard stochastic gradient methods
  • For i.i.d. datasets with continuous latent variables per datapoint, posterior inference can be made especially efficient by fitting an approximate inference model (also called a recognition model) to the intractable posterior using the proposed lower bound estimator

The key idea: approximate the posterior pθ(z|x) with a simpler, tractable distribution qϕ(z|x). Auto-Encoding Variational Bayes - Kingma - ICLR 2014

The graphical model involved in Variational Autoencoder. Solid lines denote the generative distribution pθ(.) and dashed lines denote the distribution $q_ϕ(z x)toapproximatetheintractableposteriorp_θ(z x)$.

Auto-Encoding Variational Bayes - Kingma - ICLR 2014

Loss Function: ELBO Using KL Divergence: DKL(qϕ(z|x)pθ(z|x))=logpθ(x)+DKL(qϕ(z|x)pθ(z))Ezqϕ(z|x)logpθ(x|z)

ELOB defined as: LVAE(θ,ϕ)=logpθ(x)+DKL(qϕ(z|x)pθ(z|x))=Ezqϕ(z|x)logpθ(x|z)+DKL(qϕ(z|x)pθ(z))θ,ϕ=argminθ,ϕLVAE

By minimizing the loss we are maximizing the lower bound of the probability of generating real data samples.

The Reparameterization Trick

The expectation term in the loss function invokes generating samples from zqϕ(z|x). Sampling is a stochastic process and therefore we cannot backpropagate the gradient. To make it trainable, the reparameterization trick is introduced: It is often possible to express the random variable z as a deterministic variable $\mathbf{z}=\mathcal{T}{\phi}(\mathbf{x}, \boldsymbol{\epsilon}),whereϵisanauxiliaryindependentrandomvariable,andthetransformationfunction\mathcal{T}{\phi}parameterizedbyϕconvertsϵtoz$.

For example, a common choice of the form of qϕ(z|x) ltivariate Gaussian with a diagonal covariance structure: zqϕ(z|x(i))=N(z;μ(i),σ2(i)I)z=μ+σϵ, where ϵN(0,I) where refers to element-wise product.

Auto-Encoding Variational Bayes - Kingma - ICLR 2014

(VQ-VAE)Neural Discrete Representation Learning - van den Oord - NIPS 2017

  • Title: Neural Discrete Representation Learning
  • Task: Image Generation
  • Author: A. van den Oord, O. Vinyals, and K. Kavukcuoglu
  • Date: Nov. 2017
  • Arxiv: 1711.00937
  • Published: NIPS 2017
  • Affiliation: Google DeepMind

Highlights

  • Discrete representation for data distribution
  • The prior is learned instead of random

Vector Quantisation(VQ) Vector quantisation (VQ) is a method to map K-dimensional vectors into a finite set of “code” vectors. The encoder output E(x)=ze goes through a nearest-neighbor lookup to match to one of K embedding vectors and then this matched code vector becomes the input for the decoder D(.):

zq(x)=ek, where k=argminjze(x)ej2

The dictionary items are updated using Exponential Moving Averages(EMA), which is similar to EM methods like K-Means.

(VQ-VAE)Neural Discrete Representation Learning

Loss Design

  • Reconstruction loss
  • VQ loss: The L2 error between the embedding space and the encoder outputs.
  • Commitment loss: A measure to encourage the encoder output to stay close to the embedding space and to prevent it from fluctuating too frequently from one code vector to another.
L=xD(ek)22 reconstruction loss +sg[E(x)]ek22 VQ loss +βE(x)sg[ek]22 commitment loss 

where sq[.] is the stop_gradient operator.

Training PixelCNN and WaveNet for images and audio respectively on learned latent space, the VA-VAE model avoids “posterior collapse” problem which VAE suffers from.

Generating Diverse High-Fidelity Images with VQ-VAE-2 - Razavi - 2019

  • Title: Generating Diverse High-Fidelity Images with VQ-VAE-2
  • Task: Image Generation
  • Author: A. Razavi, A. van den Oord, and O. Vinyals
  • Date: Jun. 2019
  • Arxiv: 1906.00446
  • Affiliation: Google DeepMind

Highlights

  • Diverse generated results
  • A multi-scale hierarchical organization of VQ-VAE
  • Self-attention mechanism over autoregressive model

Generating Diverse High-Fidelity Images with VQ-VAE-2

Stage 1: Training hierarchical VQ-VAE The design of hierarchical latent variables intends to separate local patterns (i.e., texture) from global information (i.e., object shapes). The training of the larger bottom level codebook is conditioned on the smaller top level code too, so that it does not have to learn everything from scratch.

Generating Diverse High-Fidelity Images with VQ-VAE-2

Stage 2: Learning a prior over the latent discrete codebook The decoder can receive input vectors sampled from a similar distribution as the one in training. A powerful autoregressive model enhanced with multi-headed self-attention layers is used to capture the correlations in spatial locations that are far apart in the image with a larger receptive field.

Generating Diverse High-Fidelity Images with VQ-VAE-2

References

  1. IJCAI 2018 Tutorial: Deep Generative Models
  2. Flow-based Deep Generative Models