Variational Autoencoders — Deep Dive

Generative ModelsVAEsFree Lesson

Advertisement

Variational Autoencoders — Deep Dive

VAEs are generative models that learn a smooth latent space by combining autoencoders with variational inference. Unlike GANs, they provide a principled probabilistic framework for generation.


From Autoencoder to VAE

DfAutoencoder

A standard autoencoder compresses input xx into a latent code zz (encoder) and reconstructs it (decoder). It learns a deterministic mapping but the latent space may have gaps, making generation difficult.

DfVariational Autoencoder (VAE)

A VAE (Kingma & Welling, 2014) learns a probabilistic latent space:

  • Encoder qϕ(zx)q_\phi(z|x): Maps input to a distribution (mean μ\mu, variance σ2\sigma^2)
  • Decoder pθ(xz)p_\theta(x|z): Reconstructs input from sampled latent
  • Prior p(z)=N(0,I)p(z) = \mathcal{N}(0, I): Standard Gaussian

Instead of encoding to a point, VAE encodes to a distribution. Sampling from this distribution forces the latent space to be smooth and continuous.


Evidence Lower Bound (ELBO)

LVAE=Eqϕ(zx)[logpθ(xz)]Reconstruction Loss+DKL(qϕ(zx)p(z))KL Divergence\mathcal{L}_{\text{VAE}} = \underbrace{-\mathbb{E}_{q_\phi(z|x)}[\log p_\theta(x|z)]}_{\text{Reconstruction Loss}} + \underbrace{D_{KL}(q_\phi(z|x) \| p(z))}_{\text{KL Divergence}}

Reconstruction Loss

Lrecon=Eqϕ(zx)[logpθ(xz)]\mathcal{L}_{\text{recon}} = -\mathbb{E}_{q_\phi(z|x)}[\log p_\theta(x|z)]

Here,

  • qϕ(zx)q_\phi(z|x)=Encoder distribution (approximate posterior)
  • pθ(xz)p_\theta(x|z)=Decoder distribution (likelihood)
  • logpθ(xz)\log p_\theta(x|z)=Log-likelihood of reconstruction

KL Divergence

DKL(qϕ(zx)p(z))=12j=1J(1+logσj2μj2σj2)D_{KL}(q_\phi(z|x) \| p(z)) = -\frac{1}{2} \sum_{j=1}^{J} \left(1 + \log \sigma_j^2 - \mu_j^2 - \sigma_j^2\right)

Here,

  • μj\mu_j=Mean of encoder output for dimension j
  • σj2\sigma_j^2=Variance of encoder output for dimension j
  • JJ=Latent dimension

ℹ️ ELBO Derivation

logp(x)=Eqϕ(zx)[logpθ(x,z)qϕ(zx)]+DKL(qϕ(zx)pθ(zx))\log p(x) = \mathbb{E}_{q_\phi(z|x)}\left[\log \frac{p_\theta(x,z)}{q_\phi(z|x)}\right] + D_{KL}(q_\phi(z|x) \| p_\theta(z|x))

Since KL divergence is always non-negative, the first term (ELBO) is a lower bound on logp(x)\log p(x). Maximizing the ELBO simultaneously maximizes data likelihood and minimizes the gap to the true posterior.


Reparameterization Trick

ThReparameterization Trick

To backpropagate through a stochastic sampling operation, reparameterize the sampling as a deterministic transformation of noise:

z=μ+σϵ,ϵN(0,I)z = \mu + \sigma \odot \epsilon, \quad \epsilon \sim \mathcal{N}(0, I)

This makes the sampling differentiable with respect to μ\mu and σ\sigma while maintaining the stochastic nature through ϵ\epsilon.

Reparameterization

z=μ+σϵ,ϵN(0,I)z = \mu + \sigma \odot \epsilon, \quad \epsilon \sim \mathcal{N}(0, I)

Here,

  • μ\mu=Mean vector from encoder
  • σ\sigma=Standard deviation from encoder
  • ϵ\epsilon=Random noise (sampled during forward, not backpropagated)
  • \odot=Element-wise multiplication

Advertisement

Need Expert Deep Learning Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement