Variational Autoencoders — Deep Dive
VAEs are generative models that learn a smooth latent space by combining autoencoders with variational inference. Unlike GANs, they provide a principled probabilistic framework for generation.
From Autoencoder to VAE
DfAutoencoder
A standard autoencoder compresses input into a latent code (encoder) and reconstructs it (decoder). It learns a deterministic mapping but the latent space may have gaps, making generation difficult.
DfVariational Autoencoder (VAE)
A VAE (Kingma & Welling, 2014) learns a probabilistic latent space:
- Encoder : Maps input to a distribution (mean , variance )
- Decoder : Reconstructs input from sampled latent
- Prior : Standard Gaussian
Instead of encoding to a point, VAE encodes to a distribution. Sampling from this distribution forces the latent space to be smooth and continuous.
Evidence Lower Bound (ELBO)
Reconstruction Loss
Here,
- =Encoder distribution (approximate posterior)
- =Decoder distribution (likelihood)
- =Log-likelihood of reconstruction
KL Divergence
Here,
- =Mean of encoder output for dimension j
- =Variance of encoder output for dimension j
- =Latent dimension
ℹ️ ELBO Derivation
Since KL divergence is always non-negative, the first term (ELBO) is a lower bound on . Maximizing the ELBO simultaneously maximizes data likelihood and minimizes the gap to the true posterior.
Reparameterization Trick
ThReparameterization Trick
To backpropagate through a stochastic sampling operation, reparameterize the sampling as a deterministic transformation of noise:
This makes the sampling differentiable with respect to and while maintaining the stochastic nature through .
Reparameterization
Here,
- =Mean vector from encoder
- =Standard deviation from encoder
- =Random noise (sampled during forward, not backpropagated)
- =Element-wise multiplication