Diffusion Models — Complete Guide
Diffusion models are the state-of-the-art for image generation (DALL-E, Stable Diffusion, Midjourney).
How Diffusion Works
Forward Process (fixed):
Image → Add noise → Add noise → ... → Pure noise
x₀ → x₁ → x₂ → ... → x_T
Reverse Process (learned):
Pure noise → Remove noise → ... → Image
x_T → x_T₋₁ → ... → x₀
The model LEARNS to reverse the noise!
DDPM (Denoising Diffusion Probabilistic Models)
Training:
1. Take clean image x₀
2. Sample random timestep t
3. Add noise: x_t = √(ᾱ_t) x₀ + √(1-ᾱ_t) ε
4. Model predicts noise: ε_θ(x_t, t)
5. Loss: ||ε - ε_θ(x_t, t)||²
Generation:
1. Start with random noise x_T ~ N(0,I)
2. For t = T, T-1, ..., 1:
x_{t-1} = μ_θ(x_t, t) + σ_t z
3. Return x₀
Classifier-Free Guidance
Improves sample quality:
ε_guided = ε_uncond + w × (ε_cond - ε_uncond)
w = guidance scale
├─ w = 1: No guidance (standard)
├─ w = 7.5: Good balance
└─ w = 15+: Very high quality, low diversity
Key Takeaways
- Diffusion models learn to reverse a noise process
- DDPM is the foundational algorithm
- Classifier-free guidance improves quality
- Latent diffusion (Stable Diffusion) operates in compressed space
- Score matching is an alternative formulation
- Diffusion models outperform GANs for image generation
- Text-to-image uses cross-attention for conditioning
- Diffusion models are the basis of Sora (video generation)