GANs — Generative Adversarial Networks
GANs generate realistic images by pitting two neural networks against each other.
How GANs Work
Generator (G): Creates fake images from random noise
Discriminator (D): Distinguishes real from fake images
Training Loop:
1. G generates fake images
2. D classifies real vs fake
3. D improves at detection
4. G improves at fooling D
5. Repeat until G generates realistic images
Analogy: Counterfeiter (G) vs Detective (D)
Counterfeiter gets better → Detective gets better → Counterfeiter gets better...
Loss Functions
Discriminator loss:
L_D = -[log(D(real)) + log(1 - D(G(z)))]
Generator loss:
L_G = -log(D(G(z)))
Minimax game:
min_G max_D V(D,G) = E[log D(x)] + E[log(1 - D(G(z)))]
DCGAN (Deep Convolutional GAN)
Architecture guidelines:
├─ Replace pooling with strided convolutions (D)
├─ Use batch normalization
├─ Remove fully connected layers
├─ Use ReLU in G (except output: Tanh)
└─ Use LeakyReLU in D
class Generator(nn.Module):
def __init__(self, latent_dim=100):
super().__init__()
self.gen = nn.Sequential(
nn.ConvTranspose2d(latent_dim, 512, 4, 1, 0),
nn.BatchNorm2d(512), nn.ReLU(),
nn.ConvTranspose2d(512, 256, 4, 2, 1),
nn.BatchNorm2d(256), nn.ReLU(),
nn.ConvTranspose2d(256, 128, 4, 2, 1),
nn.BatchNorm2d(128), nn.ReLU(),
nn.ConvTranspose2d(128, 3, 4, 2, 1),
nn.Tanh()
)
def forward(self, z):
return self.gen(z.view(-1, 100, 1, 1))
GAN Variants
StyleGAN: Style transfer, high-res faces
Pix2Pix: Paired image translation (sketch → photo)
CycleGAN: Unpaired image translation (horse → zebra)
ProGAN: Progressive growing for high resolution
Key Takeaways
- GANs consist of Generator vs Discriminator in adversarial training
- Training is unstable — requires careful balancing
- Mode collapse — G produces limited variety
- DCGAN established stable convolutional architecture
- StyleGAN produces photorealistic faces
- Wasserstein loss (WGAN) improves training stability
- GANs are being replaced by diffusion models for many tasks
- GANs still useful for style transfer and image editing