Self-Supervised Learning — Complete Guide
Self-supervised learning creates labels from the data itself, enabling training on massive unlabeled datasets.
Why Self-Supervised?
Labeled data: Expensive, scarce
Unlabeled data: Abundant, free
Self-supervised: Create pseudo-labels from data
├─ Masked token prediction (BERT, GPT)
├─ Contrastive learning (SimCLR, CLIP)
├─ Next token prediction (GPT)
└─ Image rotation prediction
Approaches
Contrastive Learning:
├─ Similar pairs → close in embedding space
├─ Dissimilar pairs → far apart
├─ SimCLR, MoCo, CLIP
└─ Works for images and text
Masked Modeling:
├─ Mask parts of input, predict them
├─ BERT: Mask tokens → predict
├─ MAE: Mask image patches → predict
└─ GPT: Predict next token
Pretext Tasks:
├─ Predict rotation
├─ Predict relative position
├─ Solve puzzles
└─ Fill in gaps
Key Takeaways
- Self-supervised learning creates labels from data
- Contrastive learning learns by comparing pairs
- Masked modeling learns by predicting hidden parts
- GPT and BERT use self-supervised pre-training
- CLIP learns vision-language alignment
- Self-supervised learning enables foundation models
- Pre-training + fine-tuning is the dominant paradigm
- Self-supervised learning reduces labeled data needs