Transfer Learning — Complete Guide
Transfer learning reuses a pre-trained model on a new task, dramatically reducing data and training requirements.
Why Transfer Learning?
Training from scratch:
├─ Needs millions of examples
├─ Takes weeks on GPUs
├─ Expensive compute
└─ Risk of overfitting
Transfer learning:
├─ Start with pre-trained weights
├─ Fine-tune on small dataset
├─ Hours instead of weeks
└─ Often better performance
Strategies
Strategy 1: Feature Extraction
├─ Freeze pre-trained layers
├─ Only train new classification head
├─ Fast, less overfitting
└─ Use when: Small dataset, similar domain
Strategy 2: Fine-Tuning
├─ Unfreeze some/all pre-trained layers
├─ Train with small learning rate
├─ Better performance
└─ Use when: More data, different domain
Strategy 3: Full Fine-Tuning
├─ Unfreeze everything
├─ Train with very small learning rate
├─ Best performance
└─ Use when: Large dataset, different domain
Implementation
from torchvision import models
import torch.nn as nn
# Load pre-trained ResNet
model = models.resnet50(pretrained=True)
# Freeze all layers
for param in model.parameters():
param.requires_grad = False
# Replace classifier
model.fc = nn.Linear(2048, num_classes)
# Only new layer trains
optimizer = torch.optim.Adam(model.fc.parameters(), lr=0.001)
When to Use
Small data + Similar domain → Feature extraction
Small data + Different domain → Fine-tune top layers
Large data + Similar domain → Fine-tune all layers
Large data + Different domain → Fine-tune or train from scratch
Key Takeaways
- Transfer learning dramatically reduces data and compute needs
- Feature extraction is safest for small datasets
- Fine-tuning with small learning rate prevents catastrophic forgetting
- ImageNet pre-trained models work for most vision tasks
- BERT/GPT pre-trained models work for most NLP tasks
- Discriminative learning rates — lower for early layers
- Gradual unfreezing — unfreeze layer by layer
- Transfer learning is the default approach in modern ML