Model Compression
Deep learning models are often over-parameterized for deployment. Model compression reduces model size and computation while maintaining accuracy, enabling deployment on edge devices and reducing inference costs.
Unstructured Pruning
DfUnstructured Pruning
Set individual weights to zero based on magnitude:
Creates sparse weight matrices that require specialized hardware/SparseTensor support for speedup.
Magnitude Pruning Threshold
Here,
- =Weight to prune
- =Threshold (percentile of absolute weights)
- =Indicator function
Structured Pruning
DfStructured Pruning
Remove entire structures (filters, channels, heads, layers) rather than individual weights:
- Filter pruning: Remove entire convolutional filters
- Channel pruning: Remove output channels (equivalent to filter pruning)
- Head pruning: Remove attention heads in Transformers
- Layer pruning: Remove entire layers
Structured pruning produces smaller dense matrices that achieve actual speedup without specialized hardware.
Filter Importance (L1-norm)
Here,
- =Importance score for filter f
- =Weight of filter f at position (c,h,w)
Quantization
DfQuantization
Reduce precision of weights and activations from FP32 to lower-bit representations:
- FP16: 16-bit floating point (half precision) — 2x memory reduction
- INT8: 8-bit integer — 4x memory reduction, 2-3x speedup
- INT4: 4-bit integer — 8x memory reduction
- Binary: 1-bit — extreme compression
Quantization-aware training (QAT) simulates quantization during training for better accuracy.
Linear Quantization
Here,
- =Quantized value
- =Original floating-point value
- =Scaling factor
- =Zero-point offset
Scale and Zero Point
Here,
- =Range of floating-point values
- =Number of bits (e.g., 8 for INT8)
Knowledge Distillation
DfKnowledge Distillation
Distill knowledge from a large teacher model to a smaller student model :
- Soft targets: Teacher's softened outputs with temperature
- Hard targets: Ground truth labels
- Combined loss: Weighted sum of soft and hard losses
The soft targets contain "dark knowledge" — relationships between classes that hard labels miss.
Softened Probabilities
Here,
- =Logit for class i
- =Temperature (higher = softer distribution)
- =Softened probability