Regularization — Complete Guide

Regularization prevents overfitting by adding a penalty term to the loss function, constraining model complexity.

The Problem

Without regularization:
├─ Model fits training data perfectly
├─ Complex models with large weights
├─ High variance (overfitting)
└─ Poor generalization to new data

With regularization:
├─ Model balances fit and simplicity
├─ Smaller weights
├─ Lower variance (less overfitting)
└─ Better generalization

Ridge Regression (L2)

Loss = MSE + α × Σwᵢ²

Adds squared magnitude of weights as penalty
Shrinks weights toward zero (but never exactly zero)

Effect: Smoother, simpler model
When to use: Many features, all potentially useful

Lasso Regression (L1)

Loss = MSE + α × Σ|wᵢ|

Adds absolute magnitude of weights as penalty
Can shrink weights to EXACTLY zero (feature selection!)

Effect: Sparse model, automatic feature selection
When to use: Many features, some irrelevant

Elastic Net

Loss = MSE + α₁ × Σ|wᵢ| + α₂ × Σwᵢ²

Combines Ridge + Lasso
Best of both worlds

When to use: correlated features, need feature selection

Python Implementation

from sklearn.linear_model import Ridge, Lasso, ElasticNet
from sklearn.model_selection import cross_val_score
import numpy as np

# Ridge
ridge = Ridge(alpha=1.0)
scores = cross_val_score(ridge, X, y, cv=5, scoring='r2')

# Lasso
lasso = Lasso(alpha=0.1)
scores = cross_val_score(lasso, X, y, cv=5, scoring='r2')
print(f"Lasso selected {np.sum(lasso.coef_ != 0)} features")

# Elastic Net
enet = ElasticNet(alpha=0.1, l1_ratio=0.5)
scores = cross_val_score(enet, X, y, cv=5, scoring='r2')

Choosing Alpha

α = 0: No regularization (original model)
α = ∞: All weights = 0 (trivial model)

Use cross-validation to find optimal α:
alphas = [0.001, 0.01, 0.1, 1, 10, 100]
for a in alphas:
    model = Ridge(alpha=a)
    score = cross_val_score(model, X, y, cv=5).mean()
    print(f"α={a}: {score:.3f}")

Key Takeaways

Regularization prevents overfitting by penalizing complexity
Ridge (L2) shrinks weights — good when all features matter
Lasso (L1) performs feature selection — zeros out irrelevant features
Elastic Net combines both — good default choice
Cross-validation is essential for choosing alpha
Scale features before regularization (penalty is scale-dependent)
Regularization is crucial for high-dimensional data
Tree-based models don't need regularization

Regularization — Ridge, Lasso & Elastic Net Complete Guide

Regularization — Complete Guide

The Problem

Ridge Regression (L2)

Lasso Regression (L1)

Elastic Net

Python Implementation

Choosing Alpha

Key Takeaways

Need Expert Machine Learning Help?