Ridge Regression (L2 Regularization)

Regression Analysis

Shrinking Coefficients to Prevent Overfitting

Ridge regression adds an L2 penalty to the OLS objective, shrinking coefficients toward zero. This reduces variance at the cost of small bias, improving generalization when multicollinearity exists or predictors outnumber observations.

Genomics — Handle thousands of gene predictors with limited samples
Finance — Stabilize portfolio weight estimates with correlated assets
Text Mining — Regularize high-dimensional term frequency features

The tuning parameter lambda controls the bias-variance tradeoff along a continuous path.

Ridge regression adds an L2 penalty to the OLS objective to shrink coefficients, reducing variance at the cost of a small bias:


import numpy as np

import matplotlib.pyplot as plt

from sklearn.linear_model import Ridge, RidgeCV, LinearRegression

from sklearn.preprocessing import StandardScaler

from sklearn.model_selection import cross_val_score, train_test_split

from sklearn.pipeline import Pipeline



np.random.seed(42)

n, p = 100, 20  # 20 features, many correlated

X = np.random.randn(n, p)

# Introduce correlations

X[:, 1] = X[:, 0] + np.random.randn(n)*0.3

X[:, 2] = X[:, 0] + np.random.randn(n)*0.3

# True model: only first 5 features matter

true_beta = np.array([2,-1.5,1,0.5,-0.8] + [0]*15)

y = X @ true_beta + np.random.randn(n)*2



X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)



# Ridge path

lambdas = np.logspace(-3, 4, 100)

coef_path = []

for lam in lambdas:

    ridge = Pipeline([('scaler', StandardScaler()),

                      ('ridge', Ridge(alpha=lam))])

    ridge.fit(X_train, y_train)

    coef_path.append(ridge.named_steps['ridge'].coef_)



coef_path = np.array(coef_path)



fig, axes = plt.subplots(1, 2, figsize=(14, 5))

for j in range(p):

    axes[0].plot(np.log10(lambdas), coef_path[:, j],

                 alpha=0.7, linewidth=1.5,

                 color='red' if j < 5 else 'lightblue')

axes[0].set_xlabel('log10(?)')

axes[0].set_ylabel('Coefficient Value')

axes[0].set_title('Ridge Coefficient Path\n(Red = true predictors)')

axes[0].axvline(0, color='black', linestyle='--', alpha=0.5)



# Cross-validation to select ?

ridge_cv = Pipeline([('scaler', StandardScaler()),

                     ('ridge', RidgeCV(alphas=lambdas, cv=5, scoring='neg_mean_squared_error'))])

ridge_cv.fit(X_train, y_train)

best_lambda = ridge_cv.named_steps['ridge'].alpha_



cv_scores = []

for lam in lambdas:

    model = Pipeline([('scaler', StandardScaler()), ('ridge', Ridge(alpha=lam))])

    score = cross_val_score(model, X_train, y_train, cv=5, scoring='neg_mse').mean()

    cv_scores.append(-score)



best_idx = np.argmin(cv_scores)

axes[1].plot(np.log10(lambdas), cv_scores, 'b-', linewidth=2)

axes[1].axvline(np.log10(best_lambda), color='red', linestyle='--',

               label=f'Best ?={best_lambda:.3f}')

axes[1].set_xlabel('log10(?)')

axes[1].set_ylabel('CV MSE')

axes[1].set_title('Ridge CV — Selecting Optimal ?')

axes[1].legend()



plt.tight_layout()

plt.savefig('ridge_regression.png', dpi=150)

plt.show()



# Compare OLS vs Ridge

ols = Pipeline([('scaler', StandardScaler()), ('ols', LinearRegression())])

best_ridge = Pipeline([('scaler', StandardScaler()), ('ridge', Ridge(alpha=best_lambda))])



ols.fit(X_train, y_train)

best_ridge.fit(X_train, y_train)



print(f"Best ?: {best_lambda:.4f}")

print(f"OLS   — Train MSE: {np.mean((y_train - ols.predict(X_train))**2):.3f}, Test MSE: {np.mean((y_test - ols.predict(X_test))**2):.3f}")

print(f"Ridge — Train MSE: {np.mean((y_train - best_ridge.predict(X_train))**2):.3f}, Test MSE: {np.mean((y_test - best_ridge.predict(X_test))**2):.3f}")

Ridge Regression (L2 Regularization) — Complete Guide

Ridge Regression (L2 Regularization)

Shrinking Coefficients to Prevent Overfitting

Key Takeaways

Need Expert Statistics Help?