Lasso Regression (L1 Regularization)

Regression Analysis

Feature Selection Through Coefficient Shrinkage

The Lasso adds an L1 penalty that can drive coefficients to exactly zero, performing automatic feature selection. This produces sparse, interpretable models that identify the most important predictors from high-dimensional data.

Biomedical Research — Identify key biomarkers from thousands of candidates
Marketing — Select the most predictive customer attributes for targeting
Environmental Modeling — Pinpoint critical factors from numerous measurements

While Ridge shrinks all coefficients, Lasso selects only the essential ones.

Lasso (Least Absolute Shrinkage and Selection Operator) adds an L1 penalty:

Unlike Ridge, Lasso produces sparse solutions — many coefficients shrink to exactly zero.


import numpy as np

import matplotlib.pyplot as plt

from sklearn.linear_model import Lasso, LassoCV, Ridge

from sklearn.preprocessing import StandardScaler

from sklearn.pipeline import Pipeline

from sklearn.model_selection import cross_val_score, train_test_split



np.random.seed(42)

n, p = 150, 30  # 30 features, only 5 are truly relevant

X = np.random.randn(n, p)

true_beta = np.array([3, -2, 1.5, -1, 0.8] + [0]*25)  # only 5 nonzero

y = X @ true_beta + np.random.randn(n)*1.5



X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)



# Lasso path

lambdas = np.logspace(-3, 2, 200)

lasso_coefs = []

ridge_coefs = []

for lam in lambdas:

    l = Pipeline([('s', StandardScaler()), ('m', Lasso(alpha=lam, max_iter=10000))])

    r = Pipeline([('s', StandardScaler()), ('m', Ridge(alpha=lam))])

    l.fit(X_train, y_train)

    r.fit(X_train, y_train)

    lasso_coefs.append(l.named_steps['m'].coef_)

    ridge_coefs.append(r.named_steps['m'].coef_)



lasso_coefs = np.array(lasso_coefs)

ridge_coefs = np.array(ridge_coefs)



fig, axes = plt.subplots(1, 3, figsize=(16, 5))



# Lasso path

for j in range(p):

    axes[0].plot(np.log10(lambdas), lasso_coefs[:, j],

                 linewidth=1.5, color='red' if j < 5 else 'lightgray', alpha=0.7)

axes[0].set_xlabel('log10(?)')

axes[0].set_ylabel('Coefficient')

axes[0].set_title('Lasso Path — Sparse Solutions\n(coefficients become exactly 0)')



# Compare: sparsity at optimal ?

lasso_cv = Pipeline([('s', StandardScaler()), ('m', LassoCV(cv=5, random_state=42))])

lasso_cv.fit(X_train, y_train)

best_lam = lasso_cv.named_steps['m'].alpha_

best_coefs = lasso_cv.named_steps['m'].coef_



nonzero = (best_coefs != 0).sum()

axes[1].bar(range(p), best_coefs, color=['red' if j<5 else 'steelblue' for j in range(p)])

axes[1].axhline(0, color='black', linewidth=0.5)

axes[1].set_title(f'Lasso Coefficients (?={best_lam:.4f})\n{nonzero}/{p} nonzero features selected')

axes[1].set_xlabel('Feature Index')



# Lasso vs Ridge: number of nonzero features

lasso_nonzero = [(lasso_coefs[i] != 0).sum() for i in range(len(lambdas))]

ridge_nonzero = [(ridge_coefs[i] != 0).sum() for i in range(len(lambdas))]

axes[2].plot(np.log10(lambdas), lasso_nonzero, 'r-', linewidth=2, label='Lasso')

axes[2].plot(np.log10(lambdas), ridge_nonzero, 'b-', linewidth=2, label='Ridge')

axes[2].set_title('Sparsity: Lasso vs Ridge')

axes[2].set_xlabel('log10(?)')

axes[2].set_ylabel('# Nonzero Coefficients')

axes[2].legend()



plt.tight_layout()

plt.savefig('lasso_regression.png', dpi=150)

plt.show()



test_mse_lasso = np.mean((y_test - lasso_cv.predict(X_test))**2)

print(f"Lasso: best ?={best_lam:.4f}, {nonzero} features selected, Test MSE={test_mse_lasso:.4f}")

print(f"True nonzero features: {(true_beta!=0).sum()}")

print(f"Correctly selected: {sum(1 for j in range(p) if (best_coefs[j]!=0) == (true_beta[j]!=0))}/{p}")

Ridge vs Lasso

| Aspect | Ridge | Lasso |

|--------|-------|-------|

| Penalty | L2 (squared) | L1 (absolute) |

| Shrinkage | Toward 0, not to 0 | Can be exactly 0 |

| Feature selection | ? No | ? Yes (sparse) |

| Multicollinearity | Keeps all | Picks one arbitrarily |

| Solution | Closed form | Iterative (coordinate descent) |

Lasso Regression (L1 Regularization) — Feature Selection

Lasso Regression (L1 Regularization)

Feature Selection Through Coefficient Shrinkage

Ridge vs Lasso

Key Takeaways

Need Expert Statistics Help?