Hyperparameter Tuning: GridSearch, Random Search & Optuna
Hyperparameters vs Parameters
Understanding the difference between hyperparameters and parameters is crucial for model optimization.
Definitions
Architecture Diagram
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β MODEL COMPONENTS β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β PARAMETERS (Learned from Data): β
β βββ Linear Regression: coefficients (Ξ²), intercept β
β βββ Neural Networks: weights, biases β
β βββ Decision Trees: split thresholds β
β βββ SVM: support vectors β
β β
β HYPERPARAMETERS (Set Before Training): β
β βββ Linear Regression: None (or regularization Ξ±) β
β βββ Neural Networks: learning rate, layers, neurons β
β βββ Decision Trees: max_depth, min_samples_split β
β βββ SVM: C, kernel, gamma β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Mathematical Perspective
For a model f_\\theta with parameters :
- Parameters are learned by minimizing loss: \\theta^* = \\arg\\min_\\theta L(\\theta)
- Hyperparameters control the learning process: \\theta^*(\\lambda) = \\arg\\min_\\theta L(\\theta; \\lambda)
The goal of hyperparameter tuning is:
Grid Search
The most straightforward approach: exhaustively try all combinations.
Visual Representation
Architecture Diagram
Hyperparameter Space for SVM:
gamma
β
1.0 β β β β β
β
0.1 β β β β β
β
0.01 β β β β β
β
0.001 β β β β β
ββββββββββββββββββββββββ C
0.1 1.0 10 100
Total combinations: 4 Γ 4 = 16 models to train
Complete Grid Search Implementation
πGrid Search Cross-Validation
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
import time
np.random.seed(42)
X, y = make_classification(
n_samples=2000, n_features=20, n_informative=10,
n_classes=2, random_state=42
)
# Define parameter grid for SVM
param_grid_svm = {
'C': [0.01, 0.1, 1, 10, 100],
'gamma': [0.001, 0.01, 0.1, 1],
'kernel': ['rbf']
}
print(f"Total combinations: {np.prod([len(v) for v in param_grid_svm.values()])}")
pipeline = Pipeline([
('scaler', StandardScaler()),
('svm', SVC(random_state=42))
])
start_time = time.time()
grid_search = GridSearchCV(
pipeline, param_grid_svm, cv=5, scoring='accuracy',
n_jobs=-1, verbose=1, return_train_score=True
)
grid_search.fit(X, y)
grid_time = time.time() - start_time
print(f"\nBest parameters: {grid_search.best_params_}")
print(f"Best CV score: {grid_search.best_score_:.4f}")
print(f"Time taken: {grid_time:.2f}s")
Random Search
More efficient than Grid Search for large parameter spaces:
π‘ Why Random Search is Often Better
Random Search is more efficient because:
- Budget-constrained: With a fixed budget, Random Search explores more of the parameter space
- Continuous parameters: Can sample from distributions, not just grids
- Unimportant parameters: If only a few parameters matter, Random Search focuses budget on those
- Mathematical insight: For dimensions with budget , Random Search covers unique points vs Grid's unique values per dimension
from scipy.stats import uniform, randint, loguniform
param_distributions = {
'svm__C': loguniform(0.01, 100),
'svm__gamma': loguniform(0.001, 1),
'svm__kernel': ['rbf', 'linear']
}
random_search = RandomizedSearchCV(
pipeline, param_distributions, n_iter=50, cv=5,
scoring='accuracy', n_jobs=-1, random_state=42
)
start_time = time.time()
random_search.fit(X, y)
random_time = time.time() - start_time
print(f"Random Search (50 iterations): {random_search.best_score_:.4f} ({random_time:.2f}s)")
print(f"Best params: {random_search.best_params_}")
Bayesian Optimization with Optuna
Optuna uses Bayesian optimization with Tree-structured Parzen Estimator (TPE) to intelligently search the hyperparameter space.
How Optuna Works
Architecture Diagram
Optuna Optimization Process:
Iteration 1: Random sampling β Score = 0.75
Iteration 2: Random sampling β Score = 0.82
Iteration 3: TPE suggests β Score = 0.85 (explores promising area)
Iteration 4: TPE suggests β Score = 0.87 (refines search)
Iteration 5: TPE suggests β Score = 0.86 (explores alternative)
...
Iteration 50: TPE suggests β Score = 0.91 (converges)
Key Features:
β’ Pruning: Stop bad trials early
β’ Conditional parameters: Different params for different conditions
β’ Dashboard: Visual monitoring of optimization
Complete Optuna Implementation
try:
import optuna
from optuna.samplers import TPESampler
OPTUNA_AVAILABLE = True
except ImportError:
OPTUNA_AVAILABLE = False
print("Optuna not installed. Install with: pip install optuna")
if OPTUNA_AVAILABLE:
optuna.logging.set_verbosity(optuna.logging.WARNING)
def objective(trial):
classifier_name = trial.suggest_categorical('classifier', ['SVM', 'RF', 'GBM'])
if classifier_name == 'SVM':
C = trial.suggest_float('svm__C', 0.01, 100, log=True)
gamma = trial.suggest_float('svm__gamma', 0.001, 1, log=True)
kernel = trial.suggest_categorical('svm__kernel', ['rbf', 'linear'])
clf = SVC(C=C, gamma=gamma, kernel=kernel, random_state=42)
elif classifier_name == 'RF':
n_estimators = trial.suggest_int('rf__n_estimators', 50, 300)
max_depth = trial.suggest_int('rf__max_depth', 3, 20)
min_samples_split = trial.suggest_int('rf__min_samples_split', 2, 20)
clf = RandomForestClassifier(
n_estimators=n_estimators, max_depth=max_depth,
min_samples_split=min_samples_split, random_state=42, n_jobs=-1
)
else:
n_estimators = trial.suggest_int('gbm__n_estimators', 50, 300)
learning_rate = trial.suggest_float('gbm__learning_rate', 0.01, 0.3, log=True)
max_depth = trial.suggest_int('gbm__max_depth', 3, 10)
from sklearn.ensemble import GradientBoostingClassifier
clf = GradientBoostingClassifier(
n_estimators=n_estimators, learning_rate=learning_rate,
max_depth=max_depth, random_state=42
)
from sklearn.model_selection import StratifiedKFold
cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
scores = cross_val_score(clf, X, y, cv=cv, scoring='accuracy', n_jobs=-1)
return scores.mean()
study = optuna.create_study(direction='maximize', sampler=TPESampler(seed=42))
study.optimize(objective, n_trials=50, show_progress_bar=True)
print(f"Best trial:")
print(f" Value (CV Score): {study.best_trial.value:.4f}")
print(f" Params: {study.best_trial.params}")
Resource Allocation Strategies
Parallelization Strategies
Architecture Diagram
Strategy 1: Independent Jobs (Easy)
βββββββββββββββββββββββββββββββββββββββ
β Job 1: Trial 1 βββββββββββββββββββ β
β Job 2: Trial 2 βββββββββββββββββββ β
β Job 3: Trial 3 βββββββββββββββββββ β
β Job 4: Trial 4 βββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββ
β’ Each trial independent
β’ Easy to parallelize
β’ Wastes resources on bad trials
Strategy 2: Successive Halving (Efficient)
βββββββββββββββββββββββββββββββββββββββ
β All 16 trials (small budget) β
β βββ Top 8 advance βββββββββββββββββ β
β β βββ Top 4 advance βββββββββββββ β
β β β βββ Top 2 advance βββββββββ β
β β β β βββ Winner! β
βββββββββββββββββββββββββββββββββββββββ
β’ Early stopping for bad configs
β’ More efficient resource use
β’ Requires careful budget allocation
Successive Halving
Successive Halving Budget
Here,
- =Initial total budget
- =Elimination factor (typically 2 or 3)
- =Number of rounds
from sklearn.experimental import enable_halving_search_cv
from sklearn.model_selection import HalvingGridSearchCV, HalvingRandomSearchCV
param_grid = {
'n_estimators': [50, 100, 200, 300],
'max_depth': [3, 5, 7, 10],
'min_samples_split': [2, 5, 10]
}
rf = RandomForestClassifier(random_state=42)
halving_search = HalvingGridSearchCV(
rf, param_grid, cv=5, scoring='accuracy',
n_jobs=-1, factor=2, random_state=42
)
start = time.time()
halving_search.fit(X, y)
halving_time = time.time() - start
print(f"Halving Grid Search:")
print(f" Time: {halving_time:.2f}s")
print(f" Best score: {halving_search.best_score_:.4f}")
Best Practices
When to Use Each Method
| Method | When to Use | Pros | Cons |
|---|---|---|---|
| Grid Search | Small parameter space, few params | Exhaustive, simple | Slow for large spaces |
| Random Search | Large parameter spaces | Fast, good coverage | Might miss optimal |
| Optuna | Complex spaces, limited budget | Smart search, pruning | Requires library |
| Halving | Many combinations, limited compute | Very efficient | Less stable |
Practical Tips
βΉοΈ Hyperparameter Tuning Best Practices
- START COARSE, THEN REFINE: First pass with wide ranges, second pass narrow around best region
- USE LOG SCALE: For learning rates (0.001 to 0.1) and regularization (0.001 to 100)
- MONITOR OVERFITTING: Compare train vs CV scores; large gap = overfitting
- SET RANDOM SEEDS: Ensures reproducibility and fair comparison
- USE PIPELINES: Prevents data leakage; proper preprocessing in CV
- ALLOCATE BUDGET WISELY: Start with fewer CV folds, increase for final selection
Key Takeaways
πSummary: Hyperparameter Tuning
- Grid Search is simple but exponential in parameters
- Random Search is often more efficient than Grid Search
- Optuna uses Bayesian optimization for intelligent search with pruning
- Halving strategies dramatically reduce computation
- Always use pipelines to prevent data leakage
- Start coarse, then refine the search space
- Monitor train vs CV scores for overfitting
- Use log scale for learning rates and regularization
Practice Exercises
Exercise 1: Grid vs Random Search
# Compare Grid Search and Random Search on:
# 1. SVM with 3 hyperparameters
# 2. Random Forest with 4 hyperparameters
# 3. Gradient Boosting with 4 hyperparameters
# Measure: Time taken, Best score, Number of evaluations
Exercise 2: Optuna Optimization
# Use Optuna to optimize a Gradient Boosting model:
# 1. Define search space with conditional parameters
# 2. Implement pruning
# 3. Visualize optimization history
# 4. Analyze parameter importance
# 5. Compare with Grid/Random Search
Exercise 3: Resource Allocation
# Implement and compare:
# 1. Standard Grid Search (5-fold CV)
# 2. Halving Grid Search
# 3. Halving Random Search
# On a dataset with 1000+ samples and 10+ features.
Exercise 4: Practical Optimization Pipeline
from sklearn.datasets import load_breast_cancer
X, y = load_breast_cancer(return_X_y=True)
# Build a complete hyperparameter optimization pipeline:
# 1. Split into train/test (80/20)
# 2. Try 3 different algorithms
# 3. For each: Use Optuna with 100 trials + pruning
# 4. Select best model overall
# 5. Evaluate on test set
# 6. Compare CV estimate with test performance
Summary
| Aspect | Grid Search | Random Search | Optuna | Halving |
|---|---|---|---|---|
| Strategy | Exhaustive | Random sampling | Bayesian (TPE) | Progressive elimination |
| Speed | Slow | Fast | Fast | Very Fast |
| Effectiveness | Good | Good | Best | Good |
| Complexity | Low | Low | Medium | Medium |
| Best For | Small spaces | Large spaces | Complex spaces | Many combinations |