Scikit-Learn Model Selection

Machine LearningScikit-LearnFree Lesson

Advertisement

Introduction

Model selection tools help split data, evaluate models, and compare different algorithms.

Train-Test Split

from sklearn.model_selection import train_test_split
from sklearn.datasets import make_classification

X, y = make_classification(n_samples=1000, n_features=20)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

print(f"Train: {X_train.shape}, Test: {X_test.shape}")

Cross-Validation

from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris

iris = load_iris()
X, y = iris.data, iris.target

clf = LogisticRegression(max_iter=200)

# 5-fold CV
scores = cross_val_score(clf, X, y, cv=5, scoring='accuracy')
print(f"CV scores: {scores}")
print(f"Mean: {scores.mean():.3f} ± {scores.std():.3f}")

K-Fold and StratifiedKFold

from sklearn.model_selection import KFold, StratifiedKFold
from sklearn.linear_model import Ridge

kf = KFold(n_splits=5, shuffle=True, random_state=42)
skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)

for train_idx, test_idx in kf.split(X, y):
    X_train, X_test = X[train_idx], X[test_idx]
    y_train, y_test = y[train_idx], y[test_idx]
    # Train model

Repeated Cross-Validation

from sklearn.model_selection import RepeatedStratifiedKFold
from sklearn.svm import SVC

rcv = RepeatedStratifiedKFold(n_splits=5, n_repeats=3, random_state=42)
scores = cross_val_score(SVC(), X, y, cv=rcv)
print(f"Total scores: {len(scores)}, Mean: {scores.mean()}")

Practice Problems

  1. Split data with stratification
  2. Perform k-fold cross-validation
  3. Compare different classifiers using CV
  4. Use shuffle in cross-validation
  5. Implement stratified K-fold

Advertisement

Need Expert Python Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement