Ensemble Methods — Bagging, Boosting, Stacking Complete Guide

Core MLEnsemble MethodsFree Lesson

Advertisement

Ensemble Methods — Complete Guide

Ensemble methods combine multiple models to produce better predictions than any single model.


Types of Ensembles

Bagging (Bootstrap Aggregating):
├─ Train models INDEPENDENTLY on different data samples
├─ Combine by averaging/voting
├─ Reduces variance
└─ Example: Random Forest

Boosting:
├─ Train models SEQUENTIALLY
├─ Each model corrects previous errors
├─ Reduces bias and variance
└─ Example: XGBoost, AdaBoost

Stacking:
├─ Train different model types
├─ Train a meta-learner to combine predictions
├─ Uses diverse base models
└─ Often wins competitions

Voting:
├─ Hard voting: majority wins
├─ Soft voting: average probabilities
└─ Simple but effective

Python Implementation

from sklearn.ensemble import (
    VotingClassifier, StackingClassifier,
    BaggingClassifier, AdaBoostClassifier
)
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier

# Voting
voting = VotingClassifier(estimators=[
    ('lr', LogisticRegression()),
    ('rf', RandomForestClassifier()),
    ('svc', SVC(probability=True))
], voting='soft')

# Bagging
bagging = BaggingClassifier(
    base_estimator=DecisionTreeClassifier(),
    n_estimators=100, random_state=42
)

# Boosting
boosting = AdaBoostClassifier(
    base_estimator=DecisionTreeClassifier(max_depth=1),
    n_estimators=100, learning_rate=0.1
)

# Stacking
stacking = StackingClassifier(estimators=[
    ('rf', RandomForestClassifier()),
    ('svm', SVC(probability=True)),
    ('xgb', xgb.XGBClassifier())
], final_estimator=LogisticRegression())

Key Takeaways

  1. Ensembles almost always outperform single models
  2. Bagging reduces variance (Random Forest)
  3. Boosting reduces bias and variance (XGBoost)
  4. Stacking combines different model types
  5. Diversity between models is key to ensemble success
  6. Voting is simple — good baseline ensemble
  7. XGBoost is the most popular boosting algorithm
  8. Ensembles trade interpretability for performance

Advertisement

Need Expert Machine Learning Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement