XGBoost & Gradient Boosting — Complete Guide

Core MLEnsemble MethodsFree Lesson

Advertisement

XGBoost & Gradient Boosting — Complete Guide

Gradient Boosting builds trees sequentially — each new tree corrects the errors of previous ones. XGBoost is the most popular implementation.


Boosting vs Bagging

Bagging (Random Forest):
├─ Trees trained INDEPENDENTLY (parallel)
├─ Each tree on different data sample
├─ Reduce variance
└─ Combine by averaging

Boosting (XGBoost):
├─ Trees trained SEQUENTIALLY
├─ Each tree corrects previous errors
├─ Reduce bias AND variance
└─ Combine by weighted sum

How Gradient Boosting Works

Algorithm:
1. Start with simple prediction (mean for regression)
2. Compute residuals (errors)
3. Fit a tree to predict residuals
4. Add tree to ensemble (with learning rate)
5. Repeat steps 2-4

Iteration 0: Predict mean
Iteration 1: Tree predicts residuals → add to prediction
Iteration 2: Tree predicts new residuals → add to prediction
...
Iteration N: Final prediction = sum of all trees

Key insight: Each tree learns from the MISTAKES of previous trees

XGBoost Implementation

import xgboost as xgb
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Generate data
X, y = make_classification(n_samples=1000, n_features=20,
                           n_informative=10, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Train XGBoost
model = xgb.XGBClassifier(
    n_estimators=100,
    max_depth=6,
    learning_rate=0.1,
    subsample=0.8,
    colsample_bytree=0.8,
    random_state=42,
    eval_metric='logloss'
)
model.fit(X_train, y_train)

# Evaluate
y_pred = model.predict(X_test)
print(f"Accuracy: {accuracy_score(y_test, y_pred):.3f}")

# Feature importance
xgb.plot_importance(model)

Key Hyperparameters

n_estimators: Number of trees (100-1000)
learning_rate: Step size (0.01-0.3)
├─ Lower = need more trees
├─ Higher = need fewer trees
└─ Usually 0.01-0.1

max_depth: Tree depth (3-10)
├─ Shallower = simpler, less overfitting
└─ Deeper = more complex, may overfit

subsample: Row sampling (0.5-1.0)
colsample_bytree: Column sampling (0.5-1.0)
├─ Similar to Random Forest feature sampling
└─ Adds randomness, reduces overfitting

min_child_weight: Minimum child weight (1-10)
gamma: Minimum loss reduction for split (0-5)

XGBoost vs Random Forest

                    Random Forest     XGBoost
Training            Parallel          Sequential
Speed               Faster            Slower
Overfitting         Less likely       More likely
Performance         Good              Excellent
Hyperparameters     Fewer             More
Interpretability    Feature importance Feature importance

Key Takeaways

  1. Gradient Boosting trains trees sequentially — each corrects errors
  2. XGBoost is the most popular and fastest implementation
  3. Learning rate controls how much each tree contributes
  4. Lower learning rate + more trees = better generalization
  5. XGBoost wins Kaggle competitions — extremely effective
  6. Early stopping prevents overfitting automatically
  7. Use cross-validation to find optimal number of trees
  8. XGBoost handles missing values natively

Advertisement

Need Expert Machine Learning Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement