XGBoost & Gradient Boosting — Complete Guide

Gradient Boosting builds trees sequentially — each new tree corrects the errors of previous ones. XGBoost is the most popular implementation.

Boosting vs Bagging

Bagging (Random Forest):
├─ Trees trained INDEPENDENTLY (parallel)
├─ Each tree on different data sample
├─ Reduce variance
└─ Combine by averaging

Boosting (XGBoost):
├─ Trees trained SEQUENTIALLY
├─ Each tree corrects previous errors
├─ Reduce bias AND variance
└─ Combine by weighted sum

How Gradient Boosting Works

Algorithm:
1. Start with simple prediction (mean for regression)
2. Compute residuals (errors)
3. Fit a tree to predict residuals
4. Add tree to ensemble (with learning rate)
5. Repeat steps 2-4

Iteration 0: Predict mean
Iteration 1: Tree predicts residuals → add to prediction
Iteration 2: Tree predicts new residuals → add to prediction
...
Iteration N: Final prediction = sum of all trees

Key insight: Each tree learns from the MISTAKES of previous trees

XGBoost Implementation

import xgboost as xgb
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Generate data
X, y = make_classification(n_samples=1000, n_features=20,
                           n_informative=10, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Train XGBoost
model = xgb.XGBClassifier(
    n_estimators=100,
    max_depth=6,
    learning_rate=0.1,
    subsample=0.8,
    colsample_bytree=0.8,
    random_state=42,
    eval_metric='logloss'
)
model.fit(X_train, y_train)

# Evaluate
y_pred = model.predict(X_test)
print(f"Accuracy: {accuracy_score(y_test, y_pred):.3f}")

# Feature importance
xgb.plot_importance(model)

Key Hyperparameters

n_estimators: Number of trees (100-1000)
learning_rate: Step size (0.01-0.3)
├─ Lower = need more trees
├─ Higher = need fewer trees
└─ Usually 0.01-0.1

max_depth: Tree depth (3-10)
├─ Shallower = simpler, less overfitting
└─ Deeper = more complex, may overfit

subsample: Row sampling (0.5-1.0)
colsample_bytree: Column sampling (0.5-1.0)
├─ Similar to Random Forest feature sampling
└─ Adds randomness, reduces overfitting

min_child_weight: Minimum child weight (1-10)
gamma: Minimum loss reduction for split (0-5)

XGBoost vs Random Forest

                    Random Forest     XGBoost
Training            Parallel          Sequential
Speed               Faster            Slower
Overfitting         Less likely       More likely
Performance         Good              Excellent
Hyperparameters     Fewer             More
Interpretability    Feature importance Feature importance

Key Takeaways

Gradient Boosting trains trees sequentially — each corrects errors
XGBoost is the most popular and fastest implementation
Learning rate controls how much each tree contributes
Lower learning rate + more trees = better generalization
XGBoost wins Kaggle competitions — extremely effective
Early stopping prevents overfitting automatically
Use cross-validation to find optimal number of trees
XGBoost handles missing values natively

XGBoost & Gradient Boosting — Complete Guide

XGBoost & Gradient Boosting — Complete Guide

Boosting vs Bagging

How Gradient Boosting Works

XGBoost Implementation

Key Hyperparameters

XGBoost vs Random Forest

Key Takeaways

Need Expert Machine Learning Help?