XGBoost & Gradient Boosting — Complete Guide
Gradient Boosting builds trees sequentially — each new tree corrects the errors of previous ones. XGBoost is the most popular implementation.
Boosting vs Bagging
Bagging (Random Forest):
├─ Trees trained INDEPENDENTLY (parallel)
├─ Each tree on different data sample
├─ Reduce variance
└─ Combine by averaging
Boosting (XGBoost):
├─ Trees trained SEQUENTIALLY
├─ Each tree corrects previous errors
├─ Reduce bias AND variance
└─ Combine by weighted sum
How Gradient Boosting Works
Algorithm:
1. Start with simple prediction (mean for regression)
2. Compute residuals (errors)
3. Fit a tree to predict residuals
4. Add tree to ensemble (with learning rate)
5. Repeat steps 2-4
Iteration 0: Predict mean
Iteration 1: Tree predicts residuals → add to prediction
Iteration 2: Tree predicts new residuals → add to prediction
...
Iteration N: Final prediction = sum of all trees
Key insight: Each tree learns from the MISTAKES of previous trees
XGBoost Implementation
import xgboost as xgb
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
# Generate data
X, y = make_classification(n_samples=1000, n_features=20,
n_informative=10, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
# Train XGBoost
model = xgb.XGBClassifier(
n_estimators=100,
max_depth=6,
learning_rate=0.1,
subsample=0.8,
colsample_bytree=0.8,
random_state=42,
eval_metric='logloss'
)
model.fit(X_train, y_train)
# Evaluate
y_pred = model.predict(X_test)
print(f"Accuracy: {accuracy_score(y_test, y_pred):.3f}")
# Feature importance
xgb.plot_importance(model)
Key Hyperparameters
n_estimators: Number of trees (100-1000)
learning_rate: Step size (0.01-0.3)
├─ Lower = need more trees
├─ Higher = need fewer trees
└─ Usually 0.01-0.1
max_depth: Tree depth (3-10)
├─ Shallower = simpler, less overfitting
└─ Deeper = more complex, may overfit
subsample: Row sampling (0.5-1.0)
colsample_bytree: Column sampling (0.5-1.0)
├─ Similar to Random Forest feature sampling
└─ Adds randomness, reduces overfitting
min_child_weight: Minimum child weight (1-10)
gamma: Minimum loss reduction for split (0-5)
XGBoost vs Random Forest
Random Forest XGBoost
Training Parallel Sequential
Speed Faster Slower
Overfitting Less likely More likely
Performance Good Excellent
Hyperparameters Fewer More
Interpretability Feature importance Feature importance
Key Takeaways
- Gradient Boosting trains trees sequentially — each corrects errors
- XGBoost is the most popular and fastest implementation
- Learning rate controls how much each tree contributes
- Lower learning rate + more trees = better generalization
- XGBoost wins Kaggle competitions — extremely effective
- Early stopping prevents overfitting automatically
- Use cross-validation to find optimal number of trees
- XGBoost handles missing values natively