Model Interpretability: SHAP, LIME
Introduction
Model interpretability is crucial for building trust, meeting regulatory requirements, and debugging model behavior. As models become more complex, understanding why a model makes a prediction becomes as important as the prediction itself.
Interpretability Spectrum:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
High Interpretability Low Interpretability
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββΊ
β β
Linear Decision Rule Random Gradient Neural Deep β
Regression Trees Lists Forest Boosting Networks Learningβ
β β
βββ Global Methods βββΊ βββ Local Methods βββΊ βββ Both βββββββΊ β
(SHAP) (LIME) (SHAP+LIME) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Why Interpretability Matters
Interpretability Benefits:
βββββββββββββββββββββββββββββββββββββββββββββββββββ
1. Trust Building 2. Debugging
βββββββββββββββββββββββ βββββββββββββββββββββββ
β "Why was my loan β β Model relies on β
β rejected?" β β correlation instead β
β β β causation β
β Transparent decisionβ β β Fix data/features β
β process β β β
βββββββββββββββββββββββ βββββββββββββββββββββββ
3. Regulatory Compliance 4. Scientific Discovery
βββββββββββββββββββββββ βββββββββββββββββββββββ
β GDPR "right to β β Feature importance β
β explanation" β β reveals new β
β β β domain insights β
β EU AI Act requires β β β
β transparency β β "What drives Y?" β
βββββββββββββββββββββββ βββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββ
SHAP (SHapley Additive exPlanations)
Theory
SHAP values are based on Shapley values from cooperative game theory. They provide a unified measure of feature contribution.
DfShapley Value
For a player in a coalition , the Shapley value measures the marginal contribution of player averaged over all possible coalitions.
Shapley Value Formula
Here,
- =Set of all features (players)
- =Subset of features not including i
- =Model prediction using features in S
- =SHAP value for feature i
ThShapley Value Axioms (Unique Solution)
The Shapley value is the unique allocation that satisfies four desirable axioms:
- Efficiency: The sum of all Shapley values equals the total value:
- Symmetry: If two players contribute equally to all coalitions, they receive equal Shapley values
- Dummy: A player that contributes nothing to any coalition receives a Shapley value of zero
- Additivity: The Shapley value of a sum of games equals the sum of individual Shapley values
This makes SHAP the only method that satisfies all four axioms simultaneously, providing a theoretically grounded explanation.
SHAP Properties:
Efficiency Property
Here,
- =SHAP value for feature i
- =Model prediction for instance x
- =Expected (average) model prediction
The SHAP values sum to the difference between the prediction and the average prediction.
βΉοΈ Why SHAP Values Sum to the Difference
The SHAP value decomposition ensures that the sum of all feature contributions equals the difference between the individual prediction and the average prediction. This provides a complete and consistent explanation β every feature's contribution is accounted for, and no prediction component is left unexplained.
βΉοΈ Why SHAP Values Sum to the Difference
The SHAP value decomposition ensures that the sum of all feature contributions equals the difference between the individual prediction and the average prediction. This provides a complete and consistent explanation.
SHAP Value Decomposition Example:
βββββββββββββββββββββββββββββββββββββββββββββββββββ
Model Prediction: 750 (Credit Score)
Average Prediction: 680
SHAP Values:
ββββββββββββββββββββββββββββββββββββββββββββββββ
β Income: +45 β
β Age: +20 β
β Debt: -30 β
β Credit History: +35 β
β ββββββββββββββββββββββββββββββββββββββ β
β Total SHAP: +70 β
β β
β 680 (avg) + 70 = 750 (prediction) β
ββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββ
SHAP Implementation
import numpy as np
import pandas as pd
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingRegressor
import shap
import matplotlib.pyplot as plt
# βββββββββββββββββββββββββββββββββββββββββββββββββββ
# Load and Prepare Data
# βββββββββββββββββββββββββββββββββββββββββββββββββββ
data = fetch_california_housing()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = data.target
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
# Train model
model = GradientBoostingRegressor(
n_estimators=200,
max_depth=5,
learning_rate=0.1,
random_state=42
)
model.fit(X_train, y_train)
print(f"RΒ² Score: {model.score(X_test, y_test):.4f}")
# βββββββββββββββββββββββββββββββββββββββββββββββββββ
# SHAP Explainer
# βββββββββββββββββββββββββββββββββββββββββββββββββββ
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X_test)
# βββββββββββββββββββββββββββββββββββββββββββββββββββ
# 1. Summary Plot (Global Feature Importance)
# βββββββββββββββββββββββββββββββββββββββββββββββββββ
plt.figure(figsize=(10, 8))
shap.summary_plot(shap_values, X_test, plot_type="bar", show=False)
plt.title("Global Feature Importance (SHAP)")
plt.tight_layout()
plt.savefig('shap_summary_bar.png', dpi=150)
plt.show()
# βββββββββββββββββββββββββββββββββββββββββββββββββββ
# 2. Beeswarm Plot (Feature Effects)
# βββββββββββββββββββββββββββββββββββββββββββββββββββ
plt.figure(figsize=(10, 8))
shap.summary_plot(shap_values, X_test, show=False)
plt.title("Feature Effects (SHAP Beeswarm)")
plt.tight_layout()
plt.savefig('shap_beeswarm.png', dpi=150)
plt.show()
# βββββββββββββββββββββββββββββββββββββββββββββββββββ
# 3. Dependence Plot (Feature Interaction)
# βββββββββββββββββββββββββββββββββββββββββββββββββββ
plt.figure(figsize=(10, 6))
shap.dependence_plot(
"MedInc", shap_values, X_test,
interaction_index="AveRooms",
show=False
)
plt.title("MedIncome Dependence (colored by AveRooms)")
plt.tight_layout()
plt.savefig('shap_dependence.png', dpi=150)
plt.show()
# βββββββββββββββββββββββββββββββββββββββββββββββββββ
# 4. Waterfall Plot (Individual Prediction)
# βββββββββββββββββββββββββββββββββββββββββββββββββββ
sample_idx = 0
plt.figure(figsize=(10, 8))
shap.waterfall_plot(
shap.Explanation(
values=shap_values[sample_idx],
base_values=explainer.expected_value,
data=X_test.iloc[sample_idx],
feature_names=X_test.columns.tolist()
),
show=False
)
plt.title("Individual Prediction Explanation")
plt.tight_layout()
plt.savefig('shap_waterfall.png', dpi=150)
plt.show()
# βββββββββββββββββββββββββββββββββββββββββββββββββββ
# 5. Force Plot (Single Prediction)
# βββββββββββββββββββββββββββββββββββββββββββββββββββ
shap.initjs()
force_plot = shap.force_plot(
explainer.expected_value,
shap_values[sample_idx],
X_test.iloc[sample_idx]
)
shap.save_html('shap_force_plot.html', force_plot)
# βββββββββββββββββββββββββββββββββββββββββββββββββββ
# SHAP Interaction Values
# βββββββββββββββββββββββββββββββββββββββββββββββββββ
shap_interaction = explainer.shap_interaction_values(X_test[:100])
plt.figure(figsize=(12, 10))
shap.summary_plot(
shap_interaction,
X_test[:100],
show=False
)
plt.title("Feature Interactions")
plt.tight_layout()
plt.savefig('shap_interactions.png', dpi=150)
plt.show()
LIME (Local Interpretable Model-agnostic Explanations)
Theory
LIME explains individual predictions by approximating the model locally with an interpretable model.
DfLIME Optimization Problem
LIME finds an interpretable model that approximates the complex model in the local neighborhood of the instance , balancing fidelity to with simplicity of .
LIME Objective
Here,
- =Complex (black-box) model
- =Interpretable model (e.g., linear model)
- =Local neighborhood around x
- =Complexity penalty for g
- =Fidelity loss measuring how well g approximates f
Sample Weighting:
LIME Neighborhood Weighting
Here,
- =Distance between original instance x and perturbed instance z
- =Kernel width controlling locality
π‘ Kernel Width Selection
The kernel width controls the "locality" of the explanation. A smaller makes the explanation more local (only very nearby points matter), while a larger considers a broader neighborhood. The default in the LIME library is .
LIME Explanation Process:
βββββββββββββββββββββββββββββββββββββββββββββββββββ
Original Data Point x:
βββββββββββββββββββββββββββββββββββββββββββ
β Income: $75,000 Age: 35 Debt: $15,000 β
β Credit Score: 720 β
βββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββ
β Step 1: Generate Neighborhood β
β β’ Perturb features around x β
β β’ Create weighted samples β
β β
β β β β β β β β β β β β β
β (β = original, β = perturbed) β
βββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββ
β Step 2: Get Model Predictions β
β β’ Run complex model on perturbations β
β β’ Assign weights by distance β
βββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββ
β Step 3: Fit Interpretable Model β
β β’ Train linear model on weighted data β
β β’ Feature importance = coefficients β
β β
β Income: +0.45 β Most important β
β Credit: +0.35 β
β Age: +0.12 β
β Debt: -0.25 β
βββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββ
ThLIME Local Approximation Guarantees
LIME provides locally faithful explanations: for a given instance , the explanation approximates the complex model well in a neighborhood around . However, unlike SHAP, LIME does not guarantee global consistency β explanations for nearby instances may differ significantly due to the sampling-based nature of the method.
πLIME Explanation for Loan Application
A bank uses a gradient boosting model to predict loan defaults. For a specific applicant:
Applicant features: Income = 15,000, Credit Score = 720
LIME Explanation (local linear model):
- Income: +0.45 (high income β lower risk)
- Credit Score: +0.35 (good score β lower risk)
- Debt: -0.25 (moderate debt β slight risk increase)
- Age: +0.12 (prime working age β slight positive)
Interpretation: The model predicts "low risk" primarily because of the applicant's high income and good credit score. The debt partially offsets these positives. This explanation is only valid for this specific prediction β a different applicant might have different feature contributions.
LIME Implementation
from lime.lime_tabular import LimeTabularExplainer
import lime.lime_tabular
# βββββββββββββββββββββββββββββββββββββββββββββββββββ
# LIME Explainer Setup
# βββββββββββββββββββββββββββββββββββββββββββββββββββ
lime_explainer = LimeTabularExplainer(
training_data=X_train.values,
feature_names=X_train.columns.tolist(),
mode='regression',
discretize_continuous=True,
random_state=42
)
# βββββββββββββββββββββββββββββββββββββββββββββββββββ
# Explain Individual Predictions
# βββββββββββββββββββββββββββββββββββββββββββββββββββ
# Explain 5 different samples
sample_indices = [0, 50, 100, 200, 500]
for idx in sample_indices:
exp = lime_explainer.explain_instance(
X_test.iloc[idx].values,
model.predict,
num_features=10,
top_labels=1
)
print(f"\n{'=' * 60}")
print(f"Sample {idx}")
print(f"Actual: {y_test.iloc[idx]:.4f}")
print(f"Predicted: {model.predict(X_test.iloc[[idx]])[0]:.4f}")
print(f"\nTop Feature Contributions:")
for feat, weight in exp.as_list()[:5]:
print(f" {feat}: {weight:.4f}")
# Save HTML explanation
exp.save_to_file(f'lime_explanation_{idx}.html')
# βββββββββββββββββββββββββββββββββββββββββββββββββββ
# Visualize LIME Explanation
# βββββββββββββββββββββββββββββββββββββββββββββββββββ
exp = lime_explainer.explain_instance(
X_test.iloc[0].values,
model.predict,
num_features=10
)
fig = exp.as_pyplot_figure()
plt.title("LIME Local Explanation")
plt.tight_layout()
plt.savefig('lime_explanation.png', dpi=150)
plt.show()
Partial Dependence Plots
from sklearn.inspection import partial_dependence, PartialDependenceDisplay
# βββββββββββββββββββββββββββββββββββββββββββββββββββ
# Individual Conditional Expectation (ICE) Plots
# βββββββββββββββββββββββββββββββββββββββββββββββββββ
fig, axes = plt.subplots(2, 4, figsize=(20, 10))
features = ['MedInc', 'HouseAge', 'AveRooms', 'AveBedrms',
'Population', 'AveOccup', 'Latitude', 'Longitude']
for i, feature in enumerate(features):
row, col = i // 4, i % 4
PartialDependenceDisplay.from_estimator(
model, X_train, [feature],
ax=axes[row, col],
kind='both', # ICE + PDP
subsample=50,
grid_resolution=50
)
axes[row, col].set_title(f'PDP: {feature}')
plt.suptitle("Partial Dependence Plots", fontsize=16, y=1.02)
plt.tight_layout()
plt.savefig('partial_dependence_plots.png', dpi=150, bbox_inches='tight')
plt.show()
# βββββββββββββββββββββββββββββββββββββββββββββββββββ
# 2D Partial Dependence (Feature Interactions)
# βββββββββββββββββββββββββββββββββββββββββββββββββββ
fig, ax = plt.subplots(figsize=(10, 8))
PartialDependenceDisplay.from_estimator(
model, X_train, [('MedInc', 'AveRooms')],
ax=ax,
grid_resolution=50
)
plt.title("2D PDP: MedIncome Γ AveRooms Interaction")
plt.tight_layout()
plt.savefig('pdp_2d_interaction.png', dpi=150)
plt.show()
Global vs Local Interpretability
Global vs Local Interpretability:
βββββββββββββββββββββββββββββββββββββββββββββββββββ
GLOBAL (Overall Model Behavior)
ββββββββββββββββββββββββββββββββ
Methods:
β’ Feature Importance Rankings
β’ Partial Dependence Plots
β’ SHAP Summary Plots
Question: "What features matter most overall?"
ββββββββββββββββββββββββββββββββββββββββββββββββ
β Feature Importance: β
β 1. Income ββββββββββββββββ (0.35) β
β 2. Credit Score ββββββββββββββ (0.30) β
β 3. Age ββββββββ (0.15) β
β 4. Debt ββββββ (0.12) β
β 5. Employment ββββ (0.08) β
ββββββββββββββββββββββββββββββββββββββββββββββββ
LOCAL (Individual Prediction)
ββββββββββββββββββββββββββββββββ
Methods:
β’ LIME Explanations
β’ SHAP Waterfall Plots
β’ Counterfactual Explanations
Question: "Why did THIS prediction happen?"
ββββββββββββββββββββββββββββββββββββββββββββββββ
β Sample #142: β
β Base Value: $680 β
β Income: +$45 (high income) β
β Credit: +$35 (good history) β
β Debt: -$30 (high debt) β
β Age: +$20 (prime age) β
β βββββββββββββββββββββββββββββββ β
β Prediction: $750 β
ββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββ
Comparison: SHAP vs LIME
SHAP vs LIME Comparison:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Feature β SHAP β LIME
βββββββββββββββͺββββββββββββββββββββββββββͺβββββββββββββββββββββββ
Theory β Game Theory (Shapley) β Local Surrogate Model
Speed β Slower (exact) β Faster (sampling)
Consistency β Guaranteed β Not guaranteed
Global β Yes (summary plots) β No (local only)
Local β Yes (waterfall) β Yes (primary use)
Interactions β Yes (SHAP interaction) β No (independent)
βββββββββββββββ§ββββββββββββββββββββββββββ§βββββββββββββββββββββββ
When to Use:
βββββββββββββ
β’ SHAP: When you need theoretical guarantees and both global/local
β’ LIME: When you need fast, model-agnostic local explanations
β’ Both: For comprehensive interpretability analysis
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
πKey Takeaways
- SHAP is grounded in game theory and satisfies the four Shapley axioms (efficiency, symmetry, dummy, additivity), providing theoretically unique and consistent explanations
- LIME approximates the model locally with an interpretable surrogate, offering fast model-agnostic explanations without theoretical guarantees
- Efficiency Property: SHAP values always sum to , providing complete attribution
- Feature importance rankings help with global model understanding; partial dependence plots reveal feature effects and interactions
- Local vs. Global: SHAP supports both local (waterfall) and global (summary) explanations; LIME is primarily local
- Validation: Always validate interpretations with domain experts β statistical correlation does not imply causation
Practice Exercises
- SHAP vs LIME Agreement: Compare SHAP and LIME explanations for the same predictions - when do they disagree?
- Adversarial Analysis: Find predictions where high feature values lead to low SHAP values (feature correlation effects)
- Interactive Dashboard: Build a Streamlit app that shows SHAP explanations for user-selected predictions
- Causal Interpretation: Discuss when SHAP values can vs cannot be interpreted causally