CW

Model Interpretability: SHAP, LIME and Feature Importance

Module 8: Tree-Based ModelsFree Lesson

Advertisement

Why Interpretability Matters

Modern ML models achieve remarkable predictive accuracy, but black-box predictions are insufficient when trust, debugging, and regulatory compliance are required.

The Interpretability Imperative

StakeholderNeedExample
RegulatorsLegal complianceEU GDPR Article 22 โ€” "right to explanation"
Domain ExpertsScientific validationDo feature effects align with theory?
EngineersDebugging and monitoringWhich features drive distribution shifts?
End UsersTrust and adoptionWhy was my loan denied?

When Interpretability Is Critical

Architecture Diagram
High-stakes decisions:
โ”œโ”€โ”€ Healthcare: Diagnosis explanations
โ”œโ”€โ”€ Finance: Credit scoring, fraud detection
โ”œโ”€โ”€ Criminal justice: Risk assessment
โ”œโ”€โ”€ Autonomous systems: Safety-critical decisions
โ””โ”€โ”€ Insurance: Premium pricing transparency

A model with 99% accuracy that cannot explain why it makes predictions is dangerous in regulated industries. Interpretability is not optionalโ€”it is a legal and ethical requirement.


Interpretability Spectrum

Models range from fully interpretable (you can read the rules) to completely opaque (only inputs and outputs visible).

IntrinsicSemi-TransparentBlack BoxLinear RegressionDecision TreeRule ListRandom ForestXGBoostNeural NetTransformerInterpretability โ† โ†’ Complexity

Intrinsic vs Post-Hoc Interpretability

CategoryMethodProsCons
IntrinsicLinear models, treesBuilt-in, no extra computationLimited to simple models
Post-Hoc (Model-Specific)Tree feature importanceFast, native to modelOnly for that model type
Post-Hoc (Model-Agnostic)SHAP, LIME, PDPWorks on any modelComputationally expensive

Feature Importance

Permutation Feature Importance

Measures importance by randomly shuffling each feature and measuring the drop in model performance.

Algorithm:

  1. Train model ff, compute baseline score S0=Score(y,f(X))S_0 = \text{Score}(y, f(X))
  2. For each feature jโˆˆ{1,โ€ฆ,p}j \in \{1, \ldots, p\}:
    • Create permuted dataset Xjฯ€X_j^{\pi} (column jj shuffled randomly)
    • Compute Sj=Score(y,f(Xjฯ€))S_j = \text{Score}(y, f(X_j^{\pi}))
  3. Feature importance: FIj=S0โˆ’Sj\text{FI}_j = S_0 - S_j

Theoretical Foundation:

Permutation importance approximates the expected performance drop:

FIj=EX[Loss(y,f(X))]โˆ’EX[Loss(y,f(Xjฯ€))]\text{FI}_j = \mathbb{E}_{X}\left[\text{Loss}(y, f(X))\right] - \mathbb{E}_{X}\left[\text{Loss}(y, f(X_j^{\pi}))\right]

where Xjฯ€X_j^{\pi} denotes XX with feature jj replaced by a random permutation.

Permutation importance is model-agnostic and measures the decrease in model performance when a single feature's values are randomly shuffled. It captures both linear and nonlinear effects.

Implementation

import numpy as np
from sklearn.inspection import permutation_importance
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_breast_cancer

# Load data
data = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(
    data.data, data.target, test_size=0.3, random_state=42
)

# Train model
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

# Permutation importance
result = permutation_importance(
    model, X_test, y_test, n_repeats=10, random_state=42
)

# Display results
import pandas as pd
importance_df = pd.DataFrame({
    'Feature': data.feature_names,
    'Importance_Mean': result.importances_mean,
    'Importance_Std': result.importances_std
}).sort_values('Importance_Mean', ascending=False)

print(importance_df.head(10))

Feature Importance Comparison

Feature Importance Methods Comparisonworst radiusworst perimetermean concave ptsworst concave ptsmean radiusmean texturemean smoothnessPermutation ImportanceGini ImportanceSHAP ImportanceNormalized Importance (0 - 1)0.000.250.500.751.00

Gini Importance (Mean Decrease in Impurity)

For tree-based models, Gini importance measures the total reduction of impurity (Gini or entropy) contributed by each feature across all trees:

Ginij=โˆ‘t=1Tโˆ‘nโˆˆsplits(t)ฮ”I(n,t)โ‹…1[xjย usedย atย n]\text{Gini}_j = \sum_{t=1}^{T} \sum_{n \in \text{splits}(t)} \Delta I(n, t) \cdot \mathbb{1}[x_j \text{ used at } n]

where ฮ”I(n,t)\Delta I(n, t) is the impurity decrease at node nn in tree tt.


SHAP (SHapley Additive exPlanations)

Game Theory Foundation

SHAP is grounded in cooperative game theory. Each feature is treated as a "player" in a game, and SHAP values compute the fair marginal contribution of each feature.

The Shapley value for feature jj is:

ฯ•j=โˆ‘SโІFโˆ–{j}โˆฃSโˆฃ!โ‹…(โˆฃFโˆฃโˆ’โˆฃSโˆฃโˆ’1)!โˆฃFโˆฃ![v(Sโˆช{j})โˆ’v(S)]\phi_j = \sum_{S \subseteq F \setminus \{j\}} \frac{|S|! \cdot (|F| - |S| - 1)!}{|F|!} \left[ v(S \cup \{j\}) - v(S) \right]

where:

  • FF is the set of all features
  • SS is a subset of features not including jj
  • v(S)v(S) is the value function (model prediction using features in SS)
  • โˆฃSโˆฃ!โ‹…(โˆฃFโˆฃโˆ’โˆฃSโˆฃโˆ’1)!โˆฃFโˆฃ!\frac{|S|! \cdot (|F| - |S| - 1)!}{|F|!} is the weighting factor

Key SHAP Properties

PropertyDefinitionImplication
Efficiencyโˆ‘j=1pฯ•j=f(x)โˆ’E[f(X)]\sum_{j=1}^{p} \phi_j = f(x) - \mathbb{E}[f(X)]SHAP values sum to the deviation from the expected prediction
SymmetryIf jj and kk contribute equally, ฯ•j=ฯ•k\phi_j = \phi_kFair allocation
Null PlayerIf jj doesn't affect output, ฯ•j=0\phi_j = 0Irrelevant features get zero
Linearityฯ•j(f+g)=ฯ•j(f)+ฯ•j(g)\phi_j(f + g) = \phi_j(f) + \phi_j(g)Additive decomposition

SHAP Additive Explanation

The SHAP explanation model is:

g(z)=ฯ•0+โˆ‘j=1pฯ•jzjg(z) = \phi_0 + \sum_{j=1}^{p} \phi_j z_j

where zjโˆˆ{0,1}z_j \in \{0, 1\} indicates whether feature jj is included, and ฯ•0=E[f(X)]\phi_0 = \mathbb{E}[f(X)].

TreeSHAP: Efficient Exact Computation

For tree ensembles, TreeSHAP computes exact Shapley values in O(TLD2)O(TLD^2) time (not exponential):

TreeSHAP(f,x)j=โˆ‘ฮดโˆˆpathsp(ฮด)โ‹…contribution(j,ฮด)\text{TreeSHAP}(f, x)_j = \sum_{\delta \in \text{paths}} p(\delta) \cdot \text{contribution}(j, \delta)

where:

  • TT = number of trees
  • LL = number of leaves
  • DD = maximum depth

TreeSHAP avoids enumerating all 2p2^p feature subsets by exploiting the recursive structure of trees.

SHAP Waterfall PlotBase value (E[f(X)]):0.620worst radius-0.185worst perimeter-0.124mean concave points-0.098worst concave points-0.067mean radius+0.045mean texture+0.032mean smoothness+0.018Final prediction: 0.341 (Malignant)Negative (lowers prediction)Positive (raises prediction)

SHAP Implementation

import shap
import xgboost as xgb
from sklearn.datasets import load_breast_cancer

# Load data and train XGBoost
data = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(
    data.data, data.target, test_size=0.3, random_state=42
)

model = xgb.XGBClassifier(n_estimators=100, use_label_encoder=False, eval_metric='logloss')
model.fit(X_train, y_train)

# Compute SHAP values using TreeSHAP
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X_test)

# Summary plot (global feature importance)
shap.summary_plot(shap_values, X_test, feature_names=data.feature_names)

# Waterfall plot (single prediction)
shap.waterfall_plot(shap.Explanation(
    values=shap_values[0],
    base_values=explainer.expected_value,
    data=X_test[0],
    feature_names=data.feature_names
))

# Dependence plot (feature interaction)
shap.dependence_plot("worst radius", shap_values, X_test, 
                     feature_names=data.feature_names)

SHAP for Deep Learning (DeepSHAP / GradientSHAP)

For neural networks, approximate Shapley values using:

ฯ•j(x)โ‰ˆโˆ‚fxโ€ฒ(x)โˆ‚xjโ‹…(xjโˆ’E[Xj])\phi_j(x) \approx \frac{\partial f_{x'}(x)}{\partial x_j} \cdot (x_j - \mathbb{E}[X_j])
import shap
import torch
import torchvision

# Load pretrained model
model = torchvision.models.resnet18(pretrained=True)
model.eval()

# DeepExplainer for deep learning
background = torch.randn(100, 3, 224, 224)  # Random background samples
e = shap.DeepExplainer(model, background)
shap_values = e.shap_values(test_images)

# Visualize
shap.image_plot(shap_values, -test_images.numpy())

LIME (Local Interpretable Model-agnostic Explanations)

Core Idea

LIME explains individual predictions by fitting a local linear model around the instance of interest in the perturbed input space.

LIME Algorithm

Objective:

ฮพ(x)=argโกminโกgโˆˆGL(f,g,ฯ€x)+ฮฉ(g)\xi(x) = \arg\min_{g \in G} \mathcal{L}(f, g, \pi_x) + \Omega(g)

where:

  • ff is the black-box model
  • gโˆˆGg \in G is an interpretable model (e.g., linear model)
  • ฯ€x(z)=expโก(โˆ’D(x,z)2ฯƒ2)\pi_x(z) = \exp\left(-\frac{D(x, z)^2}{\sigma^2}\right) is a kernel measuring proximity to instance xx
  • ฮฉ(g)\Omega(g) is a complexity penalty (e.g., number of features)

Step-by-step:

  1. Perturb: Generate samples ziz_i around xx by sampling from N(x,ฯƒ2I)N(x, \sigma^2 I)
  2. Predict: Get f(zi)f(z_i) for each perturbed sample
  3. Weight: Compute weights wi=ฯ€x(zi)w_i = \pi_x(z_i)
  4. Fit: Train interpretable model gg on weighted dataset {(zi,f(zi),wi)}\{(z_i, f(z_i), w_i)\}
  5. Explain: Return coefficients of gg as local explanation
LIME Local ExplanationDecision boundaryLocal linear modelx (instance)Kernel neighborhoodBackground samplesPerturbed samples (weighted)Local explanation

LIME Implementation

import lime
import lime.lime_tabular
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split

# Load data and train model
data = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(
    data.data, data.target, test_size=0.3, random_state=42
)

model = GradientBoostingClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

# Create LIME explainer
explainer = lime.lime_tabular.LimeTabularExplainer(
    training_data=X_train,
    feature_names=data.feature_names,
    class_names=['Benign', 'Malignant'],
    mode='classification',
    discretize_continuous=True
)

# Explain a single prediction
instance = X_test[0]
explanation = explainer.explain_instance(
    instance,
    model.predict_proba,
    num_features=10
)

# Visualize
explanation.show_in_notebook()

# Get feature contributions as dataframe
exp_df = explanation.as_list()
print("LIME Explanation:")
for feature, weight in exp_df:
    print(f"  {feature}: {weight:.4f}")

LIME for Images and Text

# Image classification with LIME
from lime import lime_image
from skimage.segmentation import slic

explainer = lime_image.LimeImageExplainer()

def predict_fn(images):
    """Model prediction function for LIME"""
    return model.predict(preprocess(images))

explanation = explainer.explain_instance(
    image,
    predict_fn,
    top_labels=5,
    hide_color=0,
    num_samples=1000,
    segmentation_fn=slic  # Superpixel segmentation
)

# Get explanation for top predicted label
temp, mask = explanation.get_image_and_mask(
    explanation.top_labels[0],
    positive_only=True,
    num_features=5,
    hide_rest=False
)

# Text classification with LIME
from lime.lime_text import LimeTextExplainer

text_explainer = LimeTextExplainer(class_names=['Negative', 'Positive'])
text_exp = text_explainer.explain_instance(
    "This movie was absolutely fantastic!",
    classifier.predict_proba,
    num_features=6
)

Partial Dependence Plots (PDP)

Mathematical Definition

The partial dependence function shows the marginal effect of feature jj on the prediction:

f^j(xj)=EXC[f(xj,XC)]=โˆซf(xj,XC)โ€‰dP(XC)\hat{f}_j(x_j) = \mathbb{E}_{X_{C}}\left[f(x_j, X_C)\right] = \int f(x_j, X_C) \, dP(X_C)

where XCX_C denotes all features except jj.

The empirical estimate is:

f^j(xj)=1nโˆ‘i=1nf(xj,xi,C)\hat{f}_j(x_j) = \frac{1}{n} \sum_{i=1}^{n} f(x_j, x_{i,C})

Individual Conditional Expectation (ICE)

ICE curves extend PDP by showing the effect for each individual instance:

f^ICE,i(xj)=f(xj,xi,C)forย eachย i=1,โ€ฆ,n\hat{f}_{\text{ICE}, i}(x_j) = f(x_j, x_{i,C}) \quad \text{for each } i = 1, \ldots, n

While PDP shows the average effect, ICE reveals heterogeneity in the effect across instances.

Partial Dependence Plot (PDP) + ICEFeature value (e.g., "worst radius")Partial dependence f(x)2.04.06.08.010152025303540PDP (average)ICE (individual)

PDP and ICE Implementation

from sklearn.inspection import PartialDependenceDisplay
import matplotlib.pyplot as plt
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split

# Load data and train model
data = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(
    data.data, data.target, test_size=0.3, random_state=42
)

model = GradientBoostingClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

# Single feature PDP
fig, ax = plt.subplots(figsize=(10, 6))
PartialDependenceDisplay.from_estimator(
    model, X_test,
    features=[0, 1, 2, 3],  # Feature indices
    feature_names=data.feature_names,
    kind='both',  # PDP + ICE
    subsample=50,  # Number of ICE curves
    grid_resolution=50,
    ax=ax
)
plt.suptitle('Partial Dependence and Individual Conditional Expectation')
plt.tight_layout()
plt.show()

# 2D PDP (feature interaction)
fig, ax = plt.subplots(figsize=(10, 8))
PartialDependenceDisplay.from_estimator(
    model, X_test,
    features=[(0, 1)],  # 2D interaction
    feature_names=data.feature_names,
    grid_resolution=30,
    ax=ax
)
plt.title('2D Partial Dependence Plot (Feature Interaction)')
plt.show()

Practical Comparison: When to Use What

MethodScopeSpeedFaithfulnessBest For
Permutation ImportanceGlobalFastHighQuick feature ranking
SHAPGlobal + LocalSlowVery HighDetailed explanations, theory
LIMELocalMediumMediumQuick local explanations
PDPGlobalFastHighFeature effect visualization
ICEGlobal + IndividualMediumHighHeterogeneity detection

Decision Framework

Architecture Diagram
Need to explain...
โ”œโ”€โ”€ Single prediction โ†’ LIME (fast) or SHAP (accurate)
โ”œโ”€โ”€ Global feature effects โ†’ PDP + ICE
โ”œโ”€โ”€ Feature importance ranking โ†’ Permutation or SHAP
โ”œโ”€โ”€ Feature interactions โ†’ SHAP dependence plot or 2D PDP
โ””โ”€โ”€ Regulatory compliance โ†’ SHAP (game-theoretic guarantees)

Advanced: SHAP Interaction Values

SHAP can decompose effects into main effects and interaction effects:

f(x)=E[f(X)]+โˆ‘jฯ•j+โˆ‘j<kฯ•jkf(x) = \mathbb{E}[f(X)] + \sum_{j} \phi_j + \sum_{j < k} \phi_{jk}

where ฯ•jk\phi_{jk} captures the interaction between features jj and kk.

# SHAP interaction values
explainer = shap.TreeExplainer(model)
shap_interaction = explainer.shap_interaction_values(X_test)

# Visualization
shap.summary_plot(shap_interaction, X_test, feature_names=data.feature_names)

Implementation: Complete Pipeline

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import shap
import lime
import lime.lime_tabular
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.inspection import permutation_importance, PartialDependenceDisplay
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_breast_cancer
from sklearn.metrics import accuracy_score

# Load data
data = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(
    data.data, data.target, test_size=0.3, random_state=42
)

# Train model
model = RandomForestClassifier(n_estimators=200, random_state=42)
model.fit(X_train, y_train)

print(f"Test Accuracy: {accuracy_score(y_test, model.predict(X_test)):.4f}")

# 1. Permutation Feature Importance
perm_result = permutation_importance(model, X_test, y_test, n_repeats=10, random_state=42)
perm_df = pd.DataFrame({
    'Feature': data.feature_names,
    'Importance': perm_result.importances_mean
}).sort_values('Importance', ascending=False)
print("\n=== Permutation Feature Importance (Top 10) ===")
print(perm_df.head(10).to_string(index=False))

# 2. SHAP Analysis
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X_test)

# Summary plot
shap.summary_plot(shap_values[1], X_test, feature_names=data.feature_names, show=False)
plt.title("SHAP Summary Plot (Malignant Class)")
plt.tight_layout()
plt.savefig("shap_summary.png", dpi=150)
plt.show()

# Single prediction explanation
idx = 0
shap.waterfall_plot(shap.Explanation(
    values=shap_values[1][idx],
    base_values=explainer.expected_value[1],
    data=X_test[idx],
    feature_names=data.feature_names
))

# 3. LIME Explanation
lime_explainer = lime.lime_tabular.LimeTabularExplainer(
    training_data=X_train,
    feature_names=list(data.feature_names),
    class_names=['Benign', 'Malignant'],
    mode='classification'
)
lime_exp = lime_explainer.explain_instance(
    X_test[idx],
    model.predict_proba,
    num_features=10
)

# 4. Partial Dependence Plot
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
PartialDependenceDisplay.from_estimator(
    model, X_test,
    features=['worst radius', 'worst concave points'],
    feature_names=data.feature_names,
    kind='both',
    subsample=50,
    ax=axes
)
plt.suptitle('Partial Dependence Plots')
plt.tight_layout()
plt.savefig("pdp_plots.png", dpi=150)
plt.show()

print("\n=== Analysis Complete ===")

Evaluation: Knowledge Check

Q1. What is the fundamental difference between permutation importance and SHAP-based importance?

Q2. Why does SHAP use Shapley values rather than simple marginal contributions?

Q3. When would LIME provide a better explanation than SHAP?

Q4. What does the width of ICE curves in a PDP reveal?

Q5. Prove that Shapley values satisfy the efficiency property: โˆ‘jฯ•j=v(N)โˆ’v(โˆ…)\sum_j \phi_j = v(N) - v(\emptyset).


Key Takeaways

  1. SHAP provides theoretically grounded explanations with consistency guarantees
  2. LIME is faster for single-instance explanations but lacks global consistency
  3. PDP/ICE reveals how features affect predictions on average and individually
  4. Permutation importance is the simplest model-agnostic global method
  5. TreeSHAP makes exact Shapley computation feasible for tree ensembles
  6. Always validate explanations against domain knowledge โ€” no method is perfect

The field of Explainable AI (XAI) is evolving rapidly. Recent advances include Counterfactual Explanations, Anchors, Concept-based Explanations, and Influence Functions. The techniques covered here form the foundation for understanding any new method that emerges.


Next: Hyperparameter Tuning โ€” Learn systematic approaches to optimizing model performance through Bayesian optimization, grid search, and random search.

Advertisement

Need Expert Data Science Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement