← Math|56 of 100
Statistics

Bootstrapping

Master bootstrap methods for confidence intervals, hypothesis testing, and uncertainty estimation.

📂 Resampling📖 Lesson 56 of 100🎓 Free Course

Advertisement

Bootstrapping

â„šī¸ Why It Matters

Bootstrapping makes minimal assumptions, providing robust uncertainty estimates for any statistic. When you need a confidence interval for a median, a ratio, or a model parameter with no closed-form variance, the bootstrap is the universal tool. It replaces theoretical distributional assumptions with computational resampling — making it one of the most versatile methods in modern statistics.


Overview

The bootstrap principle treats the empirical distribution F^\hat{F} as an approximation of the true population distribution FF. By resampling from F^\hat{F} (with replacement), we approximate the sampling distribution of any statistic. The algorithm: (1) resample data with replacement BB times (typically 10,000), (2) compute the statistic on each resample, (3) take percentiles of the bootstrap distribution as the confidence interval. The percentile method is simple and intuitive. The BCa (bias-corrected and accelerated) method improves accuracy by adjusting for bias and skewness. Permutation tests complement bootstrapping by building exact null distributions for hypothesis testing.


Key Concepts

Bootstrap Principle

F^∗≈F\hat{F}^* \approx F

Here,

  • F^∗\hat{F}^*=Empirical distribution from observed data
  • FF=True population distribution

Bootstrap Resampling

xb∗=Resample(x1,â€Ļ,xn) with replacement;θ^b∗=T(xb∗)x^{*}_{b} = \text{Resample}(x_1, \ldots, x_n) \text{ with replacement}; \quad \hat{\theta}^{*}_b = T(x^{*}_{b})

Here,

  • BB=Number of bootstrap resamples (typically 10,000)
  • xb∗x^{*}_{b}=b-th bootstrap sample (size n, drawn with replacement)
  • θ^b∗\hat{\theta}^{*}_b=Bootstrap replicate of the statistic
  • T(⋅)T(\cdot)=The statistic being estimated

Percentile Bootstrap CI

CI=[θ^(α/2)∗,  θ^(1−α/2)∗]\text{CI} = [\hat{\theta}^{*}_{(\alpha/2)},\; \hat{\theta}^{*}_{(1-\alpha/2)}]

Here,

  • θ^(Îą/2)∗\hat{\theta}^{*}_{(\alpha/2)}=$\alpha/2$ quantile of bootstrap statistics

Bootstrap Standard Error

SEboot=std(θ^1∗,θ^2∗,â€Ļ,θ^B∗)SE_{boot} = \text{std}(\hat{\theta}^{*}_1, \hat{\theta}^{*}_2, \ldots, \hat{\theta}^{*}_B)

Here,

  • θ^b∗\hat{\theta}^{*}_b=b-th bootstrap replicate

Permutation Test P-Value

p=count of permuted statisticsâ‰ĨobservedBp = \frac{\text{count of permuted statistics} \geq \text{observed}}{B}

Here,

  • BB=Total number of permutations

Bootstrap CI Methods

MethodDescriptionBest WhenAccuracy
PercentileDirect quantiles of bootstrap distributionStatistic is symmetricGood
BCaBias-corrected and acceleratedStatistic is biased or skewedBetter
Basic (Pivotal)[2θ^−θ^1−α/2∗,  2θ^−θ^α/2∗][2\hat{\theta} - \hat{\theta}^{*}_{1-\alpha/2},\; 2\hat{\theta} - \hat{\theta}^{*}_{\alpha/2}]Symmetric distributionsGood
StudentizedUses bootstrap estimate of SEMost accurateBest (expensive)

Quick Example

📝Bootstrap CI for Median

Given data x={2.1,3.4,1.8,4.2,2.9,3.1,2.5,3.8,4.0,2.7}x = \{2.1, 3.4, 1.8, 4.2, 2.9, 3.1, 2.5, 3.8, 4.0, 2.7\}, original median = 3.0.

Step 1: Resample 10,000 times, each time computing the median of 10 values drawn with replacement.

Step 2: Sort bootstrap medians and extract 2.5th and 97.5th percentiles.

Step 3: 95% Bootstrap CI = [2.5,3.8][2.5, 3.8].

The CI is asymmetric around 3.0, reflecting the skewness of the sampling distribution of the median. A t-interval would force symmetry and be misleading here.

📝Permutation Test

Observed difference in means between two groups: 5.3. After 10,000 label shuffles, 237 permutations produce a difference â‰Ĩ 5.3.

p=237/10000=0.0237p = 237/10000 = 0.0237. Reject H0H_0 at Îą=0.05\alpha = 0.05. The permutation test makes no distributional assumptions and provides an exact p-value for this specific dataset.

When to Use Bootstrap vs. Parametric Methods

ScenarioRecommended Approach
Normal data, known ΃Z-interval (parametric)
Normal data, unknown ΃t-interval (parametric)
Skewed data, n>30n > 30Bootstrap CI
Any statistic (median, ratio)Bootstrap CI
Small sample (n<20n < 20)BCa bootstrap or double bootstrap
Hypothesis test, any distributionPermutation test
Model comparisonPaired bootstrap or permutation

Python Implementation Tips

import numpy as np

def bootstrap_ci(data, stat_fn=np.mean, n_boot=10000, ci=0.95):
    """Bootstrap confidence interval for any statistic."""
    n = len(data)
    boot_stats = np.array([
        stat_fn(np.random.choice(data, n, replace=True))
        for _ in range(n_boot)
    ])
    lower = np.percentile(boot_stats, (1 - ci) / 2 * 100)
    upper = np.percentile(boot_stats, (1 + ci) / 2 * 100)
    return lower, upper, boot_stats

Key points: (1) Always set a random seed for reproducibility. (2) Use np.random.choice(data, n, replace=True) for with-replacement resampling. (3) At least B=10,000B = 10{,}000 resamples for stable bounds.


Key Takeaways

📋Summary: Bootstrapping

  • Core Principle: The empirical distribution F^\hat{F} approximates FF. Resample from F^\hat{F} to approximate the sampling distribution.
  • Algorithm: Resample with replacement BB times (â‰Ĩ10,000), compute statistic, take percentiles for CI.
  • Percentile vs BCa: Percentile is simple and intuitive; BCa adjusts for bias and skewness for better coverage.
  • Permutation Tests: Shuffle labels to build exact null distributions for hypothesis testing. Complements bootstrapping.
  • Assumption-Free: No normality, no closed-form variance. Works for medians, ratios, AUC, any statistic.
  • Small Samples: For n<20n < 20, consider BCa or double bootstrap for better coverage properties.
  • Practical Tip: Use at least B=10,000B = 10,000 resamples for stable CI bounds. More is better but with diminishing returns.

Deep Dive

For detailed explanations, worked examples, and Python implementations, explore the dedicated statistics lessons:

Bootstrap Methods

  • Bootstrap Methods — Complete theory, algorithms, when each method is appropriate, and Python implementation

Bootstrap Confidence Intervals

Related Topics

Lesson Progress56 / 100