Bootstrapping

ℹ️ Why It Matters

Bootstrapping makes minimal assumptions, providing robust uncertainty estimates for any statistic. When you need a confidence interval for a median, a ratio, or a model parameter with no closed-form variance, the bootstrap is the universal tool. It replaces theoretical distributional assumptions with computational resampling — making it one of the most versatile methods in modern statistics.

Overview

The bootstrap principle treats the empirical distribution $\hat{F}$ as an approximation of the true population distribution $F$ . By resampling from $\hat{F}$ (with replacement), we approximate the sampling distribution of any statistic. The algorithm: (1) resample data with replacement $B$ times (typically 10,000), (2) compute the statistic on each resample, (3) take percentiles of the bootstrap distribution as the confidence interval. The percentile method is simple and intuitive. The BCa (bias-corrected and accelerated) method improves accuracy by adjusting for bias and skewness. Permutation tests complement bootstrapping by building exact null distributions for hypothesis testing.

Key Concepts

Bootstrap Principle

\hat{F}^* \approx F

Here,

$\hat{F}^*$ =Empirical distribution from observed data
$F$ =True population distribution

Bootstrap Resampling

x^{*}_{b} = \text{Resample}(x_1, \ldots, x_n) \text{ with replacement}; \quad \hat{\theta}^{*}_b = T(x^{*}_{b})

Here,

$B$ =Number of bootstrap resamples (typically 10,000)
$x^{*}_{b}$ =b-th bootstrap sample (size n, drawn with replacement)
$\hat{\theta}^{*}_b$ =Bootstrap replicate of the statistic
$T(\cdot)$ =The statistic being estimated

Percentile Bootstrap CI

\text{CI} = [\hat{\theta}^{*}_{(\alpha/2)},\; \hat{\theta}^{*}_{(1-\alpha/2)}]

Here,

$\hat{\theta}^{*}_{(\alpha/2)}$ =$\alpha/2$ quantile of bootstrap statistics

Bootstrap Standard Error

SE_{boot} = \text{std}(\hat{\theta}^{*}_1, \hat{\theta}^{*}_2, \ldots, \hat{\theta}^{*}_B)

Here,

$\hat{\theta}^{*}_b$ =b-th bootstrap replicate

Permutation Test P-Value

p = \frac{\text{count of permuted statistics} \geq \text{observed}}{B}

Here,

$B$ =Total number of permutations

Bootstrap CI Methods

Method	Description	Best When	Accuracy
Percentile	Direct quantiles of bootstrap distribution	Statistic is symmetric	Good
BCa	Bias-corrected and accelerated	Statistic is biased or skewed	Better
Basic (Pivotal)	$[2\hat{\theta} - \hat{\theta}^{}_{1-\alpha/2},\; 2\hat{\theta} - \hat{\theta}^{}_{\alpha/2}]$	Symmetric distributions	Good
Studentized	Uses bootstrap estimate of SE	Most accurate	Best (expensive)

Quick Example

📝Bootstrap CI for Median

Given data $x = \{2.1, 3.4, 1.8, 4.2, 2.9, 3.1, 2.5, 3.8, 4.0, 2.7\}$ , original median = 3.0.

Step 1: Resample 10,000 times, each time computing the median of 10 values drawn with replacement.

Step 2: Sort bootstrap medians and extract 2.5th and 97.5th percentiles.

Step 3: 95% Bootstrap CI = $[2.5, 3.8]$ .

The CI is asymmetric around 3.0, reflecting the skewness of the sampling distribution of the median. A t-interval would force symmetry and be misleading here.

📝Permutation Test

Observed difference in means between two groups: 5.3. After 10,000 label shuffles, 237 permutations produce a difference ≥ 5.3.

$p = 237/10000 = 0.0237$ . Reject $H_0$ at $\alpha = 0.05$ . The permutation test makes no distributional assumptions and provides an exact p-value for this specific dataset.

When to Use Bootstrap vs. Parametric Methods

Scenario	Recommended Approach
Normal data, known σ	Z-interval (parametric)
Normal data, unknown σ	t-interval (parametric)
Skewed data, $n > 30$	Bootstrap CI
Any statistic (median, ratio)	Bootstrap CI
Small sample ( $n < 20$ )	BCa bootstrap or double bootstrap
Hypothesis test, any distribution	Permutation test
Model comparison	Paired bootstrap or permutation

Python Implementation Tips

import numpy as np

def bootstrap_ci(data, stat_fn=np.mean, n_boot=10000, ci=0.95):
    """Bootstrap confidence interval for any statistic."""
    n = len(data)
    boot_stats = np.array([
        stat_fn(np.random.choice(data, n, replace=True))
        for _ in range(n_boot)
    ])
    lower = np.percentile(boot_stats, (1 - ci) / 2 * 100)
    upper = np.percentile(boot_stats, (1 + ci) / 2 * 100)
    return lower, upper, boot_stats

Key points: (1) Always set a random seed for reproducibility. (2) Use np.random.choice(data, n, replace=True) for with-replacement resampling. (3) At least $B = 10{,}000$ resamples for stable bounds.

Key Takeaways

📋Summary: Bootstrapping

Core Principle: The empirical distribution $\hat{F}$ approximates $F$ . Resample from $\hat{F}$ to approximate the sampling distribution.
Algorithm: Resample with replacement $B$ times (≥10,000), compute statistic, take percentiles for CI.
Percentile vs BCa: Percentile is simple and intuitive; BCa adjusts for bias and skewness for better coverage.
Permutation Tests: Shuffle labels to build exact null distributions for hypothesis testing. Complements bootstrapping.
Assumption-Free: No normality, no closed-form variance. Works for medians, ratios, AUC, any statistic.
Small Samples: For $n < 20$ , consider BCa or double bootstrap for better coverage properties.
Practical Tip: Use at least $B = 10,000$ resamples for stable CI bounds. More is better but with diminishing returns.

Deep Dive

For detailed explanations, worked examples, and Python implementations, explore the dedicated statistics lessons:

Bootstrap Methods

Bootstrap Methods — Complete theory, algorithms, when each method is appropriate, and Python implementation

Bootstrap Confidence Intervals

Bootstrap Confidence Intervals — Percentile, BCa, pivotal, and studentized methods with detailed examples

Bootstrapping

Bootstrapping

Overview

Key Concepts

Bootstrap Principle

Bootstrap Resampling

Percentile Bootstrap CI

Bootstrap Standard Error

Permutation Test P-Value

Bootstrap CI Methods

Quick Example

📝Bootstrap CI for Median

📝Permutation Test

When to Use Bootstrap vs. Parametric Methods

Python Implementation Tips

Key Takeaways

📋Summary: Bootstrapping

Deep Dive

Bootstrap Methods

Bootstrap Confidence Intervals

Related Topics