Bootstrapping
âšī¸ Why It Matters
Bootstrapping makes minimal assumptions, providing robust uncertainty estimates for any statistic. When you need a confidence interval for a median, a ratio, or a model parameter with no closed-form variance, the bootstrap is the universal tool. It replaces theoretical distributional assumptions with computational resampling â making it one of the most versatile methods in modern statistics.
Overview
The bootstrap principle treats the empirical distribution as an approximation of the true population distribution . By resampling from (with replacement), we approximate the sampling distribution of any statistic. The algorithm: (1) resample data with replacement times (typically 10,000), (2) compute the statistic on each resample, (3) take percentiles of the bootstrap distribution as the confidence interval. The percentile method is simple and intuitive. The BCa (bias-corrected and accelerated) method improves accuracy by adjusting for bias and skewness. Permutation tests complement bootstrapping by building exact null distributions for hypothesis testing.
Key Concepts
Bootstrap Principle
Here,
- =Empirical distribution from observed data
- =True population distribution
Bootstrap Resampling
Here,
- =Number of bootstrap resamples (typically 10,000)
- =b-th bootstrap sample (size n, drawn with replacement)
- =Bootstrap replicate of the statistic
- =The statistic being estimated
Percentile Bootstrap CI
Here,
- =$\alpha/2$ quantile of bootstrap statistics
Bootstrap Standard Error
Here,
- =b-th bootstrap replicate
Permutation Test P-Value
Here,
- =Total number of permutations
Bootstrap CI Methods
| Method | Description | Best When | Accuracy |
|---|---|---|---|
| Percentile | Direct quantiles of bootstrap distribution | Statistic is symmetric | Good |
| BCa | Bias-corrected and accelerated | Statistic is biased or skewed | Better |
| Basic (Pivotal) | Symmetric distributions | Good | |
| Studentized | Uses bootstrap estimate of SE | Most accurate | Best (expensive) |
Quick Example
đBootstrap CI for Median
Given data , original median = 3.0.
Step 1: Resample 10,000 times, each time computing the median of 10 values drawn with replacement.
Step 2: Sort bootstrap medians and extract 2.5th and 97.5th percentiles.
Step 3: 95% Bootstrap CI = .
The CI is asymmetric around 3.0, reflecting the skewness of the sampling distribution of the median. A t-interval would force symmetry and be misleading here.
đPermutation Test
Observed difference in means between two groups: 5.3. After 10,000 label shuffles, 237 permutations produce a difference âĨ 5.3.
. Reject at . The permutation test makes no distributional assumptions and provides an exact p-value for this specific dataset.
When to Use Bootstrap vs. Parametric Methods
| Scenario | Recommended Approach |
|---|---|
| Normal data, known Ī | Z-interval (parametric) |
| Normal data, unknown Ī | t-interval (parametric) |
| Skewed data, | Bootstrap CI |
| Any statistic (median, ratio) | Bootstrap CI |
| Small sample () | BCa bootstrap or double bootstrap |
| Hypothesis test, any distribution | Permutation test |
| Model comparison | Paired bootstrap or permutation |
Python Implementation Tips
import numpy as np
def bootstrap_ci(data, stat_fn=np.mean, n_boot=10000, ci=0.95):
"""Bootstrap confidence interval for any statistic."""
n = len(data)
boot_stats = np.array([
stat_fn(np.random.choice(data, n, replace=True))
for _ in range(n_boot)
])
lower = np.percentile(boot_stats, (1 - ci) / 2 * 100)
upper = np.percentile(boot_stats, (1 + ci) / 2 * 100)
return lower, upper, boot_stats
Key points: (1) Always set a random seed for reproducibility. (2) Use np.random.choice(data, n, replace=True) for with-replacement resampling. (3) At least resamples for stable bounds.
Key Takeaways
đSummary: Bootstrapping
- Core Principle: The empirical distribution approximates . Resample from to approximate the sampling distribution.
- Algorithm: Resample with replacement times (âĨ10,000), compute statistic, take percentiles for CI.
- Percentile vs BCa: Percentile is simple and intuitive; BCa adjusts for bias and skewness for better coverage.
- Permutation Tests: Shuffle labels to build exact null distributions for hypothesis testing. Complements bootstrapping.
- Assumption-Free: No normality, no closed-form variance. Works for medians, ratios, AUC, any statistic.
- Small Samples: For , consider BCa or double bootstrap for better coverage properties.
- Practical Tip: Use at least resamples for stable CI bounds. More is better but with diminishing returns.
Deep Dive
For detailed explanations, worked examples, and Python implementations, explore the dedicated statistics lessons:
Bootstrap Methods
- Bootstrap Methods â Complete theory, algorithms, when each method is appropriate, and Python implementation
Bootstrap Confidence Intervals
- Bootstrap Confidence Intervals â Percentile, BCa, pivotal, and studentized methods with detailed examples
Related Topics
- Permutation Tests â Exact hypothesis tests by shuffling labels
- Confidence Intervals for the Mean â Parametric alternatives when assumptions hold
- Point Estimation â MLE and the sampling distributions bootstrapping approximates
- Nonparametric Methods â Other distribution-free statistical methods