Permutation Tests — Distribution-Free Hypothesis Testing

Hypothesis TestingNonparametric TestsFree Lesson

Advertisement

Permutation Tests

Permutation tests (randomization tests) generate the null distribution by permuting the data itself, making no distributional assumptions.

How They Work

  1. Compute the observed test statistic
  2. Repeatedly shuffle (permute) group labels
  3. Compute the test statistic for each permutation
  4. P-value = proportion of permuted statistics as extreme as the observed
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt

np.random.seed(42)

# Example: Do two groups have different means?
# (Without assuming normality)
group_a = np.array([23, 28, 31, 25, 27, 29, 24, 30, 26, 28])
group_b = np.array([30, 35, 33, 28, 37, 32, 34, 36, 31, 33])

observed_diff = group_a.mean() - group_b.mean()
print(f"Observed difference: {observed_diff:.4f}")

# Permutation test
n_permutations = 10000
combined = np.concatenate([group_a, group_b])
n_a = len(group_a)

perm_diffs = np.array([
    np.random.permutation(combined)[:n_a].mean() -
    np.random.permutation(combined)[n_a:].mean()
    for _ in range(n_permutations)
])

# Two-tailed p-value
p_perm = (np.abs(perm_diffs) >= np.abs(observed_diff)).mean()

# Compare with t-test
t_stat, p_ttest = stats.ttest_ind(group_a, group_b)

print(f"Permutation test p-value: {p_perm:.4f}")
print(f"Welch's t-test p-value:   {p_ttest:.4f}")

# Visualize permutation distribution
fig, ax = plt.subplots(figsize=(10, 5))
ax.hist(perm_diffs, bins=50, density=True, color='lightblue', edgecolor='black', alpha=0.7)
ax.axvline(observed_diff, color='red', linewidth=2, linestyle='--',
           label=f'Observed = {observed_diff:.3f}')
ax.axvline(-observed_diff, color='red', linewidth=2, linestyle='--')
ax.fill_betweenx([0, 0.15], -20, -abs(observed_diff), alpha=0.3, color='red', label=f'p = {p_perm:.4f}')
ax.fill_betweenx([0, 0.15], abs(observed_diff), 20, alpha=0.3, color='red')
ax.set_title(f'Permutation Distribution of Difference in Means
(n_perm={n_permutations:,})')
ax.set_xlabel('Difference in Means (A - B)')
ax.legend()
plt.tight_layout()
plt.savefig('permutation_test.png', dpi=150)
plt.show()

# Permutation test for correlation (also distribution-free)
x = np.random.normal(0, 1, 30)
y = 0.5 * x + np.random.normal(0, 1, 30)

obs_r, _ = stats.pearsonr(x, y)
perm_r = [stats.pearsonr(np.random.permutation(x), y)[0] for _ in range(10000)]
p_corr_perm = (np.abs(perm_r) >= np.abs(obs_r)).mean()

print(f"
Correlation: r = {obs_r:.4f}")
print(f"Permutation p-value: {p_corr_perm:.4f}")

Advantages of Permutation Tests

AspectParametric TestPermutation Test
Distributional assumptionsRequiredNone
Works for any statisticLimited✅ Yes
Exact testApproximateApproximately exact
Computational costFastSlower (but fast on modern hardware)
Small samplesProblematicWorks well
Custom statisticsDifficultEasy

Key Takeaways

  1. No distributional assumptions — valid for any distribution
  2. Works for any test statistic — median difference, correlation, R², etc.
  3. P-value = fraction of permutations as extreme as observed
  4. For small datasets, exact permutation (enumerate all arrangements) is possible
  5. Computationally intensive but negligible with modern CPUs (10,000 permutations in <1 second)

Advertisement

Need Expert Statistics Help?

Get personalized tutoring, dissertation support, or statistical consulting.

Advertisement