Mann-Whitney U Test
The Mann-Whitney U test (Wilcoxon rank-sum test) is the nonparametric alternative to the two-sample t-test. Tests whether one group tends to have larger values than the other.
import numpy as np
from scipy import stats
np.random.seed(42)
# Non-normal data: response times (milliseconds)
group_a = np.array([250, 280, 230, 310, 270, 290, 260, 320, 245, 275, 1200]) # outlier!
group_b = np.array([200, 220, 195, 235, 215, 205, 240, 210, 225, 190])
print(f"Group A: median={np.median(group_a):.1f}, mean={np.mean(group_a):.1f}")
print(f"Group B: median={np.median(group_b):.1f}, mean={np.mean(group_b):.1f}")
# Mann-Whitney U test
U_stat, p_mw = stats.mannwhitneyu(group_a, group_b, alternative='two-sided')
print(f"\nMann-Whitney: U={U_stat:.2f}, p={p_mw:.4f}")
# Compare: independent t-test (affected by outlier)
t_stat, p_t = stats.ttest_ind(group_a, group_b)
print(f"Independent t-test: t={t_stat:.4f}, p={p_t:.4f}")
# Effect size: rank biserial correlation
n1, n2 = len(group_a), len(group_b)
r_rb = 1 - 2*U_stat/(n1*n2)
print(f"Effect size (rank biserial r) = {r_rb:.4f}")
The U Statistic
U counts the number of times a value from group 1 exceeds a value from group 2:
# Manual U computation
U_manual = sum(1 for a in group_a for b in group_b if a > b)
print(f"Manual U = {U_manual} (fraction > : {U_manual/(n1*n2):.3f})")
Key Takeaways
- Tests whether values from one group tend to be larger — stochastic dominance
- Robust to outliers — uses ranks, not raw values
- Does NOT test medians directly (common misconception) unless distributions are identical in shape
- U ranges from 0 to n₁×n₂ — U = n₁n₂/2 means complete overlap
- Effect size r = 1 − 2U/(n₁n₂): |r|<0.1 small, <0.3 medium, >0.5 large