Five-Number Summary
The five-number summary provides a compact non-parametric description of any dataset:
| Statistic | Description |
|---|---|
| Minimum | Smallest non-outlier value |
| Q1 | 25th percentile (lower quartile) |
| Median | 50th percentile |
| Q3 | 75th percentile (upper quartile) |
| Maximum | Largest non-outlier value |
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
tips = sns.load_dataset('tips')
print("=== Five-Number Summary: Total Bill ===")
bill = tips['total_bill']
q1, med, q3 = np.percentile(bill, [25, 50, 75])
iqr = q3 - q1
lower_fence = q1 - 1.5*iqr
upper_fence = q3 + 1.5*iqr
not_outlier = bill[(bill >= lower_fence) & (bill <= upper_fence)]
print(f"Min (non-outlier): ${not_outlier.min():.2f}")
print(f"Q1: ${q1:.2f}")
print(f"Median: ${med:.2f}")
print(f"Q3: ${q3:.2f}")
print(f"Max (non-outlier): ${not_outlier.max():.2f}")
print(f"IQR: ${iqr:.2f}")
print(f"Lower fence: ${lower_fence:.2f}")
print(f"Upper fence: ${upper_fence:.2f}")
outliers = bill[(bill < lower_fence) | (bill > upper_fence)]
print(f"Outliers: {sorted(outliers.values)}")
pandas describe() — Extended Summary
print(tips.describe().round(2))
# Shows: count, mean, std, min, Q1, Q2, Q3, max for all numeric columns
Comparing Groups with Five-Number Summaries
fig, ax = plt.subplots(figsize=(10, 5))
groups = tips.groupby('day')['total_bill']
for i, (day, group) in enumerate(groups):
q1, med, q3 = np.percentile(group, [25, 50, 75])
iqr = q3 - q1
whisker_lo = group[group >= q1-1.5*iqr].min()
whisker_hi = group[group <= q3+1.5*iqr].max()
outliers = group[(group < whisker_lo) | (group > whisker_hi)]
# Draw box
ax.barh(i, q3-q1, left=q1, height=0.4, color='steelblue', alpha=0.7)
ax.plot([med, med], [i-0.2, i+0.2], 'red', lw=2)
ax.plot([whisker_lo, q1], [i, i], 'black', lw=1)
ax.plot([q3, whisker_hi], [i, i], 'black', lw=1)
ax.scatter(outliers, [i]*len(outliers), color='red', zorder=5, s=20)
print(f"{day}: Min={whisker_lo:.1f} Q1={q1:.1f} Med={med:.1f} Q3={q3:.1f} Max={whisker_hi:.1f}")
ax.set_yticks(range(4))
ax.set_yticklabels(['Thursday','Friday','Saturday','Sunday'])
ax.set_xlabel('Total Bill ($)')
ax.set_title('Five-Number Summary: Total Bill by Day')
plt.tight_layout()
plt.savefig('five_num_summary.png', dpi=150)
plt.show()
Key Takeaways
- Min, Q1, Median, Q3, Max define a box plot — the five-number summary IS a box plot
- No distributional assumptions required — works for any shape
- Outliers are defined by the 1.5×IQR rule, not by the min/max
- pandas describe() adds mean and std to the five-number summary
- Compare distributions across groups side by side with grouped five-number summaries
- Skewness is visible: if median is closer to Q1, data is right-skewed; closer to Q3 → left-skewed