Five-Number Summary — Box Plot Foundation

Foundations of StatisticsDescriptive StatisticsFree Lesson

Advertisement

Five-Number Summary

The five-number summary provides a compact non-parametric description of any dataset:

StatisticDescription
MinimumSmallest non-outlier value
Q125th percentile (lower quartile)
Median50th percentile
Q375th percentile (upper quartile)
MaximumLargest non-outlier value
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

tips = sns.load_dataset('tips')

print("=== Five-Number Summary: Total Bill ===")
bill = tips['total_bill']
q1, med, q3 = np.percentile(bill, [25, 50, 75])
iqr = q3 - q1
lower_fence = q1 - 1.5*iqr
upper_fence = q3 + 1.5*iqr
not_outlier = bill[(bill >= lower_fence) & (bill <= upper_fence)]

print(f"Min (non-outlier): ${not_outlier.min():.2f}")
print(f"Q1:                ${q1:.2f}")
print(f"Median:            ${med:.2f}")
print(f"Q3:                ${q3:.2f}")
print(f"Max (non-outlier): ${not_outlier.max():.2f}")
print(f"IQR:               ${iqr:.2f}")
print(f"Lower fence:       ${lower_fence:.2f}")
print(f"Upper fence:       ${upper_fence:.2f}")
outliers = bill[(bill < lower_fence) | (bill > upper_fence)]
print(f"Outliers: {sorted(outliers.values)}")

pandas describe() — Extended Summary

print(tips.describe().round(2))
# Shows: count, mean, std, min, Q1, Q2, Q3, max for all numeric columns

Comparing Groups with Five-Number Summaries

fig, ax = plt.subplots(figsize=(10, 5))
groups = tips.groupby('day')['total_bill']

for i, (day, group) in enumerate(groups):
    q1, med, q3 = np.percentile(group, [25, 50, 75])
    iqr = q3 - q1
    whisker_lo = group[group >= q1-1.5*iqr].min()
    whisker_hi = group[group <= q3+1.5*iqr].max()
    outliers = group[(group < whisker_lo) | (group > whisker_hi)]
    
    # Draw box
    ax.barh(i, q3-q1, left=q1, height=0.4, color='steelblue', alpha=0.7)
    ax.plot([med, med], [i-0.2, i+0.2], 'red', lw=2)
    ax.plot([whisker_lo, q1], [i, i], 'black', lw=1)
    ax.plot([q3, whisker_hi], [i, i], 'black', lw=1)
    ax.scatter(outliers, [i]*len(outliers), color='red', zorder=5, s=20)
    print(f"{day}: Min={whisker_lo:.1f} Q1={q1:.1f} Med={med:.1f} Q3={q3:.1f} Max={whisker_hi:.1f}")

ax.set_yticks(range(4))
ax.set_yticklabels(['Thursday','Friday','Saturday','Sunday'])
ax.set_xlabel('Total Bill ($)')
ax.set_title('Five-Number Summary: Total Bill by Day')
plt.tight_layout()
plt.savefig('five_num_summary.png', dpi=150)
plt.show()

Key Takeaways

  1. Min, Q1, Median, Q3, Max define a box plot — the five-number summary IS a box plot
  2. No distributional assumptions required — works for any shape
  3. Outliers are defined by the 1.5×IQR rule, not by the min/max
  4. pandas describe() adds mean and std to the five-number summary
  5. Compare distributions across groups side by side with grouped five-number summaries
  6. Skewness is visible: if median is closer to Q1, data is right-skewed; closer to Q3 → left-skewed

Advertisement

Need Expert Statistics Help?

Get personalized tutoring, dissertation support, or statistical consulting.

Advertisement