Percentiles and Quartiles
The pth percentile is the value below which p% of observations fall. Quartiles (Q1=25th, Q2=50th, Q3=75th) are special cases.
import numpy as np
from scipy import stats
import pandas as pd
data = np.array([15, 20, 35, 40, 50, 12, 27, 45, 38, 22, 18, 55, 30, 42, 25])
sorted_d = np.sort(data)
print(f"Sorted: {sorted_d}")
for p in [10, 25, 50, 75, 90]:
print(f"P{p:2d}: {np.percentile(data, p):.2f}")
NumPy Interpolation Methods
d = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
for method in ['linear', 'lower', 'higher', 'midpoint', 'nearest']:
val = np.percentile(d, 50, interpolation=method)
print(f" method='{method}': {val}")
Five-Number Summary
def five_num(data, label=''):
q1, q2, q3 = np.percentile(data, [25, 50, 75])
iqr = q3 - q1
lower, upper = q1 - 1.5*iqr, q3 + 1.5*iqr
if label: print(f"\n=== {label} ===")
print(f"Min: {data.min():.2f} Q1: {q1:.2f} Median: {q2:.2f} Q3: {q3:.2f} Max: {data.max():.2f}")
print(f"IQR: {iqr:.2f} Fences: [{lower:.2f}, {upper:.2f}]")
np.random.seed(42)
exam = np.random.normal(75, 12, 200).clip(0, 100)
five_num(exam, "Exam Scores")
Percentile Rank
score = 88
rank = stats.percentileofscore(exam, score, kind='weak')
print(f"Score of {score} is at the {rank:.1f}th percentile")
print(f"{rank:.1f}% of students scored at or below {score}")
Deciles
deciles = np.percentile(exam, range(10, 100, 10))
for i, val in enumerate(deciles, 1):
print(f"D{i} ({i*10}th pct): {val:.1f}")
Key Takeaways
- P50 = median — percentiles generalize the median to any fraction
- Quartiles divide data into 4 equal-frequency groups (not equal-width intervals)
- IQR = Q3 − Q1 covers the middle 50% and drives outlier fences
- Percentile rank answers "where does this value fall in the distribution?"
- NumPy's default method='linear' is appropriate for most cases
- Percentiles are non-parametric — no distributional assumptions needed