Percentiles and Quartiles

Descriptive Statistics

Where Does Any Value Stand Relative to the Rest?

Percentiles tell you the relative standing of any value within a dataset. Quartiles are special percentiles that divide data into four equal parts.

Percentile rank — "You scored better than 85% of test takers"
Quartiles — Q1, Q2 (median), Q3 split data into four equal groups
Deciles — Ten equal groups for finer-grained comparison
Interpolation methods — Different calculators give slightly different answers; know why

Percentiles turn raw scores into meaningful rankings. They are the language of standardized testing and performance evaluation.

What are Percentiles and Quartiles?

Definition

The pth percentile is the value below which p% of observations fall. Quartiles (Q1=25th, Q2=50th, Q3=75th) are special cases.

import numpy as np
from scipy import stats
import pandas as pd

data = np.array([15, 20, 35, 40, 50, 12, 27, 45, 38, 22, 18, 55, 30, 42, 25])
sorted_d = np.sort(data)
print(f"Sorted: {sorted_d}")

for p in [10, 25, 50, 75, 90]:
    print(f"P{p:2d}: {np.percentile(data, p):.2f}")

NumPy Interpolation Methods

d = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
for method in ['linear', 'lower', 'higher', 'midpoint', 'nearest']:
    val = np.percentile(d, 50, interpolation=method)
    print(f"  method='{method}': {val}")

Five-Number Summary

def five_num(data, label=''):
    q1, q2, q3 = np.percentile(data, [25, 50, 75])
    iqr = q3 - q1
    lower, upper = q1 - 1.5*iqr, q3 + 1.5*iqr
    if label: print(f"\n=== {label} ===")
    print(f"Min: {data.min():.2f}  Q1: {q1:.2f}  Median: {q2:.2f}  Q3: {q3:.2f}  Max: {data.max():.2f}")
    print(f"IQR: {iqr:.2f}  Fences: [{lower:.2f}, {upper:.2f}]")

np.random.seed(42)
exam = np.random.normal(75, 12, 200).clip(0, 100)
five_num(exam, "Exam Scores")

Quartile	Percentile	Description
Q1	25th	Lower quartile — 25% of data falls below this
Q2	50th	Median — middle value of the dataset
Q3	75th	Upper quartile — 75% of data falls below this

Percentile Rank

score = 88
rank = stats.percentileofscore(exam, score, kind='weak')
print(f"Score of {score} is at the {rank:.1f}th percentile")
print(f"{rank:.1f}% of students scored at or below {score}")

Deciles

deciles = np.percentile(exam, range(10, 100, 10))
for i, val in enumerate(deciles, 1):
    print(f"D{i} ({i*10}th pct): {val:.1f}")

Percentiles in Machine Learning

ML Application	Percentile Usage	Why
Quantile regression	Predict percentiles, not mean	Robust to skewed targets
Feature binning	Cut into quantile bins	Discretize continuous features
Performance metrics	P95 latency, P99 response time	SLA monitoring
Data preprocessing	Clip outliers at percentiles	Robust scaling

import numpy as np
from sklearn.preprocessing import QuantileTransformer

np.random.seed(42)

# Quantile binning for feature engineering
data = np.random.lognormal(3, 1, 1000)
bins = np.percentile(data, [0, 25, 50, 75, 100])
binned = np.digitize(data, bins[1:-1])
print(f"Quantile bins: {bins.round(1)}")
print(f"Binned values: {np.bincount(binned)}")

# QuantileTransformer for normality
qt = QuantileTransformer(n_quantiles=100, output_distribution='normal')
transformed = qt.fit_transform(data.reshape(-1,1)).flatten()
print(f"\nOriginal skewness: {float(np.mean(((data-data.mean())/data.std())**3)):.3f}")
print(f"Transformed skewness: {float(np.mean(((transformed-transformed.mean())/transformed.std())**3)):.3f}")

Percentiles and Quartiles — Calculation and Interpretation