Range and IQR

Descriptive Statistics

The Simplest Measures of How Spread Out Your Data Is

Measures of spread tell us how scattered the data is. Range and IQR are the simplest measures — they use only specific order statistics.

Range — The difference between max and min; simple but brutally sensitive to outliers
IQR — The middle 50% of data; robust and reliable for skewed distributions
Outlier detection — The 1.5 times IQR rule flags suspicious values automatically
Box plot foundation — The IQR forms the box in every box plot you will ever make

Spread matters as much as center. Two datasets with the same mean can behave very differently.

What are Range and IQR?

Definition

Measures of spread (dispersion) tell us how scattered the data is. Range and IQR are the simplest measures — they use only specific order statistics.

Range

Simple but highly sensitive to outliers — one extreme value changes it completely.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

np.random.seed(42)
data = np.array([12, 15, 14, 10, 18, 20, 16, 11, 13, 17])
data_with_outlier = np.append(data, 100)

print(f"Data: {sorted(data)}")
print(f"Range = {data.max()} - {data.min()} = {data.max() - data.min()}")
print(f"\nWith outlier (100 added):")
print(f"Range = {data_with_outlier.max()} - {data_with_outlier.min()} = {data_with_outlier.max() - data_with_outlier.min()}")
print("Range nearly quadrupled due to one outlier!")

Interquartile Range (IQR)

The range of the middle 50% of the data. Robust to outliers.

# Computing quartiles and IQR
def five_number_summary(data):
    q1, q2, q3 = np.percentile(data, [25, 50, 75])
    iqr = q3 - q1
    lower_fence = q1 - 1.5 * iqr
    upper_fence = q3 + 1.5 * iqr
    
    print(f"Min:    {data.min():.2f}")
    print(f"Q1:     {q1:.2f}")
    print(f"Median: {q2:.2f}")
    print(f"Q3:     {q3:.2f}")
    print(f"Max:    {data.max():.2f}")
    print(f"IQR:    {iqr:.2f}")
    print(f"Lower fence (Q1 - 1.5×IQR): {lower_fence:.2f}")
    print(f"Upper fence (Q3 + 1.5×IQR): {upper_fence:.2f}")
    return q1, q2, q3, iqr

print("=== Normal data ===")
five_number_summary(data)
print("\n=== Data with outlier ===")
five_number_summary(data_with_outlier)
print("IQR barely changed — robust!")

Visualizing Range and IQR

# Two datasets with same mean and range but different IQR
np.random.seed(0)
dataset_a = np.random.uniform(0, 100, 200)  # Uniform: large IQR
dataset_b = np.random.normal(50, 10, 200)   # Normal: smaller IQR

fig, axes = plt.subplots(1, 2, figsize=(12, 5))

for ax, data, label, color in zip(axes, 
                                    [dataset_a, dataset_b], 
                                    ['Uniform', 'Normal'], 
                                    ['steelblue', 'coral']):
    ax.hist(data, bins=30, color=color, edgecolor='black', alpha=0.7, density=True)
    q1, q2, q3 = np.percentile(data, [25, 50, 75])
    ax.axvline(data.min(), color='gray', linestyle=':', label=f'Min={data.min():.0f}')
    ax.axvline(q1, color='blue', linestyle='--', label=f'Q1={q1:.0f}')
    ax.axvline(q2, color='red', linestyle='-', linewidth=2, label=f'Median={q2:.0f}')
    ax.axvline(q3, color='blue', linestyle='--', label=f'Q3={q3:.0f}')
    ax.axvline(data.max(), color='gray', linestyle=':', label=f'Max={data.max():.0f}')
    ax.fill_betweenx([0, ax.get_ylim()[1] if ax.get_ylim()[1] > 0 else 0.05], 
                      q1, q3, alpha=0.2, color='yellow', label=f'IQR={q3-q1:.0f}')
    ax.set_title(f'{label} Distribution\nRange={data.max()-data.min():.0f}, IQR={q3-q1:.0f}')
    ax.legend(fontsize=7)

plt.tight_layout()
plt.savefig('range_iqr.png', dpi=150)
plt.show()

Comparing Spread Measures

Measure	Formula	Breakdown Point	Sensitive To
Range	Max - Min	0%	Very sensitive to outliers
IQR	Q3 - Q1	25%	Robust
Std Dev	√(Σ(xᵢ-x̄)²/(n-1))	0%	Sensitive to outliers
MAD	Median(	xᵢ - Median	)

Range and IQR in Machine Learning

ML Application	Range/IQR Usage	Why
Outlier detection	IQR fence = Q1-1.5×IQR to Q3+1.5×IQR	Robust to skewed data
Feature selection	Zero/near-zero range → remove feature	No information content
Min-Max normalization	Scale to [0,1] using range	Neural networks need bounded inputs
Box plots	IQR defines the box	Visual model diagnostics
Anomaly detection	IQR-based thresholds	Production data monitoring

import numpy as np
from sklearn.preprocessing import MinMaxScaler

np.random.seed(42)

# IQR-based outlier detection
data = np.concatenate([np.random.normal(50, 10, 100), [200, -50]])
q1, q3 = np.percentile(data, [25, 75])
iqr = q3 - q1
lower, upper = q1 - 1.5*iqr, q3 + 1.5*iqr
outliers = data[(data < lower) | (data > upper)]
print(f"IQR: {iqr:.2f}, Fences: [{lower:.2f}, {upper:.2f}]")
print(f"Outliers detected: {len(outliers)} ({outliers})")

# Min-Max normalization using range
data_features = np.random.randn(100, 3) * [10, 1, 100]  # very different ranges
scaler = MinMaxScaler()
normalized = scaler.fit_transform(data_features)
print(f"\nOriginal ranges: {[f'{d.max()-d.min():.1f}' for d in data_features.T]}")
print(f"Normalized ranges: {[f'{d.max()-d.min():.3f}' for d in normalized.T]}")

Range and IQR — Measures of Spread Explained