Range and Interquartile Range (IQR)
Measures of spread (dispersion) tell us how scattered the data is. Range and IQR are the simplest measures — they use only specific order statistics.
Range
Simple but highly sensitive to outliers — one extreme value changes it completely.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
np.random.seed(42)
data = np.array([12, 15, 14, 10, 18, 20, 16, 11, 13, 17])
data_with_outlier = np.append(data, 100)
print(f"Data: {sorted(data)}")
print(f"Range = {data.max()} - {data.min()} = {data.max() - data.min()}")
print(f"\nWith outlier (100 added):")
print(f"Range = {data_with_outlier.max()} - {data_with_outlier.min()} = {data_with_outlier.max() - data_with_outlier.min()}")
print("Range nearly quadrupled due to one outlier!")
Interquartile Range (IQR)
The range of the middle 50% of the data. Robust to outliers.
# Computing quartiles and IQR
def five_number_summary(data):
q1, q2, q3 = np.percentile(data, [25, 50, 75])
iqr = q3 - q1
lower_fence = q1 - 1.5 * iqr
upper_fence = q3 + 1.5 * iqr
print(f"Min: {data.min():.2f}")
print(f"Q1: {q1:.2f}")
print(f"Median: {q2:.2f}")
print(f"Q3: {q3:.2f}")
print(f"Max: {data.max():.2f}")
print(f"IQR: {iqr:.2f}")
print(f"Lower fence (Q1 - 1.5×IQR): {lower_fence:.2f}")
print(f"Upper fence (Q3 + 1.5×IQR): {upper_fence:.2f}")
return q1, q2, q3, iqr
print("=== Normal data ===")
five_number_summary(data)
print("\n=== Data with outlier ===")
five_number_summary(data_with_outlier)
print("IQR barely changed — robust!")
Visualizing Range and IQR
# Two datasets with same mean and range but different IQR
np.random.seed(0)
dataset_a = np.random.uniform(0, 100, 200) # Uniform: large IQR
dataset_b = np.random.normal(50, 10, 200) # Normal: smaller IQR
fig, axes = plt.subplots(1, 2, figsize=(12, 5))
for ax, data, label, color in zip(axes,
[dataset_a, dataset_b],
['Uniform', 'Normal'],
['steelblue', 'coral']):
ax.hist(data, bins=30, color=color, edgecolor='black', alpha=0.7, density=True)
q1, q2, q3 = np.percentile(data, [25, 50, 75])
ax.axvline(data.min(), color='gray', linestyle=':', label=f'Min={data.min():.0f}')
ax.axvline(q1, color='blue', linestyle='--', label=f'Q1={q1:.0f}')
ax.axvline(q2, color='red', linestyle='-', linewidth=2, label=f'Median={q2:.0f}')
ax.axvline(q3, color='blue', linestyle='--', label=f'Q3={q3:.0f}')
ax.axvline(data.max(), color='gray', linestyle=':', label=f'Max={data.max():.0f}')
ax.fill_betweenx([0, ax.get_ylim()[1] if ax.get_ylim()[1] > 0 else 0.05],
q1, q3, alpha=0.2, color='yellow', label=f'IQR={q3-q1:.0f}')
ax.set_title(f'{label} Distribution\nRange={data.max()-data.min():.0f}, IQR={q3-q1:.0f}')
ax.legend(fontsize=7)
plt.tight_layout()
plt.savefig('range_iqr.png', dpi=150)
plt.show()
Comparing Spread Measures
| Measure | Formula | Breakdown Point | Sensitive To |
|---|---|---|---|
| Range | Max - Min | 0% | Very sensitive to outliers |
| IQR | Q3 - Q1 | 25% | Robust |
| Std Dev | √(Σ(xᵢ-x̄)²/(n-1)) | 0% | Sensitive to outliers |
| MAD | Median( | xᵢ - Median | ) |
Key Takeaways
- Range is simple but useless with outliers — one bad data point ruins it
- IQR is the most robust simple spread measure — covers the middle 50%
- The 1.5×IQR rule for outlier detection is built into most box plot implementations
- For symmetric data without outliers, standard deviation is more informative than IQR
- For skewed data or data with outliers, report IQR instead of (or alongside) standard deviation
- IQR = 0 means at least 50% of data is identical — common in count data