Measures of Central Tendency
A measure of central tendency describes where the "center" of a distribution is located. Different measures capture different notions of center, and choosing the right one depends on your data and question.
The Three Main Measures
Arithmetic Mean (x̄)
Sum of all values divided by count. The most common measure.
Properties:
- Uses every data point — sensitive to outliers
- The only measure where Σ(xᵢ - x̄) = 0
- Minimizes the sum of squared deviations
Median (M)
The middle value when data is sorted. For even n, average the two middle values.
Properties:
- Robust to outliers
- Appropriate for ordinal and continuous data
- Minimizes the sum of absolute deviations
Mode
The most frequently occurring value(s). A distribution can be unimodal, bimodal, or multimodal.
Properties:
- Only measure appropriate for nominal data
- Can have multiple modes (or none in continuous data)
- Not affected by extreme values
import numpy as np
import pandas as pd
from scipy import stats
import matplotlib.pyplot as plt
def describe_central(data, label):
mean = np.mean(data)
median = np.median(data)
mode_result = stats.mode(data, keepdims=True)
mode = mode_result.mode[0]
print(f"\n{label}:")
print(f" Mean = {mean:.2f}")
print(f" Median = {median:.2f}")
print(f" Mode = {mode:.2f}")
print(f" Mean - Median = {mean - median:.2f} {'(right-skewed)' if mean > median else '(left-skewed)' if mean < median else '(symmetric)'}")
np.random.seed(42)
# Symmetric distribution
sym = np.random.normal(50, 10, 1000)
describe_central(sym, "Symmetric (Normal)")
# Right-skewed (income-like)
right_skew = np.random.lognormal(3.5, 0.8, 1000)
describe_central(right_skew, "Right-Skewed (Income)")
# Left-skewed
left_skew = 100 - np.random.exponential(10, 1000)
describe_central(left_skew, "Left-Skewed")
# With outlier
with_outlier = np.concatenate([np.random.normal(50, 5, 99), [500]])
describe_central(with_outlier, "Data with One Extreme Outlier (n=100)")
Effect of Skewness on Mean vs Median
Right-Skewed: Mode < Median < Mean
Symmetric: Mode ≈ Median ≈ Mean
Left-Skewed: Mean < Median < Mode
fig, axes = plt.subplots(1, 3, figsize=(15, 5))
datasets = {
'Right-Skewed': np.random.lognormal(3, 0.8, 2000),
'Symmetric': np.random.normal(50, 10, 2000),
'Left-Skewed': 100 - np.random.exponential(10, 2000)
}
for ax, (title, data) in zip(axes, datasets.items()):
ax.hist(data, bins=40, density=True, color='lightblue', edgecolor='gray', alpha=0.7)
mean_val = np.mean(data)
median_val = np.median(data)
ax.axvline(mean_val, color='red', lw=2, linestyle='--', label=f'Mean={mean_val:.1f}')
ax.axvline(median_val, color='blue', lw=2, linestyle='-', label=f'Median={median_val:.1f}')
ax.set_title(title)
ax.legend()
plt.tight_layout()
plt.savefig('central_tendency.png', dpi=150)
plt.show()
When to Use Which Measure
| Situation | Best Measure | Why |
|---|---|---|
| Symmetric distribution, no outliers | Mean | Most efficient, uses all data |
| Skewed distribution | Median | Robust, not pulled by the tail |
| Outliers present | Median | Mean is distorted |
| Nominal (categorical) data | Mode | Mean/median meaningless |
| Ordinal data | Median | Can rank, but intervals unknown |
| Reporting income, housing prices | Median | Right-skewed distributions |
| Reporting temperature, height | Mean | Approximately normal |
A Famous Example: Income
The mean household income in the US is significantly higher than the median household income because a small number of very high earners pull the mean up. The median better represents the typical household.
# Simulate US-like income distribution
np.random.seed(0)
incomes = np.concatenate([
np.random.lognormal(10.8, 0.7, 9990), # Most people
np.random.lognormal(13.5, 1.0, 10) # Very wealthy (top 0.1%)
])
print(f"Mean income: ${np.mean(incomes):>12,.0f}")
print(f"Median income: ${np.median(incomes):>12,.0f}")
print(f"The top 0.1% pulls the mean ${np.mean(incomes)-np.median(incomes):,.0f} above the median!")
Key Takeaways
- Mean is best for symmetric data without outliers — it uses all information
- Median is robust — always use it when data is skewed or has outliers
- Mode is for categorical data and identifying the most common value
- In right-skewed data: Mean > Median (the tail pulls the mean right)
- In left-skewed data: Mean < Median
- Always report the appropriate measure for your data type and distribution shape