Measures of Central Tendency — Mean, Median, Mode Compared

Foundations of StatisticsDescriptive StatisticsFree Lesson

Advertisement

Measures of Central Tendency

A measure of central tendency describes where the "center" of a distribution is located. Different measures capture different notions of center, and choosing the right one depends on your data and question.


The Three Main Measures

Arithmetic Mean (x̄)

Sum of all values divided by count. The most common measure.

xˉ=1ni=1nxi\bar{x} = \frac{1}{n}\sum_{i=1}^n x_i

Properties:

  • Uses every data point — sensitive to outliers
  • The only measure where Σ(xᵢ - x̄) = 0
  • Minimizes the sum of squared deviations

Median (M)

The middle value when data is sorted. For even n, average the two middle values.

Properties:

  • Robust to outliers
  • Appropriate for ordinal and continuous data
  • Minimizes the sum of absolute deviations

Mode

The most frequently occurring value(s). A distribution can be unimodal, bimodal, or multimodal.

Properties:

  • Only measure appropriate for nominal data
  • Can have multiple modes (or none in continuous data)
  • Not affected by extreme values
import numpy as np
import pandas as pd
from scipy import stats
import matplotlib.pyplot as plt

def describe_central(data, label):
    mean = np.mean(data)
    median = np.median(data)
    mode_result = stats.mode(data, keepdims=True)
    mode = mode_result.mode[0]
    
    print(f"\n{label}:")
    print(f"  Mean   = {mean:.2f}")
    print(f"  Median = {median:.2f}")
    print(f"  Mode   = {mode:.2f}")
    print(f"  Mean - Median = {mean - median:.2f} {'(right-skewed)' if mean > median else '(left-skewed)' if mean < median else '(symmetric)'}")

np.random.seed(42)

# Symmetric distribution
sym = np.random.normal(50, 10, 1000)
describe_central(sym, "Symmetric (Normal)")

# Right-skewed (income-like)
right_skew = np.random.lognormal(3.5, 0.8, 1000)
describe_central(right_skew, "Right-Skewed (Income)")

# Left-skewed
left_skew = 100 - np.random.exponential(10, 1000)
describe_central(left_skew, "Left-Skewed")

# With outlier
with_outlier = np.concatenate([np.random.normal(50, 5, 99), [500]])
describe_central(with_outlier, "Data with One Extreme Outlier (n=100)")

Effect of Skewness on Mean vs Median

Right-Skewed:        Mode < Median < Mean
Symmetric:           Mode ≈ Median ≈ Mean
Left-Skewed:         Mean < Median < Mode
fig, axes = plt.subplots(1, 3, figsize=(15, 5))

datasets = {
    'Right-Skewed': np.random.lognormal(3, 0.8, 2000),
    'Symmetric': np.random.normal(50, 10, 2000),
    'Left-Skewed': 100 - np.random.exponential(10, 2000)
}

for ax, (title, data) in zip(axes, datasets.items()):
    ax.hist(data, bins=40, density=True, color='lightblue', edgecolor='gray', alpha=0.7)
    mean_val = np.mean(data)
    median_val = np.median(data)
    ax.axvline(mean_val, color='red', lw=2, linestyle='--', label=f'Mean={mean_val:.1f}')
    ax.axvline(median_val, color='blue', lw=2, linestyle='-', label=f'Median={median_val:.1f}')
    ax.set_title(title)
    ax.legend()

plt.tight_layout()
plt.savefig('central_tendency.png', dpi=150)
plt.show()

When to Use Which Measure

SituationBest MeasureWhy
Symmetric distribution, no outliersMeanMost efficient, uses all data
Skewed distributionMedianRobust, not pulled by the tail
Outliers presentMedianMean is distorted
Nominal (categorical) dataModeMean/median meaningless
Ordinal dataMedianCan rank, but intervals unknown
Reporting income, housing pricesMedianRight-skewed distributions
Reporting temperature, heightMeanApproximately normal

A Famous Example: Income

The mean household income in the US is significantly higher than the median household income because a small number of very high earners pull the mean up. The median better represents the typical household.

# Simulate US-like income distribution
np.random.seed(0)
incomes = np.concatenate([
    np.random.lognormal(10.8, 0.7, 9990),   # Most people
    np.random.lognormal(13.5, 1.0, 10)      # Very wealthy (top 0.1%)
])

print(f"Mean income:   ${np.mean(incomes):>12,.0f}")
print(f"Median income: ${np.median(incomes):>12,.0f}")
print(f"The top 0.1% pulls the mean ${np.mean(incomes)-np.median(incomes):,.0f} above the median!")

Key Takeaways

  1. Mean is best for symmetric data without outliers — it uses all information
  2. Median is robust — always use it when data is skewed or has outliers
  3. Mode is for categorical data and identifying the most common value
  4. In right-skewed data: Mean > Median (the tail pulls the mean right)
  5. In left-skewed data: Mean < Median
  6. Always report the appropriate measure for your data type and distribution shape

Advertisement

Need Expert Statistics Help?

Get personalized tutoring, dissertation support, or statistical consulting.

Advertisement