Arithmetic Mean

Why the Average Isn't Always Right

A neighbourhood has 9 households earning $50k and one billionaire earning$ 50M. Mean income = ~$5M — nobody earns near that. The mean is pulled by extremes.

Core Insight: The mean minimises the sum of squared deviations. It's the least-squares centre — powerful but sensitive to outliers.

Formula

$\bar{x} = \frac{1}{n}\sum_{i=1}^{n} x_i \qquad \mu = \frac{1}{N}\sum_{i=1}^{N} x_i$

Grouped data: $\bar{x} = \sum f_i m_i / \sum f_i$

Worked Example

Scores: 72, 85, 90, 68, 78, 95, 88, 74

$\bar{x} = \frac{72+85+90+68+78+95+88+74}{8} = \frac{650}{8} = 81.25$

Sorted:  68  72  74  78  85  88  90  95
                      ↑
                 Mean = 81.25

Python Implementation

import numpy as np
import pandas as pd

data = [72, 85, 90, 68, 78, 95, 88, 74]

print(f"Mean:       {np.mean(data):.2f}")     # 81.25
print(f"Pandas:     {pd.Series(data).mean():.2f}")  # 81.25

# Grouped data
midpoints   = [15, 25, 35, 45, 55]
frequencies = [3,   8,  12,  5,  2]
grouped_mean = np.average(midpoints, weights=frequencies)
print(f"Grouped:    {grouped_mean:.2f}")      # 33.33

# Outlier effect
data_out = data + [500]
print(f"Normal mean:  {np.mean(data):.2f}")      # 81.25
print(f"Outlier mean: {np.mean(data_out):.2f}")  # 130.56

R Implementation

data <- c(72, 85, 90, 68, 78, 95, 88, 74)
cat("Mean:", mean(data), "\n")  # 81.25

# Grouped
cat("Grouped:", weighted.mean(c(15,25,35,45,55), c(3,8,12,5,2)), "\n")

# Trimmed mean (robust)
cat("Trimmed (10%):", mean(data, trim=0.1), "\n")

When to Use Mean vs Median

Situation	Use
Symmetric, no outliers	Mean
Skewed data (income, prices)	Median
Categorical data	Mode
Need SD / variance	Mean
Outliers present	Median

Key Takeaways

Sum ÷ count — $\bar{x} = \sum x_i / n$ ; always check for outliers first
Outlier sensitivity — one extreme value shifts the mean significantly
Least-squares — mean minimises $\sum(x_i - c)^2$ for any constant $c$
Grouped data — use $\sum f_i m_i / \sum f_i$ when only frequency table is available
$\mu$ vs $\bar{x}$ — population vs sample symbol; computation is identical
Prefer median for skewed distributions like income or survival times