The Arithmetic Mean

Descriptive Statistics

The Most Used — and Most Misused — Statistical Measure

The arithmetic mean is the most widely used statistical measure in all of statistics. Understanding it deeply saves you from common analytical errors.

Algebraic properties — Sum of deviations equals zero; minimizes sum of squared deviations
Population vs sample — Why dividing by n-1 gives an unbiased estimator
Sensitivity to outliers — One extreme value can pull the mean far from the center
Trimmed and winsorized alternatives — Robust versions for contaminated data

The mean is powerful, but it is not always right. Know when to use it and when to look elsewhere.

What is the Arithmetic Mean?

Definition

The arithmetic mean of a set of values is the sum of the values divided by the number of values. It is the value that minimizes the sum of squared deviations from itself.

Definition and Formula

For a sample of observations:

For a population of observations:

For a continuous random variable with probability density function :

Algebraic Properties of the Mean

Proof that the Mean Minimizes Sum of Squared Deviations

Weighted Mean

When observations have different importance, frequency, or precision:

The ordinary mean is the special case for all . In the inverse-variance weighting scheme, , which gives the minimum-variance unbiased estimator.

Mean for Grouped Data

When raw data is unavailable (only a frequency table):

This is an approximation — the exact mean cannot be recovered from grouped data.

Trimmed Mean: A Robust Alternative

The trimmed mean trades a small amount of efficiency (under normality) for greatly improved robustness against outliers.

Influence Function and Breakdown Point

The breakdown point of the mean is — a single observation at infinity can make the mean infinite.

Limitations of the Mean

Problem	Example	Solution
Sensitive to outliers	CEO salary distorts avg company salary	Use median
Meaningless for nominal data	Mean blood type is nonsense	Use mode
Inappropriate for skewed data	Mean income misleads	Use median
May not be a possible value	Mean family size = 2.3 children	Use appropriate measure
Hides multimodality	Mean of bimodal = between the modes	Visualize first
Unbounded influence	One extreme value shifts mean arbitrarily	Use trimmed mean or winsorized mean

The Mean in Machine Learning

The arithmetic mean is the mathematical foundation of ML:

ML Concept	How Mean is Used	Formula
Mean Squared Error	Minimizing → predicts the mean	MSE = (1/n)Σ(yᵢ - ŷᵢ)²
Batch Normalization	Stabilizes training by centering activations	x̂ = (x - μ_batch) / σ_batch
Mean Pooling (NLP)	Averages token embeddings	h = (1/T)Σhₜ
StandardScaler	Centers features to mean=0	x_scaled = (x - μ) / σ
Weighted Loss	Inverse-variance weighting	wᵢ = 1/σᵢ²

import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import make_regression
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

# The mean is the minimizer of MSE
np.random.seed(42)
y_true = np.array([10, 20, 30, 40, 50, 100])  # skewed by outlier

# What value minimizes sum of squared errors?
candidates = np.linspace(0, 100, 200)
sse = [np.sum((y_true - c)**2) for c in candidates]
best = candidates[np.argmin(sse)]
print(f"Value minimizing SSE: {best:.1f}")
print(f"Mean of data: {np.mean(y_true):.1f}")
print(f"They are the same! The mean is the least squares estimator.\n")

# StandardScaler uses mean
X, y = make_regression(n_samples=100, n_features=3, noise=10, random_state=42)
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
print(f"Original mean: {X.mean(axis=0).round(3)}")
print(f"Scaled mean:   {X_scaled.mean(axis=0).round(10)}")
print(f"Original std:  {X.std(axis=0).round(3)}")
print(f"Scaled std:    {X_scaled.std(axis=0).round(3)}")

Arithmetic Mean — Formula, Properties, Computation, Limitations