Variance

Descriptive Statistics

Why Does Data Spread Out?

Variance is the foundation of statistical dispersion — it tells you how far data points wander from the average.

Understanding variance helps you:

Quantify uncertainty — measure how reliable or volatile a dataset truly is
Compare datasets — see which group has more consistent behavior
Build estimators — understand why dividing by n-1 produces unbiased results
Unlock advanced measures — standard deviation, skewness, and kurtosis all build on variance

Master variance and every other measure of spread becomes a natural extension.

What is Variance?

Definition

Variance measures the average squared deviation from the mean. It quantifies the spread or dispersion of a random variable around its expected value.

For a random variable with mean , the variance is:

For a finite population of equally likely observations:

The Shortcut Formula

Using the identity , we obtain the computationally equivalent form:

This form is useful because it requires only one pass through the data (computing and simultaneously).

Sample Variance and Bessel's Correction

Given a sample drawn i.i.d. from a population with variance , the sample variance is:

Why ? — Degrees of Freedom

The computational form of the sample variance is:

Algebraic Properties of Variance

Variance as a Second Central Moment

The variance is the second central moment of a distribution. The general framework of moments provides:

The skewness uses the third central moment (), and kurtosis uses the fourth (). Variance is the foundational building block.

Worked Example

Given the sample: with :

Step 1: Compute :

Step 2: Compute squared deviations:


4	−2.6	6.76
7	0.4	0.16
13	6.4	40.96
2	−4.6	21.16
1	−5.6	31.36
8	1.4	1.96
11	4.4	19.36
6	−0.6	0.36
9	2.4	5.76
5	−1.6	2.56

Step 3: Sum:

Step 4: Population variance:

Step 5: Sample variance:

The Bias of the Naive Estimator

Relationship to Standard Deviation

The standard deviation returns the spread to the original units of the data. While variance is mathematically convenient (additive for independent variables), standard deviation is more interpretable because it shares the units of the mean.

Variance in Machine Learning

ML Application	Variance Usage	Why
Bias-variance tradeoff	Model variance = overfitting	High variance = complex model
Feature selection	Low variance → remove	No signal in feature
Regularization	Penalize high variance weights	Ridge/Lasso reduce variance
Ensemble methods	Bagging reduces variance	Average many high-variance models
Information gain	Variance reduction = split quality	Decision trees split on variance

import numpy as np
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import BaggingRegressor
from sklearn.metrics import mean_squared_error

np.random.seed(42)
n = 200
X = np.random.randn(n, 1) * 10
y = 3 * X[:,0] + np.random.randn(n) * 3  # signal + noise

# Single tree: high variance
tree = DecisionTreeRegressor(max_depth=10, random_state=42)
from sklearn.model_selection import cross_val_score
tree_var = -cross_val_score(tree, X, y, cv=10, scoring='neg_mean_squared_error').var()
print(f"Single tree MSE variance across folds: {tree_var:.2f}")

# Bagging: reduces variance by averaging
bagging = BaggingRegressor(n_estimators=10, random_state=42)
bag_var = -cross_val_score(bagging, X, y, cv=10, scoring='neg_mean_squared_error').var()
print(f"Bagging MSE variance across folds: {bag_var:.2f}")
print(f"Variance reduction: {(1 - bag_var/tree_var)*100:.1f}%")

Key Takeaways

Variance = expected squared deviation from the mean — it quantifies spread

Bessel's correction (dividing by n−1) makes the sample variance unbiased because the sample mean absorbs one degree of freedom

Variance is additive for independent variables: Var(X+Y) = Var(X) + Var(Y)

Scaling: Var(aX) = a²Var(X) — variance scales quadratically with constants

"Variance is the price you pay for randomness." — Every model that ignores spread will be surprised by reality.

Variance — Population vs Sample Formula and Interpretation