Variance — Population vs Sample Formula and Interpretation

Foundations of StatisticsDescriptive StatisticsFree Lesson

Advertisement

Variance

Variance measures the average squared deviation from the mean.

Population variance: σ² = (1/N) Σ(xᵢ − μ)²

Sample variance (Bessel's correction): s² = (1/(n−1)) Σ(xᵢ − x̄)²

We divide by n−1 to produce an unbiased estimator of σ².

import numpy as np
from scipy import stats
import matplotlib.pyplot as plt

data = np.array([4, 7, 13, 2, 1, 8, 11, 6, 9, 5])
n = len(data)
mean = np.mean(data)

deviations = data - mean
squared_devs = deviations ** 2
pop_var = squared_devs.sum() / n
sample_var = squared_devs.sum() / (n - 1)

print(f"Mean: {mean:.2f}")
print(f"Deviations: {deviations}")
print(f"Squared deviations: {squared_devs}")
print(f"Population variance σ² (÷n):   {pop_var:.4f}")
print(f"Sample variance s² (÷n−1):     {sample_var:.4f}")
print(f"NumPy ddof=0 (population): {np.var(data, ddof=0):.4f}")
print(f"NumPy ddof=1 (sample):     {np.var(data, ddof=1):.4f}")

Why Divide by n−1?

The sample mean x̄ is computed from the data and is the "closest" center to the sample. This causes squared deviations from x̄ to systematically underestimate those from the true μ. Dividing by n−1 corrects this bias.

np.random.seed(42)
pop = np.random.normal(50, 10, 10000)
true_var = np.var(pop)

biased, unbiased = [], []
for _ in range(5000):
    sample = np.random.choice(pop, size=10)
    biased.append(np.var(sample, ddof=0))
    unbiased.append(np.var(sample, ddof=1))

print(f"True population variance:           {true_var:.4f}")
print(f"Average biased estimate (÷n):        {np.mean(biased):.4f}  ← UNDERESTIMATES")
print(f"Average unbiased estimate (÷n−1):    {np.mean(unbiased):.4f}  ← CORRECT")

Computational Shortcut

s2=xi2(xi)2/nn1s^2 = \frac{\sum x_i^2 - (\sum x_i)^2/n}{n-1}

sum_x  = data.sum()
sum_x2 = (data**2).sum()
s2_comp = (sum_x2 - sum_x**2/n) / (n-1)
print(f"Computational formula: {s2_comp:.4f}")
print(f"Standard formula:      {sample_var:.4f}")   # identical

Properties of Variance

a, b = 3, 10
x = np.random.normal(50, 10, 1000)

# Shifting does NOT change variance
assert abs(np.var(x+b, ddof=1) - np.var(x, ddof=1)) < 1e-6

# Scaling multiplies variance by a²
assert abs(np.var(a*x, ddof=1) - a**2*np.var(x, ddof=1)) < 1e-4

print("Var(X+b) = Var(X):  True")
print("Var(aX)  = a²Var(X): True")

# Var(X+Y) = Var(X)+Var(Y) for independent X,Y
y = np.random.normal(30, 5, 1000)
print(f"Var(X+Y) ≈ Var(X)+Var(Y):  {abs(np.var(x+y,ddof=1)-(np.var(x,ddof=1)+np.var(y,ddof=1))) < 200}")

Key Takeaways

  1. Variance = average squared deviation — squaring makes all deviations positive
  2. Always use ddof=1 for samples (Bessel's correction) in Python
  3. Variance units are squared — use standard deviation (√variance) for interpretation
  4. Variance is additive for independent variables: Var(X+Y) = Var(X) + Var(Y)
  5. Scaling by a multiplies variance by a²: Var(aX) = a²Var(X)
  6. Adding a constant does not change variance: Var(X+b) = Var(X)

Advertisement

Need Expert Statistics Help?

Get personalized tutoring, dissertation support, or statistical consulting.

Advertisement