Geometric Mean
The geometric mean is the nth root of the product of n positive values. It is the appropriate average for ratios, rates of change, and multiplicative processes.
Why Geometric Mean for Growth Rates?
Suppose a stock grows: Year 1: +100%, Year 2: −50%.
- Arithmetic mean: (100% − 50%) / 2 = +25% per year → sounds great
- Actual result:
200 → $100 → back where you started (0% per year!) - Geometric mean: √(2.00 × 0.50) − 1 = √(1.0) − 1 = 0% → correct!
import numpy as np
from scipy.stats import gmean
import pandas as pd
import matplotlib.pyplot as plt
# ==========================================
# Example 1: Investment Returns
# ==========================================
annual_returns = [0.25, -0.10, 0.40, -0.20, 0.15] # +25%, -10%, etc.
multipliers = [1 + r for r in annual_returns]
# Geometric mean of multipliers
geo_mean_multiplier = gmean(multipliers)
arith_mean_multiplier = np.mean(multipliers)
geo_cagr = geo_mean_multiplier - 1
arith_mean_return = np.mean(annual_returns)
print("Annual returns:", [f"{r:.0%}" for r in annual_returns])
print(f"Arithmetic mean return: {arith_mean_return:.2%} ← WRONG for compounding")
print(f"Geometric mean (CAGR): {geo_cagr:.2%} ← CORRECT for compounding")
# Verify: compound at geometric mean
initial = 1000
final_actual = initial * np.prod(multipliers)
final_geometric = initial * (1 + geo_cagr)**5
print(f"\nActual final value: ${final_actual:.2f}")
print(f"Projected via CAGR: ${final_geometric:.2f}")
# ==========================================
# Example 2: Population Growth
# ==========================================
years = [2018, 2019, 2020, 2021, 2022, 2023]
population = [50000, 53000, 56180, 59551, 63124, 66911]
growth_rates = [population[i]/population[i-1] for i in range(1, len(population))]
geo_mean_growth = gmean(growth_rates)
print(f"\nAnnual growth rates: {[f'{r:.4f}' for r in growth_rates]}")
print(f"Geometric mean growth rate: {geo_mean_growth:.4f}")
print(f"Expected CAGR: {geo_mean_growth - 1:.2%}")
# Verify
n_years = len(population) - 1
cagr_from_endpoints = (population[-1]/population[0])**(1/n_years)
print(f"CAGR from endpoints: {cagr_from_endpoints - 1:.2%}")
# ==========================================
# Example 3: Geometric vs Arithmetic Mean
# ==========================================
print("\n=== AM-GM Inequality ===")
print("Arithmetic mean is ALWAYS ≥ Geometric mean (for positive values)")
for trial in range(5):
x = np.random.uniform(1, 100, 10)
am = np.mean(x)
gm = gmean(x)
print(f" AM = {am:.3f}, GM = {gm:.3f}, AM ≥ GM: {am >= gm}")
Log-transform Shortcut
The geometric mean of X equals exp(arithmetic mean of log X):
# Same result, numerically more stable for many values
data = np.random.uniform(1, 100, 1000)
gm_direct = gmean(data)
gm_log = np.exp(np.log(data).mean())
print(f"Direct gmean: {gm_direct:.6f}")
print(f"Via log: {gm_log:.6f}")
When to Use Each Mean
| Situation | Use |
|---|---|
| Average price, temperature, test score | Arithmetic mean |
| Average growth rate, CAGR, inflation | Geometric mean |
| Average rate (speed, frequency) | Harmonic mean |
| Ratios, multiplicative processes | Geometric mean |
Key Takeaways
- Use geometric mean for growth rates and ratios — arithmetic mean overstates
- AM ≥ GM always (AM-GM inequality), with equality only when all values are identical
- CAGR (compound annual growth rate) is the geometric mean of annual multipliers
- Log transformation converts geometric mean to arithmetic mean of log-transformed data
- The geometric mean is undefined if any value is zero or negative
- Investment returns, population growth, inflation rates — all geometric means in finance