Weighted Mean
The weighted mean assigns different weights to different observations, allowing some values to have more influence on the average than others.
When to Use Weighted Mean
| Situation | Why Weights Matter |
|---|---|
| GPA (grade point average) | Courses have different credit hours |
| Portfolio returns | Assets have different dollar weights |
| Survey analysis | Respondents represent different group sizes |
| Grouped frequency data | Each midpoint represents many observations |
| Price indices (CPI, etc.) | Goods have different consumption weights |
Python Implementation
import numpy as np
import pandas as pd
# ========================================
# Example 1: GPA Calculation
# ========================================
courses = pd.DataFrame({
'Course': ['Statistics', 'Linear Algebra', 'Machine Learning', 'Databases', 'Ethics'],
'Grade_Points': [4.0, 3.7, 3.3, 4.0, 3.0],
'Credits': [4, 3, 4, 3, 1]
})
weighted_gpa = np.average(courses['Grade_Points'], weights=courses['Credits'])
simple_gpa = courses['Grade_Points'].mean()
print("Courses:")
print(courses.to_string(index=False))
print(f"\nWeighted GPA: {weighted_gpa:.4f}")
print(f"Unweighted GPA: {simple_gpa:.4f}")
print(f"Difference: {weighted_gpa - simple_gpa:+.4f} (credits matter!)")
# ========================================
# Example 2: Portfolio Return
# ========================================
portfolio = pd.DataFrame({
'Asset': ['Stock A', 'Stock B', 'Bonds', 'Cash'],
'Weight_pct': [40, 35, 20, 5],
'Return_pct': [12.5, -3.2, 4.1, 0.5]
})
portfolio_return = np.average(portfolio['Return_pct'],
weights=portfolio['Weight_pct'])
simple_avg_return = portfolio['Return_pct'].mean()
print("\nPortfolio:")
print(portfolio.to_string(index=False))
print(f"\nWeighted portfolio return: {portfolio_return:.2f}%")
print(f"Simple average return: {simple_avg_return:.2f}%")
# ========================================
# Example 3: Survey Weighting (post-stratification)
# ========================================
survey = pd.DataFrame({
'Group': ['18-34', '35-54', '55+'],
'Survey_pct': [15, 50, 35], # % in survey sample
'Pop_pct': [30, 40, 30], # % in true population
'Support_pct': [75, 55, 40] # % supporting the policy
})
# Unweighted (biased — overrepresents 35-54 group)
unweighted = np.average(survey['Support_pct'], weights=survey['Survey_pct'])
# Weighted to population (correct)
weighted = np.average(survey['Support_pct'], weights=survey['Pop_pct'])
print("\nSurvey with nonrepresentative sample:")
print(survey.to_string(index=False))
print(f"\nUnweighted mean: {unweighted:.1f}% support (biased!)")
print(f"Population-weighted mean: {weighted:.1f}% support (correct)")
Properties of the Weighted Mean
# Property 1: Reduces to arithmetic mean when all weights equal
equal_weights = [1, 1, 1, 1, 1]
data = [10, 20, 30, 40, 50]
print(f"Equal weights → weighted mean = {np.average(data, weights=equal_weights):.1f}")
print(f"Simple mean = {np.mean(data):.1f}")
# Property 2: Extreme weights force convergence toward that value
extreme_weights = [1, 1, 1, 1, 100]
print(f"Extreme weight on last → weighted mean = {np.average(data, weights=extreme_weights):.1f}")
print(f"Last value: {data[-1]}") # Should be close to 50
# Property 3: Normalized weights must sum to 1 for percentages to work
weights = [4, 3, 4, 3, 1]
norm_weights = [w/sum(weights) for w in weights]
print(f"Normalized weights: {[f'{w:.3f}' for w in norm_weights]}")
print(f"Sum: {sum(norm_weights):.4f}")
Key Takeaways
- Weighted mean is the correct tool when observations differ in importance
- np.average(data, weights=w) in Python computes it exactly
- GPA, portfolio returns, price indices — all weighted means in disguise
- Survey post-stratification weighting corrects for nonprobability samples
- Equal weights → arithmetic mean (special case of weighted mean)
- A weight of zero excludes an observation; very large weights dominate the result