When Not All Data Points Deserve Equal Say

Descriptive Statistics

The Weighted Mean: Giving Voice to What Matters

A four-credit course shouldn't count the same as a one-credit elective. The weighted mean fixes the blind spots of simple averages.

GPA calculation — credit hours determine how much each grade matters
Portfolio returns — dollar weights reveal true investment performance
Survey correction — adjusting for who actually responded versus who should have

When observations differ in importance, the arithmetic mean lies — the weighted mean tells the truth.

What is the Weighted Mean?

Definition

The weighted mean assigns different weights to different observations, allowing some values to have more influence on the average than others.

When to Use Weighted Mean

Situation	Why Weights Matter
GPA (grade point average)	Courses have different credit hours
Portfolio returns	Assets have different dollar weights
Survey analysis	Respondents represent different group sizes
Grouped frequency data	Each midpoint represents many observations
Price indices (CPI, etc.)	Goods have different consumption weights

Python Implementation

import numpy as np
import pandas as pd

# ========================================
# Example 1: GPA Calculation
# ========================================
courses = pd.DataFrame({
    'Course': ['Statistics', 'Linear Algebra', 'Machine Learning', 'Databases', 'Ethics'],
    'Grade_Points': [4.0, 3.7, 3.3, 4.0, 3.0],
    'Credits': [4, 3, 4, 3, 1]
})

weighted_gpa = np.average(courses['Grade_Points'], weights=courses['Credits'])
simple_gpa = courses['Grade_Points'].mean()

print("Courses:")
print(courses.to_string(index=False))
print(f"\nWeighted GPA: {weighted_gpa:.4f}")
print(f"Unweighted GPA: {simple_gpa:.4f}")
print(f"Difference: {weighted_gpa - simple_gpa:+.4f} (credits matter!)")

# ========================================
# Example 2: Portfolio Return
# ========================================
portfolio = pd.DataFrame({
    'Asset': ['Stock A', 'Stock B', 'Bonds', 'Cash'],
    'Weight_pct': [40, 35, 20, 5],
    'Return_pct': [12.5, -3.2, 4.1, 0.5]
})

portfolio_return = np.average(portfolio['Return_pct'], 
                               weights=portfolio['Weight_pct'])
simple_avg_return = portfolio['Return_pct'].mean()

print("\nPortfolio:")
print(portfolio.to_string(index=False))
print(f"\nWeighted portfolio return: {portfolio_return:.2f}%")
print(f"Simple average return:     {simple_avg_return:.2f}%")

# ========================================
# Example 3: Survey Weighting (post-stratification)
# ========================================
survey = pd.DataFrame({
    'Group': ['18-34', '35-54', '55+'],
    'Survey_pct': [15, 50, 35],    # % in survey sample
    'Pop_pct': [30, 40, 30],       # % in true population
    'Support_pct': [75, 55, 40]    # % supporting the policy
})

# Unweighted (biased — overrepresents 35-54 group)
unweighted = np.average(survey['Support_pct'], weights=survey['Survey_pct'])
# Weighted to population (correct)
weighted = np.average(survey['Support_pct'], weights=survey['Pop_pct'])

print("\nSurvey with nonrepresentative sample:")
print(survey.to_string(index=False))
print(f"\nUnweighted mean: {unweighted:.1f}% support (biased!)")
print(f"Population-weighted mean: {weighted:.1f}% support (correct)")

Properties of the Weighted Mean

# Property 1: Reduces to arithmetic mean when all weights equal
equal_weights = [1, 1, 1, 1, 1]
data = [10, 20, 30, 40, 50]
print(f"Equal weights -> weighted mean = {np.average(data, weights=equal_weights):.1f}")
print(f"Simple mean = {np.mean(data):.1f}")

# Property 2: Extreme weights force convergence toward that value
extreme_weights = [1, 1, 1, 1, 100]
print(f"Extreme weight on last -> weighted mean = {np.average(data, weights=extreme_weights):.1f}")
print(f"Last value: {data[-1]}")  # Should be close to 50

# Property 3: Normalized weights must sum to 1 for percentages to work
weights = [4, 3, 4, 3, 1]
norm_weights = [w/sum(weights) for w in weights]
print(f"Normalized weights: {[f'{w:.3f}' for w in norm_weights]}")
print(f"Sum: {sum(norm_weights):.4f}")

Key Takeaways

The weighted mean is the correct tool when observations differ in importance — never treat unequal data points equally.

GPA, portfolio returns, and price indices are all weighted means in disguise — recognizing this prevents costly errors.

Survey post-stratification weighting corrects for nonprobability samples and prevents biased conclusions.

Equal weights reduce to the arithmetic mean — the simple average is just a special case of the weighted mean.

Weighted Mean in Machine Learning

ML Application	Weighting Scheme	Why
Ensemble models	Accuracy-weighted average	Better models get more vote
Inverse-variance weighting	w = 1/σ²	Minimum variance estimator
Attention mechanisms	Learned weights	Transformer attention = weighted mean of tokens
Loss aggregation	Sample weights	Rare class gets higher weight
Exponential moving average	Exponential decay weights	Recent data weighted more

import numpy as np
from sklearn.metrics import accuracy_score

# Ensemble: weighted average of 3 models
model_preds = np.array([
    [0.9, 0.1, 0.2, 0.8],  # Model A
    [0.1, 0.8, 0.3, 0.7],  # Model B
    [0.2, 0.7, 0.8, 0.3],  # Model C
])
model_weights = np.array([0.5, 0.3, 0.2])  # A is best

weighted_avg = np.average(model_preds, axis=0, weights=model_weights)
print(f"Weighted ensemble prediction: {weighted_avg}")
print(f"Predicted class: {np.argmax(weighted_avg)}")

# Attention mechanism (simplified)
# Query, Key, Value — attention scores = weights over values
tokens = np.array([[1, 0, 0], [0, 1, 0], [0, 0, 1]])  # 3 tokens
attention_scores = np.array([0.7, 0.2, 0.1])  # softmax output
output = np.average(tokens, axis=0, weights=attention_scores)
print(f"\nAttention output: {output}")
print("This is exactly a weighted mean of token embeddings!")

Weighting is not a statistical trick — it's an acknowledgment that in the real world, not all observations are created equal.

Weighted Mean — Formula, Applications, and Python