Z-Scores — Standardization and the Normal Table

Foundations of StatisticsDescriptive StatisticsFree Lesson

Advertisement

Z-Scores (Standard Scores)

z=xμσz = \frac{x - \mu}{\sigma}

A z-score measures how many standard deviations a value is from the mean.

import numpy as np
from scipy.stats import norm
from sklearn.preprocessing import StandardScaler

scores = np.array([72, 85, 91, 68, 77, 94, 83, 79, 88, 62])
mean, std = scores.mean(), scores.std(ddof=1)
z = (scores - mean) / std

print(f"Mean={mean:.2f}, Std={std:.2f}\n")
for s, zz in zip(scores, z):
    print(f"  Score={s:3d}  z={zz:+.3f}  ({abs(zz):.1f}σ {'above' if zz>0 else 'below'} mean)")

Comparing Across Different Tests

alice_math  = 85;  mu_m = 75; sd_m = 10   # Math test
alice_essay = 4.5; mu_e = 4.0; sd_e = 0.5  # Essay rubric (0-6)

z_math  = (alice_math  - mu_m) / sd_m
z_essay = (alice_essay - mu_e) / sd_e

print(f"Math z-score:  {z_math:.2f}")
print(f"Essay z-score: {z_essay:.2f}")
print(f"Better performance: {'Math' if z_math > z_essay else 'Essay'}")

Z-Scores and Normal Probabilities

print("Key z-score probabilities:")
print(f"P(Z < 1.645) = {norm.cdf(1.645):.4f}  (90th pct)")
print(f"P(Z < 1.960) = {norm.cdf(1.960):.4f}  (97.5th pct)")
print(f"P(Z < 2.326) = {norm.cdf(2.326):.4f}  (99th pct)")
print(f"P(-1.96<Z<1.96) = {norm.cdf(1.96)-norm.cdf(-1.96):.4f}  (95%)")

# Given probability → find z
print(f"\nZ for 90th percentile: {norm.ppf(0.90):.4f}")
print(f"Z for 97.5th:          {norm.ppf(0.975):.4f}")
print(f"Z for 99th:            {norm.ppf(0.99):.4f}")

Standardization for Machine Learning

X = np.array([
    [170, 70, 25],
    [180, 85, 30],
    [160, 55, 22],
    [175, 80, 35],
    [165, 60, 28],
], dtype=float)

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
print("Standardized features (z-scores):")
print(X_scaled.round(3))
print(f"Column means (should be ~0): {X_scaled.mean(axis=0).round(6)}")
print(f"Column stds  (should be ~1): {X_scaled.std(axis=0).round(4)}")

Key Takeaways

  1. z = (x−μ)/σ — units are standard deviations from the mean
  2. Enables comparison across different scales — math score vs essay rubric
  3. P(−1.96 < Z < 1.96) ≈ 95% — foundation of 95% confidence intervals
  4. |z| > 3 flags potential outliers in approximately normal data
  5. StandardScaler in sklearn applies z-score standardization for ML pipelines
  6. Clean probability interpretation requires normality — for skewed data, use percentile ranks

Advertisement

Need Expert Statistics Help?

Get personalized tutoring, dissertation support, or statistical consulting.

Advertisement