Standard Deviation — Formula, Empirical Rule, and Coefficient of Variation

Foundations of StatisticsDescriptive StatisticsFree Lesson

Advertisement

Standard Deviation

The standard deviation is the square root of variance — it returns to the original units of the data, making it directly interpretable.

s=i=1n(xixˉ)2n1s = \sqrt{\frac{\sum_{i=1}^n (x_i - \bar{x})^2}{n-1}}

import numpy as np
import matplotlib.pyplot as plt
from scipy import stats

data = np.array([85, 90, 78, 92, 88, 76, 95, 83, 87, 91])
mean = np.mean(data)
std  = np.std(data, ddof=1)

print(f"Mean:    {mean:.2f}")
print(f"Std Dev: {std:.4f}")
print(f"Variance: {std**2:.4f}")
print(f"Each score deviates ≈{std:.1f} points from the mean on average")

The Empirical Rule (68-95-99.7)

For approximately normal distributions:

Range% of Data
μ ± 1σ68.27%
μ ± 2σ95.45%
μ ± 3σ99.73%
np.random.seed(42)
scores = np.random.normal(75, 10, 10000)
mu, sigma = scores.mean(), scores.std()

within_1 = np.abs(scores - mu) <= sigma
within_2 = np.abs(scores - mu) <= 2*sigma
within_3 = np.abs(scores - mu) <= 3*sigma

print(f"Within 1σ: {within_1.mean()*100:.2f}%  (theory: 68.27%)")
print(f"Within 2σ: {within_2.mean()*100:.2f}%  (theory: 95.45%)")
print(f"Within 3σ: {within_3.mean()*100:.2f}%  (theory: 99.73%)")

# Visualization
x = np.linspace(mu-4*sigma, mu+4*sigma, 300)
y = stats.norm.pdf(x, mu, sigma)

fig, ax = plt.subplots(figsize=(10, 5))
ax.plot(x, y, 'k-', lw=2, label='Normal curve')
for k, color, label in zip([3,2,1],
                             ['#b8daff','#c3e6cb','#d4edda'],
                             ['99.73% (±3σ)','95.45% (±2σ)','68.27% (±1σ)']):
    ax.fill_between(x, y, where=(np.abs(x-mu)<=k*sigma), color=color, label=label)
for k in [1,2,3]:
    ax.axvline(mu+k*sigma, color='gray', ls='--', alpha=0.5)
    ax.axvline(mu-k*sigma, color='gray', ls='--', alpha=0.5)
ax.axvline(mu, color='red', lw=2, label=f'μ={mu:.0f}')
ax.set_title('Empirical Rule (68-95-99.7)')
ax.legend()
plt.tight_layout()
plt.savefig('empirical_rule.png', dpi=150)
plt.show()

Coefficient of Variation (CV)

CV=sxˉ×100%CV = \frac{s}{\bar{x}} \times 100\%

Enables comparison of variability across different scales and units.

datasets = {
    'Heights (cm)': np.random.normal(170, 10, 200),
    'Weights (kg)': np.random.normal(70, 15, 200),
    'IQ Scores':    np.random.normal(100, 15, 200),
    'Income ($K)':  np.random.lognormal(4, 0.5, 200),
}

print(f"{'Variable':<20} {'Mean':>8} {'SD':>8} {'CV%':>8}")
print("-" * 48)
for name, d in datasets.items():
    cv = np.std(d,ddof=1)/np.mean(d)*100
    print(f"{name:<20} {np.mean(d):>8.1f} {np.std(d,ddof=1):>8.1f} {cv:>7.1f}%")

Outlier Detection (3σ Rule)

np.random.seed(0)
data_out = np.concatenate([np.random.normal(50,5,97), [120, 5, 3]])
z = np.abs((data_out - data_out.mean()) / data_out.std())
outliers = data_out[z > 3]
print(f"Outliers (|z|>3): {outliers}")

Key Takeaways

  1. Standard deviation is in the same units as the data — directly interpretable
  2. 68-95-99.7 rule applies to normal distributions: memorize these thresholds
  3. CV = SD/mean allows variability comparison across different scales
  4. For samples use ddof=1 in NumPy; for populations use ddof=0
  5. |z| > 3 (3σ from mean) flags potential outliers in approximately normal data
  6. Skewed distributions violate the empirical rule — never apply it blindly

Advertisement

Need Expert Statistics Help?

Get personalized tutoring, dissertation support, or statistical consulting.

Advertisement