Levels of Measurement
In 1946, psychologist Stanley Stevens proposed a taxonomy of four levels of measurement that has since become foundational in statistics. The level determines which statistical operations are mathematically valid.
The Four Levels
1. Nominal Scale
The weakest level. Data is placed into named categories with no meaningful order or distance between them.
Properties:
- Identity: each value belongs to a distinct category
- No order, no distance, no meaningful zero
Examples: Gender, blood type, nationality, color, product ID, political party
Valid statistics: Frequency, mode, chi-square test
Invalid: Mean, median, standard deviation
import pandas as pd
from scipy.stats import chi2_contingency
# Nominal: blood type distribution
blood_types = pd.Series(['A', 'O', 'B', 'AB', 'O', 'A', 'O', 'A', 'B', 'O'])
print("Mode:", blood_types.mode()[0])
print(blood_types.value_counts())
# Chi-square test of independence (nominal vs nominal)
2. Ordinal Scale
Categories have a meaningful order, but the intervals between categories are unknown or unequal.
Properties:
- Identity + Order
- No distance, no meaningful zero
Examples:
- Survey Likert scales (Strongly Disagree → Strongly Agree)
- Education level (High School < Bachelor's < Master's < PhD)
- Race finishing position (1st, 2nd, 3rd)
- Socioeconomic status (Low, Middle, High)
Valid statistics: Median, IQR, percentiles, Spearman rank correlation, Mann-Whitney test
Invalid: Arithmetic mean (debated), standard deviation, Pearson r
import numpy as np
from scipy.stats import spearmanr
# Ordinal: race positions
team_a = [1, 3, 5, 7] # positions team A finished
team_b = [2, 4, 6, 8] # positions team B finished
# Spearman correlation (rank-based — appropriate for ordinal)
rho, p = spearmanr(team_a, team_b)
print(f"Spearman ρ = {rho:.3f}, p = {p:.4f}")
# Median is appropriate for ordinal
satisfaction = [3, 4, 2, 5, 4, 3, 4, 5, 2, 4] # 1–5 scale
print(f"Median satisfaction: {np.median(satisfaction)}")
3. Interval Scale
Equal intervals between values, but no true zero — zero is arbitrary, not the absence of the quantity.
Properties:
- Identity + Order + Equal Intervals
- No true zero (ratios meaningless)
Examples:
- Temperature in Celsius or Fahrenheit (0°C ≠ "no temperature")
- IQ scores (IQ 0 doesn't mean no intelligence)
- Calendar years (Year 0 is arbitrary)
- Likert scales (when treated as interval — common in practice)
Valid statistics: Mean, standard deviation, Pearson r, t-tests, ANOVA
Invalid: Ratios ("twice as hot" is not meaningful in Celsius)
# Temperature conversion — shows why ratios fail for interval data
celsius_a = 20
celsius_b = 40
# It is NOT true that 40°C is "twice as hot" as 20°C
# Convert to Kelvin (ratio scale) to see why:
kelvin_a = celsius_a + 273.15 # 293.15 K
kelvin_b = celsius_b + 273.15 # 313.15 K
ratio_celsius = celsius_b / celsius_a # 2.0 — misleading!
ratio_kelvin = kelvin_b / kelvin_a # 1.068 — true ratio
print(f"Celsius ratio: {ratio_celsius:.3f} ← NOT meaningful")
print(f"Kelvin ratio: {ratio_kelvin:.3f} ← Meaningful thermodynamic ratio")
4. Ratio Scale
The strongest level. Has all properties of interval scale plus a true absolute zero (zero means absence of the attribute).
Properties:
- Identity + Order + Equal Intervals + True Zero
Examples:
- Height, weight, length (0 kg = no mass)
- Age, time duration
- Income (0 = no income)
- Temperature in Kelvin
- Number of items (count data)
Valid statistics: All statistics including geometric mean, coefficient of variation, and ratio comparisons.
import numpy as np
heights_m = np.array([1.65, 1.72, 1.80, 1.58, 1.90])
print(f"Mean: {np.mean(heights_m):.3f} m")
print(f"Ratio (tallest/shortest): {heights_m.max()/heights_m.min():.3f}")
print(f"Geometric mean: {np.exp(np.log(heights_m).mean()):.3f} m")
print(f"CV (coeff of variation): {(np.std(heights_m)/np.mean(heights_m)*100):.1f}%")
# All valid because height is ratio scale
Summary Table
| Level | Order | Equal Intervals | True Zero | Example | Appropriate Mean |
|---|---|---|---|---|---|
| Nominal | ❌ | ❌ | ❌ | Eye color | Mode |
| Ordinal | ✅ | ❌ | ❌ | Satisfaction rating | Median |
| Interval | ✅ | ✅ | ❌ | Temperature (°C) | Arithmetic mean |
| Ratio | ✅ | ✅ | ✅ | Height, weight | Geometric mean possible |
Choosing the Right Statistical Test
def suggest_test(level_of_measurement, n_groups, paired=False):
"""Suggest appropriate statistical test based on measurement level."""
if level_of_measurement == 'nominal':
return "Chi-square test (categories) or Fisher's exact test (small samples)"
elif level_of_measurement == 'ordinal':
if n_groups == 2:
return "Mann-Whitney U (independent) or Wilcoxon signed-rank (paired)"
else:
return "Kruskal-Wallis (independent) or Friedman (repeated measures)"
elif level_of_measurement in ('interval', 'ratio'):
if n_groups == 1:
return "One-sample t-test"
elif n_groups == 2:
return "Independent t-test" if not paired else "Paired t-test"
else:
return "One-way ANOVA" if not paired else "Repeated measures ANOVA"
# Examples
print(suggest_test('nominal', 2))
print(suggest_test('ordinal', 2))
print(suggest_test('ratio', 2, paired=False))
print(suggest_test('ratio', 3))
Key Takeaways
- Nominal: categories only — use frequency, mode, chi-square
- Ordinal: ordered categories — use median, rank-based tests
- Interval: equal gaps, no true zero — use mean, t-test, Pearson r
- Ratio: everything interval + true zero — use all statistics including ratios and CV
- Applying higher-level statistics to lower-level data is mathematically invalid but regrettably common
- When in doubt, use conservative (lower-level) methods — they're more robust