Skewness
Skewness quantifies asymmetry of a distribution.
Positive → right tail. Negative → left tail. Zero → symmetric.
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
np.random.seed(42)
right_skew = np.random.lognormal(0, 0.8, 2000) # income-like
symmetric = np.random.normal(0, 1, 2000)
left_skew = -np.random.lognormal(0, 0.8, 2000)
for name, data in [("Right-Skewed", right_skew),
("Symmetric", symmetric),
("Left-Skewed", left_skew)]:
sk = stats.skew(data)
print(f"{name:<15}: skew={sk:+.4f}, mean={np.mean(data):.3f}, median={np.median(data):.3f}")
Mean vs Median Under Skewness
Right-Skewed: Mode < Median < Mean
Symmetric: Mode ≈ Median ≈ Mean
Left-Skewed: Mean < Median < Mode
fig, axes = plt.subplots(1, 3, figsize=(15, 4))
datasets = [("Right-Skewed", right_skew, '#f8d7da'),
("Symmetric", symmetric, '#d4edda'),
("Left-Skewed", left_skew, '#d1ecf1')]
for ax, (name, data, color) in zip(axes, datasets):
ax.hist(data, bins=50, density=True, color=color, edgecolor='gray', alpha=0.7)
ax.axvline(np.mean(data), color='red', lw=2, ls='--', label=f'Mean={np.mean(data):.2f}')
ax.axvline(np.median(data), color='blue', lw=2, ls='-', label=f'Median={np.median(data):.2f}')
ax.set_title(f'{name}\nskewness={stats.skew(data):.3f}')
ax.legend(fontsize=8)
plt.tight_layout()
plt.savefig('skewness.png', dpi=150)
plt.show()
Interpretation Guide
| Absolute Skewness | Interpretation |
|---|---|
| < 0.5 | Approximately symmetric |
| 0.5–1.0 | Moderately skewed |
| > 1.0 | Highly skewed — consider transformation |
Fixing Skewness with Transformations
skewed = np.random.lognormal(0, 1, 500)
print(f"Original skewness: {stats.skew(skewed):.4f}")
# Log transform (works for positive right-skewed data)
log_transformed = np.log(skewed)
print(f"Log-transformed skewness: {stats.skew(log_transformed):.4f}")
# Square root (moderate right skew)
sqrt_transformed = np.sqrt(skewed)
print(f"Sqrt-transformed skewness: {stats.skew(sqrt_transformed):.4f}")
Key Takeaways
- Positive skew = right tail — mean > median (tail pulls mean rightward)
- Negative skew = left tail — mean < median
- |skew| > 1: strongly skewed — use non-parametric methods or transform
- Log transformation corrects right skewness in income, prices, reaction times
- Always visualize — skewness alone doesn't tell you the full story
- Income, house prices, stock returns are classically right-skewed