The Mode
The mode is the value that appears most frequently in a dataset. Unlike the mean and median, it is the only measure of central tendency that works for nominal (categorical) data.
Calculating the Mode
import numpy as np
import pandas as pd
from scipy import stats
import matplotlib.pyplot as plt
# Discrete data: shoe sizes
shoe_sizes = [8, 9, 8, 10, 8, 9, 10, 9, 8, 11, 9, 8, 10, 9, 8]
mode_result = stats.mode(shoe_sizes, keepdims=True)
print(f"Shoe sizes: {sorted(shoe_sizes)}")
print(f"Mode = {mode_result.mode[0]} (appears {mode_result.count[0]} times)")
# Using pandas
series = pd.Series(shoe_sizes)
print(f"Pandas mode: {series.mode().tolist()}")
# Frequency table
freq_table = series.value_counts().sort_index()
print(freq_table)
Bimodal and Multimodal Distributions
A distribution can have:
- Unimodal: one peak
- Bimodal: two peaks (often indicating two subpopulations)
- Multimodal: multiple peaks
np.random.seed(42)
fig, axes = plt.subplots(1, 3, figsize=(14, 4))
# Unimodal
data_uni = np.random.normal(50, 10, 1000)
axes[0].hist(data_uni, bins=30, color='steelblue', edgecolor='black', alpha=0.7)
axes[0].set_title('Unimodal Distribution')
axes[0].axvline(np.mean(data_uni), color='red', linestyle='--', label='Mean ≈ Mode ≈ Median')
axes[0].legend(fontsize=8)
# Bimodal (two populations)
data_bi = np.concatenate([np.random.normal(35, 7, 500),
np.random.normal(70, 7, 500)])
axes[1].hist(data_bi, bins=40, color='coral', edgecolor='black', alpha=0.7)
axes[1].set_title('Bimodal Distribution\n(Two groups mixed!)')
mean_bi = np.mean(data_bi)
axes[1].axvline(mean_bi, color='red', linestyle='--', label=f'Mean={mean_bi:.0f} (misleading!)')
axes[1].legend(fontsize=8)
# Mode for categorical data
categories = ['Python', 'R', 'SQL', 'Python', 'Python', 'R', 'SQL', 'Python', 'R', 'Python']
cat_counts = pd.Series(categories).value_counts()
cat_counts.plot(kind='bar', ax=axes[2], color=['steelblue','coral','green'], edgecolor='black')
axes[2].set_title('Mode for Categorical Data\n(Mode = Python)')
axes[2].tick_params(rotation=0)
plt.tight_layout()
plt.savefig('mode_examples.png', dpi=150)
plt.show()
Mode for Continuous Data
Continuous data rarely repeats exactly. Use a histogram or kernel density estimate (KDE) to find the modal class or the density peak.
from scipy import stats as scipy_stats
continuous = np.random.gamma(3, 2, 1000)
# KDE to find the mode
kde = scipy_stats.gaussian_kde(continuous)
x = np.linspace(continuous.min(), continuous.max(), 1000)
mode_estimate = x[np.argmax(kde(x))]
print(f"Continuous data mode (KDE peak): {mode_estimate:.3f}")
print(f"Theoretical mode of Gamma(3,2): {(3-1)*2:.3f}") # (α-1)β
plt.figure(figsize=(8, 4))
plt.hist(continuous, bins=40, density=True, color='lightblue', edgecolor='gray', alpha=0.7)
plt.plot(x, kde(x), 'r-', linewidth=2, label='KDE')
plt.axvline(mode_estimate, color='green', linewidth=2, linestyle='--',
label=f'KDE Mode ≈ {mode_estimate:.2f}')
plt.legend()
plt.title('Mode Estimation for Continuous Data')
plt.show()
When to Use the Mode
| Data Type | Mode Appropriate? |
|---|---|
| Nominal (color, country, brand) | ✅ Only valid measure of center |
| Ordinal (satisfaction scale) | ✅ Valid |
| Interval/Ratio (symmetric) | Sometimes (= mean ≈ median) |
| Interval/Ratio (skewed) | Less useful |
| Bimodal data | ✅ Essential to report both modes |
Key Takeaways
- Mode is the only valid central tendency measure for nominal data
- Bimodal data signals two subpopulations — don't report just one mode
- Continuous data: use KDE to estimate the mode as the density peak
- Mode can be absent (all values unique) or non-unique (multiple modes)
- Always visualize: a histogram or bar chart is the most natural mode display
- The mean of a bimodal distribution may fall between the two modes — a value no one has!