Frequency Distributions
A frequency distribution organizes raw data into a table or chart showing how often each value (or range of values) occurs. It transforms an unreadable list of numbers into an interpretable summary.
Absolute Frequency
The count of observations in each category or interval.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# Example: Final exam scores of 50 students
np.random.seed(42)
scores = np.random.normal(72, 12, 50).clip(0, 100).round(0).astype(int)
# --- Categorical frequency table ---
from collections import Counter
freq = Counter(scores)
freq_df = pd.DataFrame({'Score': sorted(freq.keys()),
'Frequency': [freq[k] for k in sorted(freq.keys())]})
print("First 10 rows of frequency table:")
print(freq_df.head(10))
Grouped Frequency Distribution
When data is continuous or has many unique values, we group into class intervals (bins).
Steps to build:
- Find range = max − min
- Choose number of classes (typically 5–20; Sturges' rule: k = 1 + 3.322 × log₁₀(n))
- Class width = Range / k (round up)
- Create non-overlapping, equal-width intervals
- Count observations in each interval
# Sturges' rule for number of bins
n = len(scores)
k_sturges = int(np.ceil(1 + 3.322 * np.log10(n)))
print(f"Sturges' rule: k = {k_sturges} bins")
# Build grouped frequency table
min_score, max_score = scores.min(), scores.max()
bin_width = int(np.ceil((max_score - min_score) / k_sturges / 10) * 10)
bins = range(40, 105, 10) # [40,50), [50,60), ...
labels = [f"{b}-{b+9}" for b in bins[:-1]]
score_series = pd.Series(scores)
grouped = pd.cut(score_series, bins=list(bins), right=False, labels=labels)
freq_table = (grouped.value_counts(sort=False)
.reset_index()
.rename(columns={'index': 'Interval', 'count': 'Frequency'}))
freq_table.columns = ['Interval', 'Frequency']
# Add relative and cumulative frequency
freq_table['Relative Freq'] = freq_table['Frequency'] / n
freq_table['Relative %'] = (freq_table['Relative Freq'] * 100).round(1)
freq_table['Cumulative Freq'] = freq_table['Frequency'].cumsum()
freq_table['Cumulative %'] = (freq_table['Cumulative Freq'] / n * 100).round(1)
print("\nGrouped Frequency Distribution:")
print(freq_table.to_string(index=False))
Output:
Grouped Frequency Distribution:
Interval Frequency Relative Freq Relative % Cumulative Freq Cumulative %
40-49 2 0.040 4.0 2 4.0
50-59 7 0.140 14.0 9 18.0
60-69 14 0.280 28.0 23 46.0
70-79 16 0.320 32.0 39 78.0
80-89 10 0.200 20.0 49 98.0
90-99 1 0.020 2.0 50 100.0
Relative Frequency
Divides each frequency by total n. Useful for comparing distributions across datasets of different sizes.
Cumulative Frequency
Running total of frequencies. Answers: "What fraction of observations fall at or below this value?"
# Cumulative frequency chart (Ogive)
fig, axes = plt.subplots(1, 3, figsize=(14, 4))
# Histogram
axes[0].hist(scores, bins=list(bins), edgecolor='black', color='steelblue', alpha=0.7)
axes[0].set_title('Histogram\n(Absolute Frequency)')
axes[0].set_xlabel('Score')
axes[0].set_ylabel('Frequency')
# Relative frequency histogram
axes[1].hist(scores, bins=list(bins), density=True, edgecolor='black', color='coral', alpha=0.7)
axes[1].set_title('Relative Frequency Histogram')
axes[1].set_xlabel('Score')
axes[1].set_ylabel('Density')
# Cumulative frequency (Ogive)
sorted_scores = np.sort(scores)
cumulative = np.arange(1, n+1) / n
axes[2].plot(sorted_scores, cumulative, 'b-', linewidth=2)
axes[2].set_title('Ogive (Cumulative Freq)')
axes[2].set_xlabel('Score')
axes[2].set_ylabel('Cumulative Proportion')
axes[2].axhline(0.5, color='red', linestyle='--', label='Median')
axes[2].legend()
plt.tight_layout()
plt.savefig('frequency_distributions.png', dpi=150)
plt.show()
Key Takeaways
- Frequency distributions transform raw data into interpretable summaries
- Absolute frequency = counts; Relative frequency = proportions; Cumulative = running total
- Grouped distributions are needed for continuous data — bin width matters for interpretation
- Sturges' rule (k = 1 + 3.322 log₁₀n) is a starting point for number of bins
- The ogive (cumulative frequency curve) allows you to read off percentiles
- Different bin widths reveal different features — always try several widths