Frequency Distributions

Descriptive Statistics

From Raw Numbers to Understandable Patterns

A frequency distribution organizes raw data into a table or chart showing how often each value occurs. It transforms an unreadable list of numbers into an interpretable summary.

Absolute frequency — Count observations in each category to see the raw totals
Relative frequency — Convert counts to proportions for comparison across datasets
Cumulative frequency — Track running totals to answer "at or below" questions
Grouped distributions — Handle continuous data by binning into intervals

Before you calculate any statistic, organize your data. Frequency distributions are the first step to understanding.

What is a Frequency Distribution?

Definition

A frequency distribution organizes raw data into a table or chart showing how often each value (or range of values) occurs. It transforms an unreadable list of numbers into an interpretable summary.

Absolute Frequency

The count of observations in each category or interval.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Example: Final exam scores of 50 students
np.random.seed(42)
scores = np.random.normal(72, 12, 50).clip(0, 100).round(0).astype(int)

# --- Categorical frequency table ---
from collections import Counter
freq = Counter(scores)
freq_df = pd.DataFrame({'Score': sorted(freq.keys()),
                         'Frequency': [freq[k] for k in sorted(freq.keys())]})
print("First 10 rows of frequency table:")
print(freq_df.head(10))

Grouped Frequency Distribution

When data is continuous or has many unique values, we group into class intervals (bins).

Steps to build:

Find range = max − min
Choose number of classes (typically 5–20; Sturges' rule: k = 1 + 3.322 × log₁₀(n))
Class width = Range / k (round up)
Create non-overlapping, equal-width intervals
Count observations in each interval

# Sturges' rule for number of bins
n = len(scores)
k_sturges = int(np.ceil(1 + 3.322 * np.log10(n)))
print(f"Sturges' rule: k = {k_sturges} bins")

# Build grouped frequency table
min_score, max_score = scores.min(), scores.max()
bin_width = int(np.ceil((max_score - min_score) / k_sturges / 10) * 10)
bins = range(40, 105, 10)  # [40,50), [50,60), ...

labels = [f"{b}-{b+9}" for b in bins[:-1]]
score_series = pd.Series(scores)
grouped = pd.cut(score_series, bins=list(bins), right=False, labels=labels)

freq_table = (grouped.value_counts(sort=False)
              .reset_index()
              .rename(columns={'index': 'Interval', 'count': 'Frequency'}))
freq_table.columns = ['Interval', 'Frequency']

# Add relative and cumulative frequency
freq_table['Relative Freq'] = freq_table['Frequency'] / n
freq_table['Relative %'] = (freq_table['Relative Freq'] * 100).round(1)
freq_table['Cumulative Freq'] = freq_table['Frequency'].cumsum()
freq_table['Cumulative %'] = (freq_table['Cumulative Freq'] / n * 100).round(1)

print("\nGrouped Frequency Distribution:")
print(freq_table.to_string(index=False))

Output:

Architecture Diagram

Grouped Frequency Distribution:
 Interval  Frequency  Relative Freq  Relative %  Cumulative Freq  Cumulative %
    40-49          2          0.040         4.0                2           4.0
    50-59          7          0.140        14.0                9          18.0
    60-69         14          0.280        28.0               23          46.0
    70-79         16          0.320        32.0               39          78.0
    80-89         10          0.200        20.0               49          98.0
    90-99          1          0.020         2.0               50         100.0

Relative Frequency

Divides each frequency by total n. Useful for comparing distributions across datasets of different sizes.

Cumulative Frequency

Running total of frequencies. Answers: "What fraction of observations fall at or below this value?"

# Cumulative frequency chart (Ogive)
fig, axes = plt.subplots(1, 3, figsize=(14, 4))

# Histogram
axes[0].hist(scores, bins=list(bins), edgecolor='black', color='steelblue', alpha=0.7)
axes[0].set_title('Histogram\n(Absolute Frequency)')
axes[0].set_xlabel('Score')
axes[0].set_ylabel('Frequency')

# Relative frequency histogram
axes[1].hist(scores, bins=list(bins), density=True, edgecolor='black', color='coral', alpha=0.7)
axes[1].set_title('Relative Frequency Histogram')
axes[1].set_xlabel('Score')
axes[1].set_ylabel('Density')

# Cumulative frequency (Ogive)
sorted_scores = np.sort(scores)
cumulative = np.arange(1, n+1) / n
axes[2].plot(sorted_scores, cumulative, 'b-', linewidth=2)
axes[2].set_title('Ogive (Cumulative Freq)')
axes[2].set_xlabel('Score')
axes[2].set_ylabel('Cumulative Proportion')
axes[2].axhline(0.5, color='red', linestyle='--', label='Median')
axes[2].legend()

plt.tight_layout()
plt.savefig('frequency_distributions.png', dpi=150)
plt.show()