Stem-and-Leaf Plots
A stem-and-leaf plot (or stemplot) is a data display that organizes numerical data while retaining the actual values — unlike a histogram which loses individual data points.
Best for: Small datasets (n < 100), when you want to preserve original values, two-group comparisons.
Construction
Rule: Split each data value into a stem (leading digit(s)) and leaf (last digit).
Example: Exam scores: 72, 68, 85, 91, 73, 77, 88, 65, 82, 79, 84, 93, 71, 66
Stem | Leaf
6 | 5 6 8 (65, 66, 68)
7 | 1 2 3 7 9 (71, 72, 73, 77, 79)
8 | 2 4 5 8 (82, 84, 85, 88)
9 | 1 3 (91, 93)
Key: 7|2 = 72
def stem_and_leaf(data, leaf_unit=1):
"""
Create a stem-and-leaf display for a list of numbers.
leaf_unit: the place value of the leaf (1 for ones, 10 for tens, etc.)
"""
from collections import defaultdict
stems = defaultdict(list)
for val in sorted(data):
stem = int(val // (leaf_unit * 10))
leaf = int((val % (leaf_unit * 10)) // leaf_unit)
stems[stem].append(leaf)
print(f"Stem-and-Leaf Plot (leaf unit = {leaf_unit})")
print(f"{'Stem':>5} | Leaves")
print(f"{'-----':>5}---{'-------'}")
for stem in sorted(stems.keys()):
leaves = ' '.join(str(l) for l in sorted(stems[stem]))
print(f"{stem:>5} | {leaves}")
print(f"\nKey: stem|leaf = each unit of {leaf_unit * 10}")
scores = [72, 68, 85, 91, 73, 77, 88, 65, 82, 79, 84, 93, 71, 66]
stem_and_leaf(scores)
# You can read off the median directly!
import numpy as np
print(f"\nMedian = {np.median(scores)}")
print(f"Min = {min(scores)}, Max = {max(scores)}")
Back-to-Back Stem-and-Leaf
Compare two groups side by side. Leaves for Group A go left, Group B go right.
def back_to_back_stem(data_a, label_a, data_b, label_b, leaf_unit=1):
"""Back-to-back stem-and-leaf plot."""
from collections import defaultdict
stems_a = defaultdict(list)
stems_b = defaultdict(list)
for val in data_a:
stem = int(val // (leaf_unit * 10))
leaf = int((val % (leaf_unit * 10)) // leaf_unit)
stems_a[stem].append(leaf)
for val in data_b:
stem = int(val // (leaf_unit * 10))
leaf = int((val % (leaf_unit * 10)) // leaf_unit)
stems_b[stem].append(leaf)
all_stems = sorted(set(stems_a.keys()) | set(stems_b.keys()))
print(f"Back-to-Back Stem-and-Leaf: {label_a} vs {label_b}")
print(f"{'':>15} Stem {'':}")
print(f"{label_a:>15} | | {label_b}")
print("-" * 40)
for stem in all_stems:
left = ' '.join(str(l) for l in sorted(stems_a[stem], reverse=True))
right = ' '.join(str(l) for l in sorted(stems_b[stem]))
print(f"{left:>15} | {stem:2d} | {right}")
import numpy as np
np.random.seed(0)
class_a = sorted(np.random.normal(74, 8, 20).clip(50, 100).round(0).astype(int))
class_b = sorted(np.random.normal(81, 7, 20).clip(50, 100).round(0).astype(int))
back_to_back_stem(class_a, "Class A", class_b, "Class B")
print(f"\nClass A: median={np.median(class_a):.1f}, mean={np.mean(class_a):.1f}")
print(f"Class B: median={np.median(class_b):.1f}, mean={np.mean(class_b):.1f}")
Stem-and-Leaf vs Histogram
| Feature | Stem-and-Leaf | Histogram |
|---|---|---|
| Retains raw values | ✅ Yes | ❌ No |
| Works for large n | ❌ Unwieldy | ✅ Yes |
| Side-by-side comparison | ✅ Back-to-back | ✅ Overlaid |
| Shape visible | ✅ Yes | ✅ Yes |
| Median/quartiles readable | ✅ Yes | ❌ Not directly |
Key Takeaways
- Stem-and-leaf plots preserve original data values — unlike histograms
- They're best for small datasets (n < 100) where individual values matter
- Back-to-back stemplots are excellent for comparing two distributions
- You can read the median directly by counting to the middle value
- Shape is visible — you can see skewness, gaps, and outliers immediately
- For large datasets, use histograms or KDE plots instead