Stem-and-Leaf Plots — Construction and Interpretation

Foundations of StatisticsData VisualizationFree Lesson

Advertisement

Stem-and-Leaf Plots

A stem-and-leaf plot (or stemplot) is a data display that organizes numerical data while retaining the actual values — unlike a histogram which loses individual data points.

Best for: Small datasets (n < 100), when you want to preserve original values, two-group comparisons.


Construction

Rule: Split each data value into a stem (leading digit(s)) and leaf (last digit).

Example: Exam scores: 72, 68, 85, 91, 73, 77, 88, 65, 82, 79, 84, 93, 71, 66

Stem | Leaf
   6 | 5 6 8       (65, 66, 68)
   7 | 1 2 3 7 9   (71, 72, 73, 77, 79)
   8 | 2 4 5 8     (82, 84, 85, 88)
   9 | 1 3         (91, 93)

Key: 7|2 = 72
def stem_and_leaf(data, leaf_unit=1):
    """
    Create a stem-and-leaf display for a list of numbers.
    leaf_unit: the place value of the leaf (1 for ones, 10 for tens, etc.)
    """
    from collections import defaultdict
    
    stems = defaultdict(list)
    for val in sorted(data):
        stem = int(val // (leaf_unit * 10))
        leaf = int((val % (leaf_unit * 10)) // leaf_unit)
        stems[stem].append(leaf)
    
    print(f"Stem-and-Leaf Plot (leaf unit = {leaf_unit})")
    print(f"{'Stem':>5} | Leaves")
    print(f"{'-----':>5}---{'-------'}")
    for stem in sorted(stems.keys()):
        leaves = ' '.join(str(l) for l in sorted(stems[stem]))
        print(f"{stem:>5} | {leaves}")
    print(f"\nKey: stem|leaf = each unit of {leaf_unit * 10}")

scores = [72, 68, 85, 91, 73, 77, 88, 65, 82, 79, 84, 93, 71, 66]
stem_and_leaf(scores)

# You can read off the median directly!
import numpy as np
print(f"\nMedian = {np.median(scores)}")
print(f"Min = {min(scores)}, Max = {max(scores)}")

Back-to-Back Stem-and-Leaf

Compare two groups side by side. Leaves for Group A go left, Group B go right.

def back_to_back_stem(data_a, label_a, data_b, label_b, leaf_unit=1):
    """Back-to-back stem-and-leaf plot."""
    from collections import defaultdict
    
    stems_a = defaultdict(list)
    stems_b = defaultdict(list)
    
    for val in data_a:
        stem = int(val // (leaf_unit * 10))
        leaf = int((val % (leaf_unit * 10)) // leaf_unit)
        stems_a[stem].append(leaf)
    for val in data_b:
        stem = int(val // (leaf_unit * 10))
        leaf = int((val % (leaf_unit * 10)) // leaf_unit)
        stems_b[stem].append(leaf)
    
    all_stems = sorted(set(stems_a.keys()) | set(stems_b.keys()))
    
    print(f"Back-to-Back Stem-and-Leaf: {label_a} vs {label_b}")
    print(f"{'':>15}  Stem  {'':}")
    print(f"{label_a:>15}  |    | {label_b}")
    print("-" * 40)
    
    for stem in all_stems:
        left = ' '.join(str(l) for l in sorted(stems_a[stem], reverse=True))
        right = ' '.join(str(l) for l in sorted(stems_b[stem]))
        print(f"{left:>15}  | {stem:2d} | {right}")

import numpy as np
np.random.seed(0)
class_a = sorted(np.random.normal(74, 8, 20).clip(50, 100).round(0).astype(int))
class_b = sorted(np.random.normal(81, 7, 20).clip(50, 100).round(0).astype(int))

back_to_back_stem(class_a, "Class A", class_b, "Class B")

print(f"\nClass A: median={np.median(class_a):.1f}, mean={np.mean(class_a):.1f}")
print(f"Class B: median={np.median(class_b):.1f}, mean={np.mean(class_b):.1f}")

Stem-and-Leaf vs Histogram

FeatureStem-and-LeafHistogram
Retains raw values✅ Yes❌ No
Works for large n❌ Unwieldy✅ Yes
Side-by-side comparison✅ Back-to-back✅ Overlaid
Shape visible✅ Yes✅ Yes
Median/quartiles readable✅ Yes❌ Not directly

Key Takeaways

  1. Stem-and-leaf plots preserve original data values — unlike histograms
  2. They're best for small datasets (n < 100) where individual values matter
  3. Back-to-back stemplots are excellent for comparing two distributions
  4. You can read the median directly by counting to the middle value
  5. Shape is visible — you can see skewness, gaps, and outliers immediately
  6. For large datasets, use histograms or KDE plots instead

Advertisement

Need Expert Statistics Help?

Get personalized tutoring, dissertation support, or statistical consulting.

Advertisement