CW

Data Storytelling and Communication

Module 3: VisualizationFree Lesson

Advertisement

Why Data Storytelling Matters

The best analysis in the world is worthless if you cannot communicate it. Data storytelling translates numbers into narratives that drive decisions.

Architecture Diagram
  What You Built                What They See              What They Do
  +----------------+           +----------------+          +----------------+
  | Complex model  |  ------>  | "So what?"     | -------> | Nothing        |
  | 95% accuracy   |           | Confused       |          | Ignore         |
  | 20 features    |           | Tune out       |          | Deprioritize   |
  +----------------+           +----------------+          +----------------+
         |                            |                           |
    Technical depth            Communication gap           No action taken

"The goal is to turn data into information, and information into insight." — Carly Fiorina

The Three Pillars of Data Storytelling

Architecture Diagram
              DATA                    VISUAL                 NARRATIVE
        +--------------+         +--------------+         +--------------+
        | The facts    |         | The chart    |         | The "why"    |
        | The numbers  |         | The design   |         | The story    |
        | The evidence |         | The colors   |         | The insight  |
        +------+-------+         +------+-------+         +------+-------+
               |                        |                        |
               +----------+-------------+------------------------+
                          |
                          v
               Decision-Driving Insight

  Without Data:    Pretty picture, no substance
  Without Visual:  Spreadsheet nobody reads
  Without Narrative: Data dump, no action
  All Three:       Decision-driving insight

The SOAR Framework

Structure every data presentation using Situation, Obstacle, Analysis, and Result:

Architecture Diagram
  SITUATION  ---------> OBSTACLE  ---------> ANALYSIS  ---------> RESULT
  (Context)             (Problem)            (Discovery)          (Action)
       |                    |                    |                    |
  "We're losing       "Onboarding has      "Step 4 causes       "Reduce to 3
   15% of users"       7 steps"             62% of drops"        steps"
StepPurposeExample
SituationSet the context"We're losing 15% of users after signup"
ObstacleDefine the problem"Our onboarding flow has 7 steps"
AnalysisShow what you found"Users drop off at step 4 (payment)"
ResultGive a recommendation"Reduce to 3 steps, expect 20% lift"

Narrative Arc Structure

Architecture Diagram
  Opening ---------> Problem ---------> Evidence ---------> Insight ---------> Action ---------> Resolution
     |                  |                  |                  |                  |                  |
  Context           Tension            Discovery          Finding           Recommendation    Follow-up

Visualization Best Practices

Chart Selection Guide

Architecture Diagram
COMPARISON:
  Bar chart     -> Compare categories
  Grouped bars  -> Compare across groups
  Lollipop      -> Clean comparison with many categories
  Heatmap       -> Two categorical dimensions

TREND:
  Line chart    -> Show change over time
  Area chart    -> Emphasize volume over time
  Sparkline     -> Quick trend in context

DISTRIBUTION:
  Histogram     -> Show frequency distribution
  Box plot      -> Compare distributions across groups
  Violin plot   -> Show distribution shape

RELATIONSHIP:
  Scatter       -> Show correlation between 2 variables
  Bubble        -> Add 3rd dimension (size)
  Heatmap       -> Show correlation matrix

PART-TO-WHOLE:
  Stacked bar   -> Show composition across categories
  Treemap       -> Hierarchical proportions
  Waterfall     -> Show cumulative effect

GEOGRAPHIC:
  Choropleth    -> Show values by region
  Bubble map    -> Show magnitude at locations

Visualization Do's and Don'ts

import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import pandas as pd

# BAD: Too much clutter
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

np.random.seed(42)
categories = ['Q1', 'Q2', 'Q3', 'Q4', 'Q1', 'Q2', 'Q3', 'Q4']
years = ['2023']*4 + ['2024']*4
values = [45, 52, 48, 61, 49, 58, 55, 72]
colors = ['#ff6b6b', '#4ecdc4', '#45b7d1', '#96ceb4',
          '#ff6b6b', '#4ecdc4', '#45b7d1', '#96ceb4']

axes[0].bar(categories, values, color=colors, edgecolor='black',
            linewidth=2, hatch='//')
axes[0].set_title('Quarterly Performance (BAD)', fontsize=14, fontweight='bold', color='darkblue')
axes[0].grid(True, linestyle='--', alpha=0.7)
axes[0].set_facecolor('#f0f0f0')

# After: Clean
df = pd.DataFrame({
    'Quarter': ['Q1', 'Q2', 'Q3', 'Q4'] * 2,
    'Year': ['2023']*4 + ['2024']*4,
    'Revenue': [45, 52, 48, 61, 49, 58, 55, 72]
})

for year, group in df.groupby('Year'):
    marker = 'o' if year == '2024' else 's'
    linestyle = '-' if year == '2024' else '--'
    axes[1].plot(group['Quarter'], group['Revenue'],
                 marker=marker, linewidth=2.5, markersize=8,
                 label=year, linestyle=linestyle)

axes[1].set_title('Revenue by Quarter', fontsize=14, fontweight='bold')
axes[1].set_ylabel('Revenue ($K)', fontsize=12)
axes[1].legend(title='Year', frameon=True)
axes[1].spines['top'].set_visible(False)
axes[1].spines['right'].set_visible(False)
axes[1].grid(axis='y', alpha=0.3)

plt.tight_layout()
plt.show()

Color and Accessibility

# Color-blind friendly palettes
COLORBLIND_PALETTE = ['#0072B2', '#E69F00', '#009E73',
                       '#F0E442', '#56B4E9', '#D55E00', '#CC79A7']

def create_accessible_chart(df, x, y, hue=None):
    fig, ax = plt.subplots(figsize=(10, 6))
    if hue:
        for i, (name, group) in enumerate(df.groupby(hue)):
            color = COLORBLIND_PALETTE[i % len(COLORBLIND_PALETTE)]
            ax.scatter(group[x], group[y], label=name, color=color,
                      s=100, edgecolors='white', linewidth=0.5)
    ax.spines['top'].set_visible(False)
    ax.spines['right'].set_visible(False)
    ax.set_xlabel(x.replace('_', ' ').title(), fontsize=12)
    ax.set_ylabel(y.replace('_', ' ').title(), fontsize=12)
    return fig, ax

The Pyramid Principle

Start with the answer, then provide supporting evidence:

Architecture Diagram
                    +-------------------------------------+
                    |         RECOMMENDATION               |
                    |  "Reduce onboarding to 3 steps"      |
                    +------------------+-------------------+
                                       |
                   +-------------------+-------------------+
                   |                   |                   |
               +---v---+           +---v---+           +---v---+
               |Evidence|          |Evidence|          |Evidence|
               |   1    |          |   2    |          |   3    |
               |"40%   |          |"Step 4 |          |"A/B   |
               | drop  |          | is the |          | test  |
               | off"  |          | cliff" |          | data" |
               +-------+          +-------+          +-------+

Stakeholder Communication

Know Your Audience

AudienceFocusFormatDetail Level
C-SuiteROI, strategy1-pagerMinimal
Product ManagersUser metricsDashboardMedium
EngineersImplementationTechnical docsHigh
MarketingCampaign perfVisual summaryMedium
FinanceRevenue/costTablesPrecise

Email Communication Template

def create_analysis_email(
    finding: str,
    recommendation: str,
    evidence: list[str],
    metrics: dict,
    next_steps: list[str]
) -> str:
    email = f"""Hi Team,

**Key Finding:**
{finding}

**Recommendation:**
{recommendation}

**Supporting Evidence:**
"""
    for i, e in enumerate(evidence, 1):
        email += f"{i}. {e}\n"

    email += "\n**Key Metrics:**\n"
    for metric, value in metrics.items():
        email += f"- {metric}: {value}\n"

    email += "\n**Next Steps:**\n"
    for step in next_steps:
        email += f"- [ ] {step}\n"

    email += "\nHappy to discuss further."
    return email

# Example
email = create_analysis_email(
    finding="Mobile conversion rate dropped 23% after the latest app update.",
    recommendation="Roll back the checkout flow changes in the next release.",
    evidence=[
        "Conversion dropped from 4.2% to 3.2% within 48 hours",
        "User session recordings show confusion at the new payment screen",
        "Mobile revenue is down $45K week-over-week"
    ],
    metrics={
        "Mobile Conversion Rate": "3.2% (was 4.2%)",
        "Revenue Impact": "-$45K/week",
        "Affected Users": "~12,000/day"
    },
    next_steps=[
        "Schedule emergency rollback for tonight",
        "Prepare A/B test for alternative checkout flow",
        "Post-mortem scheduled for Friday"
    ]
)
print(email)

Presentation Skills

The 10-20-30 Rule (Guy Kawasaki)

10-20-30 Rule

  • 10 slides maximum
  • 20 minutes maximum
  • 30pt font minimum

Slide Structure Template

Architecture Diagram
Slide 1: Title + One-Line Recommendation
  "Reducing Churn by 15% — Data Science Team | Q4 2024"

Slide 2: The Problem (1 slide)
  "We're losing $2M/month to churn"
  + Simple chart showing the trend

Slide 3-4: What We Found (2-3 slides)
  Key insight with supporting visualization

Slide 5: The Solution (1-2 slides)
  Our recommendation with expected impact

Slide 6: Next Steps (1 slide)
  Action items with owners and timeline

Live Presentation Checklist

presentation_checklist = {
    "Before": [
        "Test all links and code demos",
        "Prepare backup screenshots",
        "Know your audience (technical vs business)",
        "Prepare for tough questions",
        "Have one-page summary ready"
    ],
    "During": [
        "Start with the answer, not the journey",
        "Use concrete numbers, not vague claims",
        "Pause after key points",
        "Watch for confused faces",
        "Stay within time limit"
    ],
    "After": [
        "Send follow-up with key slides",
        "Include actionable next steps",
        "Provide data/code for technical audience",
        "Schedule follow-up if needed"
    ]
}

Real-World Communication Scenarios

Scenario: Explaining Model Performance to Executives

# BAD: Technical explanation
"""
Our XGBoost model achieved an AUC-ROC of 0.847 on the test set,
with a precision-recall tradeoff at threshold 0.35, and the SHAP
values indicate feature importance is dominated by recency_score..."
"""

# GOOD: Business language
def executive_summary():
    return """
    What we built:
    A system that predicts which customers are about to leave.

    How well it works:
    Out of 100 customers we flag, 73 actually leave (73% precision).
    We catch 81% of all customers who leave (81% recall).

    Business impact:
    If we intervene with our top 1,000 at-risk customers:
    - Expected saves: ~250 customers
    - Revenue retained: ~$1.2M annually
    - Cost of intervention: ~$50K (discounts + outreach)
    - Net benefit: ~$1.15M

    Recommendation:
    Pilot the retention program with the top 500 customers next month.
    """

Scenario: Data-Driven Product Recommendation

def product_recommendation_deck():
    slides = [
        {
            "title": "The Opportunity",
            "content": "Users who complete onboarding within 24 hours have 3x higher LTV",
            "visual": "bar_chart: onboarding_speed_vs_ltv"
        },
        {
            "title": "Current State",
            "content": "Only 34% of users complete onboarding within 24 hours",
            "visual": "funnel_chart: onboarding_completion"
        },
        {
            "title": "Root Cause",
            "content": "Step 3 (payment setup) causes 62% of drop-offs",
            "visual": "waterfall_chart: dropoff_by_step"
        },
        {
            "title": "Our Recommendation",
            "content": "Move payment setup to post-trial, after users experience value",
            "visual": "before_after: conversion_comparison"
        },
        {
            "title": "Expected Impact",
            "content": "+18% onboarding completion -> +$3.2M annual revenue",
            "visual": "projection_chart: revenue_impact"
        }
    ]
    return slides

Key Takeaways

Summary: Data Storytelling

  1. Always start with the answer, not the analysis journey — busy stakeholders need the key message first
  2. Match your communication style to your audience — C-suite wants ROI, engineers want implementation details
  3. Use the SOAR framework: Situation, Obstacle, Analysis, Result — it provides a clear narrative arc
  4. Choose visualizations that support your narrative, not decorate it — every chart should answer a question
  5. Focus on business impact, not technical details — translate AUC into dollars saved
  6. Practice the 10-20-30 rule for presentations — fewer slides, bigger font, more impact
  7. Follow up with actionable next steps and data — ensure decisions are made

Practice Exercises

  1. Build a 1-pager: Take a complex analysis and summarize it on one page for an executive
  2. Create a chart makeover: Find a bad visualization online and redesign it following best practices
  3. Practice the elevator pitch: Explain a data project in 60 seconds to a non-technical friend
  4. Write a stakeholder email: Draft an email communicating a key finding with recommendations
  5. Build a 5-slide deck: Create a presentation for a product recommendation using the template
  6. Role-play Q and A: Practice answering tough questions about your analysis

Advertisement

Need Expert Data Science Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement