Why Data Storytelling Matters
The best analysis in the world is worthless if you cannot communicate it. Data storytelling translates numbers into narratives that drive decisions.
Architecture Diagram
What You Built What They See What They Do
+----------------+ +----------------+ +----------------+
| Complex model | ------> | "So what?" | -------> | Nothing |
| 95% accuracy | | Confused | | Ignore |
| 20 features | | Tune out | | Deprioritize |
+----------------+ +----------------+ +----------------+
| | |
Technical depth Communication gap No action taken
"The goal is to turn data into information, and information into insight." — Carly Fiorina
The Three Pillars of Data Storytelling
Architecture Diagram
DATA VISUAL NARRATIVE
+--------------+ +--------------+ +--------------+
| The facts | | The chart | | The "why" |
| The numbers | | The design | | The story |
| The evidence | | The colors | | The insight |
+------+-------+ +------+-------+ +------+-------+
| | |
+----------+-------------+------------------------+
|
v
Decision-Driving Insight
Without Data: Pretty picture, no substance
Without Visual: Spreadsheet nobody reads
Without Narrative: Data dump, no action
All Three: Decision-driving insight
The SOAR Framework
Structure every data presentation using Situation, Obstacle, Analysis, and Result:
Architecture Diagram
SITUATION ---------> OBSTACLE ---------> ANALYSIS ---------> RESULT
(Context) (Problem) (Discovery) (Action)
| | | |
"We're losing "Onboarding has "Step 4 causes "Reduce to 3
15% of users" 7 steps" 62% of drops" steps"
| Step | Purpose | Example |
|---|---|---|
| Situation | Set the context | "We're losing 15% of users after signup" |
| Obstacle | Define the problem | "Our onboarding flow has 7 steps" |
| Analysis | Show what you found | "Users drop off at step 4 (payment)" |
| Result | Give a recommendation | "Reduce to 3 steps, expect 20% lift" |
Narrative Arc Structure
Architecture Diagram
Opening ---------> Problem ---------> Evidence ---------> Insight ---------> Action ---------> Resolution
| | | | | |
Context Tension Discovery Finding Recommendation Follow-up
Visualization Best Practices
Chart Selection Guide
Architecture Diagram
COMPARISON:
Bar chart -> Compare categories
Grouped bars -> Compare across groups
Lollipop -> Clean comparison with many categories
Heatmap -> Two categorical dimensions
TREND:
Line chart -> Show change over time
Area chart -> Emphasize volume over time
Sparkline -> Quick trend in context
DISTRIBUTION:
Histogram -> Show frequency distribution
Box plot -> Compare distributions across groups
Violin plot -> Show distribution shape
RELATIONSHIP:
Scatter -> Show correlation between 2 variables
Bubble -> Add 3rd dimension (size)
Heatmap -> Show correlation matrix
PART-TO-WHOLE:
Stacked bar -> Show composition across categories
Treemap -> Hierarchical proportions
Waterfall -> Show cumulative effect
GEOGRAPHIC:
Choropleth -> Show values by region
Bubble map -> Show magnitude at locations
Visualization Do's and Don'ts
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import pandas as pd
# BAD: Too much clutter
fig, axes = plt.subplots(1, 2, figsize=(14, 5))
np.random.seed(42)
categories = ['Q1', 'Q2', 'Q3', 'Q4', 'Q1', 'Q2', 'Q3', 'Q4']
years = ['2023']*4 + ['2024']*4
values = [45, 52, 48, 61, 49, 58, 55, 72]
colors = ['#ff6b6b', '#4ecdc4', '#45b7d1', '#96ceb4',
'#ff6b6b', '#4ecdc4', '#45b7d1', '#96ceb4']
axes[0].bar(categories, values, color=colors, edgecolor='black',
linewidth=2, hatch='//')
axes[0].set_title('Quarterly Performance (BAD)', fontsize=14, fontweight='bold', color='darkblue')
axes[0].grid(True, linestyle='--', alpha=0.7)
axes[0].set_facecolor('#f0f0f0')
# After: Clean
df = pd.DataFrame({
'Quarter': ['Q1', 'Q2', 'Q3', 'Q4'] * 2,
'Year': ['2023']*4 + ['2024']*4,
'Revenue': [45, 52, 48, 61, 49, 58, 55, 72]
})
for year, group in df.groupby('Year'):
marker = 'o' if year == '2024' else 's'
linestyle = '-' if year == '2024' else '--'
axes[1].plot(group['Quarter'], group['Revenue'],
marker=marker, linewidth=2.5, markersize=8,
label=year, linestyle=linestyle)
axes[1].set_title('Revenue by Quarter', fontsize=14, fontweight='bold')
axes[1].set_ylabel('Revenue ($K)', fontsize=12)
axes[1].legend(title='Year', frameon=True)
axes[1].spines['top'].set_visible(False)
axes[1].spines['right'].set_visible(False)
axes[1].grid(axis='y', alpha=0.3)
plt.tight_layout()
plt.show()
Color and Accessibility
# Color-blind friendly palettes
COLORBLIND_PALETTE = ['#0072B2', '#E69F00', '#009E73',
'#F0E442', '#56B4E9', '#D55E00', '#CC79A7']
def create_accessible_chart(df, x, y, hue=None):
fig, ax = plt.subplots(figsize=(10, 6))
if hue:
for i, (name, group) in enumerate(df.groupby(hue)):
color = COLORBLIND_PALETTE[i % len(COLORBLIND_PALETTE)]
ax.scatter(group[x], group[y], label=name, color=color,
s=100, edgecolors='white', linewidth=0.5)
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.set_xlabel(x.replace('_', ' ').title(), fontsize=12)
ax.set_ylabel(y.replace('_', ' ').title(), fontsize=12)
return fig, ax
The Pyramid Principle
Start with the answer, then provide supporting evidence:
Architecture Diagram
+-------------------------------------+
| RECOMMENDATION |
| "Reduce onboarding to 3 steps" |
+------------------+-------------------+
|
+-------------------+-------------------+
| | |
+---v---+ +---v---+ +---v---+
|Evidence| |Evidence| |Evidence|
| 1 | | 2 | | 3 |
|"40% | |"Step 4 | |"A/B |
| drop | | is the | | test |
| off" | | cliff" | | data" |
+-------+ +-------+ +-------+
Stakeholder Communication
Know Your Audience
| Audience | Focus | Format | Detail Level |
|---|---|---|---|
| C-Suite | ROI, strategy | 1-pager | Minimal |
| Product Managers | User metrics | Dashboard | Medium |
| Engineers | Implementation | Technical docs | High |
| Marketing | Campaign perf | Visual summary | Medium |
| Finance | Revenue/cost | Tables | Precise |
Email Communication Template
def create_analysis_email(
finding: str,
recommendation: str,
evidence: list[str],
metrics: dict,
next_steps: list[str]
) -> str:
email = f"""Hi Team,
**Key Finding:**
{finding}
**Recommendation:**
{recommendation}
**Supporting Evidence:**
"""
for i, e in enumerate(evidence, 1):
email += f"{i}. {e}\n"
email += "\n**Key Metrics:**\n"
for metric, value in metrics.items():
email += f"- {metric}: {value}\n"
email += "\n**Next Steps:**\n"
for step in next_steps:
email += f"- [ ] {step}\n"
email += "\nHappy to discuss further."
return email
# Example
email = create_analysis_email(
finding="Mobile conversion rate dropped 23% after the latest app update.",
recommendation="Roll back the checkout flow changes in the next release.",
evidence=[
"Conversion dropped from 4.2% to 3.2% within 48 hours",
"User session recordings show confusion at the new payment screen",
"Mobile revenue is down $45K week-over-week"
],
metrics={
"Mobile Conversion Rate": "3.2% (was 4.2%)",
"Revenue Impact": "-$45K/week",
"Affected Users": "~12,000/day"
},
next_steps=[
"Schedule emergency rollback for tonight",
"Prepare A/B test for alternative checkout flow",
"Post-mortem scheduled for Friday"
]
)
print(email)
Presentation Skills
The 10-20-30 Rule (Guy Kawasaki)
10-20-30 Rule
- 10 slides maximum
- 20 minutes maximum
- 30pt font minimum
Slide Structure Template
Architecture Diagram
Slide 1: Title + One-Line Recommendation
"Reducing Churn by 15% — Data Science Team | Q4 2024"
Slide 2: The Problem (1 slide)
"We're losing $2M/month to churn"
+ Simple chart showing the trend
Slide 3-4: What We Found (2-3 slides)
Key insight with supporting visualization
Slide 5: The Solution (1-2 slides)
Our recommendation with expected impact
Slide 6: Next Steps (1 slide)
Action items with owners and timeline
Live Presentation Checklist
presentation_checklist = {
"Before": [
"Test all links and code demos",
"Prepare backup screenshots",
"Know your audience (technical vs business)",
"Prepare for tough questions",
"Have one-page summary ready"
],
"During": [
"Start with the answer, not the journey",
"Use concrete numbers, not vague claims",
"Pause after key points",
"Watch for confused faces",
"Stay within time limit"
],
"After": [
"Send follow-up with key slides",
"Include actionable next steps",
"Provide data/code for technical audience",
"Schedule follow-up if needed"
]
}
Real-World Communication Scenarios
Scenario: Explaining Model Performance to Executives
# BAD: Technical explanation
"""
Our XGBoost model achieved an AUC-ROC of 0.847 on the test set,
with a precision-recall tradeoff at threshold 0.35, and the SHAP
values indicate feature importance is dominated by recency_score..."
"""
# GOOD: Business language
def executive_summary():
return """
What we built:
A system that predicts which customers are about to leave.
How well it works:
Out of 100 customers we flag, 73 actually leave (73% precision).
We catch 81% of all customers who leave (81% recall).
Business impact:
If we intervene with our top 1,000 at-risk customers:
- Expected saves: ~250 customers
- Revenue retained: ~$1.2M annually
- Cost of intervention: ~$50K (discounts + outreach)
- Net benefit: ~$1.15M
Recommendation:
Pilot the retention program with the top 500 customers next month.
"""
Scenario: Data-Driven Product Recommendation
def product_recommendation_deck():
slides = [
{
"title": "The Opportunity",
"content": "Users who complete onboarding within 24 hours have 3x higher LTV",
"visual": "bar_chart: onboarding_speed_vs_ltv"
},
{
"title": "Current State",
"content": "Only 34% of users complete onboarding within 24 hours",
"visual": "funnel_chart: onboarding_completion"
},
{
"title": "Root Cause",
"content": "Step 3 (payment setup) causes 62% of drop-offs",
"visual": "waterfall_chart: dropoff_by_step"
},
{
"title": "Our Recommendation",
"content": "Move payment setup to post-trial, after users experience value",
"visual": "before_after: conversion_comparison"
},
{
"title": "Expected Impact",
"content": "+18% onboarding completion -> +$3.2M annual revenue",
"visual": "projection_chart: revenue_impact"
}
]
return slides
Key Takeaways
Summary: Data Storytelling
- Always start with the answer, not the analysis journey — busy stakeholders need the key message first
- Match your communication style to your audience — C-suite wants ROI, engineers want implementation details
- Use the SOAR framework: Situation, Obstacle, Analysis, Result — it provides a clear narrative arc
- Choose visualizations that support your narrative, not decorate it — every chart should answer a question
- Focus on business impact, not technical details — translate AUC into dollars saved
- Practice the 10-20-30 rule for presentations — fewer slides, bigger font, more impact
- Follow up with actionable next steps and data — ensure decisions are made
Practice Exercises
- Build a 1-pager: Take a complex analysis and summarize it on one page for an executive
- Create a chart makeover: Find a bad visualization online and redesign it following best practices
- Practice the elevator pitch: Explain a data project in 60 seconds to a non-technical friend
- Write a stakeholder email: Draft an email communicating a key finding with recommendations
- Build a 5-slide deck: Create a presentation for a product recommendation using the template
- Role-play Q and A: Practice answering tough questions about your analysis