Advanced Visualization: Plotly & Interactive Charts
Learning Objectives
By the end of this tutorial, you will be able to:
- Distinguish when static vs interactive visualizations are appropriate
- Build publication-quality interactive charts with Plotly Express
- Construct complex multi-panel layouts with subplots and insets
- Create statistical visualizations including marginal distributions and correlation heatmaps
- Render geographical data on interactive maps
- Animate temporal data with transitions and animation frames
- Build dashboard-ready charts with updatemenus, sliders, and buttons
- Customize templates, colorscales, and themes for consistency
- Export visualizations to PNG, SVG, and standalone HTML
- Select the right library (matplotlib vs seaborn vs plotly) for any scenario
1. Static vs Interactive: When to Use Each
The Fundamental Tradeoff
The choice between static and interactive visualization is not aestheticβit is a cognitive design decision that affects how viewers process information.
DfInformation Density
The amount of data-encoded information per unit area of a visualization. Static charts have fixed information density set at design time. Interactive charts have variable information density β the base layer is simple, but hover tooltips, zoom, and filtering allow users to reveal deeper layers on demand, effectively increasing density without increasing visual clutter.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β VISUALIZATION DECISION TREE β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β What is your delivery medium? β
β β β
β βββΊ Print / PDF / Paper βββββββββββΊ STATIC (matplotlib, seaborn) β
β β β
β βββΊ Web page / Dashboard ββββββββββΊ INTERACTIVE (plotly, bokeh) β
β β β
β βββΊ Both needed βββββββββββββββββββΊ STATIC primary + HTML export β
β β
β How many dimensions of data? β
β β β
β βββΊ β€ 3 dimensions βββββββββββββββΊ Static is sufficient β
β β β
β βββΊ β₯ 4 dimensions βββββββββββββββΊ Interactive (hover, filter) β
β β
β Audience expertise? β
β β β
β βββΊ Domain experts βββββββββββββββΊ Static (precise, reproducible) β
β β β
β βββΊ Exploratory / stakeholders βββΊ Interactive (self-serve) β
β β
β Data point count? β
β β β
β βββΊ < 1,000 βββββββββββββββββββββΊ Either works β
β β β
β βββΊ > 10,000 ββββββββββββββββββββΊ Interactive (zoom, pan) β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Formal Comparison
| Property | Static (matplotlib/seaborn) | Interactive (Plotly) |
|---|---|---|
| Output format | PNG, SVG, PDF | HTML, JS bundle |
| File size | Small (KBβMB) | Larger (KBβMB, depends on data) |
| Reproducibility | Pixel-perfect | Viewport-dependent |
| Hover information | Not possible | Native support |
| Zoom / Pan | Not possible | Native support |
| Animation | Requires save as GIF/MP4 | Real-time in browser |
| Server requirement | None (embedded) | None (standalone HTML) |
| Learning curve | Moderate | Steep initially |
| Customization | Infinite (low-level) | Template-based (high-level) |
| Rendering backend | Agg, TkAgg, Cairo | WebGL, SVG, Canvas |
The rendering backend matters: Matplotlib uses CPU-based rasterization (Agg backend), which is deterministic but slow for large figures. Plotly uses WebGL (via GPU acceleration) for scatter and line plots with > 1,000 points, making it significantly faster for large datasets. However, SVG rendering in Plotly can be slower than matplotlib for simple figures.
When Plotly Wins
Plotly excels when you need exploration, presentation to non-technical stakeholders, or embeddable web content. The hover-to-reveal detail pattern reduces cognitive load because viewers only see complexity on demand.
When matplotlib Wins
Matplotlib wins when you need pixel-perfect control, publication-quality output, or reproducible figures for academic journals where every line width and font size must be exact.
2. Plotly Express: The Grammar of Interactive Graphics
Installation and Import
# Install if needed
# pip install plotly pandas numpy
import plotly.express as px
import plotly.graph_objects as go
import plotly.subplots as sp
import pandas as pd
import numpy as np
2.1 Scatter Plots
The scatter plot is the workhorse of exploratory data analysis. Plotly Express adds interactivity automatically.
DfLogarithmic Scale
A nonlinear scale where equal distances represent equal ratios rather than equal differences. On a log scale, the distance from 10 to 100 equals the distance from 100 to 1000. This is essential for visualizing data that spans orders of magnitude (e.g., GDP from 100,000) because it compresses the right tail and reveals structure in the left portion that would otherwise be invisible.
# Load the built-in Gapminder dataset
df = px.data.gapminder()
# Basic scatter: GDP per capita vs life expectancy
fig = px.scatter(
df.query("year == 2007"),
x="gdpPercap",
y="lifeExp",
size="pop",
color="continent",
hover_name="country",
log_x=True,
size_max=60,
title="GDP per Capita vs Life Expectancy (2007)",
labels={
"gdpPercap": "GDP per Capita (USD, log scale)",
"lifeExp": "Life Expectancy (years)",
"pop": "Population",
"continent": "Continent"
}
)
fig.update_layout(
template="plotly_white",
font=dict(family="Arial", size=12),
title_font_size=18
)
fig.show()
Output: An interactive scatter plot where hovering over any bubble reveals the country name, GDP, life expectancy, and population. The bubble size encodes population magnitude.
Key parameters explained:
log_x=Trueβ Applies log transformation to x-axis (essential for GDP data which spans orders of magnitude)size_max=60β Caps maximum bubble diameter to prevent visual dominance of large populationshover_nameβ Specifies which column appears as the bold title in hover tooltips
2.2 Line Charts
# Multi-line time series
df_canada = px.data.gapminder().query("country == 'Canada'")
fig = px.line(
df_canada,
x="year",
y="lifeExp",
title="Life Expectancy in Canada Over Time",
markers=True,
labels={"year": "Year", "lifeExp": "Life Expectancy (years)"},
template="plotly_white"
)
fig.update_traces(
line=dict(color="#2E86AB", width=3),
marker=dict(size=8, symbol="circle")
)
fig.update_layout(
xaxis=dict(dtick=5, gridcolor="lightgray"),
yaxis=dict(range=[65, 85])
)
fig.show()
The markers=True parameter adds data point markers to the line, making individual observations distinguishableβcritical for small time series where interpolation between points can be misleading.
2.3 Bar Charts
# Grouped bar chart
df_tips = px.data.tips()
fig = px.bar(
df_tips,
x="day",
y="total_bill",
color="sex",
barmode="group",
title="Average Total Bill by Day and Gender",
labels={
"day": "Day of Week",
"total_bill": "Total Bill ($)",
"sex": "Sex"
},
text_auto=".2f",
template="plotly_white"
)
fig.update_layout(
legend_title_text="Customer Sex",
font=dict(size=13)
)
fig.show()
The text_auto=".2f" parameter automatically places formatted numeric labels above each bar, eliminating the need for manual annotation.
2.4 Histograms
# Overlaid histogram with density curve
df = px.data.tips()
fig = px.histogram(
df,
x="total_bill",
color="time",
nbins=30,
opacity=0.7,
histnorm="probability density",
marginal="rug",
title="Distribution of Total Bills (Lunch vs Dinner)",
labels={"total_bill": "Total Bill ($)", "time": "Meal Time"},
template="plotly_white"
)
fig.update_layout(
bargap=0.1,
legend_title_text="Meal Time"
)
fig.show()
histnorm="probability density" normalizes the histogram so the area under each bar group sums to 1, enabling comparison between groups of different sizes. marginal="rug" adds rug plots along the x-axis showing individual data points.
2.5 Box Plots
fig = px.box(
df,
x="day",
y="total_bill",
color="day",
points="all",
notched=True,
title="Total Bill Distribution by Day (with All Points)",
labels={"day": "Day of Week", "total_bill": "Total Bill ($)"},
template="plotly_white"
)
fig.update_layout(showlegend=False)
fig.show()
notched=True draws notches whose width is proportional to the confidence interval of the medianβnon-overlapping notches suggest statistically different medians. points="all" overlays all individual data points, revealing distribution shape hidden by the box.
2.6 Violin Plots
DfKernel Density Estimation (KDE)
A non-parametric method for estimating the probability density function of a random variable. Instead of binning data like a histogram, KDE places a smooth kernel function (typically Gaussian) at each data point and sums them to produce a continuous density estimate. The smoothness is controlled by the bandwidth parameter .
fig = px.violin(
df,
x="day",
y="total_bill",
color="day",
box=True,
points="all",
violinmode="group",
title="Violin Plot of Total Bills by Day",
labels={"day": "Day of Week", "total_bill": "Total Bill ($)"},
template="plotly_white"
)
fig.update_layout(showlegend=False)
fig.show()
Violin plots combine the box plot (median, quartiles) with a kernel density estimation (KDE), revealing multimodality that box plots hide.
The KDE estimation:
Kernel Density Estimator
Here,
- =
- =
- =
- =
where is the kernel function (typically Gaussian) and is the bandwidth selected via Silverman's rule:
Bandwidth selection tradeoff: A smaller bandwidth () produces a wiggly, low-bias estimate that captures fine structure but has high variance. A larger bandwidth produces a smooth, low-variance estimate but may obscure real features. Silverman's rule is a reasonable default, but for multimodal data, consider using cross-validation or a smaller bandwidth to avoid oversmoothing.
3. Subplots: Multi-Panel Layouts
3.1 Basic Subplots with make_subplots
from plotly.subplots import make_subplots
fig = make_subplots(
rows=2, cols=2,
subplot_titles=("Scatter", "Bar", "Histogram", "Box"),
horizontal_spacing=0.12,
vertical_spacing=0.15
)
# Panel 1: Scatter
fig.add_trace(
go.Scatter(x=df_tips["total_bill"], y=df_tips["tip"],
mode="markers", marker=dict(size=6, opacity=0.6),
name="Scatter"),
row=1, col=1
)
# Panel 2: Bar
day_means = df_tips.groupby("day")["total_bill"].mean().reset_index()
fig.add_trace(
go.Bar(x=day_means["day"], y=day_means["total_bill"],
name="Bar", marker_color="#2E86AB"),
row=1, col=2
)
# Panel 3: Histogram
fig.add_trace(
go.Histogram(x=df_tips["total_bill"], nbinsx=20,
name="Histogram", marker_color="#A23B72"),
row=2, col=1
)
# Panel 4: Box
for day in df_tips["day"].unique():
day_data = df_tips[df_tips["day"] == day]
fig.add_trace(
go.Box(y=day_data["total_bill"], name=day),
row=2, col=2
)
fig.update_layout(
height=700, width=900,
title_text="Multi-Panel EDA Dashboard",
template="plotly_white",
showlegend=False
)
fig.show()
3.2 Secondary Y-Axes
# Plot two variables with different scales
months = pd.date_range("2024-01-01", periods=12, freq="MS")
revenue = np.random.uniform(50000, 120000, 12)
orders = np.random.poisson(200, 12)
df_multi = pd.DataFrame({
"month": months,
"revenue": revenue,
"orders": orders
})
fig = make_subplots(specs=[[{"secondary_y": True}]])
fig.add_trace(
go.Scatter(
x=df_multi["month"], y=df_multi["revenue"],
name="Revenue ($)",
line=dict(color="#2E86AB", width=3),
mode="lines+markers"
),
secondary_y=False
)
fig.add_trace(
go.Bar(
x=df_multi["month"], y=df_multi["orders"],
name="Orders",
marker_color="rgba(162, 59, 114, 0.5)"
),
secondary_y=True
)
fig.update_yaxes(title_text="Revenue ($)", secondary_y=False)
fig.update_yaxes(title_text="Number of Orders", secondary_y=True)
fig.update_layout(
title="Revenue and Orders Over Time",
template="plotly_white",
height=450
)
fig.show()
Why secondary axes matter: When two metrics have fundamentally different units (dollars vs. counts) or magnitudes (thousands vs. single digits), overlaying them on the same axis renders one invisible. Secondary axes solve this by providing independent scales.
Dual-axis pitfalls: Secondary axes can be misleading if used carelessly. Viewers may interpret a visual intersection as meaningful when it is merely an artifact of axis scaling. Always clearly label both axes, and consider whether a small-multiples layout (separate panels) would be more honest.
3.3 Inset Plots
# Main plot with an inset showing detail
x = np.linspace(0, 4 * np.pi, 1000)
y = np.sin(x) * np.exp(-0.1 * x)
fig = go.Figure()
# Main trace
fig.add_trace(go.Scatter(
x=x, y=y, mode="lines",
line=dict(color="#2E86AB", width=2),
name="Damped Sine"
))
# Add annotation for the region to zoom
fig.add_vrect(
x0=0, x1=2, y0=0.3, y1=0.6,
fillcolor="rgba(255,0,0,0.1)",
line_width=1, line_color="red"
)
# Add inset with zoomed view
fig.update_layout(
xaxis2=dict(
domain=[0.5, 0.85],
anchor="y2"
),
yaxis2=dict(
domain=[0.6, 0.95],
anchor="x2"
)
)
fig.add_trace(go.Scatter(
x=x[(x >= 0) & (x <= 2)],
y=y[(x >= 0) & (x <= 2)],
mode="lines",
line=dict(color="red", width=2),
xaxis="x2", yaxis="y2",
name="Zoomed Region"
))
fig.update_layout(
title="Damped Sine Wave with Inset Detail",
template="plotly_white",
height=500
)
fig.show()
4. Statistical Visualizations
4.1 Marginal Distributions
fig = px.scatter(
df_tips,
x="total_bill",
y="tip",
marginal_x="histogram",
marginal_y="rug",
color="time",
title="Tip vs Total Bill with Marginal Distributions",
opacity=0.6,
template="plotly_white"
)
fig.show()
marginal_x="histogram" places a histogram above the scatter showing the x-distribution, while marginal_y="rug" places rug marks on the right showing individual y-values. This triple-panel view reveals both the joint relationship and marginal distributions simultaneously.
4.2 Correlation Heatmaps
# Compute correlation matrix
numeric_cols = ["total_bill", "tip", "size"]
corr_matrix = df_tips[numeric_cols].corr()
fig = px.imshow(
corr_matrix,
text_auto=".3f",
color_continuous_scale="RdBu_r",
zmin=-1, zmax=1,
title="Correlation Matrix Heatmap",
labels=dict(color="Correlation"),
aspect="auto"
)
fig.update_layout(
template="plotly_white",
font=dict(size=13)
)
fig.show()
The correlation coefficient formula:
Pearson Product-Moment Correlation Coefficient
Here,
- =
- =
- =
where and are sample means. The coefficient ranges from (perfect negative correlation) to (perfect positive correlation), with indicating no linear relationship.
Correlation does not imply causation β but it constrains it. A high correlation (|r| > 0.8) between two variables does not prove one causes the other, but it does imply that any causal explanation must account for the observed association. Always consider confounding variables, reverse causality, and coincidence before drawing causal conclusions.
4.3 Pair Plots
fig = px.scatter_matrix(
df_tips,
dimensions=["total_bill", "tip", "size"],
color="time",
title="Pair Plot: Tips Dataset",
opacity=0.5,
template="plotly_white"
)
fig.update_traces(diagonal_visible=True)
fig.show()
Pair plots show all pairwise relationships plus the marginal distribution along the diagonalβessential for multivariate EDA.
5. Geographical Maps
5.1 Choropleth Maps
DfChoropleth Map
A thematic map where geographic regions (countries, states, counties) are shaded or patterned in proportion to the value of a variable. The color intensity encodes magnitude, making it easy to identify spatial patterns. Plotly uses ISO 3166-1 alpha-3 country codes to match data to built-in GeoJSON boundaries.
# World choropleth with life expectancy
df_2007 = px.data.gapminder().query("year == 2007")
fig = px.choropleth(
df_2007,
locations="iso_alpha",
color="lifeExp",
hover_name="country",
color_continuous_scale=px.colors.sequential.Viridis,
title="World Life Expectancy (2007)",
labels={"lifeExp": "Life Expectancy", "iso_alpha": "Country Code"}
)
fig.update_layout(
geo=dict(
showframe=False,
showcoastlines=True,
projection_type="equirectangular"
),
template="plotly_white"
)
fig.show()
Choropleth maps encode data values as color intensity within geographic regions. The iso_alpha column uses ISO 3166-1 alpha-3 country codes, which Plotly matches to built-in GeoJSON boundaries.
5.2 Scatter Mapbox
# Simulate GPS data
np.random.seed(42)
n_points = 200
lats = 40.7128 + np.random.normal(0, 0.05, n_points) # NYC area
lons = -74.0060 + np.random.normal(0, 0.05, n_points)
values = np.random.exponential(50, n_points)
df_map = pd.DataFrame({"lat": lats, "lon": lons, "value": values})
fig = px.scatter_mapbox(
df_map,
lat="lat",
lon="lon",
color="value",
size="value",
color_continuous_scale="Viridis",
size_max=15,
zoom=11,
mapbox_style="carto-positron",
title="NYC Point Data (Scatter Mapbox)",
labels={"value": "Intensity"}
)
fig.update_layout(
margin=dict(l=0, r=0, t=40, b=0),
template="plotly_white"
)
fig.show()
Mapbox styles: carto-positron (light), carto-darkmatter (dark), open-street-map (standard tiles). For the mapbox_style="open-street-map" option, no Mapbox token is required.
6. Animated Charts
6.1 Animation Frames
df = px.data.gapminder()
fig = px.scatter(
df,
x="gdpPercap",
y="lifeExp",
size="pop",
color="continent",
hover_name="country",
log_x=True,
size_max=55,
range_x=[100, 100000],
range_y=[25, 90],
animation_frame="year",
animation_group="country",
title="Global Development Over Time (1952-2007)",
labels={"gdpPercap": "GDP per Capita (log)", "lifeExp": "Life Expectancy"}
)
fig.update_layout(template="plotly_white")
fig.show()
How animation works: animation_frame="year" creates a time slider. For each frame, Plotly renders only the data for that year. The animation_group="country" parameter ensures each country's bubble moves smoothly between frames rather than appearing/disappearing.
6.2 Animated Bar Charts (Race Charts)
# Animated bar chart
df_movies = px.data.gapminder()
# Get top 10 countries by population for selected years
years = [1952, 1972, 1992, 2007]
top_n = 8
frames = []
for year in years:
year_data = (
df_movies[df_movies["year"] == year]
.nlargest(top_n, "pop")
.sort_values("pop", ascending=True)
)
frames.append(go.Frame(
data=[go.Bar(
x=year_data["pop"],
y=year_data["country"],
orientation="h",
marker_color=year_data["pop"]
)],
name=str(year)
))
# Initial frame
initial = df_movies[df_movies["year"] == 1952].nlargest(top_n, "pop").sort_values("pop", ascending=True)
fig = go.Figure(
data=[go.Bar(x=initial["pop"], y=initial["country"], orientation="h")],
frames=frames
)
fig.update_layout(
xaxis_title="Population",
yaxis_title="Country",
title="Most Populous Countries Over Time",
updatemenus=[dict(
type="buttons",
showactive=False,
y=1.15, x=0.5, xanchor="center",
buttons=[
dict(label="Play", method="animate",
args=[None, dict(frame=dict(duration=800, redraw=True),
fromcurrent=True)]),
dict(label="Pause", method="animate",
args=[[None], dict(frame=dict(duration=0, redraw=False),
mode="immediate")])
]
)],
sliders=[dict(
active=0,
steps=[dict(args=[[str(y)], dict(frame=dict(duration=800, redraw=True),
mode="immediate")],
method="animate", label=str(y))
for y in years],
x=0, len=1, currentvalue=dict(prefix="Year: "),
transition=dict(duration=300)
)],
template="plotly_white",
height=500
)
fig.show()
7. Dashboard-Ready: Updatemenus, Sliders, Buttons
7.1 Dropdown Menus for Trace Visibility
fig = go.Figure()
# Add all traces (hidden by default)
continents = df["continent"].unique()
for continent in continents:
cdf = df[df["continent"] == continent]
fig.add_trace(go.Scatter(
x=cdf["gdpPercap"], y=cdf["lifeExp"],
mode="markers",
marker=dict(size=cdf["pop"] / 2e7, opacity=0.6),
name=continent,
visible=True if continent == "Asia" else "legendonly"
))
# Add dropdown buttons
fig.update_layout(
updatemenus=[
dict(
buttons=[
dict(label="All Continents",
method="update",
args=[{"visible": [True] * len(continents)}]),
] + [
dict(label=c,
method="update",
args=[{"visible": [i == idx for i in range(len(continents))]}])
for idx, c in enumerate(continents)
],
direction="down",
showactive=True,
x=0.17, xanchor="left",
y=1.15, yanchor="top"
)
],
template="plotly_white",
height=550,
title="GDP vs Life Expectancy (Select Continent)"
)
fig.show()
7.2 Range Slider
fig = go.Figure()
fig.add_trace(go.Scatter(
x=df_canada["year"],
y=df_canada["lifeExp"],
mode="lines+markers",
line=dict(color="#2E86AB", width=3),
fill="tozeroy",
fillcolor="rgba(46,134,171,0.2)"
))
fig.update_layout(
xaxis=dict(
rangeselector=dict(
buttons=[
dict(count=10, label="10Y", step="year", stepmode="backward"),
dict(count=20, label="20Y", step="year", stepmode="backward"),
dict(step="all", label="All")
]
),
rangeslider=dict(visible=True),
type="date"
),
title="Life Expectancy with Range Slider",
template="plotly_white"
)
fig.show()
8. Customizing Themes
8.1 Built-in Templates
# List all available templates
import plotly.io as pio
print(pio.templates)
# Output: ['plotly', 'plotly_white', 'plotly_dark', 'ggplot2',
# 'seaborn', 'simple_white', 'none']
# Apply a template globally
import plotly.io as pio
pio.templates.default = "plotly_white"
# Or per-figure
fig = px.scatter(df_tips, x="total_bill", y="tip", template="plotly_dark")
fig.show()
8.2 Custom Template
custom_template = go.layout.Template(
layout=go.Layout(
title=dict(
font=dict(family="Helvetica", size=18, color="#1a1a2e"),
x=0.5, xanchor="center"
),
font=dict(family="Helvetica", size=12, color="#1a1a2e"),
plot_bgcolor="white",
paper_bgcolor="white",
xaxis=dict(
gridcolor="#e0e0e0",
zerolinecolor="#c0c0c0",
linecolor="#c0c0c0"
),
yaxis=dict(
gridcolor="#e0e0e0",
zerolinecolor="#c0c0c0",
linecolor="#c0c0c0"
),
colorway=["#2E86AB", "#A23B72", "#F18F01", "#C73E1D", "#3B1F2B"],
legend=dict(
bgcolor="rgba(255,255,255,0.8)",
bordercolor="#e0e0e0",
borderwidth=1
)
)
)
fig = px.scatter(df_tips, x="total_bill", y="tip", color="time",
template=custom_template, title="Custom Themed Chart")
fig.show()
8.3 Colorscale Customization
fig = px.imshow(
np.random.randn(20, 20),
color_continuous_scale=[
[0, "#1a1a2e"],
[0.25, "#16213e"],
[0.5, "#0f3460"],
[0.75, "#533483"],
[1, "#e94560"]
],
title="Custom Colorscale (Midnight Gradient)"
)
fig.show()
9. Publication Quality: Export
9.1 Static Image Export
# Requires kaleido: pip install kaleido
fig = px.scatter(
df.query("year == 2007"),
x="gdpPercap", y="lifeExp", size="pop", color="continent",
log_x=True, size_max=55, template="plotly_white",
title="Publication-Quality Figure"
)
fig.update_layout(
font=dict(family="Times New Roman", size=12),
title_font_size=16,
legend=dict(title_font_size=12)
)
# Export to various formats
fig.write_image("figure.png", scale=3, width=800, height=500)
fig.write_image("figure.svg", width=800, height=500)
fig.write_image("figure.pdf", width=800, height=500)
scale=3 produces 3x resolution (300 DPI equivalent), suitable for journal submission. SVG is vector and infinitely scalable.
9.2 Standalone HTML
fig.write_html(
"interactive_chart.html",
include_plotlyjs=True, # Bundle Plotly.js (larger file, works offline)
full_html=True, # Full HTML document
config={"displayModeBar": True, "scrollZoom": True}
)
When to use HTML: Embedding in web pages, sharing via email, or when stakeholders need to interact with the data. The file is self-contained and works in any modern browser.
File size optimization: A standalone HTML file with include_plotlyjs=True bundles the full Plotly.js library (~3.5 MB). If embedding multiple charts on a single page, use include_plotlyjs='cdn' to load the library once from a CDN, reducing per-chart overhead to ~5 KB.
10. Comparison: matplotlib vs seaborn vs plotly
| Criterion | matplotlib | seaborn | plotly |
|---|---|---|---|
| Best for | Fine-grained control, academic papers | Statistical plots, EDA | Interactive dashboards, web |
| Learning curve | Steep | Moderate | Moderate |
| Default aesthetics | Basic | Beautiful | Modern |
| Interactivity | None (unlessmpld3) | None | Native |
| Statistical models | Manual | Built-in (regplot, etc.) | Limited |
| 3D plotting | Axes3D | Limited | plotly.graph_objects |
| Subplots | plt.subplots | FacetGrid | make_subplots |
| Animation | FuncAnimation | Limited | animation_frame |
| Export quality | Excellent | Excellent | Good (via kaleido) |
| File size | Small | Small | Larger |
| Browser embedding | No | No | Yes |
| Dashboard integration | Dash only | Dash only | Dash, Streamlit, Flask |
Decision Framework
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β LIBRARY SELECTION GUIDE β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β "I need a figure for a journal paper" β
β βββΊ matplotlib (reproducible, precise control) β
β β
β "I need quick EDA with beautiful defaults" β
β βββΊ seaborn (statistical plots out of the box) β
β β
β "I need stakeholders to explore the data" β
β βββΊ plotly (hover, zoom, filter) β
β β
β "I need a dashboard" β
β βββΊ plotly + dash (or plotly in streamlit) β
β β
β "I need publication-quality AND interactivity" β
β βββΊ matplotlib for static + plotly for supplementary HTML β
β β
β "I need statistical visualizations (regression, distributions)"β
β βββΊ seaborn (regplot, displot, pairplot) β
β β
β "I need maps or 3D visualizations" β
β βββΊ plotly (choropleth, scatter_mapbox, scatter_3d) β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
11. Complete EDA Dashboard
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import pandas as pd
import numpy as np
# Load and prepare data
df = px.data.tips()
df["tip_pct"] = (df["tip"] / df["total_bill"] * 100).round(2)
df["bill_per_person"] = (df["total_bill"] / df["size"]).round(2)
# Create comprehensive EDA dashboard
fig = make_subplots(
rows=3, cols=3,
subplot_titles=(
"Total Bill Distribution", "Tip vs Total Bill", "Bill by Day",
"Tip Percentage by Time", "Heatmap: Day vs Time",
"Revenue by Smoker Status", "Distribution of Party Size",
"Violin: Tip % by Day", "Box: Bill per Person"
),
specs=[
[{"type": "histogram"}, {"type": "scatter"}, {"type": "bar"}],
[{"type": "histogram"}, {"type": "heatmap"}, {"type": "pie"}],
[{"type": "box"}, {"type": "violin"}, {"type": "box"}]
],
vertical_spacing=0.08,
horizontal_spacing=0.06
)
# Panel 1: Histogram of total bill
fig.add_trace(
go.Histogram(x=df["total_bill"], nbinsx=25, marker_color="#2E86AB",
name="Total Bill", opacity=0.75),
row=1, col=1
)
# Panel 2: Scatter - Tip vs Bill
for time in df["time"].unique():
tdf = df[df["time"] == time]
fig.add_trace(
go.Scatter(x=tdf["total_bill"], y=tdf["tip"], mode="markers",
marker=dict(size=7, opacity=0.6),
name=f"Tip ({time})", showlegend=False),
row=1, col=2
)
# Panel 3: Bar - Average bill by day
day_means = df.groupby("day")["total_bill"].mean().sort_values()
fig.add_trace(
go.Bar(x=day_means.index, y=day_means.values,
marker_color=["#A23B72", "#F18F01", "#2E86AB", "#C73E1D"],
name="Avg Bill", showlegend=False),
row=1, col=3
)
# Panel 4: Histogram - Tip percentage
fig.add_trace(
go.Histogram(x=df["tip_pct"], nbinsx=20, marker_color="#A23B72",
name="Tip %", opacity=0.75, showlegend=False),
row=2, col=1
)
# Panel 5: Heatmap - Day vs Time counts
cross_tab = pd.crosstab(df["day"], df["time"])
fig.add_trace(
go.Heatmap(z=cross_tab.values, x=cross_tab.columns.tolist(),
y=cross_tab.index.tolist(), colorscale="Viridis",
text=cross_tab.values, texttemplate="%{text}",
name="Counts", showlegend=False),
row=2, col=2
)
# Panel 6: Pie - Smoker status
smoker_counts = df["smoker"].value_counts()
fig.add_trace(
go.Pie(labels=smoker_counts.index, values=smoker_counts.values,
marker_colors=["#2E86AB", "#C73E1D"], name="Smoker",
showlegend=False),
row=2, col=3
)
# Panel 7: Box - Tip percentage by day
for day in df["day"].unique():
ddf = df[df["day"] == day]
fig.add_trace(
go.Box(y=ddf["tip_pct"], name=day, showlegend=False),
row=3, col=1
)
# Panel 8: Violin - Tip % by day
for day in df["day"].unique():
ddf = df[df["day"] == day]
fig.add_trace(
go.Violin(y=ddf["tip_pct"], name=day, showlegend=False,
box_visible=True, meanline_visible=True),
row=3, col=2
)
# Panel 9: Box - Bill per person
for sex in df["sex"].unique():
sdf = df[df["sex"] == sex]
fig.add_trace(
go.Box(y=sdf["bill_per_person"], name=sex, showlegend=False),
row=3, col=3
)
fig.update_layout(
height=1100, width=1200,
title_text="Complete EDA Dashboard: Tips Dataset",
title_font_size=20,
template="plotly_white",
showlegend=True,
legend=dict(x=1.02, y=1, font=dict(size=10))
)
# Axis labels
fig.update_xaxes(title_text="Total Bill ($)", row=1, col=1)
fig.update_yaxes(title_text="Count", row=1, col=1)
fig.update_xaxes(title_text="Total Bill ($)", row=1, col=2)
fig.update_yaxes(title_text="Tip ($)", row=1, col=2)
fig.update_yaxes(title_text="Avg Total Bill ($)", row=1, col=3)
fig.update_xaxes(title_text="Tip Percentage (%)", row=2, col=1)
fig.update_yaxes(title_text="Count", row=2, col=1)
fig.update_yaxes(title_text="Tip %", row=3, col=1)
fig.update_yaxes(title_text="Tip %", row=3, col=2)
fig.update_yaxes(title_text="Bill per Person ($)", row=3, col=3)
fig.show()
Key Takeaways
πSummary: Advanced Visualization
- Interactive visualization is a cognitive design choice, not just aesthetics. Use Plotly when stakeholders need to explore data; use matplotlib when precision and reproducibility matter. The choice affects how viewers process information and the depth of insight they can extract.
- Plotly Express provides the highest-level API β a single function call generates a fully interactive figure with hover, zoom, and legends. Under the hood, it builds on
plotly.graph_objectswhich provides lower-level control. make_subplotsis the foundation for multi-panel dashboards. Combinespecswithsecondary_yfor complex layouts where variables have different scales or units.- Statistical visualizations (marginal distributions, correlation heatmaps, pair plots) are built into Plotly Express via
marginal_x,marginal_y, andpx.imshow. KDE-based violin plots reveal multimodality hidden by box plots. - Animations via
animation_framecreate self-contained temporal visualizations without external video tools. Theanimation_groupparameter ensures smooth transitions between frames. - Export to HTML for web embedding; export to PNG/SVG/PDF via
kaleidofor print and publications. Usescale=3for 300 DPI equivalent output suitable for journal submission. - The matplotlib vs seaborn vs plotly decision depends on your medium (print vs web), audience (experts vs stakeholders), and need for interactivity. There is no single best choice β each excels in different contexts.
Practice Exercises
Exercise 1: Scatter Plot Mastery
Create an interactive scatter plot using px.data.iris() with:
- Sepal length on x-axis, sepal width on y-axis
- Color by species, shape by species
- Add marginal distributions on both axes
- Add a trendline (hint: use
trendline="ols") - Export as HTML
Exercise 2: Subplot Dashboard
Build a 2x3 subplot dashboard using the px.data.gapminder() dataset:
- Panel 1: Histogram of life expectancy
- Panel 2: Scatter of GDP vs life expectancy
- Panel 3: Bar chart of population by continent
- Panel 4: Box plot of life expectancy by continent
- Panel 5: Line chart of global average life expectancy over time
- Panel 6: Pie chart of continent population shares
Exercise 3: Animated Gapminder
Create the animated Gapminder bubble chart and:
- Customize the colorscale to use a custom 5-color palette
- Add range slider for the x-axis
- Add a dropdown to filter by continent
- Export as standalone HTML
Exercise 4: Correlation Analysis
Using a dataset of your choice:
- Compute the correlation matrix
- Create a masked heatmap (upper triangle only)
- Add annotations with correlation values
- Color scale: diverging (red-white-blue)
- Add a colorbar with title
Exercise 5: Geo Visualization
Using px.data.gapminder() for year 2007:
- Create a choropleth map colored by GDP per capita
- Add a scatter_mapbox overlay with top 20 most populous countries
- Customize the map style
- Add hover data showing country, GDP, life expectancy, and population