Advanced Visualization: Plotly & Interactive Charts

Module 1: FoundationsFree Lesson

Advertisement

Advanced Visualization: Plotly & Interactive Charts

Learning Objectives

By the end of this tutorial, you will be able to:

  • Distinguish when static vs interactive visualizations are appropriate
  • Build publication-quality interactive charts with Plotly Express
  • Construct complex multi-panel layouts with subplots and insets
  • Create statistical visualizations including marginal distributions and correlation heatmaps
  • Render geographical data on interactive maps
  • Animate temporal data with transitions and animation frames
  • Build dashboard-ready charts with updatemenus, sliders, and buttons
  • Customize templates, colorscales, and themes for consistency
  • Export visualizations to PNG, SVG, and standalone HTML
  • Select the right library (matplotlib vs seaborn vs plotly) for any scenario

1. Static vs Interactive: When to Use Each

The Fundamental Tradeoff

The choice between static and interactive visualization is not aestheticβ€”it is a cognitive design decision that affects how viewers process information.

DfInformation Density

The amount of data-encoded information per unit area of a visualization. Static charts have fixed information density set at design time. Interactive charts have variable information density β€” the base layer is simple, but hover tooltips, zoom, and filtering allow users to reveal deeper layers on demand, effectively increasing density without increasing visual clutter.

Architecture Diagram
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    VISUALIZATION DECISION TREE                      β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                     β”‚
β”‚  What is your delivery medium?                                      β”‚
β”‚  β”‚                                                                  β”‚
β”‚  β”œβ”€β–Ί Print / PDF / Paper ──────────► STATIC (matplotlib, seaborn)   β”‚
β”‚  β”‚                                                                  β”‚
β”‚  β”œβ”€β–Ί Web page / Dashboard ─────────► INTERACTIVE (plotly, bokeh)    β”‚
β”‚  β”‚                                                                  β”‚
β”‚  └─► Both needed ──────────────────► STATIC primary + HTML export   β”‚
β”‚                                                                     β”‚
β”‚  How many dimensions of data?                                       β”‚
β”‚  β”‚                                                                  β”‚
β”‚  β”œβ”€β–Ί ≀ 3 dimensions ──────────────► Static is sufficient            β”‚
β”‚  β”‚                                                                  β”‚
β”‚  └─► β‰₯ 4 dimensions ──────────────► Interactive (hover, filter)     β”‚
β”‚                                                                     β”‚
β”‚  Audience expertise?                                                β”‚
β”‚  β”‚                                                                  β”‚
β”‚  β”œβ”€β–Ί Domain experts ──────────────► Static (precise, reproducible)  β”‚
β”‚  β”‚                                                                  β”‚
β”‚  └─► Exploratory / stakeholders ──► Interactive (self-serve)        β”‚
β”‚                                                                     β”‚
β”‚  Data point count?                                                  β”‚
β”‚  β”‚                                                                  β”‚
β”‚  β”œβ”€β–Ί < 1,000 ────────────────────► Either works                    β”‚
β”‚  β”‚                                                                  β”‚
β”‚  └─► > 10,000 ───────────────────► Interactive (zoom, pan)          β”‚
β”‚                                                                     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Formal Comparison

PropertyStatic (matplotlib/seaborn)Interactive (Plotly)
Output formatPNG, SVG, PDFHTML, JS bundle
File sizeSmall (KB–MB)Larger (KB–MB, depends on data)
ReproducibilityPixel-perfectViewport-dependent
Hover informationNot possibleNative support
Zoom / PanNot possibleNative support
AnimationRequires save as GIF/MP4Real-time in browser
Server requirementNone (embedded)None (standalone HTML)
Learning curveModerateSteep initially
CustomizationInfinite (low-level)Template-based (high-level)
Rendering backendAgg, TkAgg, CairoWebGL, SVG, Canvas

The rendering backend matters: Matplotlib uses CPU-based rasterization (Agg backend), which is deterministic but slow for large figures. Plotly uses WebGL (via GPU acceleration) for scatter and line plots with > 1,000 points, making it significantly faster for large datasets. However, SVG rendering in Plotly can be slower than matplotlib for simple figures.

When Plotly Wins

Plotly excels when you need exploration, presentation to non-technical stakeholders, or embeddable web content. The hover-to-reveal detail pattern reduces cognitive load because viewers only see complexity on demand.

When matplotlib Wins

Matplotlib wins when you need pixel-perfect control, publication-quality output, or reproducible figures for academic journals where every line width and font size must be exact.


2. Plotly Express: The Grammar of Interactive Graphics

Installation and Import

# Install if needed
# pip install plotly pandas numpy

import plotly.express as px
import plotly.graph_objects as go
import plotly.subplots as sp
import pandas as pd
import numpy as np

2.1 Scatter Plots

The scatter plot is the workhorse of exploratory data analysis. Plotly Express adds interactivity automatically.

DfLogarithmic Scale

A nonlinear scale where equal distances represent equal ratios rather than equal differences. On a log scale, the distance from 10 to 100 equals the distance from 100 to 1000. This is essential for visualizing data that spans orders of magnitude (e.g., GDP from 500to500 to100,000) because it compresses the right tail and reveals structure in the left portion that would otherwise be invisible.

# Load the built-in Gapminder dataset
df = px.data.gapminder()

# Basic scatter: GDP per capita vs life expectancy
fig = px.scatter(
    df.query("year == 2007"),
    x="gdpPercap",
    y="lifeExp",
    size="pop",
    color="continent",
    hover_name="country",
    log_x=True,
    size_max=60,
    title="GDP per Capita vs Life Expectancy (2007)",
    labels={
        "gdpPercap": "GDP per Capita (USD, log scale)",
        "lifeExp": "Life Expectancy (years)",
        "pop": "Population",
        "continent": "Continent"
    }
)
fig.update_layout(
    template="plotly_white",
    font=dict(family="Arial", size=12),
    title_font_size=18
)
fig.show()

Output: An interactive scatter plot where hovering over any bubble reveals the country name, GDP, life expectancy, and population. The bubble size encodes population magnitude.

Key parameters explained:

  • log_x=True β€” Applies log transformation to x-axis (essential for GDP data which spans orders of magnitude)
  • size_max=60 β€” Caps maximum bubble diameter to prevent visual dominance of large populations
  • hover_name β€” Specifies which column appears as the bold title in hover tooltips

2.2 Line Charts

# Multi-line time series
df_canada = px.data.gapminder().query("country == 'Canada'")

fig = px.line(
    df_canada,
    x="year",
    y="lifeExp",
    title="Life Expectancy in Canada Over Time",
    markers=True,
    labels={"year": "Year", "lifeExp": "Life Expectancy (years)"},
    template="plotly_white"
)
fig.update_traces(
    line=dict(color="#2E86AB", width=3),
    marker=dict(size=8, symbol="circle")
)
fig.update_layout(
    xaxis=dict(dtick=5, gridcolor="lightgray"),
    yaxis=dict(range=[65, 85])
)
fig.show()

The markers=True parameter adds data point markers to the line, making individual observations distinguishableβ€”critical for small time series where interpolation between points can be misleading.

2.3 Bar Charts

# Grouped bar chart
df_tips = px.data.tips()

fig = px.bar(
    df_tips,
    x="day",
    y="total_bill",
    color="sex",
    barmode="group",
    title="Average Total Bill by Day and Gender",
    labels={
        "day": "Day of Week",
        "total_bill": "Total Bill ($)",
        "sex": "Sex"
    },
    text_auto=".2f",
    template="plotly_white"
)
fig.update_layout(
    legend_title_text="Customer Sex",
    font=dict(size=13)
)
fig.show()

The text_auto=".2f" parameter automatically places formatted numeric labels above each bar, eliminating the need for manual annotation.

2.4 Histograms

# Overlaid histogram with density curve
df = px.data.tips()

fig = px.histogram(
    df,
    x="total_bill",
    color="time",
    nbins=30,
    opacity=0.7,
    histnorm="probability density",
    marginal="rug",
    title="Distribution of Total Bills (Lunch vs Dinner)",
    labels={"total_bill": "Total Bill ($)", "time": "Meal Time"},
    template="plotly_white"
)
fig.update_layout(
    bargap=0.1,
    legend_title_text="Meal Time"
)
fig.show()

histnorm="probability density" normalizes the histogram so the area under each bar group sums to 1, enabling comparison between groups of different sizes. marginal="rug" adds rug plots along the x-axis showing individual data points.

2.5 Box Plots

fig = px.box(
    df,
    x="day",
    y="total_bill",
    color="day",
    points="all",
    notched=True,
    title="Total Bill Distribution by Day (with All Points)",
    labels={"day": "Day of Week", "total_bill": "Total Bill ($)"},
    template="plotly_white"
)
fig.update_layout(showlegend=False)
fig.show()

notched=True draws notches whose width is proportional to the confidence interval of the medianβ€”non-overlapping notches suggest statistically different medians. points="all" overlays all individual data points, revealing distribution shape hidden by the box.

2.6 Violin Plots

DfKernel Density Estimation (KDE)

A non-parametric method for estimating the probability density function of a random variable. Instead of binning data like a histogram, KDE places a smooth kernel function (typically Gaussian) at each data point and sums them to produce a continuous density estimate. The smoothness is controlled by the bandwidth parameter hh.

fig = px.violin(
    df,
    x="day",
    y="total_bill",
    color="day",
    box=True,
    points="all",
    violinmode="group",
    title="Violin Plot of Total Bills by Day",
    labels={"day": "Day of Week", "total_bill": "Total Bill ($)"},
    template="plotly_white"
)
fig.update_layout(showlegend=False)
fig.show()

Violin plots combine the box plot (median, quartiles) with a kernel density estimation (KDE), revealing multimodality that box plots hide.

The KDE estimation:

Kernel Density Estimator

f^(x)=1nhβˆ‘i=1nK(xβˆ’xih)\hat{f}(x) = \frac{1}{nh} \sum_{i=1}^{n} K\left(\frac{x - x_i}{h}\right)

Here,

  • KK=
  • hh=
  • nn=
  • xix_i=

where KK is the kernel function (typically Gaussian) and hh is the bandwidth selected via Silverman's rule:

h=0.9β‹…min⁑(Οƒ^,IQR1.34)β‹…nβˆ’1/5h = 0.9 \cdot \min\left(\hat{\sigma}, \frac{IQR}{1.34}\right) \cdot n^{-1/5}

Bandwidth selection tradeoff: A smaller bandwidth (hh) produces a wiggly, low-bias estimate that captures fine structure but has high variance. A larger bandwidth produces a smooth, low-variance estimate but may obscure real features. Silverman's rule is a reasonable default, but for multimodal data, consider using cross-validation or a smaller bandwidth to avoid oversmoothing.


3. Subplots: Multi-Panel Layouts

3.1 Basic Subplots with make_subplots

from plotly.subplots import make_subplots

fig = make_subplots(
    rows=2, cols=2,
    subplot_titles=("Scatter", "Bar", "Histogram", "Box"),
    horizontal_spacing=0.12,
    vertical_spacing=0.15
)

# Panel 1: Scatter
fig.add_trace(
    go.Scatter(x=df_tips["total_bill"], y=df_tips["tip"],
               mode="markers", marker=dict(size=6, opacity=0.6),
               name="Scatter"),
    row=1, col=1
)

# Panel 2: Bar
day_means = df_tips.groupby("day")["total_bill"].mean().reset_index()
fig.add_trace(
    go.Bar(x=day_means["day"], y=day_means["total_bill"],
           name="Bar", marker_color="#2E86AB"),
    row=1, col=2
)

# Panel 3: Histogram
fig.add_trace(
    go.Histogram(x=df_tips["total_bill"], nbinsx=20,
                 name="Histogram", marker_color="#A23B72"),
    row=2, col=1
)

# Panel 4: Box
for day in df_tips["day"].unique():
    day_data = df_tips[df_tips["day"] == day]
    fig.add_trace(
        go.Box(y=day_data["total_bill"], name=day),
        row=2, col=2
    )

fig.update_layout(
    height=700, width=900,
    title_text="Multi-Panel EDA Dashboard",
    template="plotly_white",
    showlegend=False
)
fig.show()

3.2 Secondary Y-Axes

# Plot two variables with different scales
months = pd.date_range("2024-01-01", periods=12, freq="MS")
revenue = np.random.uniform(50000, 120000, 12)
orders = np.random.poisson(200, 12)

df_multi = pd.DataFrame({
    "month": months,
    "revenue": revenue,
    "orders": orders
})

fig = make_subplots(specs=[[{"secondary_y": True}]])

fig.add_trace(
    go.Scatter(
        x=df_multi["month"], y=df_multi["revenue"],
        name="Revenue ($)",
        line=dict(color="#2E86AB", width=3),
        mode="lines+markers"
    ),
    secondary_y=False
)

fig.add_trace(
    go.Bar(
        x=df_multi["month"], y=df_multi["orders"],
        name="Orders",
        marker_color="rgba(162, 59, 114, 0.5)"
    ),
    secondary_y=True
)

fig.update_yaxes(title_text="Revenue ($)", secondary_y=False)
fig.update_yaxes(title_text="Number of Orders", secondary_y=True)
fig.update_layout(
    title="Revenue and Orders Over Time",
    template="plotly_white",
    height=450
)
fig.show()

Why secondary axes matter: When two metrics have fundamentally different units (dollars vs. counts) or magnitudes (thousands vs. single digits), overlaying them on the same axis renders one invisible. Secondary axes solve this by providing independent scales.

Dual-axis pitfalls: Secondary axes can be misleading if used carelessly. Viewers may interpret a visual intersection as meaningful when it is merely an artifact of axis scaling. Always clearly label both axes, and consider whether a small-multiples layout (separate panels) would be more honest.

3.3 Inset Plots

# Main plot with an inset showing detail
x = np.linspace(0, 4 * np.pi, 1000)
y = np.sin(x) * np.exp(-0.1 * x)

fig = go.Figure()

# Main trace
fig.add_trace(go.Scatter(
    x=x, y=y, mode="lines",
    line=dict(color="#2E86AB", width=2),
    name="Damped Sine"
))

# Add annotation for the region to zoom
fig.add_vrect(
    x0=0, x1=2, y0=0.3, y1=0.6,
    fillcolor="rgba(255,0,0,0.1)",
    line_width=1, line_color="red"
)

# Add inset with zoomed view
fig.update_layout(
    xaxis2=dict(
        domain=[0.5, 0.85],
        anchor="y2"
    ),
    yaxis2=dict(
        domain=[0.6, 0.95],
        anchor="x2"
    )
)

fig.add_trace(go.Scatter(
    x=x[(x >= 0) & (x <= 2)],
    y=y[(x >= 0) & (x <= 2)],
    mode="lines",
    line=dict(color="red", width=2),
    xaxis="x2", yaxis="y2",
    name="Zoomed Region"
))

fig.update_layout(
    title="Damped Sine Wave with Inset Detail",
    template="plotly_white",
    height=500
)
fig.show()

4. Statistical Visualizations

4.1 Marginal Distributions

fig = px.scatter(
    df_tips,
    x="total_bill",
    y="tip",
    marginal_x="histogram",
    marginal_y="rug",
    color="time",
    title="Tip vs Total Bill with Marginal Distributions",
    opacity=0.6,
    template="plotly_white"
)
fig.show()

marginal_x="histogram" places a histogram above the scatter showing the x-distribution, while marginal_y="rug" places rug marks on the right showing individual y-values. This triple-panel view reveals both the joint relationship and marginal distributions simultaneously.

4.2 Correlation Heatmaps

# Compute correlation matrix
numeric_cols = ["total_bill", "tip", "size"]
corr_matrix = df_tips[numeric_cols].corr()

fig = px.imshow(
    corr_matrix,
    text_auto=".3f",
    color_continuous_scale="RdBu_r",
    zmin=-1, zmax=1,
    title="Correlation Matrix Heatmap",
    labels=dict(color="Correlation"),
    aspect="auto"
)
fig.update_layout(
    template="plotly_white",
    font=dict(size=13)
)
fig.show()

The correlation coefficient formula:

Pearson Product-Moment Correlation Coefficient

rxy=βˆ‘i=1n(xiβˆ’xΛ‰)(yiβˆ’yΛ‰)βˆ‘i=1n(xiβˆ’xΛ‰)2β‹…βˆ‘i=1n(yiβˆ’yΛ‰)2r_{xy} = \frac{\sum_{i=1}^{n}(x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum_{i=1}^{n}(x_i - \bar{x})^2 \cdot \sum_{i=1}^{n}(y_i - \bar{y})^2}}

Here,

  • xΛ‰,yΛ‰\bar{x}, \bar{y}=
  • nn=
  • rxyr_{xy}=

where xΛ‰\bar{x} and yΛ‰\bar{y} are sample means. The coefficient ranges from βˆ’1-1 (perfect negative correlation) to +1+1 (perfect positive correlation), with 00 indicating no linear relationship.

Correlation does not imply causation β€” but it constrains it. A high correlation (|r| > 0.8) between two variables does not prove one causes the other, but it does imply that any causal explanation must account for the observed association. Always consider confounding variables, reverse causality, and coincidence before drawing causal conclusions.

4.3 Pair Plots

fig = px.scatter_matrix(
    df_tips,
    dimensions=["total_bill", "tip", "size"],
    color="time",
    title="Pair Plot: Tips Dataset",
    opacity=0.5,
    template="plotly_white"
)
fig.update_traces(diagonal_visible=True)
fig.show()

Pair plots show all pairwise relationships plus the marginal distribution along the diagonalβ€”essential for multivariate EDA.


5. Geographical Maps

5.1 Choropleth Maps

DfChoropleth Map

A thematic map where geographic regions (countries, states, counties) are shaded or patterned in proportion to the value of a variable. The color intensity encodes magnitude, making it easy to identify spatial patterns. Plotly uses ISO 3166-1 alpha-3 country codes to match data to built-in GeoJSON boundaries.

# World choropleth with life expectancy
df_2007 = px.data.gapminder().query("year == 2007")

fig = px.choropleth(
    df_2007,
    locations="iso_alpha",
    color="lifeExp",
    hover_name="country",
    color_continuous_scale=px.colors.sequential.Viridis,
    title="World Life Expectancy (2007)",
    labels={"lifeExp": "Life Expectancy", "iso_alpha": "Country Code"}
)
fig.update_layout(
    geo=dict(
        showframe=False,
        showcoastlines=True,
        projection_type="equirectangular"
    ),
    template="plotly_white"
)
fig.show()

Choropleth maps encode data values as color intensity within geographic regions. The iso_alpha column uses ISO 3166-1 alpha-3 country codes, which Plotly matches to built-in GeoJSON boundaries.

5.2 Scatter Mapbox

# Simulate GPS data
np.random.seed(42)
n_points = 200
lats = 40.7128 + np.random.normal(0, 0.05, n_points)  # NYC area
lons = -74.0060 + np.random.normal(0, 0.05, n_points)
values = np.random.exponential(50, n_points)

df_map = pd.DataFrame({"lat": lats, "lon": lons, "value": values})

fig = px.scatter_mapbox(
    df_map,
    lat="lat",
    lon="lon",
    color="value",
    size="value",
    color_continuous_scale="Viridis",
    size_max=15,
    zoom=11,
    mapbox_style="carto-positron",
    title="NYC Point Data (Scatter Mapbox)",
    labels={"value": "Intensity"}
)
fig.update_layout(
    margin=dict(l=0, r=0, t=40, b=0),
    template="plotly_white"
)
fig.show()

Mapbox styles: carto-positron (light), carto-darkmatter (dark), open-street-map (standard tiles). For the mapbox_style="open-street-map" option, no Mapbox token is required.


6. Animated Charts

6.1 Animation Frames

df = px.data.gapminder()

fig = px.scatter(
    df,
    x="gdpPercap",
    y="lifeExp",
    size="pop",
    color="continent",
    hover_name="country",
    log_x=True,
    size_max=55,
    range_x=[100, 100000],
    range_y=[25, 90],
    animation_frame="year",
    animation_group="country",
    title="Global Development Over Time (1952-2007)",
    labels={"gdpPercap": "GDP per Capita (log)", "lifeExp": "Life Expectancy"}
)
fig.update_layout(template="plotly_white")
fig.show()

How animation works: animation_frame="year" creates a time slider. For each frame, Plotly renders only the data for that year. The animation_group="country" parameter ensures each country's bubble moves smoothly between frames rather than appearing/disappearing.

6.2 Animated Bar Charts (Race Charts)

# Animated bar chart
df_movies = px.data.gapminder()

# Get top 10 countries by population for selected years
years = [1952, 1972, 1992, 2007]
top_n = 8

frames = []
for year in years:
    year_data = (
        df_movies[df_movies["year"] == year]
        .nlargest(top_n, "pop")
        .sort_values("pop", ascending=True)
    )
    frames.append(go.Frame(
        data=[go.Bar(
            x=year_data["pop"],
            y=year_data["country"],
            orientation="h",
            marker_color=year_data["pop"]
        )],
        name=str(year)
    ))

# Initial frame
initial = df_movies[df_movies["year"] == 1952].nlargest(top_n, "pop").sort_values("pop", ascending=True)
fig = go.Figure(
    data=[go.Bar(x=initial["pop"], y=initial["country"], orientation="h")],
    frames=frames
)

fig.update_layout(
    xaxis_title="Population",
    yaxis_title="Country",
    title="Most Populous Countries Over Time",
    updatemenus=[dict(
        type="buttons",
        showactive=False,
        y=1.15, x=0.5, xanchor="center",
        buttons=[
            dict(label="Play", method="animate",
                 args=[None, dict(frame=dict(duration=800, redraw=True),
                                  fromcurrent=True)]),
            dict(label="Pause", method="animate",
                 args=[[None], dict(frame=dict(duration=0, redraw=False),
                                    mode="immediate")])
        ]
    )],
    sliders=[dict(
        active=0,
        steps=[dict(args=[[str(y)], dict(frame=dict(duration=800, redraw=True),
                                          mode="immediate")],
                    method="animate", label=str(y))
               for y in years],
        x=0, len=1, currentvalue=dict(prefix="Year: "),
        transition=dict(duration=300)
    )],
    template="plotly_white",
    height=500
)
fig.show()

7. Dashboard-Ready: Updatemenus, Sliders, Buttons

7.1 Dropdown Menus for Trace Visibility

fig = go.Figure()

# Add all traces (hidden by default)
continents = df["continent"].unique()
for continent in continents:
    cdf = df[df["continent"] == continent]
    fig.add_trace(go.Scatter(
        x=cdf["gdpPercap"], y=cdf["lifeExp"],
        mode="markers",
        marker=dict(size=cdf["pop"] / 2e7, opacity=0.6),
        name=continent,
        visible=True if continent == "Asia" else "legendonly"
    ))

# Add dropdown buttons
fig.update_layout(
    updatemenus=[
        dict(
            buttons=[
                dict(label="All Continents",
                     method="update",
                     args=[{"visible": [True] * len(continents)}]),
            ] + [
                dict(label=c,
                     method="update",
                     args=[{"visible": [i == idx for i in range(len(continents))]}])
                for idx, c in enumerate(continents)
            ],
            direction="down",
            showactive=True,
            x=0.17, xanchor="left",
            y=1.15, yanchor="top"
        )
    ],
    template="plotly_white",
    height=550,
    title="GDP vs Life Expectancy (Select Continent)"
)
fig.show()

7.2 Range Slider

fig = go.Figure()

fig.add_trace(go.Scatter(
    x=df_canada["year"],
    y=df_canada["lifeExp"],
    mode="lines+markers",
    line=dict(color="#2E86AB", width=3),
    fill="tozeroy",
    fillcolor="rgba(46,134,171,0.2)"
))

fig.update_layout(
    xaxis=dict(
        rangeselector=dict(
            buttons=[
                dict(count=10, label="10Y", step="year", stepmode="backward"),
                dict(count=20, label="20Y", step="year", stepmode="backward"),
                dict(step="all", label="All")
            ]
        ),
        rangeslider=dict(visible=True),
        type="date"
    ),
    title="Life Expectancy with Range Slider",
    template="plotly_white"
)
fig.show()

8. Customizing Themes

8.1 Built-in Templates

# List all available templates
import plotly.io as pio
print(pio.templates)
# Output: ['plotly', 'plotly_white', 'plotly_dark', 'ggplot2',
#          'seaborn', 'simple_white', 'none']

# Apply a template globally
import plotly.io as pio
pio.templates.default = "plotly_white"

# Or per-figure
fig = px.scatter(df_tips, x="total_bill", y="tip", template="plotly_dark")
fig.show()

8.2 Custom Template

custom_template = go.layout.Template(
    layout=go.Layout(
        title=dict(
            font=dict(family="Helvetica", size=18, color="#1a1a2e"),
            x=0.5, xanchor="center"
        ),
        font=dict(family="Helvetica", size=12, color="#1a1a2e"),
        plot_bgcolor="white",
        paper_bgcolor="white",
        xaxis=dict(
            gridcolor="#e0e0e0",
            zerolinecolor="#c0c0c0",
            linecolor="#c0c0c0"
        ),
        yaxis=dict(
            gridcolor="#e0e0e0",
            zerolinecolor="#c0c0c0",
            linecolor="#c0c0c0"
        ),
        colorway=["#2E86AB", "#A23B72", "#F18F01", "#C73E1D", "#3B1F2B"],
        legend=dict(
            bgcolor="rgba(255,255,255,0.8)",
            bordercolor="#e0e0e0",
            borderwidth=1
        )
    )
)

fig = px.scatter(df_tips, x="total_bill", y="tip", color="time",
                 template=custom_template, title="Custom Themed Chart")
fig.show()

8.3 Colorscale Customization

fig = px.imshow(
    np.random.randn(20, 20),
    color_continuous_scale=[
        [0, "#1a1a2e"],
        [0.25, "#16213e"],
        [0.5, "#0f3460"],
        [0.75, "#533483"],
        [1, "#e94560"]
    ],
    title="Custom Colorscale (Midnight Gradient)"
)
fig.show()

9. Publication Quality: Export

9.1 Static Image Export

# Requires kaleido: pip install kaleido
fig = px.scatter(
    df.query("year == 2007"),
    x="gdpPercap", y="lifeExp", size="pop", color="continent",
    log_x=True, size_max=55, template="plotly_white",
    title="Publication-Quality Figure"
)
fig.update_layout(
    font=dict(family="Times New Roman", size=12),
    title_font_size=16,
    legend=dict(title_font_size=12)
)

# Export to various formats
fig.write_image("figure.png", scale=3, width=800, height=500)
fig.write_image("figure.svg", width=800, height=500)
fig.write_image("figure.pdf", width=800, height=500)

scale=3 produces 3x resolution (300 DPI equivalent), suitable for journal submission. SVG is vector and infinitely scalable.

9.2 Standalone HTML

fig.write_html(
    "interactive_chart.html",
    include_plotlyjs=True,     # Bundle Plotly.js (larger file, works offline)
    full_html=True,            # Full HTML document
    config={"displayModeBar": True, "scrollZoom": True}
)

When to use HTML: Embedding in web pages, sharing via email, or when stakeholders need to interact with the data. The file is self-contained and works in any modern browser.

File size optimization: A standalone HTML file with include_plotlyjs=True bundles the full Plotly.js library (~3.5 MB). If embedding multiple charts on a single page, use include_plotlyjs='cdn' to load the library once from a CDN, reducing per-chart overhead to ~5 KB.


10. Comparison: matplotlib vs seaborn vs plotly

Criterionmatplotlibseabornplotly
Best forFine-grained control, academic papersStatistical plots, EDAInteractive dashboards, web
Learning curveSteepModerateModerate
Default aestheticsBasicBeautifulModern
InteractivityNone (unlessmpld3)NoneNative
Statistical modelsManualBuilt-in (regplot, etc.)Limited
3D plottingAxes3DLimitedplotly.graph_objects
Subplotsplt.subplotsFacetGridmake_subplots
AnimationFuncAnimationLimitedanimation_frame
Export qualityExcellentExcellentGood (via kaleido)
File sizeSmallSmallLarger
Browser embeddingNoNoYes
Dashboard integrationDash onlyDash onlyDash, Streamlit, Flask

Decision Framework

Architecture Diagram
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                   LIBRARY SELECTION GUIDE                        β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                 β”‚
β”‚  "I need a figure for a journal paper"                         β”‚
β”‚    └─► matplotlib (reproducible, precise control)               β”‚
β”‚                                                                 β”‚
β”‚  "I need quick EDA with beautiful defaults"                    β”‚
β”‚    └─► seaborn (statistical plots out of the box)               β”‚
β”‚                                                                 β”‚
β”‚  "I need stakeholders to explore the data"                     β”‚
β”‚    └─► plotly (hover, zoom, filter)                             β”‚
β”‚                                                                 β”‚
β”‚  "I need a dashboard"                                          β”‚
β”‚    └─► plotly + dash (or plotly in streamlit)                   β”‚
β”‚                                                                 β”‚
β”‚  "I need publication-quality AND interactivity"                β”‚
β”‚    └─► matplotlib for static + plotly for supplementary HTML   β”‚
β”‚                                                                 β”‚
β”‚  "I need statistical visualizations (regression, distributions)"β”‚
β”‚    └─► seaborn (regplot, displot, pairplot)                     β”‚
β”‚                                                                 β”‚
β”‚  "I need maps or 3D visualizations"                            β”‚
β”‚    └─► plotly (choropleth, scatter_mapbox, scatter_3d)          β”‚
β”‚                                                                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

11. Complete EDA Dashboard

import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import pandas as pd
import numpy as np

# Load and prepare data
df = px.data.tips()
df["tip_pct"] = (df["tip"] / df["total_bill"] * 100).round(2)
df["bill_per_person"] = (df["total_bill"] / df["size"]).round(2)

# Create comprehensive EDA dashboard
fig = make_subplots(
    rows=3, cols=3,
    subplot_titles=(
        "Total Bill Distribution", "Tip vs Total Bill", "Bill by Day",
        "Tip Percentage by Time", "Heatmap: Day vs Time",
        "Revenue by Smoker Status", "Distribution of Party Size",
        "Violin: Tip % by Day", "Box: Bill per Person"
    ),
    specs=[
        [{"type": "histogram"}, {"type": "scatter"}, {"type": "bar"}],
        [{"type": "histogram"}, {"type": "heatmap"}, {"type": "pie"}],
        [{"type": "box"}, {"type": "violin"}, {"type": "box"}]
    ],
    vertical_spacing=0.08,
    horizontal_spacing=0.06
)

# Panel 1: Histogram of total bill
fig.add_trace(
    go.Histogram(x=df["total_bill"], nbinsx=25, marker_color="#2E86AB",
                 name="Total Bill", opacity=0.75),
    row=1, col=1
)

# Panel 2: Scatter - Tip vs Bill
for time in df["time"].unique():
    tdf = df[df["time"] == time]
    fig.add_trace(
        go.Scatter(x=tdf["total_bill"], y=tdf["tip"], mode="markers",
                   marker=dict(size=7, opacity=0.6),
                   name=f"Tip ({time})", showlegend=False),
        row=1, col=2
    )

# Panel 3: Bar - Average bill by day
day_means = df.groupby("day")["total_bill"].mean().sort_values()
fig.add_trace(
    go.Bar(x=day_means.index, y=day_means.values,
           marker_color=["#A23B72", "#F18F01", "#2E86AB", "#C73E1D"],
           name="Avg Bill", showlegend=False),
    row=1, col=3
)

# Panel 4: Histogram - Tip percentage
fig.add_trace(
    go.Histogram(x=df["tip_pct"], nbinsx=20, marker_color="#A23B72",
                 name="Tip %", opacity=0.75, showlegend=False),
    row=2, col=1
)

# Panel 5: Heatmap - Day vs Time counts
cross_tab = pd.crosstab(df["day"], df["time"])
fig.add_trace(
    go.Heatmap(z=cross_tab.values, x=cross_tab.columns.tolist(),
               y=cross_tab.index.tolist(), colorscale="Viridis",
               text=cross_tab.values, texttemplate="%{text}",
               name="Counts", showlegend=False),
    row=2, col=2
)

# Panel 6: Pie - Smoker status
smoker_counts = df["smoker"].value_counts()
fig.add_trace(
    go.Pie(labels=smoker_counts.index, values=smoker_counts.values,
           marker_colors=["#2E86AB", "#C73E1D"], name="Smoker",
           showlegend=False),
    row=2, col=3
)

# Panel 7: Box - Tip percentage by day
for day in df["day"].unique():
    ddf = df[df["day"] == day]
    fig.add_trace(
        go.Box(y=ddf["tip_pct"], name=day, showlegend=False),
        row=3, col=1
    )

# Panel 8: Violin - Tip % by day
for day in df["day"].unique():
    ddf = df[df["day"] == day]
    fig.add_trace(
        go.Violin(y=ddf["tip_pct"], name=day, showlegend=False,
                  box_visible=True, meanline_visible=True),
        row=3, col=2
    )

# Panel 9: Box - Bill per person
for sex in df["sex"].unique():
    sdf = df[df["sex"] == sex]
    fig.add_trace(
        go.Box(y=sdf["bill_per_person"], name=sex, showlegend=False),
        row=3, col=3
    )

fig.update_layout(
    height=1100, width=1200,
    title_text="Complete EDA Dashboard: Tips Dataset",
    title_font_size=20,
    template="plotly_white",
    showlegend=True,
    legend=dict(x=1.02, y=1, font=dict(size=10))
)

# Axis labels
fig.update_xaxes(title_text="Total Bill ($)", row=1, col=1)
fig.update_yaxes(title_text="Count", row=1, col=1)
fig.update_xaxes(title_text="Total Bill ($)", row=1, col=2)
fig.update_yaxes(title_text="Tip ($)", row=1, col=2)
fig.update_yaxes(title_text="Avg Total Bill ($)", row=1, col=3)
fig.update_xaxes(title_text="Tip Percentage (%)", row=2, col=1)
fig.update_yaxes(title_text="Count", row=2, col=1)
fig.update_yaxes(title_text="Tip %", row=3, col=1)
fig.update_yaxes(title_text="Tip %", row=3, col=2)
fig.update_yaxes(title_text="Bill per Person ($)", row=3, col=3)

fig.show()

Key Takeaways

πŸ“‹Summary: Advanced Visualization

  1. Interactive visualization is a cognitive design choice, not just aesthetics. Use Plotly when stakeholders need to explore data; use matplotlib when precision and reproducibility matter. The choice affects how viewers process information and the depth of insight they can extract.
  2. Plotly Express provides the highest-level API β€” a single function call generates a fully interactive figure with hover, zoom, and legends. Under the hood, it builds on plotly.graph_objects which provides lower-level control.
  3. make_subplots is the foundation for multi-panel dashboards. Combine specs with secondary_y for complex layouts where variables have different scales or units.
  4. Statistical visualizations (marginal distributions, correlation heatmaps, pair plots) are built into Plotly Express via marginal_x, marginal_y, and px.imshow. KDE-based violin plots reveal multimodality hidden by box plots.
  5. Animations via animation_frame create self-contained temporal visualizations without external video tools. The animation_group parameter ensures smooth transitions between frames.
  6. Export to HTML for web embedding; export to PNG/SVG/PDF via kaleido for print and publications. Use scale=3 for 300 DPI equivalent output suitable for journal submission.
  7. The matplotlib vs seaborn vs plotly decision depends on your medium (print vs web), audience (experts vs stakeholders), and need for interactivity. There is no single best choice β€” each excels in different contexts.

Practice Exercises

Exercise 1: Scatter Plot Mastery

Create an interactive scatter plot using px.data.iris() with:

  • Sepal length on x-axis, sepal width on y-axis
  • Color by species, shape by species
  • Add marginal distributions on both axes
  • Add a trendline (hint: use trendline="ols")
  • Export as HTML

Exercise 2: Subplot Dashboard

Build a 2x3 subplot dashboard using the px.data.gapminder() dataset:

  • Panel 1: Histogram of life expectancy
  • Panel 2: Scatter of GDP vs life expectancy
  • Panel 3: Bar chart of population by continent
  • Panel 4: Box plot of life expectancy by continent
  • Panel 5: Line chart of global average life expectancy over time
  • Panel 6: Pie chart of continent population shares

Exercise 3: Animated Gapminder

Create the animated Gapminder bubble chart and:

  1. Customize the colorscale to use a custom 5-color palette
  2. Add range slider for the x-axis
  3. Add a dropdown to filter by continent
  4. Export as standalone HTML

Exercise 4: Correlation Analysis

Using a dataset of your choice:

  1. Compute the correlation matrix
  2. Create a masked heatmap (upper triangle only)
  3. Add annotations with correlation values
  4. Color scale: diverging (red-white-blue)
  5. Add a colorbar with title

Exercise 5: Geo Visualization

Using px.data.gapminder() for year 2007:

  1. Create a choropleth map colored by GDP per capita
  2. Add a scatter_mapbox overlay with top 20 most populous countries
  3. Customize the map style
  4. Add hover data showing country, GDP, life expectancy, and population

Advertisement

Need Expert Data Science Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement