The Interview Question
"Amazon's checkout conversion rate dropped 5% last week. Walk me through how you'd investigate and fix this."
Funnel analysis is a core data science skill β it's how you turn raw user behavior into actionable business insights.
Why Companies Ask This
βΉοΈ
Every business has a funnel: awareness β consideration β conversion β retention. Amazon and Meta need data scientists who can diagnose where users drop off, why they drop off, and how to fix it.
Interviewers evaluate:
- Structured Analysis β Can you systematically investigate a funnel?
- SQL/Data Skills β Can you write queries to extract funnel data?
- Root Cause Thinking β Do you look beyond symptoms to causes?
- Action Orientation β Do you recommend specific fixes, not just observations?
- Business Understanding β Do you know what drives conversions?
The Funnel Analysis Framework
Step 1: Define the Funnel
Architecture Diagram
E-commerce Checkout Funnel:
1. Product View
2. Add to Cart
3. Begin Checkout
4. Enter Payment
5. Complete Purchase
Step 2: Measure Conversion Rates
import pandas as pd
import numpy as np
def calculate_funnel_metrics(sessions_df, events_df):
"""
Calculate conversion rates at each funnel step.
"""
# Define funnel steps in order
funnel_steps = [
'product_view',
'add_to_cart',
'begin_checkout',
'enter_payment',
'complete_purchase'
]
# For each session, find the furthest step reached
events_df['step_index'] = events_df['event_type'].map(
{step: i for i, step in enumerate(funnel_steps)}
)
max_step_per_session = events_df.groupby('session_id')['step_index'].max()
# Count users reaching each step
funnel_counts = {}
for i, step in enumerate(funnel_steps):
funnel_counts[step] = (max_step_per_session >= i).sum()
# Calculate conversion rates
total_users = funnel_counts[funnel_steps[0]]
funnel_metrics = []
for i, step in enumerate(funnel_steps):
count = funnel_counts[step]
step_conversion = count / total_users
drop_from_previous = (
1 - count / funnel_counts[funnel_steps[i-1]]
if i > 0 else 0
)
funnel_metrics.append({
'step': step,
'users': count,
'overall_conversion': step_conversion,
'step_conversion': 1 - drop_from_previous,
'drop_off_rate': drop_from_previous,
})
return pd.DataFrame(funnel_metrics)
# Example output:
# step users overall_conv step_conv drop_off
# product_view 100,000 100.0% 100.0% 0.0%
# add_to_cart 45,000 45.0% 45.0% 55.0%
# begin_checkout 28,000 28.0% 62.2% 37.8%
# enter_payment 22,000 22.0% 78.6% 21.4%
# complete_purchase 18,000 18.0% 81.8% 18.2%
Step 3: Identify the Biggest Drop-off
def identify_drop_off_points(funnel_df):
"""
Identify where the biggest drop-offs are occurring.
"""
# Find steps with highest absolute and relative drop-off
drop_offs = funnel_df[funnel_df['step'] != funnel_df['step'].iloc[0]]
# Sort by drop-off rate
worst_steps = drop_offs.sort_values('drop_off_rate', ascending=False)
return {
'biggest_drop_off': worst_steps.iloc[0]['step'],
'drop_off_rate': worst_steps.iloc[0]['drop_off_rate'],
'users_lost': (
worst_steps.iloc[0]['users'] -
funnel_df.iloc[funnel_df[
funnel_df['step'] == worst_steps.iloc[0]['step']
].index[0] - 1]['users']
),
}
Example: Investigating Amazon Checkout Drop-off
Phase 1: Quantify the Problem
"The checkout conversion dropped from 4.2% to 3.99% β a 5% relative decline. This represents approximately 2,100 fewer completed purchases per day, or roughly $250K in lost daily revenue."
Phase 2: Segment the Drop-off
def segment_funnel_analysis(data, segment_by):
"""
Analyze funnel by different segments to find where the drop is concentrated.
"""
segments = data[segment_by].unique()
results = []
for segment in segments:
segment_data = data[data[segment_by] == segment]
funnel = calculate_funnel_metrics(segment_data)
results.append({
'segment': segment,
'overall_conversion': funnel.iloc[-1]['overall_conversion'],
'biggest_drop': funnel.sort_values(
'drop_off_rate', ascending=False
).iloc[0]['step'],
})
return pd.DataFrame(results)
# Segment by device
device_analysis = segment_funnel_analysis(data, 'device_type')
# Result: Mobile conversion = 2.8%, Desktop = 5.1%, Tablet = 4.5%
# β Mobile is significantly underperforming
# Segment by geography
geo_analysis = segment_funnel_analysis(data, 'country')
# Result: US = 4.1%, UK = 4.0%, Germany = 3.2%, India = 2.1%
# β Germany and India are underperforming
# Segment by traffic source
source_analysis = segment_funnel_analysis(data, 'traffic_source')
# Result: Organic = 4.5%, Paid = 3.2%, Email = 5.1%, Social = 2.8%
# β Paid and Social traffic converting poorly
Phase 3: Deep Dive into the Problem Segment
def deep_dive_mobile_checkout(data):
"""
Investigate mobile checkout experience specifically.
"""
mobile_data = data[data['device_type'] == 'mobile']
# Check step-level conversion for mobile vs desktop
mobile_funnel = calculate_funnel_metrics(mobile_data)
desktop_funnel = calculate_funnel_metrics(
data[data['device_type'] == 'desktop']
)
comparison = mobile_funnel.merge(
desktop_funnel,
on='step',
suffixes=('_mobile', '_desktop')
)
comparison['conversion_gap'] = (
comparison['step_conversion_mobile'] -
comparison['step_conversion_desktop']
)
# Find where mobile loses the most ground
worst_gap = comparison.sort_values('conversion_gap').iloc[0]
return {
'worst_step': worst_gap['step'],
'mobile_conversion': worst_gap['step_conversion_mobile'],
'desktop_conversion': worst_gap['step_conversion_desktop'],
'gap': worst_gap['conversion_gap'],
}
# Result: Payment entry step has largest gap
# Mobile: 65% complete payment entry
# Desktop: 82% complete payment entry
# Gap: -17 percentage points
Phase 4: Investigate the Root Cause
def investigate_payment_entry(data):
"""
Investigate why mobile users fail at payment entry.
"""
mobile_checkout = data[
(data['device_type'] == 'mobile') &
(data['reached_payment_entry'])
]
# Check for technical issues
technical_issues = {
'page_load_time': mobile_checkout['payment_page_load_ms'].mean(),
'error_rate': mobile_checkout['payment_error'].mean(),
'timeout_rate': mobile_checkout['payment_timeout'].mean(),
'form abandonment': mobile_checkout['form_abandoned'].mean(),
}
# Check for UX issues
ux_issues = {
'autofill_used': mobile_checkout['autofill_detected'].mean(),
'keyboard_type_changes': mobile_checkout['keyboard_switches'].mean(),
'field_revisits': mobile_checkout['field_revisit_count'].mean(),
}
# Check for payment method issues
payment_issues = {
'preferred_method_available': mobile_checkout[
'preferred_payment_available'
].mean(),
'payment_method_switches': mobile_checkout[
'payment_method_changes'
].mean(),
}
return {
'technical': technical_issues,
'ux': ux_issues,
'payment': payment_issues,
}
# Results show:
# - Payment page load time: 4.2 seconds (desktop: 1.1 seconds)
# - Error rate: 12% (desktop: 3%)
# - Autofill used: 35% (desktop: 78%)
# β Mobile payment page is slow, error-prone, and autofill doesn't work well
Advanced Funnel Analysis Techniques
Time-to-Convert Analysis
def time_to_convert_analysis(data, step_from, step_to):
"""
Analyze how long users take to move between funnel steps.
"""
# Find timestamp of each step per user
step_times = data[data['event_type'].isin([step_from, step_to])].pivot_table(
index='user_id',
columns='event_type',
values='timestamp',
aggfunc='min'
)
# Calculate time difference
step_times['time_to_convert'] = (
step_times[step_to] - step_times[step_from]
).dt.total_seconds() / 60 # Convert to minutes
# Distribution analysis
return {
'median_time_minutes': step_times['time_to_convert'].median(),
'mean_time_minutes': step_times['time_to_convert'].mean(),
'p90_time_minutes': step_times['time_to_convert'].quantile(0.9),
'abandonment_rate': step_times['time_to_convert'].isna().mean(),
}
Funnel Cohort Analysis
def funnel_cohort_analysis(data, cohort_by='acquisition_week'):
"""
Track how funnel conversion changes across cohorts.
"""
cohorts = data[cohort_by].unique()
results = []
for cohort in cohorts:
cohort_data = data[data[cohort_by] == cohort]
funnel = calculate_funnel_metrics(cohort_data)
results.append({
'cohort': cohort,
'overall_conversion': funnel.iloc[-1]['overall_conversion'],
'add_to_cart_rate': funnel[
funnel['step'] == 'add_to_cart'
]['overall_conversion'].values[0],
})
return pd.DataFrame(results).sort_values('cohort')
# Result: Conversion improving over time (product improvements working)
# But add_to_cart rate declining (product page issues?)
Common Funnel Analysis Mistakes
β οΈ
These mistakes lead to wrong conclusions and wasted engineering effort:
- Not defining a unique session β Users might have multiple sessions
- Ignoring time windows β Users might take days to convert
- Not accounting for returning users β Funnel might span multiple visits
- Confusing correlation with causation β Drop-off might be caused by something upstream
- Not segmenting β Average hides important variation
- Ignoring technical issues β Sometimes it's a bug, not user behavior