🔄

Asked at Amazon & Meta

Funnel Analysis

Conversion Optimization & Drop-off Investigation

The Interview Question

"Amazon's checkout conversion rate dropped 5% last week. Walk me through how you'd investigate and fix this."

Funnel analysis is a core data science skill — it's how you turn raw user behavior into actionable business insights.

Why Companies Ask This

ℹ️

Every business has a funnel: awareness → consideration → conversion → retention. Amazon and Meta need data scientists who can diagnose where users drop off, why they drop off, and how to fix it.

Interviewers evaluate:

Structured Analysis — Can you systematically investigate a funnel?
SQL/Data Skills — Can you write queries to extract funnel data?
Root Cause Thinking — Do you look beyond symptoms to causes?
Action Orientation — Do you recommend specific fixes, not just observations?
Business Understanding — Do you know what drives conversions?

The Funnel Analysis Framework

Step 1: Define the Funnel

Architecture Diagram

E-commerce Checkout Funnel:
1. Product View
2. Add to Cart
3. Begin Checkout
4. Enter Payment
5. Complete Purchase

Step 2: Measure Conversion Rates

import pandas as pd
import numpy as np

def calculate_funnel_metrics(sessions_df, events_df):
    """
    Calculate conversion rates at each funnel step.
    """
    # Define funnel steps in order
    funnel_steps = [
        'product_view',
        'add_to_cart', 
        'begin_checkout',
        'enter_payment',
        'complete_purchase'
    ]
    
    # For each session, find the furthest step reached
    events_df['step_index'] = events_df['event_type'].map(
        {step: i for i, step in enumerate(funnel_steps)}
    )
    
    max_step_per_session = events_df.groupby('session_id')['step_index'].max()
    
    # Count users reaching each step
    funnel_counts = {}
    for i, step in enumerate(funnel_steps):
        funnel_counts[step] = (max_step_per_session >= i).sum()
    
    # Calculate conversion rates
    total_users = funnel_counts[funnel_steps[0]]
    funnel_metrics = []
    
    for i, step in enumerate(funnel_steps):
        count = funnel_counts[step]
        step_conversion = count / total_users
        drop_from_previous = (
            1 - count / funnel_counts[funnel_steps[i-1]] 
            if i > 0 else 0
        )
        
        funnel_metrics.append({
            'step': step,
            'users': count,
            'overall_conversion': step_conversion,
            'step_conversion': 1 - drop_from_previous,
            'drop_off_rate': drop_from_previous,
        })
    
    return pd.DataFrame(funnel_metrics)

# Example output:
# step               users    overall_conv  step_conv  drop_off
# product_view       100,000  100.0%        100.0%     0.0%
# add_to_cart        45,000   45.0%         45.0%      55.0%
# begin_checkout     28,000   28.0%         62.2%      37.8%
# enter_payment      22,000   22.0%         78.6%      21.4%
# complete_purchase  18,000   18.0%         81.8%      18.2%

Step 3: Identify the Biggest Drop-off

def identify_drop_off_points(funnel_df):
    """
    Identify where the biggest drop-offs are occurring.
    """
    # Find steps with highest absolute and relative drop-off
    drop_offs = funnel_df[funnel_df['step'] != funnel_df['step'].iloc[0]]
    
    # Sort by drop-off rate
    worst_steps = drop_offs.sort_values('drop_off_rate', ascending=False)
    
    return {
        'biggest_drop_off': worst_steps.iloc[0]['step'],
        'drop_off_rate': worst_steps.iloc[0]['drop_off_rate'],
        'users_lost': (
            worst_steps.iloc[0]['users'] - 
            funnel_df.iloc[funnel_df[
                funnel_df['step'] == worst_steps.iloc[0]['step']
            ].index[0] - 1]['users']
        ),
    }

Example: Investigating Amazon Checkout Drop-off

Phase 1: Quantify the Problem

"The checkout conversion dropped from 4.2% to 3.99% — a 5% relative decline. This represents approximately 2,100 fewer completed purchases per day, or roughly $250K in lost daily revenue."

Phase 2: Segment the Drop-off

def segment_funnel_analysis(data, segment_by):
    """
    Analyze funnel by different segments to find where the drop is concentrated.
    """
    segments = data[segment_by].unique()
    results = []
    
    for segment in segments:
        segment_data = data[data[segment_by] == segment]
        funnel = calculate_funnel_metrics(segment_data)
        
        results.append({
            'segment': segment,
            'overall_conversion': funnel.iloc[-1]['overall_conversion'],
            'biggest_drop': funnel.sort_values(
                'drop_off_rate', ascending=False
            ).iloc[0]['step'],
        })
    
    return pd.DataFrame(results)

# Segment by device
device_analysis = segment_funnel_analysis(data, 'device_type')
# Result: Mobile conversion = 2.8%, Desktop = 5.1%, Tablet = 4.5%
# → Mobile is significantly underperforming

# Segment by geography
geo_analysis = segment_funnel_analysis(data, 'country')
# Result: US = 4.1%, UK = 4.0%, Germany = 3.2%, India = 2.1%
# → Germany and India are underperforming

# Segment by traffic source
source_analysis = segment_funnel_analysis(data, 'traffic_source')
# Result: Organic = 4.5%, Paid = 3.2%, Email = 5.1%, Social = 2.8%
# → Paid and Social traffic converting poorly

Phase 3: Deep Dive into the Problem Segment

def deep_dive_mobile_checkout(data):
    """
    Investigate mobile checkout experience specifically.
    """
    mobile_data = data[data['device_type'] == 'mobile']
    
    # Check step-level conversion for mobile vs desktop
    mobile_funnel = calculate_funnel_metrics(mobile_data)
    desktop_funnel = calculate_funnel_metrics(
        data[data['device_type'] == 'desktop']
    )
    
    comparison = mobile_funnel.merge(
        desktop_funnel, 
        on='step', 
        suffixes=('_mobile', '_desktop')
    )
    
    comparison['conversion_gap'] = (
        comparison['step_conversion_mobile'] - 
        comparison['step_conversion_desktop']
    )
    
    # Find where mobile loses the most ground
    worst_gap = comparison.sort_values('conversion_gap').iloc[0]
    
    return {
        'worst_step': worst_gap['step'],
        'mobile_conversion': worst_gap['step_conversion_mobile'],
        'desktop_conversion': worst_gap['step_conversion_desktop'],
        'gap': worst_gap['conversion_gap'],
    }

# Result: Payment entry step has largest gap
# Mobile: 65% complete payment entry
# Desktop: 82% complete payment entry
# Gap: -17 percentage points

Phase 4: Investigate the Root Cause

def investigate_payment_entry(data):
    """
    Investigate why mobile users fail at payment entry.
    """
    mobile_checkout = data[
        (data['device_type'] == 'mobile') & 
        (data['reached_payment_entry'])
    ]
    
    # Check for technical issues
    technical_issues = {
        'page_load_time': mobile_checkout['payment_page_load_ms'].mean(),
        'error_rate': mobile_checkout['payment_error'].mean(),
        'timeout_rate': mobile_checkout['payment_timeout'].mean(),
        'form abandonment': mobile_checkout['form_abandoned'].mean(),
    }
    
    # Check for UX issues
    ux_issues = {
        'autofill_used': mobile_checkout['autofill_detected'].mean(),
        'keyboard_type_changes': mobile_checkout['keyboard_switches'].mean(),
        'field_revisits': mobile_checkout['field_revisit_count'].mean(),
    }
    
    # Check for payment method issues
    payment_issues = {
        'preferred_method_available': mobile_checkout[
            'preferred_payment_available'
        ].mean(),
        'payment_method_switches': mobile_checkout[
            'payment_method_changes'
        ].mean(),
    }
    
    return {
        'technical': technical_issues,
        'ux': ux_issues,
        'payment': payment_issues,
    }

# Results show:
# - Payment page load time: 4.2 seconds (desktop: 1.1 seconds)
# - Error rate: 12% (desktop: 3%)
# - Autofill used: 35% (desktop: 78%)
# → Mobile payment page is slow, error-prone, and autofill doesn't work well

Advanced Funnel Analysis Techniques

Time-to-Convert Analysis

def time_to_convert_analysis(data, step_from, step_to):
    """
    Analyze how long users take to move between funnel steps.
    """
    # Find timestamp of each step per user
    step_times = data[data['event_type'].isin([step_from, step_to])].pivot_table(
        index='user_id',
        columns='event_type',
        values='timestamp',
        aggfunc='min'
    )
    
    # Calculate time difference
    step_times['time_to_convert'] = (
        step_times[step_to] - step_times[step_from]
    ).dt.total_seconds() / 60  # Convert to minutes
    
    # Distribution analysis
    return {
        'median_time_minutes': step_times['time_to_convert'].median(),
        'mean_time_minutes': step_times['time_to_convert'].mean(),
        'p90_time_minutes': step_times['time_to_convert'].quantile(0.9),
        'abandonment_rate': step_times['time_to_convert'].isna().mean(),
    }

Funnel Cohort Analysis

def funnel_cohort_analysis(data, cohort_by='acquisition_week'):
    """
    Track how funnel conversion changes across cohorts.
    """
    cohorts = data[cohort_by].unique()
    results = []
    
    for cohort in cohorts:
        cohort_data = data[data[cohort_by] == cohort]
        funnel = calculate_funnel_metrics(cohort_data)
        
        results.append({
            'cohort': cohort,
            'overall_conversion': funnel.iloc[-1]['overall_conversion'],
            'add_to_cart_rate': funnel[
                funnel['step'] == 'add_to_cart'
            ]['overall_conversion'].values[0],
        })
    
    return pd.DataFrame(results).sort_values('cohort')

# Result: Conversion improving over time (product improvements working)
# But add_to_cart rate declining (product page issues?)

Common Funnel Analysis Mistakes

⚠️

These mistakes lead to wrong conclusions and wasted engineering effort:

Not defining a unique session — Users might have multiple sessions
Ignoring time windows — Users might take days to convert
Not accounting for returning users — Funnel might span multiple visits
Confusing correlation with causation — Drop-off might be caused by something upstream
Not segmenting — Average hides important variation
Ignoring technical issues — Sometimes it's a bug, not user behavior

Funnel Analysis: Conversion Optimization, Drop-off Investigation