πŸŽ‰ 75% of content is free forever β€” Unlock Premium from $10/mo β†’
CW
Search courses…
πŸ’Ό Servicesℹ️ Aboutβœ‰οΈ ContactView Pricing Plansfrom $10

Funnel Analysis: Conversion Optimization, Drop-off Investigation

Data Scientist Role InterviewFunnel Analysis & Conversion Optimization⭐ Premium

Advertisement

πŸ”„

Asked at Amazon & Meta

Funnel Analysis

Conversion Optimization & Drop-off Investigation

The Interview Question

"Amazon's checkout conversion rate dropped 5% last week. Walk me through how you'd investigate and fix this."

Funnel analysis is a core data science skill β€” it's how you turn raw user behavior into actionable business insights.


Why Companies Ask This

ℹ️

Every business has a funnel: awareness β†’ consideration β†’ conversion β†’ retention. Amazon and Meta need data scientists who can diagnose where users drop off, why they drop off, and how to fix it.

Interviewers evaluate:

  1. Structured Analysis β€” Can you systematically investigate a funnel?
  2. SQL/Data Skills β€” Can you write queries to extract funnel data?
  3. Root Cause Thinking β€” Do you look beyond symptoms to causes?
  4. Action Orientation β€” Do you recommend specific fixes, not just observations?
  5. Business Understanding β€” Do you know what drives conversions?

The Funnel Analysis Framework

Step 1: Define the Funnel

Architecture Diagram
E-commerce Checkout Funnel:
1. Product View
2. Add to Cart
3. Begin Checkout
4. Enter Payment
5. Complete Purchase

Step 2: Measure Conversion Rates

import pandas as pd
import numpy as np

def calculate_funnel_metrics(sessions_df, events_df):
    """
    Calculate conversion rates at each funnel step.
    """
    # Define funnel steps in order
    funnel_steps = [
        'product_view',
        'add_to_cart', 
        'begin_checkout',
        'enter_payment',
        'complete_purchase'
    ]
    
    # For each session, find the furthest step reached
    events_df['step_index'] = events_df['event_type'].map(
        {step: i for i, step in enumerate(funnel_steps)}
    )
    
    max_step_per_session = events_df.groupby('session_id')['step_index'].max()
    
    # Count users reaching each step
    funnel_counts = {}
    for i, step in enumerate(funnel_steps):
        funnel_counts[step] = (max_step_per_session >= i).sum()
    
    # Calculate conversion rates
    total_users = funnel_counts[funnel_steps[0]]
    funnel_metrics = []
    
    for i, step in enumerate(funnel_steps):
        count = funnel_counts[step]
        step_conversion = count / total_users
        drop_from_previous = (
            1 - count / funnel_counts[funnel_steps[i-1]] 
            if i > 0 else 0
        )
        
        funnel_metrics.append({
            'step': step,
            'users': count,
            'overall_conversion': step_conversion,
            'step_conversion': 1 - drop_from_previous,
            'drop_off_rate': drop_from_previous,
        })
    
    return pd.DataFrame(funnel_metrics)

# Example output:
# step               users    overall_conv  step_conv  drop_off
# product_view       100,000  100.0%        100.0%     0.0%
# add_to_cart        45,000   45.0%         45.0%      55.0%
# begin_checkout     28,000   28.0%         62.2%      37.8%
# enter_payment      22,000   22.0%         78.6%      21.4%
# complete_purchase  18,000   18.0%         81.8%      18.2%

Step 3: Identify the Biggest Drop-off

def identify_drop_off_points(funnel_df):
    """
    Identify where the biggest drop-offs are occurring.
    """
    # Find steps with highest absolute and relative drop-off
    drop_offs = funnel_df[funnel_df['step'] != funnel_df['step'].iloc[0]]
    
    # Sort by drop-off rate
    worst_steps = drop_offs.sort_values('drop_off_rate', ascending=False)
    
    return {
        'biggest_drop_off': worst_steps.iloc[0]['step'],
        'drop_off_rate': worst_steps.iloc[0]['drop_off_rate'],
        'users_lost': (
            worst_steps.iloc[0]['users'] - 
            funnel_df.iloc[funnel_df[
                funnel_df['step'] == worst_steps.iloc[0]['step']
            ].index[0] - 1]['users']
        ),
    }

Example: Investigating Amazon Checkout Drop-off

Phase 1: Quantify the Problem

"The checkout conversion dropped from 4.2% to 3.99% β€” a 5% relative decline. This represents approximately 2,100 fewer completed purchases per day, or roughly $250K in lost daily revenue."

Phase 2: Segment the Drop-off

def segment_funnel_analysis(data, segment_by):
    """
    Analyze funnel by different segments to find where the drop is concentrated.
    """
    segments = data[segment_by].unique()
    results = []
    
    for segment in segments:
        segment_data = data[data[segment_by] == segment]
        funnel = calculate_funnel_metrics(segment_data)
        
        results.append({
            'segment': segment,
            'overall_conversion': funnel.iloc[-1]['overall_conversion'],
            'biggest_drop': funnel.sort_values(
                'drop_off_rate', ascending=False
            ).iloc[0]['step'],
        })
    
    return pd.DataFrame(results)

# Segment by device
device_analysis = segment_funnel_analysis(data, 'device_type')
# Result: Mobile conversion = 2.8%, Desktop = 5.1%, Tablet = 4.5%
# β†’ Mobile is significantly underperforming

# Segment by geography
geo_analysis = segment_funnel_analysis(data, 'country')
# Result: US = 4.1%, UK = 4.0%, Germany = 3.2%, India = 2.1%
# β†’ Germany and India are underperforming

# Segment by traffic source
source_analysis = segment_funnel_analysis(data, 'traffic_source')
# Result: Organic = 4.5%, Paid = 3.2%, Email = 5.1%, Social = 2.8%
# β†’ Paid and Social traffic converting poorly

Phase 3: Deep Dive into the Problem Segment

def deep_dive_mobile_checkout(data):
    """
    Investigate mobile checkout experience specifically.
    """
    mobile_data = data[data['device_type'] == 'mobile']
    
    # Check step-level conversion for mobile vs desktop
    mobile_funnel = calculate_funnel_metrics(mobile_data)
    desktop_funnel = calculate_funnel_metrics(
        data[data['device_type'] == 'desktop']
    )
    
    comparison = mobile_funnel.merge(
        desktop_funnel, 
        on='step', 
        suffixes=('_mobile', '_desktop')
    )
    
    comparison['conversion_gap'] = (
        comparison['step_conversion_mobile'] - 
        comparison['step_conversion_desktop']
    )
    
    # Find where mobile loses the most ground
    worst_gap = comparison.sort_values('conversion_gap').iloc[0]
    
    return {
        'worst_step': worst_gap['step'],
        'mobile_conversion': worst_gap['step_conversion_mobile'],
        'desktop_conversion': worst_gap['step_conversion_desktop'],
        'gap': worst_gap['conversion_gap'],
    }

# Result: Payment entry step has largest gap
# Mobile: 65% complete payment entry
# Desktop: 82% complete payment entry
# Gap: -17 percentage points

Phase 4: Investigate the Root Cause

def investigate_payment_entry(data):
    """
    Investigate why mobile users fail at payment entry.
    """
    mobile_checkout = data[
        (data['device_type'] == 'mobile') & 
        (data['reached_payment_entry'])
    ]
    
    # Check for technical issues
    technical_issues = {
        'page_load_time': mobile_checkout['payment_page_load_ms'].mean(),
        'error_rate': mobile_checkout['payment_error'].mean(),
        'timeout_rate': mobile_checkout['payment_timeout'].mean(),
        'form abandonment': mobile_checkout['form_abandoned'].mean(),
    }
    
    # Check for UX issues
    ux_issues = {
        'autofill_used': mobile_checkout['autofill_detected'].mean(),
        'keyboard_type_changes': mobile_checkout['keyboard_switches'].mean(),
        'field_revisits': mobile_checkout['field_revisit_count'].mean(),
    }
    
    # Check for payment method issues
    payment_issues = {
        'preferred_method_available': mobile_checkout[
            'preferred_payment_available'
        ].mean(),
        'payment_method_switches': mobile_checkout[
            'payment_method_changes'
        ].mean(),
    }
    
    return {
        'technical': technical_issues,
        'ux': ux_issues,
        'payment': payment_issues,
    }

# Results show:
# - Payment page load time: 4.2 seconds (desktop: 1.1 seconds)
# - Error rate: 12% (desktop: 3%)
# - Autofill used: 35% (desktop: 78%)
# β†’ Mobile payment page is slow, error-prone, and autofill doesn't work well

Advanced Funnel Analysis Techniques

Time-to-Convert Analysis

def time_to_convert_analysis(data, step_from, step_to):
    """
    Analyze how long users take to move between funnel steps.
    """
    # Find timestamp of each step per user
    step_times = data[data['event_type'].isin([step_from, step_to])].pivot_table(
        index='user_id',
        columns='event_type',
        values='timestamp',
        aggfunc='min'
    )
    
    # Calculate time difference
    step_times['time_to_convert'] = (
        step_times[step_to] - step_times[step_from]
    ).dt.total_seconds() / 60  # Convert to minutes
    
    # Distribution analysis
    return {
        'median_time_minutes': step_times['time_to_convert'].median(),
        'mean_time_minutes': step_times['time_to_convert'].mean(),
        'p90_time_minutes': step_times['time_to_convert'].quantile(0.9),
        'abandonment_rate': step_times['time_to_convert'].isna().mean(),
    }

Funnel Cohort Analysis

def funnel_cohort_analysis(data, cohort_by='acquisition_week'):
    """
    Track how funnel conversion changes across cohorts.
    """
    cohorts = data[cohort_by].unique()
    results = []
    
    for cohort in cohorts:
        cohort_data = data[data[cohort_by] == cohort]
        funnel = calculate_funnel_metrics(cohort_data)
        
        results.append({
            'cohort': cohort,
            'overall_conversion': funnel.iloc[-1]['overall_conversion'],
            'add_to_cart_rate': funnel[
                funnel['step'] == 'add_to_cart'
            ]['overall_conversion'].values[0],
        })
    
    return pd.DataFrame(results).sort_values('cohort')

# Result: Conversion improving over time (product improvements working)
# But add_to_cart rate declining (product page issues?)

Common Funnel Analysis Mistakes

⚠️

These mistakes lead to wrong conclusions and wasted engineering effort:

  1. Not defining a unique session β€” Users might have multiple sessions
  2. Ignoring time windows β€” Users might take days to convert
  3. Not accounting for returning users β€” Funnel might span multiple visits
  4. Confusing correlation with causation β€” Drop-off might be caused by something upstream
  5. Not segmenting β€” Average hides important variation
  6. Ignoring technical issues β€” Sometimes it's a bug, not user behavior

Quiz: Test Your Understanding


Related Topics

Advertisement