DS Case Study Prep

The Case Study Interview

Data science case studies test your ability to think through messy, real-world problems. Unlike coding interviews, there's no single correct answer — interviewers want to see your reasoning process.

Architecture Diagram

┌──────────────────────────────────────────────────────────────────┐
│                 What Interviewers Evaluate                         │
│                                                                   │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐          │
│  │  Structure   │  │  Technical   │  │ Communication│          │
│  │              │  │   Depth      │  │              │          │
│  │ Can you      │  │ Do you know  │  │ Can you      │          │
│  │ break down   │  │ the right    │  │ explain      │          │
│  │ a problem?   │  │ tools?       │  │ clearly?     │          │
│  └──────────────┘  └──────────────┘  └──────────────┘          │
│         40%              30%              30%                    │
└──────────────────────────────────────────────────────────────────┘

Case Study Format

The Standard Framework

DfCase Study Response Structure

A structured approach to answering data science case study questions.

Architecture Diagram

┌──────────────────────────────────────────────────────────────┐
│               Case Study Response Structure                    │
│                                                               │
│  1. CLARIFY (2-3 min)                                        │
│     └─ Ask questions to understand the problem               │
│                                                               │
│  2. STRUCTURE (3-5 min)                                      │
│     └─ Break into components, choose approach                │
│                                                               │
│  3. DIVE DEEP (15-20 min)                                    │
│     └─ Work through the analysis                            │
│                                                               │
│  4. SYNTHESIZE (3-5 min)                                     │
│     └─ Summarize findings, recommend action                  │
│                                                               │
│  Total: 25-30 minutes                                        │
└──────────────────────────────────────────────────────────────┘

Clarifying Questions to Always Ask

clarifying_questions = {
    "Problem Definition": [
        "What business problem are we trying to solve?",
        "Who is the stakeholder and what decisions will this inform?",
        "What does success look like? How will we measure it?",
        "Is this a one-time analysis or an ongoing system?"
    ],
    "Data Understanding": [
        "What data do we have access to?",
        "How is the data collected? What's the data generation process?",
        "What's the time range and granularity?",
        "Are there known data quality issues?",
        "What are the key entities and relationships?"
    ],
    "Constraints": [
        "What's the timeline for this project?",
        "Are there engineering resources for deployment?",
        "What's the current process and why does it need to change?",
        "Are there regulatory or ethical constraints?"
    ],
    "Scope": [
        "What's the minimum viable solution?",
        "Are there existing baselines or benchmarks?",
        "What has been tried before and what happened?"
    ]
}

Product Metrics

The Metric Framework

DfProduct Metrics Hierarchy

A hierarchical approach to defining product metrics, starting with a north star metric and breaking down into input, process, and output metrics.

Architecture Diagram

┌──────────────────────────────────────────────────────────────┐
│                  Product Metrics Hierarchy                     │
│                                                               │
│  ┌──────────────────────────────────────────────────────┐   │
│  │              NORTH STAR METRIC                        │   │
│  │         (Single metric that captures value)          │   │
│  │              e.g., Weekly Active Buyers               │   │
│  └────────────────────────┬─────────────────────────────┘   │
│                           │                                   │
│          ┌────────────────┼────────────────┐                 │
│          │                │                │                 │
│    ┌─────▼─────┐   ┌─────▼─────┐   ┌─────▼─────┐          │
│    │  Input     │   │ Process   │   │  Output   │          │
│    │  Metrics   │   │ Metrics   │   │  Metrics  │          │
│    │            │   │           │   │           │          │
│    │ # of new   │   │ Conversion│   │ Revenue   │          │
│    │ signups    │   │ rate      │   │ per user  │          │
│    └───────────┘   └───────────┘   └───────────┘          │
└──────────────────────────────────────────────────────────────┘

Common Product Metrics Definitions

# Key metrics every data scientist should know

METRICS = {
    # Acquisition
    "CAC": "Customer Acquisition Cost = Marketing Spend / New Customers",

    # Activation
    "Activation Rate": "% of signups completing key action within X days",
    "Time to Value": "Time from signup to first 'aha moment'",

    # Engagement
    "DAU/MAU": "Daily Active Users / Monthly Active Users = Stickiness",
    "Session Duration": "Average time spent per session",
    "Feature Adoption": "% of users using a specific feature",

    # Revenue
    "ARPU": "Average Revenue Per User = Total Revenue / Active Users",
    "LTV": "Customer Lifetime Value = ARPU × Average Lifespan",
    "LTV/CAC": "Target: >3 for healthy unit economics",

    # Retention
    "Retention Rate": "% of users returning after N days",
    "Churn Rate": "% of users leaving per period",
    "Net Revenue Retention": "Expansion + Renewals - Churn - Contraction",

    # Satisfaction
    "NPS": "Net Promoter Score = % Promoters - % Detractors",
    "CSAT": "Customer Satisfaction Score"
}

Retention Curve Analysis

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit

def retention_curve_analysis(retention_data: dict) -> dict:
    """
    Analyze retention curves and identify key patterns.

    Parameters:
        retention_data: {day: retention_rate} e.g., {1: 1.0, 7: 0.45, 30: 0.25}
    """
    days = np.array(list(retention_data.keys()))
    rates = np.array(list(retention_data.values()))

    # Fit exponential decay: R(t) = a * exp(-b*t) + c
    def exp_decay(t, a, b, c):
        return a * np.exp(-b * t) + c

    try:
        popt, pcov = curve_fit(exp_decay, days, rates, p0=[0.8, 0.1, 0.1],
                                maxfev=5000)
        a, b, c = popt

        # Steady-state retention (asymptote)
        steady_state = c

        # Half-life (days to lose half of remaining users)
        half_life = np.log(2) / b if b > 0 else float('inf')

        # Critical period (when curve flattens)
        # Find where derivative < threshold
        derivatives = np.gradient(exp_decay(days, *popt), days)
        flattening_day = days[np.argmin(np.abs(derivatives + 0.01))]
    except Exception:
        steady_state = rates[-1]
        half_life = None
        flattening_day = None

    return {
        "steady_state_retention": steady_state,
        "half_life_days": half_life,
        "flattening_day": flattening_day,
        "d1_retention": retention_data.get(1, None),
        "d7_retention": retention_data.get(7, None),
        "d30_retention": retention_data.get(30, None),
        "is_healthy": steady_state > 0.15 if steady_state else False
    }

# Example usage
retention = {
    1: 1.00, 2: 0.65, 3: 0.52, 7: 0.38,
    14: 0.28, 30: 0.22, 60: 0.18, 90: 0.16
}

analysis = retention_curve_analysis(retention)
print(f"Steady-state retention: {analysis['steady_state_retention']:.1%}")
print(f"Healthy: {analysis['is_healthy']}")

Experimentation Questions

A/B Test Design Framework

class ABTestDesign:
    def __init__(self, baseline_rate: float, mde: float, alpha: float = 0.05,
                 power: float = 0.80):
        """
        Design an A/B test.

        Args:
            baseline_rate: Current conversion rate (e.g., 0.05 for 5%)
            mde: Minimum detectable effect (e.g., 0.1 for 10% relative lift)
            alpha: Significance level (default 0.05)
            power: Statistical power (default 0.80)
        """
        self.baseline = baseline_rate
        self.mde = mde
        self.alpha = alpha
        self.power = power

    def calculate_sample_size(self) -> int:
        """Calculate required sample size per variant."""
        from scipy import stats

        # Effect size (Cohen's h for proportions)
        p1 = self.baseline
        p2 = self.baseline * (1 + self.mde)

        h = 2 * (np.arcsin(np.sqrt(p2)) - np.arcsin(np.sqrt(p1)))

        # Z-scores
        z_alpha = stats.norm.ppf(1 - self.alpha / 2)
        z_beta = stats.norm.ppf(self.power)

        # Sample size formula
        n = ((z_alpha + z_beta) / h) ** 2 * 2

        return int(np.ceil(n))

    def calculate_duration(self, daily_traffic: int) -> int:
        """Calculate test duration in days."""
        sample_per_variant = self.calculate_sample_size()
        total_sample = sample_per_variant * 2

        days = total_sample / daily_traffic
        return int(np.ceil(days))

    def analyze_results(self, control_n: int, control_conv: int,
                        treatment_n: int, treatment_conv: int) -> dict:
        """Analyze A/B test results."""
        from scipy import stats

        p_control = control_conv / control_n
        p_treatment = treatment_conv / treatment_n

        # Pooled proportion
        p_pooled = (control_conv + treatment_conv) / (control_n + treatment_n)

        # Standard error
        se = np.sqrt(p_pooled * (1 - p_pooled) * (1/control_n + 1/treatment_n))

        # Z-test
        z_stat = (p_treatment - p_control) / se
        p_value = 2 * (1 - stats.norm.cdf(abs(z_stat)))

        # Confidence interval for difference
        se_diff = np.sqrt(p_control*(1-p_control)/control_n +
                          p_treatment*(1-p_treatment)/treatment_n)
        ci_lower = (p_treatment - p_control) - 1.96 * se_diff
        ci_upper = (p_treatment - p_control) + 1.96 * se_diff

        # Relative lift
        relative_lift = (p_treatment - p_control) / p_control * 100

        return {
            "control_rate": p_control,
            "treatment_rate": p_treatment,
            "absolute_difference": p_treatment - p_control,
            "relative_lift_pct": relative_lift,
            "p_value": p_value,
            "significant": p_value < self.alpha,
            "ci_95": (ci_lower, ci_upper),
            "recommendation": "Ship treatment" if p_value < self.alpha
                             and p_treatment > p_control else "No change"
        }

# Example: Design a test
design = ABTestDesign(baseline_rate=0.05, mde=0.10)
print(f"Required sample size: {design.calculate_sample_size():,} per variant")
print(f"At 10K daily users: {design.calculate_duration(10000)} days")

Common Experimentation Pitfalls

⚠️ Common Experimentation Pitfalls

Be aware of these common pitfalls when designing and analyzing experiments.

pitfalls = {
    "Peeking Problem": {
        "issue": "Checking results before reaching sample size inflates false positives",
        "solution": "Use sequential testing or pre-commit to a sample size"
    },
    "Multiple Comparisons": {
        "issue": "Testing many variants increases chance of false positive",
        "solution": "Apply Bonferroni correction or use False Discovery Rate"
    },
    "Novelty Effect": {
        "issue": "New feature gets temporary boost, then fades",
        "solution": "Run test for 2+ weeks, exclude new users initially"
    },
    "Simpson's Paradox": {
        "issue": "Overall trend reverses when segmented",
        "solution": "Always check results by key segments (platform, geography)"
    },
    "Network Effects": {
        "issue": "Control group affected by treatment users",
        "solution": "Use cluster randomization (by user group, not individual)"
    }
}

Problem-Solving Frameworks

Framework 1: Metric Definition

DfMetric Definition Framework

A structured approach to defining product metrics for any feature or product.

Architecture Diagram

"How would you measure success for [feature/product]?"

Step 1: Clarify the product and its goals
Step 2: Identify the user journey (acquire → activate → retain → revenue)
Step 3: Define metrics at each stage
Step 4: Pick ONE north star metric
Step 5: Identify guardrail metrics (things that shouldn't get worse)

def metric_definition_framework(product: str) -> dict:
    """
    Framework for defining product metrics.
    """
    return {
        "product": product,
        "user_journey": {
            "acquisition": {
                "metrics": ["signups", "downloads", "traffic"],
                "example": "Daily new signups"
            },
            "activation": {
                "metrics": ["first_action", "setup_complete", "aha_moment"],
                "example": "% completing onboarding within 24h"
            },
            "engagement": {
                "metrics": ["DAU/MAU", "session_length", "features_used"],
                "example": "Weekly sessions per user"
            },
            "retention": {
                "metrics": ["D1/D7/D30 retention", "churn rate"],
                "example": "D30 retention rate"
            },
            "revenue": {
                "metrics": ["conversion", "ARPU", "LTV"],
                "example": "Revenue per active user"
            }
        },
        "north_star": "Pick the metric that best captures user value delivery",
        "guardrails": "Metrics that must not degrade (e.g., latency, errors)"
    }

Framework 2: Metric Drop Investigation

DfMetric Drop Investigation Framework

A structured approach to investigating why a metric dropped.

Architecture Diagram

"Metric X dropped by Y%. How would you investigate?"

Step 1: Validate the drop (is it real or a data artifact?)
Step 2: Segment the drop (by platform, geography, user type, time)
Step 3: Identify the most affected segment
Step 4: Generate hypotheses for that segment
Step 5: Prioritize hypotheses by likelihood and impact
Step 6: Design analysis to test top hypothesis
Step 7: Recommend action

def metric_drop_investigation(metric_name: str, drop_pct: float) -> dict:
    """
    Structured investigation plan for a metric drop.
    """
    return {
        "step_1_validate": {
            "questions": [
                f"Is the {metric_name} drop statistically significant?",
                "Is this within normal variance?",
                "Did the metric definition change?",
                "Is there a data pipeline issue?"
            ],
            "tools": ["Check data freshness", "Run significance test",
                      "Compare to historical variance"]
        },
        "step_2_segment": {
            "dimensions": [
                "Platform (iOS vs Android vs Web)",
                "Geography (country, region)",
                "User type (new vs returning)",
                "Time of day / day of week",
                "App version / release"
            ],
            "goal": "Find the segment with the largest drop"
        },
        "step_3_hypothesize": {
            "categories": {
                "Technical": ["Bug in latest release", "Performance degradation",
                             "Data pipeline issue"],
                "External": ["Seasonality", "Competitor action",
                            "Market event"],
                "Product": ["Recent feature change", "UI change",
                           "Pricing change"]
            }
        },
        "step_4_analyze": {
            "approach": "Compare affected segment vs unaffected segment",
            "tools": ["Funnel analysis", "Cohort analysis",
                     "Before/after comparison"]
        }
    }

Framework 3: Design a Prediction System

DfPrediction System Framework

A structured approach to designing a prediction system for any problem.

Architecture Diagram

"Design a system to predict [X]"

Step 1: Define the problem (what, why, for whom)
Step 2: Define the prediction target and time horizon
Step 3: Identify features and data sources
Step 4: Choose evaluation metrics
Step 5: Baseline model (simple heuristics)
Step 6: Production requirements (latency, throughput, freshness)
Step 7: Monitoring and iteration plan

def prediction_system_framework(problem: str) -> dict:
    return {
        "problem_definition": {
            "what": problem,
            "why": "Business value and decision it enables",
            "who": "Stakeholders who will use it"
        },
        "prediction_target": {
            "definition": "Precise, measurable, time-bounded",
            "examples": {
                "churn": "Will this user be inactive in 30 days?",
                "conversion": "Will this user purchase within 7 days?",
                "fraud": "Is this transaction fraudulent?"
            }
        },
        "features": {
            "behavioral": ["click patterns", "session data", "usage frequency"],
            "transactional": ["purchase history", "payment method", "AOV"],
            "contextual": ["device", "location", "time of day"],
            "derived": ["trends", "ratios", "aggregations"]
        },
        "evaluation": {
            "classification": ["AUC-ROC", "Precision@K", "Recall@K",
                               "F1", "Business-specific metric"],
            "regression": ["RMSE", "MAE", "MAPE", "R²"]
        },
        "production": {
            "latency": "Real-time vs batch",
            "throughput": "Requests per second",
            "freshness": "How often to retrain",
            "monitoring": "Drift detection, performance tracking"
        }
    }

Framework 4: Experimentation Design

DfExperimentation Design Framework

A structured approach to designing experiments.

def experimentation_framework(hypothesis: str) -> dict:
    return {
        "hypothesis": hypothesis,
        "components": {
            "control": "Current experience",
            "treatment": "New experience to test",
            "unit_of_randomization": "User level vs session level",
            "primary_metric": "Main success metric",
            "guardrail_metrics": "Metrics that must not degrade"
        },
        "design": {
            "sample_size": "Calculate using power analysis",
            "duration": "Minimum 1-2 weeks for weekly patterns",
            "segments": "Pre-specify key segments to analyze",
            "novelty_effect": "Run long enough or exclude new users"
        },
        "analysis": {
            "method": "Two-proportion z-test or t-test",
            "significance": "p < 0.05 (or adjusted for multiple tests)",
            "practical_significance": "Is the effect size meaningful?",
            "segments": "Check for heterogeneous treatment effects"
        }
    }

Practice Problems

Problem 1: Metric Definition

Architecture Diagram

Question: You're the data scientist for a music streaming app.
How would you measure the success of a new "Discover Weekly" playlist feature?

Your answer should cover:
1. Define the user journey
2. Suggest metrics at each stage
3. Pick a north star metric
4. Identify guardrail metrics

Problem 2: Metric Drop

Architecture Diagram

Question: Daily active users dropped 8% week-over-week.
Walk through your investigation.

Your answer should cover:
1. Validation steps
2. Segmentation strategy
3. Hypothesis generation
4. Analysis plan
5. Communication of findings

Problem 3: Experimentation

Architecture Diagram

Question: You want to test a new checkout flow that simplifies
the purchase process from 5 steps to 3 steps.

Your answer should cover:
1. Hypothesis and prediction
2. Primary metric selection
3. Sample size calculation approach
4. Test duration
5. Potential pitfalls to watch for

Problem 4: Prediction System

Architecture Diagram

Question: Design a system to predict which users will churn
in the next 30 days.

Your answer should cover:
1. Problem definition and business value
2. Target variable definition
3. Feature ideas (at least 5)
4. Evaluation metrics
5. Baseline approach
6. Production requirements

Problem 5: Trade-off Analysis

Architecture Diagram

Question: Your model has 85% precision and 70% recall.
The business wants to reduce false positives (wrongly flagging
good users). How do you approach this trade-off?

Your answer should cover:
1. Explain precision-recall trade-off
2. How to adjust the threshold
3. Impact on business metrics
4. How to find the optimal balance

# Solution to Problem 5: Threshold optimization
import numpy as np
from sklearn.metrics import precision_recall_curve

def optimize_threshold(y_true, y_scores, cost_fp=10, cost_fn=50):
    """
    Find optimal classification threshold based on business costs.

    Args:
        y_true: True labels
        y_scores: Predicted probabilities
        cost_fp: Cost of false positive (e.g., $10 discount given wrongly)
        cost_fn: Cost of false negative (e.g., $50 lost from churn)
    """
    precisions, recalls, thresholds = precision_recall_curve(y_true, y_scores)

    # Calculate total cost for each threshold
    costs = []
    for i, threshold in enumerate(thresholds):
        predictions = (y_scores >= threshold).astype(int)

        tp = np.sum((predictions == 1) & (y_true == 1))
        fp = np.sum((predictions == 1) & (y_true == 0))
        fn = np.sum((predictions == 0) & (y_true == 1))

        total_cost = (fp * cost_fp) + (fn * cost_fn)
        costs.append(total_cost)

    optimal_idx = np.argmin(costs)
    optimal_threshold = thresholds[optimal_idx]

    return {
        "optimal_threshold": optimal_threshold,
        "precision_at_optimal": precisions[optimal_idx],
        "recall_at_optimal": recalls[optimal_idx],
        "min_cost": costs[optimal_idx]
    }

Key Takeaways

📋Summary: DS Case Study Prep

Always structure your answer: Clarify -> Structure -> Dive Deep -> Synthesize — this demonstrates maturity and ensures completeness
Ask clarifying questions -- it shows maturity and avoids wasted effort on the wrong problem
Know your product metrics cold: LTV, CAC, retention, churn, activation -- these are the language of product teams
For experimentation, always think about sample size, duration, and pitfalls -- most A/B test failures are methodological, not statistical
Communicate trade-offs explicitly -- every decision has a cost, and interviewers want to see you reason about them
Start with a simple baseline before proposing complex solutions -- this shows pragmatic engineering judgment
Tie everything back to business impact and actionable recommendations -- the best analysis is useless if it doesn't drive action

Practice Exercises

Mock interview: Practice with a friend using problems above (25 min limit)
Metric journal: For every app you use, write down what metrics they likely track
Read case studies: Study published case studies from tech companies
Build a framework doc: Create your own cheat sheet for common case types
Time yourself: Practice staying within the 25-30 minute window

The Case Study Interview

Case Study Format

The Standard Framework

DfCase Study Response Structure

Clarifying Questions to Always Ask

Product Metrics

The Metric Framework

DfProduct Metrics Hierarchy

Common Product Metrics Definitions

Retention Curve Analysis

Experimentation Questions

A/B Test Design Framework

Common Experimentation Pitfalls

Problem-Solving Frameworks

Framework 1: Metric Definition

DfMetric Definition Framework

Framework 2: Metric Drop Investigation

DfMetric Drop Investigation Framework

Framework 3: Design a Prediction System

DfPrediction System Framework

Framework 4: Experimentation Design

DfExperimentation Design Framework

Practice Problems

Problem 1: Metric Definition

Problem 2: Metric Drop

Problem 3: Experimentation

Problem 4: Prediction System

Problem 5: Trade-off Analysis

Key Takeaways

📋Summary: DS Case Study Prep

Practice Exercises

Need Expert Data Science Help?