⚡ Performance Tuning

Microsoft & Netflix Interview Deep Dive

🏢 Microsoft🏢 Netflix⚡ Difficulty: Hard⏱️ 45 min

📋 Interview Question

ℹ️🔴 Microsoft/Netflix Interview Question

"A production query is running slow. Walk through your systematic approach to: 1) Identify the bottleneck, 2) Analyze the query plan, 3) Optimize without changing business logic, 4) Verify improvements. What metrics would you track?"

Companies: Microsoft, Netflix | Difficulty: Hard | Time: 45 minutes

📊 Setup: Large Dataset

-- Create performance test tables
CREATE TABLE orders_large (
    order_id BIGSERIAL PRIMARY KEY,
    customer_id INT NOT NULL,
    product_id INT NOT NULL,
    order_date TIMESTAMP NOT NULL,
    quantity INT DEFAULT 1,
    unit_price DECIMAL(10, 2),
    total_amount DECIMAL(12, 2),
    status VARCHAR(20),
    INDEX idx_customer (customer_id),
    INDEX idx_date (order_date),
    INDEX idx_status (status)
);

CREATE TABLE customers_large (
    customer_id SERIAL PRIMARY KEY,
    name VARCHAR(200),
    email VARCHAR(200),
    city VARCHAR(100),
    segment VARCHAR(50),
    INDEX idx_city (city),
    INDEX idx_segment (segment)
);

CREATE TABLE products_large (
    product_id SERIAL PRIMARY KEY,
    name VARCHAR(200),
    category VARCHAR(100),
    price DECIMAL(10, 2),
    INDEX idx_category (category)
);

-- Insert large dataset
INSERT INTO orders_large (customer_id, product_id, order_date, quantity, unit_price, total_amount, status)
SELECT
    (random() * 100000)::INT + 1,
    (random() * 50000)::INT + 1,
    '2020-01-01'::timestamp + (random() * 1825 || ' days')::interval,
    (random() * 10)::INT + 1,
    (random() * 1000)::decimal(10,2),
    (random() * 10000)::decimal(12,2),
    CASE (random() * 4)::INT
        WHEN 0 THEN 'pending'
        WHEN 1 THEN 'shipped'
        WHEN 2 THEN 'delivered'
        WHEN 3 THEN 'cancelled'
        ELSE 'refunded'
    END
FROM generate_series(1, 50000000);

🔍 Part 1: Systematic Debugging Approach

ℹ️🔍 Performance Tuning Steps

Reproduce: Get the exact slow query
Measure: Establish baseline metrics
Analyze: Read EXPLAIN ANALYZE output
Identify: Find the most expensive operation
Hypothesize: Form theories about root cause
Test: Apply one change at a time
Verify: Confirm improvement with metrics
Document: Record findings and changes

Step 1: Capture Slow Query

-- Enable query logging for slow queries
ALTER SYSTEM SET log_min_duration_statement = 1000;  -- Log queries > 1 second
SELECT pg_reload_conf();

-- Find slow queries in pg_stat_statements
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

SELECT
    query,
    calls,
    total_exec_time,
    mean_exec_time,
    rows
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 10;

Step 2: Baseline Metrics

-- Get baseline with EXPLAIN ANALYZE
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT
    c.customer_id,
    c.name,
    COUNT(o.order_id) AS order_count,
    SUM(o.total_amount) AS total_spent
FROM customers_large c
JOIN orders_large o ON c.customer_id = o.customer_id
WHERE o.order_date >= '2024-01-01'
AND o.status = 'completed'
GROUP BY c.customer_id, c.name
HAVING SUM(o.total_amount) > 10000
ORDER BY total_spent DESC
LIMIT 100;

-- Record key metrics:
-- Planning Time: X ms
-- Execution Time: X ms
-- Rows: X
-- Shared Hit: X
-- Shared Read: X

📈 Part 2: Analyzing Query Plans

Understanding Key Operations

⚠️⚠️ Expensive Operations

Operation	Cost	Alternative
Seq Scan	High on large tables	Add index
Nested Loop	High with large datasets	Use Hash/Merge Join
Sort (external merge)	Disk I/O	Increase work_mem
HashAggregate (high memory)	Memory pressure	Increase work_mem
Materialize	Temporary storage	Simplify query

-- Analyze specific bottlenecks
EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON)
SELECT * FROM orders_large
WHERE customer_id = 12345
AND order_date >= '2024-01-01'
AND status = 'shipped';

-- Look for:
-- 1. Seq Scan on large tables → Need index
-- 2. High "Rows Removed by Filter" → Missing index on filter column
-- 3. Sort Method: external merge → Increase work_mem
-- 4. Hash Batches > 1 → Increase work_mem

🔧 Part 3: Common Optimizations

Optimization 1: Add Missing Index

-- Before: Seq Scan
EXPLAIN ANALYZE
SELECT * FROM orders_large
WHERE customer_id = 12345
AND order_date >= '2024-01-01';

-- Create composite index
CREATE INDEX idx_orders_customer_date
ON orders_large (customer_id, order_date DESC);

-- After: Index Scan
EXPLAIN ANALYZE
SELECT * FROM orders_large
WHERE customer_id = 12345
AND order_date >= '2024-01-01';

Optimization 2: Covering Index

-- Query only needs customer_id, order_date, total_amount
-- Create covering index with INCLUDE
CREATE INDEX idx_orders_covering
ON orders_large (customer_id, order_date)
INCLUDE (total_amount, status);

-- Now query can be answered entirely from index
EXPLAIN ANALYZE
SELECT customer_id, order_date, total_amount
FROM orders_large
WHERE customer_id = 12345;
-- Output: Index Only Scan

Optimization 3: Query Rewriting

-- BAD: Subquery
EXPLAIN ANALYZE
SELECT *
FROM orders_large
WHERE customer_id IN (
    SELECT customer_id
    FROM customers_large
    WHERE city = 'New York'
);

-- GOOD: JOIN
EXPLAIN ANALYZE
SELECT DISTINCT o.*
FROM orders_large o
JOIN customers_large c ON o.customer_id = c.customer_id
WHERE c.city = 'New York';

-- GOOD: EXISTS
EXPLAIN ANALYZE
SELECT *
FROM orders_large o
WHERE EXISTS (
    SELECT 1
    FROM customers_large c
    WHERE c.customer_id = o.customer_id
    AND c.city = 'New York'
);

Optimization 4: Window Function vs Subquery

-- BAD: Correlated subquery
EXPLAIN ANALYZE
SELECT
    *,
    (SELECT COUNT(*) FROM orders_large o2 WHERE o2.customer_id = o1.customer_id) AS order_count
FROM orders_large o1;

-- GOOD: Window function
EXPLAIN ANALYZE
SELECT *,
    COUNT(*) OVER (PARTITION BY customer_id) AS order_count
FROM orders_large;

Optimization 5: Materialized Aggregation

-- Create materialized view for frequent aggregations
CREATE MATERIALIZED VIEW mv_customer_summary AS
SELECT
    customer_id,
    COUNT(*) AS order_count,
    SUM(total_amount) AS total_spent,
    AVG(total_amount) AS avg_order_value,
    MAX(order_date) AS last_order_date
FROM orders_large
GROUP BY customer_id;

CREATE UNIQUE INDEX idx_mv_customer ON mv_customer_summary(customer_id);

-- Query now reads from pre-aggregated data
SELECT * FROM mv_customer_summary
WHERE total_spent > 10000;

📊 Part 4: Statistics and Configuration

Updating Statistics

-- Check when statistics were last updated
SELECT
    schemaname,
    relname,
    last_analyze,
    last_autoanalyze,
    n_live_tup,
    n_dead_tup
FROM pg_stat_user_tables
WHERE relname = 'orders_large';

-- Update statistics
ANALYZE orders_large;

-- Update statistics for specific columns
ANALYZE orders_large (customer_id, order_date);

Configuration Tuning

-- Increase work_mem for complex sorts/aggregations
SET work_mem = '256MB';  -- Default is usually 4MB

-- Increase shared_buffers for data caching
-- (requires restart)
ALTER SYSTEM SET shared_buffers = '4GB';

-- Enable parallel query execution
SET max_parallel_workers_per_gather = 4;

-- Check current settings
SHOW work_mem;
SHOW shared_buffers;
SHOW max_parallel_workers_per_gather;

📈 Part 5: Monitoring Queries

Find Missing Indexes

-- Queries with high sequential scans
SELECT
    schemaname,
    relname,
    seq_scan,
    seq_tup_read,
    idx_scan,
    CASE
        WHEN seq_scan > 1000 AND idx_scan = 0 THEN 'Missing Index'
        WHEN seq_scan > idx_scan THEN 'Consider Index'
        ELSE 'OK'
    END AS recommendation
FROM pg_stat_user_tables
WHERE schemaname = 'public'
ORDER BY seq_tup_read DESC;

-- Unused indexes
SELECT
    indexrelname,
    idx_scan,
    pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
AND schemaname = 'public'
ORDER BY pg_relation_size(indexrelid) DESC;

Track Query Performance Over Time

-- Create performance tracking table
CREATE TABLE query_performance (
    log_id SERIAL PRIMARY KEY,
    query_name VARCHAR(100),
    execution_time_ms DECIMAL(10, 2),
    rows_returned INT,
    buffers_hit INT,
    buffers_read INT,
    logged_at TIMESTAMP DEFAULT NOW()
);

-- Log performance
INSERT INTO query_performance (query_name, execution_time_ms, rows_returned)
SELECT 'customer_report', EXTRACT(MILLISECONDS FROM clock_timestamp() - statement_timestamp()), COUNT(*)
FROM orders_large WHERE order_date >= '2024-01-01';

-- Track performance trends
SELECT
    DATE_TRUNC('day', logged_at) AS day,
    AVG(execution_time_ms) AS avg_time,
    MAX(execution_time_ms) AS max_time,
    MIN(execution_time_ms) AS min_time
FROM query_performance
WHERE query_name = 'customer_report'
GROUP BY DATE_TRUNC('day', logged_at)
ORDER BY day DESC;

🎯 Quiz Section

🏆 Best Practices for Interviews

💡✅ Performance Tuning Best Practices

1. Measure Before and After:

-- Always establish baseline
EXPLAIN (ANALYZE, BUFFERS, SUMMARY)
SELECT ...;

-- Apply optimization

-- Verify improvement
EXPLAIN (ANALYZE, BUFFERS, SUMMARY)
SELECT ...;

2. One Change at a Time:

-- Don't make multiple changes simultaneously
-- Test each optimization independently
-- Measure the impact of each change

3. Focus on Expensive Operations:

-- Look for:
-- - Sequential scans on large tables
-- - Nested loops with high row estimates
-- - External merge sorts
-- - High buffer reads (disk I/O)

4. Consider the Whole System:

-- A query might be slow due to:
-- - Missing indexes
-- - Stale statistics
-- - Insufficient memory
-- - Lock contention
-- - Concurrent workloads

5. Document Everything:

-- Keep a performance log
-- Record what was slow, what you changed, and the result
-- This helps with future debugging

⚠️⚠️ Common Tuning Mistakes

Over-indexing: Too many indexes slow down writes
Ignoring statistics: Stale stats lead to bad plans
Premature optimization: Always measure first
Not testing under load: Single-user performance ≠ production
Forgetting about caching: Repeated queries may be cached

Topic: SQL Performance Tuning for FAANG Interviews