EXPLAIN ANALYZE: Reading Query Plans, Bottlenecks, Costs

Interview Question: "Walk me through reading an EXPLAIN ANALYZE output. How do you identify a sequential scan bottleneck? What does 'Buffers' tell you?" — Asked at PostgreSQL, CitusData, Amazon Aurora for Database Engineer roles

ℹ️

Difficulty: Advanced | Companies: PostgreSQL, CitusData, Amazon Aurora, Google Cloud SQL, Azure SQL | Time: 60-75 minutes

EXPLAIN Options

-- Basic EXPLAIN
EXPLAIN SELECT * FROM orders WHERE customer_id = 1234;

-- EXPLAIN with ANALYZE (executes query)
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 1234;

-- Full analysis with BUFFERS
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) 
SELECT * FROM orders WHERE customer_id = 1234;

-- JSON output for programmatic analysis
EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON) 
SELECT * FROM orders WHERE customer_id = 1234;

-- Verbose output
EXPLAIN (ANALYZE, VERBOSE) 
SELECT * FROM orders WHERE customer_id = 1234;

Reading EXPLAIN Output

-- Create sample tables
CREATE TABLE customers (
    customer_id SERIAL PRIMARY KEY,
    name VARCHAR(100),
    email VARCHAR(255),
    region VARCHAR(20)
);

CREATE TABLE orders (
    order_id SERIAL PRIMARY KEY,
    customer_id INT REFERENCES customers(customer_id),
    order_date DATE,
    total_amount DECIMAL(12,2),
    status VARCHAR(20)
);

CREATE INDEX idx_orders_customer ON orders(customer_id);
CREATE INDEX idx_orders_date ON orders(order_date);

-- Analyze a simple query
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT o.*, c.name
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
WHERE o.order_date > '2024-01-01'
    AND o.total_amount > 1000;

Output Explanation:

Architecture Diagram

Hash Join  (cost=2567.89..19456.78 rows=62500 width=64) (actual time=12.345..89.012 rows=62489 loops=1)
  Hash Cond: (o.customer_id = c.customer_id)
  Buffers: shared hit=1234 read=567
  ->  Seq Scan on orders o  (cost=0.00..15678.90 rows=250000 width=16) (actual time=0.012..45.678 rows=250000 loops=1)
        Filter: ((order_date > '2024-01-01') AND (total_amount > 1000))
        Rows Removed by Filter: 750000
        Buffers: shared hit=890 read=345
  ->  Hash  (cost=1234.56..1234.56 rows=10000 width=48) (actual time=8.901..8.901 rows=10000 loops=1)
        Buckets: 16384  Batches: 1  Memory Usage: 1234kB
        Buffers: shared hit=344 read=222
        ->  Seq Scan on customers c  (cost=0.00..1234.56 rows=10000 width=48) (actual time=0.005..4.567 rows=10000 loops=1)
              Buffers: shared hit=344 read=222
Planning Time: 0.234 ms
Execution Time: 89.567 ms

Cost Model Breakdown

\text{Startup Cost} = \text{Cost before first row returned}

\text{Total Cost} = \text{Cost to complete execution}

\text{Cost per Row} = \frac{\text{Total Cost}}{\text{Estimated Rows}}

-- Understand cost calculations
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM orders 
WHERE customer_id = 1234;

-- Check different join strategies
EXPLAIN (ANALYZE, BUFFERS)
SELECT o.*, c.name
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id;

-- Compare with nested loop
SET enable_hashjoin = off;
SET enable_mergejoin = off;

EXPLAIN (ANALYZE, BUFFERS)
SELECT o.*, c.name
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id;

RESET enable_hashjoin;
RESET enable_mergejoin;

Identifying Bottlenecks

Common Bottleneck Patterns:

Sequential Scan on Large Table

Architecture Diagram

Seq Scan on orders  (cost=0.00..15678.90 rows=250000)
  Rows Removed by Filter: 750000

Nested Loop with High Row Estimate

Architecture Diagram

Nested Loop  (cost=0.00..1234567.89 rows=1000000)

Hash Join with High Memory

Architecture Diagram

Hash  (cost=123456.78..123456.78 rows=1000000)
  Buckets: 2097152  Batches: 4  Memory Usage: 65536kB

-- Identify missing indexes
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM orders 
WHERE status = 'pending' AND region = 'North';

-- Check for implicit type casts
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM orders 
WHERE customer_id = '1234';  -- String instead of int

-- Analyze subquery performance
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM orders 
WHERE customer_id IN (
    SELECT customer_id FROM customers WHERE region = 'North'
);

Buffer Analysis

\text{Cache Hit Ratio} = \frac{\text{shared hit}}{\text{shared hit} + \text{shared read}}

-- Analyze buffer usage
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT * FROM orders WHERE order_date > '2024-01-01';

-- Calculate I/O cost
WITH plan AS (
    SELECT 
        (SELECT SUM(shared_hit) FROM pg_stat_statements) AS total_hit,
        (SELECT SUM(shared_read) FROM pg_stat_statements) AS total_read
)
SELECT 
    total_hit,
    total_read,
    ROUND(total_hit::decimal / (total_hit + total_read) * 100, 2) AS hit_ratio,
    ROUND(total_read::decimal / (total_hit + total_read) * 100, 2) AS miss_ratio
FROM plan;

Execution Plan Visualization

-- Get JSON plan for visualization tools
EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON)
SELECT o.*, c.name
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
WHERE o.order_date > '2024-01-01';

-- Output for explain.dalibo.com or pganalyze
/*
{
  "Plan": {
    "Node Type": "Hash Join",
    "Parallel Aware": false,
    "Relation Name": "orders",
    "Alias": "o",
    "Startup Cost": 2567.89,
    "Total Cost": 19456.78,
    "Plan Rows": 62500,
    "Plan Width": 64,
    "Actual Startup Time": 12.345,
    "Actual Total Time": 89.012,
    "Actual Rows": 62489,
    "Actual Loops": 1,
    "Shared Hit Blocks": 1234,
    "Shared Read Blocks": 567
  }
}
*/

Cost Estimation Accuracy

-- Compare estimated vs actual rows
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM orders 
WHERE order_date BETWEEN '2024-01-01' AND '2024-03-31'
    AND total_amount > 5000;

-- Check statistics accuracy
SELECT 
    attname,
    n_distinct,
    most_common_vals,
    most_common_freqs,
    histogram_bounds
FROM pg_stats
WHERE tablename = 'orders'
    AND attname IN ('order_date', 'total_amount');

Estimation Error Formula:

\text{Error} = \frac{|\text{Estimated Rows} - \text{Actual Rows}|}{\text{Actual Rows}} \times 100\%

Advanced: Plan Comparison

-- Compare plans before/after index
-- Before index
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM orders WHERE status = 'pending';

-- Create index
CREATE INDEX idx_orders_status ON orders(status);

-- After index
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM orders WHERE status = 'pending';

-- Compare with different settings
SET work_mem = '64MB';
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM orders GROUP BY status;

SET work_mem = '4MB';
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM orders GROUP BY status;

RESET work_mem;

Performance Metrics

Metric	Good	Warning	Critical
Cache Hit Ratio	> 99%	95-99%	< 95%
Seq Scan %	< 10%	10-30%	> 30%
Index Scan %	> 90%	70-90%	< 70%
Estimation Error	< 10%	10-50%	> 50%

-- Monitor query performance
SELECT 
    query,
    calls,
    total_time,
    mean_time,
    rows,
    shared_hit,
    shared_read
FROM pg_stat_statements
ORDER BY total_time DESC
LIMIT 10;

-- Track slow queries
SELECT 
    pid,
    now() - pg_stat_activity.query_start AS duration,
    query,
    state
FROM pg_stat_activity
WHERE (now() - pg_stat_activity.query_start) > interval '5 minutes';

ℹ️

Pro Tip: Use EXPLAIN (ANALYZE, BUFFERS, FORMAT JSON) with tools like pganalyze, explain.dalibo.com, or pgMustard for visual plan analysis.

Mathematical Cost Formulas

Sequential scan cost:

C_{seq} = P \times SPC + R \times CTC

Where:

$P$ = pages
$SPC$ = sequential page cost (default 1.0)
$R$ = rows
$CTC$ = CPU tuple cost (default 0.01)

Index scan cost:

C_{idx} = L \times RPC + R \times (COC + CTC)

Where:

$L$ = leaf pages
$RPC$ = random page cost (default 4.0)
$COC$ = CPU operator cost (default 0.0025)

Hash join cost:

C_{hash} = 3 \times (C_{build} + C_{probe})

⚠️

Common Mistake: Don't optimize based on cost numbers alone. Always compare actual execution times and buffer usage.