πŸŽ‰ 75% of content is free forever β€” Unlock Premium from $10/mo β†’
CW
Search courses…
πŸ’Ό Servicesℹ️ Aboutβœ‰οΈ ContactView Pricing Plansfrom $10

Topic: CTEs and Recursive Queries for FAANG Interviews

SQL AdvancedCTEs and Recursion⭐ Premium

Advertisement

πŸ”„ CTEs & Recursive Queries

Amazon & Microsoft Interview Deep Dive

🏒 Amazon🏒 Microsoft⚑ Difficulty: Hard⏱️ 50 min

πŸ“‹ Interview Question

β„ΉοΈπŸ”΄ Amazon/Microsoft Interview Question

"Given an employees table with employee_id, name, and manager_id, write a query to find the management hierarchy β€” the full reporting chain from each employee up to the CEO. Also find all subordinates of a specific manager and calculate the total number of direct and indirect reports for each manager."

Companies: Amazon, Microsoft | Difficulty: Hard | Time: 50 minutes

πŸ“Š Setup: Organizational Data

CREATE TABLE employees (
    employee_id SERIAL PRIMARY KEY,
    employee_name VARCHAR(100) NOT NULL,
    manager_id INT REFERENCES employees(employee_id),
    department VARCHAR(50),
    position VARCHAR(50),
    salary DECIMAL(10, 2),
    hire_date DATE
);

-- Insert hierarchical data (CEO at top)
INSERT INTO employees (employee_id, employee_name, manager_id, department, position, salary, hire_date) VALUES
(1, 'Sarah Chen (CEO)', NULL, 'Executive', 'CEO', 300000, '2015-01-01'),
(2, 'Michael Park', 1, 'Engineering', 'VP Engineering', 250000, '2016-03-15'),
(3, 'Jennifer Wu', 1, 'Marketing', 'VP Marketing', 220000, '2016-06-01'),
(4, 'David Kim', 1, 'Sales', 'VP Sales', 230000, '2017-01-10'),
(5, 'Emily Rodriguez', 2, 'Engineering', 'Director', 180000, '2017-04-20'),
(6, 'James Thompson', 2, 'Engineering', 'Director', 175000, '2017-07-15'),
(7, 'Lisa Wang', 3, 'Marketing', 'Director', 165000, '2018-01-05'),
(8, 'Robert Chen', 4, 'Sales', 'Director', 170000, '2018-03-10'),
(9, 'Amanda Foster', 5, 'Engineering', 'Senior Engineer', 140000, '2018-06-01'),
(10, 'Chris Martinez', 5, 'Engineering', 'Senior Engineer', 138000, '2018-08-15'),
(11, 'Nicole Adams', 6, 'Engineering', 'Engineer', 120000, '2019-01-10'),
(12, 'Kevin Lee', 6, 'Engineering', 'Engineer', 118000, '2019-04-20'),
(13, 'Rachel Green', 7, 'Marketing', 'Specialist', 95000, '2019-06-01'),
(14, 'Tom Wilson', 8, 'Sales', 'Account Executive', 85000, '2019-09-10'),
(15, 'Maria Garcia', 9, 'Engineering', 'Junior Engineer', 95000, '2020-01-15');

πŸ”„ Part 1: Basic CTEs (Non-Recursive)

Why CTEs Over Subqueries?

πŸ’‘πŸ’‘ CTE vs Subquery

CTEs provide:

  • Readability: Named, reusable result sets
  • Modularity: Break complex queries into logical steps
  • Recursion: Enable recursive hierarchical queries
  • Reference: Multiple references to same CTE without re-execution (in most databases)
-- CTE: Calculate department statistics
WITH department_stats AS (
    SELECT
        department,
        COUNT(*) AS employee_count,
        AVG(salary) AS avg_salary,
        SUM(salary) AS total_salary
    FROM employees
    WHERE manager_id IS NOT NULL  -- Exclude CEO
    GROUP BY department
),
salary_bands AS (
    SELECT
        employee_id,
        employee_name,
        department,
        salary,
        CASE
            WHEN salary >= 200000 THEN 'Executive'
            WHEN salary >= 150000 THEN 'Senior'
            WHEN salary >= 100000 THEN 'Mid-Level'
            ELSE 'Junior'
        END AS salary_band
    FROM employees
)
SELECT
    sb.employee_name,
    sb.department,
    sb.salary,
    sb.salary_band,
    ds.employee_count AS dept_size,
    ROUND(sb.salary * 100.0 / ds.total_salary, 2) AS pct_of_dept_total
FROM salary_bands sb
JOIN department_stats ds ON sb.department = ds.department
ORDER BY sb.department, sb.salary DESC;

Multiple CTEs for Complex Analysis

WITH
-- Step 1: Find direct reports count
direct_reports AS (
    SELECT
        manager_id,
        COUNT(*) AS direct_report_count
    FROM employees
    WHERE manager_id IS NOT NULL
    GROUP BY manager_id
),
-- Step 2: Find total team size (direct + indirect)
team_sizes AS (
    SELECT
        manager_id,
        SUM(direct_report_count) AS total_reports
    FROM direct_reports
    GROUP BY manager_id
),
-- Step 3: Calculate management metrics
manager_metrics AS (
    SELECT
        e.employee_id,
        e.employee_name,
        e.department,
        e.salary,
        COALESCE(dr.direct_report_count, 0) AS direct_reports,
        COALESCE(ts.total_reports, 0) AS total_reports,
        e.salary / NULLIF(COALESCE(ts.total_reports, 0), 0) AS cost_per_report
    FROM employees e
    LEFT JOIN direct_reports dr ON e.employee_id = dr.manager_id
    LEFT JOIN team_sizes ts ON e.employee_id = ts.manager_id
)
SELECT
    employee_name,
    department,
    salary,
    direct_reports,
    total_reports,
    ROUND(COALESCE(cost_per_report, 0), 2) AS cost_per_report,
    CASE
        WHEN total_reports > 10 THEN 'Large Team'
        WHEN total_reports > 5 THEN 'Medium Team'
        WHEN total_reports > 0 THEN 'Small Team'
        ELSE 'Individual Contributor'
    END AS team_size_category
FROM manager_metrics
ORDER BY total_reports DESC;

🌳 Part 2: Recursive CTEs

β„ΉοΈπŸ” Recursive CTE Structure

Recursive CTEs have two parts:

  1. Anchor member: Initial query (base case)
  2. Recursive member: References the CTE itself (recursive step)
  3. Termination: Implicit (when no more rows) or explicit (LIMIT/depth check)

Basic Recursive Hierarchy

-- Find complete management chain for each employee
WITH RECURSIVE management_chain AS (
    -- Anchor: Start with employees (non-CEOs)
    SELECT
        employee_id,
        employee_name,
        manager_id,
        department,
        1 AS level,
        employee_name::TEXT AS chain,
        ARRAY[employee_id] AS path
    FROM employees
    WHERE manager_id IS NOT NULL

    UNION ALL

    -- Recursive: Join back to find managers
    SELECT
        e.employee_id,
        e.employee_name,
        e.manager_id,
        e.department,
        mc.level + 1,
        mc.chain || ' β†’ ' || e.employee_name,
        mc.path || e.employee_id
    FROM employees e
    INNER JOIN management_chain mc ON e.employee_id = mc.manager_id
    WHERE mc.level < 20  -- Safety: prevent infinite loops
)
SELECT
    employee_id,
    employee_name,
    department,
    level AS hierarchy_depth,
    chain AS reporting_chain,
    path AS employee_path
FROM management_chain
WHERE manager_id IS NULL  -- Only full chains to CEO
ORDER BY employee_id;

Finding All Subordinates of a Manager

-- Find all subordinates (direct and indirect) of Employee ID 2
WITH RECURSIVE subordinates AS (
    -- Anchor: Direct reports
    SELECT
        employee_id,
        employee_name,
        manager_id,
        department,
        position,
        1 AS depth
    FROM employees
    WHERE manager_id = 2  -- Michael Park's direct reports

    UNION ALL

    -- Recursive: Reports of reports
    SELECT
        e.employee_id,
        e.employee_name,
        e.manager_id,
        e.department,
        e.position,
        s.depth + 1
    FROM employees e
    INNER JOIN subordinates s ON e.manager_id = s.employee_id
)
SELECT
    employee_id,
    employee_name,
    department,
    position,
    depth AS reporting_depth
FROM subordinates
ORDER BY depth, employee_name;

Counting Total Reports Per Manager

-- Calculate total number of direct and indirect reports for each manager
WITH RECURSIVE all_reports AS (
    -- Anchor: All employees with managers
    SELECT
        employee_id,
        manager_id,
        employee_id AS original_manager,
        1 AS distance
    FROM employees
    WHERE manager_id IS NOT NULL

    UNION ALL

    -- Recursive: Find who reports to the manager's reports
    SELECT
        ar.employee_id,
        e.manager_id,
        ar.original_manager,
        ar.distance + 1
    FROM all_reports ar
    INNER JOIN employees e ON ar.manager_id = e.employee_id
    WHERE ar.distance < 20
),
report_counts AS (
    SELECT
        original_manager AS manager_id,
        COUNT(DISTINCT employee_id) AS total_indirect_reports
    FROM all_reports
    GROUP BY original_manager
)
SELECT
    e.employee_id,
    e.employee_name,
    e.department,
    e.position,
    COALESCE(rc.total_indirect_reports, 0) AS indirect_reports,
    (SELECT COUNT(*) FROM employees WHERE manager_id = e.employee_id) AS direct_reports,
    COALESCE(rc.total_indirect_reports, 0) +
        (SELECT COUNT(*) FROM employees WHERE manager_id = e.employee_id) AS total_reports
FROM employees e
LEFT JOIN report_counts rc ON e.employee_id = rc.manager_id
WHERE (SELECT COUNT(*) FROM employees WHERE manager_id = e.employee_id) > 0
ORDER BY total_reports DESC;

πŸ—ΊοΈ Part 3: Graph Traversal with Recursive CTEs

⚠️⚠️ Graph Cycle Detection

When traversing graphs (not trees), you must detect cycles to prevent infinite recursion. Use an array or set to track visited nodes.

Shortest Path in a Graph

-- Create a social network graph
CREATE TABLE friendships (
    user_id INT,
    friend_id INT,
    since_date DATE
);

INSERT INTO friendships VALUES
(1, 2, '2020-01-01'), (1, 3, '2020-02-15'),
(2, 4, '2020-03-01'), (3, 5, '2020-04-10'),
(4, 6, '2020-05-20'), (5, 6, '2020-06-01'),
(6, 7, '2020-07-15'), (7, 8, '2020-08-01'),
(8, 1, '2020-09-10');  -- Creates a cycle!

-- Find shortest path from user 1 to user 8
WITH RECURSIVE shortest_path AS (
    -- Anchor: Direct connections
    SELECT
        user_id,
        friend_id,
        ARRAY[user_id, friend_id] AS path,
        1 AS distance
    FROM friendships
    WHERE user_id = 1

    UNION

    -- Recursive: Explore friends of friends
    SELECT
        sp.user_id,
        f.friend_id,
        sp.path || f.friend_id,
        sp.distance + 1
    FROM shortest_path sp
    INNER JOIN friendships f ON sp.friend_id = f.user_id
    WHERE f.friend_id != ALL(sp.path)  -- Cycle detection
    AND sp.distance < 10  -- Max depth
)
SELECT
    user_id AS start_user,
    friend_id AS end_user,
    path AS shortest_path,
    distance
FROM shortest_path
WHERE friend_id = 8
ORDER BY distance
LIMIT 1;

πŸ“Š Part 4: Hierarchical Aggregation

Organizational Cost Analysis

-- Calculate total cost of each management subtree
WITH RECURSIVE org_tree AS (
    -- Anchor: Individual employees
    SELECT
        employee_id,
        employee_name,
        manager_id,
        department,
        salary,
        employee_id AS subtree_root,
        salary AS subtree_cost,
        1 AS subtree_size,
        employee_name::TEXT AS subtree_members
    FROM employees

    UNION ALL

    -- Recursive: Aggregate upward
    SELECT
        ot.employee_id,
        ot.employee_name,
        ot.manager_id,
        ot.department,
        ot.salary,
        ot.subtree_root,
        ot.subtree_cost + e.salary,
        ot.subtree_size + 1,
        ot.subtree_members || ', ' || e.employee_name
    FROM org_tree ot
    INNER JOIN employees e ON ot.manager_id = e.employee_id
    WHERE ot.subtree_size < 100  -- Safety limit
)
SELECT DISTINCT
    employee_id,
    employee_name,
    department,
    subtree_size - 1 AS total_reports,
    subtree_cost - salary AS total_team_cost,
    ROUND(subtree_cost * 100.0 / (SELECT SUM(salary) FROM employees), 2) AS pct_of_total_cost
FROM org_tree
WHERE subtree_size > 1
ORDER BY total_team_cost DESC;

πŸ”— Part 5: Multiple Recursive Members

πŸ’‘πŸ’‘ Multiple Anchor Members

You can have multiple anchor members united by UNION ALL. This is useful for finding paths from multiple starting points or handling different node types.

-- Find all paths between two specific employees (bidirectional)
WITH RECURSIVE all_paths AS (
    -- Forward direction
    SELECT
        user_id AS start_node,
        friend_id AS end_node,
        ARRAY[user_id, friend_id] AS path,
        1 AS path_length
    FROM friendships
    WHERE user_id = 1

    UNION

    -- Backward direction
    SELECT
        friend_id AS start_node,
        user_id AS end_node,
        ARRAY[friend_id, user_id] AS path,
        1 AS path_length
    FROM friendships
    WHERE friend_id = 1

    UNION ALL

    -- Extend paths
    SELECT
        ap.start_node,
        CASE
            WHEN f.user_id = ap.end_node THEN f.friend_id
            ELSE f.user_id
        END,
        ap.path || CASE
            WHEN f.user_id = ap.end_node THEN f.friend_id
            ELSE f.user_id
        END,
        ap.path_length + 1
    FROM all_paths ap
    INNER JOIN friendships f ON f.user_id = ap.end_node OR f.friend_id = ap.end_node
    WHERE NOT (
        CASE
            WHEN f.user_id = ap.end_node THEN f.friend_id
            ELSE f.user_id
        END = ANY(ap.path)
    )
    AND ap.path_length < 6
)
SELECT
    start_node,
    end_node,
    path,
    path_length
FROM all_paths
WHERE end_node = 8
ORDER BY path_length
LIMIT 5;

πŸ—οΈ Part 6: Advanced Patterns

Materialized Path Pattern

-- Store materialized path for fast ancestor/descendant queries
ALTER TABLE employees ADD COLUMN materialized_path TEXT;

-- Populate materialized path using recursive CTE
WITH RECURSIVE path_builder AS (
    SELECT
        employee_id,
        employee_name,
        manager_id,
        '/' || employee_id || '/' AS materialized_path,
        1 AS depth
    FROM employees
    WHERE manager_id IS NULL

    UNION ALL

    SELECT
        e.employee_id,
        e.employee_name,
        e.manager_id,
        pb.materialized_path || e.employee_id || '/',
        pb.depth + 1
    FROM employees e
    INNER JOIN path_builder pb ON e.manager_id = pb.employee_id
)
UPDATE employees
SET materialized_path = pb.materialized_path,
    depth = pb.depth
FROM path_builder pb
WHERE employees.employee_id = pb.employee_id;

-- Fast ancestor query using materialized path
SELECT
    e.employee_id,
    e.employee_name,
    e.materialized_path
FROM employees e
WHERE '/1/2/5/' LIKE e.materialized_path || '%'
ORDER BY e.materialized_path;

-- Fast descendant query
SELECT
    e.employee_id,
    e.employee_name,
    e.materialized_path
FROM employees e
WHERE e.materialized_path LIKE '/1/2/%'
AND e.employee_id != 2
ORDER BY e.materialized_path;

Recursive Date Series Generation

-- Generate a date series for daily reporting
WITH RECURSIVE date_series AS (
    SELECT
        DATE '2024-01-01' AS report_date,
        1 AS day_number

    UNION ALL

    SELECT
        report_date + INTERVAL '1 day',
        day_number + 1
    FROM date_series
    WHERE report_date < DATE '2024-12-31'
)
SELECT
    report_date,
    day_number,
    EXTRACT(DOW FROM report_date) AS day_of_week,
    CASE EXTRACT(DOW FROM report_date)
        WHEN 0 THEN 'Sunday'
        WHEN 6 THEN 'Saturday'
        ELSE 'Weekday'
    END AS day_type
FROM date_series
WHERE EXTRACT(DOW FROM report_date) NOT IN (0, 6)  -- Weekdays only
ORDER BY report_date;

⏱️ Complexity Analysis

OperationTime ComplexitySpace ComplexityNotes
Simple CTEO(n)O(n)Result set materialization
Recursive CTE (Tree)O(n)O(depth)Stack depth = tree height
Recursive CTE (Graph)O(V + E)O(V)Vertices + Edges
Path FindingO(V!) worstO(V)Exponential without optimization
Materialized PathO(n) queryO(path_length)Fast lookups after O(n) build

🎯 Quiz Section

πŸ† Best Practices for Interviews

πŸ’‘βœ… CTE & Recursion Best Practices

1. Always Set Recursion Limits:

-- BAD: Potential infinite loop
WITH RECURSIVE cte AS (
    SELECT * FROM graph
    UNION ALL
    SELECT g.* FROM graph g JOIN cte ON ...
)

-- GOOD: Safety limit
WITH RECURSIVE cte AS (
    SELECT *, 1 AS depth FROM graph
    UNION ALL
    SELECT g.*, c.depth + 1
    FROM graph g JOIN cte ON ...
    WHERE c.depth < 100
)

2. Use UNION for Cycle Detection:

-- For graphs, use UNION to automatically deduplicate visited nodes
-- For trees, UNION ALL is safe and faster

-- Manual cycle detection with path tracking:
WHERE NOT (node_id = ANY(visited_path))

3. Break Complex Problems into CTEs:

-- Step-by-step approach
WITH
step1 AS (...),  -- Get raw data
step2 AS (...),  -- Filter/transform
step3 AS (...),  -- Aggregate
SELECT * FROM step3;

4. Consider Alternative Patterns:

  • Adjacency List: Simple but slow for deep hierarchies
  • Materialized Path: Fast reads, complex updates
  • Nested Sets: Fast ancestors/descendants, expensive inserts
  • Closure Table: Balanced read/write performance

πŸ”— Common Follow-Up Questions

  1. "How would you handle circular references?" β€” Track visited nodes in an array
  2. "What's the maximum recursion depth?" β€” Database-dependent (PostgreSQL: 300, SQL Server: 100)
  3. "Can recursive CTEs be used in views?" β€” Yes, but performance varies by database
  4. "How do you optimize recursive CTEs?" β€” Add indexes on join columns, limit depth

⚠️⚠️ Performance Warning

Recursive CTEs can be expensive for deep hierarchies. Consider materializing paths or using adjacency list with indexes for frequently queried hierarchies. Always explain the trade-offs to interviewers.

Advertisement