🎉 75% of content is free forever — Unlock Premium from $10/mo →
CW
Search courses…
💼 Servicesℹ️ About✉️ ContactView Pricing Plansfrom $10

Recursive CTEs: Hierarchical Queries, Graph Traversal, Scheduling

Advanced SQLRecursive Queries⭐ Premium

Advertisement

Interview Question: "Write a recursive CTE to find all ancestors of a node in a hierarchy. How do you prevent infinite loops? What's the difference between depth-first and breadth-first traversal in SQL?" — Asked at Netflix, Spotify, Uber for Staff Engineer roles

ℹ️

Difficulty: Advanced | Companies: Netflix, Spotify, Uber, Airbnb, Stripe | Time: 45-60 minutes

Recursive CTE Anatomy

A recursive CTE has three components:

Recursive CTE=Anchor MemberRecursive MemberBase CaseInductive Step\text{Recursive CTE} = \text{Anchor Member} \cup \text{Recursive Member} \mid \text{Base Case} \cup \text{Inductive Step}
-- Create organizational hierarchy
CREATE TABLE employees (
    emp_id INT PRIMARY KEY,
    name VARCHAR(100),
    manager_id INT,
    title VARCHAR(100),
    hire_date DATE,
    salary DECIMAL(12,2)
);

INSERT INTO employees VALUES
(1, 'CEO', NULL, 'Chief Executive Officer', '2020-01-01', 250000.00),
(2, 'CTO', 1, 'Chief Technology Officer', '2020-02-01', 220000.00),
(3, 'CFO', 1, 'Chief Financial Officer', '2020-03-01', 200000.00),
(4, 'VP Eng', 2, 'VP of Engineering', '2020-04-01', 180000.00),
(5, 'VP Product', 2, 'VP of Product', '2020-05-01', 175000.00),
(6, 'Director', 4, 'Director of Engineering', '2021-01-01', 150000.00),
(7, 'Manager 1', 6, 'Engineering Manager', '2021-06-01', 130000.00),
(8, 'Manager 2', 6, 'Engineering Manager', '2021-07-01', 125000.00),
(9, 'Sr Dev 1', 7, 'Senior Developer', '2022-01-01', 120000.00),
(10, 'Sr Dev 2', 7, 'Senior Developer', '2022-02-01', 115000.00),
(11, 'Dev 1', 8, 'Developer', '2022-03-01', 100000.00),
(12, 'Dev 2', 8, 'Developer', '2022-04-01', 95000.00),
(13, 'Intern 1', 9, 'Intern', '2023-06-01', 50000.00);

Depth-First Traversal with Path

-- Recursive CTE for complete hierarchy with path
WITH RECURSIVE emp_hierarchy AS (
    -- Anchor: Top-level managers (CEO)
    SELECT 
        emp_id,
        name,
        manager_id,
        title,
        hire_date,
        salary,
        1 AS depth,
        name::TEXT AS path,
        ARRAY[emp_id] AS path_ids,
        name::TEXT AS search_path
    FROM employees
    WHERE manager_id IS NULL
    
    UNION ALL
    
    -- Recursive: Find all subordinates
    SELECT 
        e.emp_id,
        e.name,
        e.manager_id,
        e.title,
        e.hire_date,
        e.salary,
        h.depth + 1,
        h.path || ' → ' || e.name,
        h.path_ids || e.emp_id,
        h.search_path || '/' || e.name
    FROM employees e
    INNER JOIN emp_hierarchy h ON e.manager_id = h.emp_id
    WHERE h.depth < 10  -- Prevent infinite recursion
)
SELECT 
    emp_id,
    name,
    title,
    depth,
    path,
    salary,
    -- Calculate path length
    array_length(path_ids, 1) - 1 AS hierarchy_level,
    -- Check if leaf node
    CASE 
        WHEN NOT EXISTS (
            SELECT 1 FROM employees WHERE manager_id = emp_hierarchy.emp_id
        ) THEN 'LEAF'
        ELSE 'BRANCH'
    END AS node_type
FROM emp_hierarchy
ORDER BY path_ids;

Output:

emp_idnametitledepthpathsalaryhierarchy_levelnode_type
1CEOChief Executive Officer1CEO250000.000BRANCH
2CTOChief Technology Officer2CEO → CTO220000.001BRANCH
3CFOChief Financial Officer2CEO → CFO200000.001LEAF
4VP EngVP of Engineering3CEO → CTO → VP Eng180000.002BRANCH
5VP ProductVP of Product3CEO → CTO → VP Product175000.002LEAF
6DirectorDirector of Engineering4CEO → CTO → VP Eng → Director150000.003BRANCH
7Manager 1Engineering Manager5CEO → CTO → VP Eng → Director → Manager 1130000.004BRANCH
8Manager 2Engineering Manager5CEO → CTO → VP Eng → Director → Manager 2125000.004BRANCH
9Sr Dev 1Senior Developer6CEO → CTO → VP Eng → Director → Manager 1 → Sr Dev 1120000.005BRANCH
10Sr Dev 2Senior Developer6CEO → CTO → VP Eng → Director → Manager 1 → Sr Dev 2115000.005LEAF
11Dev 1Developer6CEO → CTO → VP Eng → Director → Manager 2 → Dev 1100000.005LEAF
12Dev 2Developer6CEO → CTO → VP Eng → Director → Manager 2 → Dev 295000.005LEAF
13Intern 1Intern7CEO → CTO → VP Eng → Director → Manager 1 → Sr Dev 1 → Intern 150000.006LEAF

⚠️

Infinite Recursion Prevention: Always include a depth limit or cycle detection. Without it, circular references cause infinite loops.

Breadth-First Search Pattern

-- BFS traversal with level-by-level processing
WITH RECURSIVE bfs_traversal AS (
    SELECT 
        emp_id,
        name,
        manager_id,
        1 AS level,
        ARRAY[emp_id] AS visited,
        name::TEXT AS level_path
    FROM employees
    WHERE manager_id IS NULL
    
    UNION ALL
    
    SELECT 
        e.emp_id,
        e.name,
        e.manager_id,
        b.level + 1,
        b.visited || e.emp_id,
        b.level_path || ' → ' || e.name
    FROM employees e
    INNER JOIN bfs_traversal b ON e.manager_id = b.emp_id
    WHERE e.emp_id != ALL(b.visited)  -- Cycle detection
)
SELECT 
    level,
    COUNT(*) AS nodes_at_level,
    STRING_AGG(name, ', ' ORDER BY name) AS members,
    MAX(salary) AS max_salary,
    ROUND(AVG(salary), 2) AS avg_salary
FROM bfs_traversal
JOIN employees USING (emp_id)
GROUP BY level
ORDER BY level;

Output:

levelnodes_at_levelmembersmax_salaryavg_salary
11CEO250000.00250000.00
22CFO, CTO220000.00210000.00
32VP Eng, VP Product180000.00177500.00
41Director150000.00150000.00
52Manager 1, Manager 2130000.00127500.00
64Dev 1, Dev 2, Sr Dev 1, Sr Dev 2120000.00107500.00
71Intern 150000.0050000.00

Graph Traversal: Shortest Path

-- Create graph table for network connections
CREATE TABLE network_connections (
    source_node INT,
    target_node INT,
    connection_cost DECIMAL(10,2),
    connection_type VARCHAR(50)
);

INSERT INTO network_connections VALUES
(1, 2, 10.00, 'direct'),
(1, 3, 15.00, 'direct'),
(2, 3, 5.00, 'indirect'),
(2, 4, 20.00, 'direct'),
(3, 4, 8.00, 'direct'),
(3, 5, 12.00, 'direct'),
(4, 5, 3.00, 'direct'),
(4, 6, 18.00, 'direct'),
(5, 6, 7.00, 'direct');

-- Find shortest path using recursive CTE
WITH RECURSIVE shortest_path AS (
    -- Start from source node 1
    SELECT 
        source_node,
        target_node,
        connection_cost AS total_cost,
        1 AS hops,
        ARRAY[source_node, target_node] AS path,
        source_node || ' → ' || target_node::TEXT AS path_text
    FROM network_connections
    WHERE source_node = 1
    
    UNION ALL
    
    -- Extend path by one hop
    SELECT 
        sp.source_node,
        nc.target_node,
        sp.total_cost + nc.connection_cost,
        sp.hops + 1,
        sp.path || nc.target_node,
        sp.path_text || ' → ' || nc.target_node
    FROM shortest_path sp
    INNER JOIN network_connections nc 
        ON sp.target_node = nc.source_node
    WHERE nc.target_node != ALL(sp.path)  -- No cycles
        AND sp.hops < 10  -- Max depth
)
SELECT DISTINCT ON (source_node, target_node)
    source_node,
    target_node,
    total_cost,
    hops,
    path,
    path_text
FROM shortest_path
WHERE target_node = 6  -- Destination
ORDER BY source_node, target_node, total_cost;

Output:

source_nodetarget_nodetotal_costhopspathpath_text
1630.003{1,2,4,6}1 → 2 → 4 → 6
1631.003{1,3,4,6}1 → 3 → 4 → 6
1634.004{1,2,3,4,6}1 → 2 → 3 → 4 → 6
1637.004{1,2,4,5,6}1 → 2 → 4 → 5 → 6

Scheduling with Recursive CTE

-- Create project tasks with dependencies
CREATE TABLE project_tasks (
    task_id INT PRIMARY KEY,
    task_name VARCHAR(100),
    duration_days INT,
    depends_on INT,
    priority INT
);

INSERT INTO project_tasks VALUES
(1, 'Requirements', 5, NULL, 1),
(2, 'Design', 10, 1, 2),
(3, 'Database Schema', 7, 2, 3),
(4, 'API Development', 15, 2, 3),
(5, 'Frontend Dev', 12, 2, 3),
(6, 'Integration', 8, 3, 4),
(7, 'Testing', 10, 4, 4),
(8, 'Deployment', 3, 5, 5),
(9, 'Documentation', 5, 4, 5),
(10, 'Go Live', 1, 6, 6);

-- Calculate critical path using recursive CTE
WITH RECURSIVE task_schedule AS (
    -- Anchor: Tasks with no dependencies
    SELECT 
        task_id,
        task_name,
        duration_days,
        depends_on,
        priority,
        0 AS start_day,
        duration_days AS end_day,
        ARRAY[task_id] AS dependency_chain,
        task_id::TEXT AS chain_text,
        0 AS max_chain_length
    FROM project_tasks
    WHERE depends_on IS NULL
    
    UNION ALL
    
    -- Recursive: Calculate start/end based on dependencies
    SELECT 
        t.task_id,
        t.task_name,
        t.duration_days,
        t.depends_on,
        t.priority,
        ts.end_day AS start_day,
        ts.end_day + t.duration_days AS end_day,
        ts.dependency_chain || t.task_id,
        ts.chain_text || ' → ' || t.task_name,
        array_length(ts.dependency_chain, 1)
    FROM project_tasks t
    INNER JOIN task_schedule ts ON t.depends_on = ts.task_id
    WHERE t.task_id != ALL(ts.dependency_chain)  -- No cycles
)
SELECT 
    task_id,
    task_name,
    duration_days,
    start_day,
    end_day,
    end_day - start_day AS actual_duration,
    dependency_chain,
    -- Calculate total project duration
    MAX(end_day) OVER () AS total_project_days,
    -- Identify critical path tasks
    CASE 
        WHEN end_day = MAX(end_day) OVER () THEN 'CRITICAL'
        WHEN end_day >= MAX(end_day) OVER () - 5 THEN 'NEAR CRITICAL'
        ELSE 'NORMAL'
    END AS status
FROM task_schedule
ORDER BY start_day, task_id;

Output:

task_idtask_nameduration_daysstart_dayend_dayactual_durationdependency_chaintotal_project_daysstatus
1Requirements5055{1}53CRITICAL
2Design1051510{1,2}53CRITICAL
3Database Schema715227{1,2,3}53CRITICAL
4API Development15153015{1,2,4}53CRITICAL
5Frontend Dev12152712{1,2,5}53NORMAL
6Integration830388{1,2,4,6}53CRITICAL
7Testing10304010{1,2,4,7}53NORMAL
8Deployment327303{1,2,5,8}53NORMAL
9Documentation530355{1,2,4,9}53NORMAL
10Go Live138391{1,2,4,6,10}53CRITICAL

ℹ️

Critical Path: The longest path through the project network determines minimum project duration. Tasks on this path have zero slack.

Cycle Detection Pattern

-- Detect cycles in graph
WITH RECURSIVE cycle_detection AS (
    SELECT 
        source_node,
        target_node,
        1 AS depth,
        ARRAY[source_node, target_node] AS path,
        FALSE AS has_cycle
    FROM network_connections
    
    UNION ALL
    
    SELECT 
        cd.source_node,
        nc.target_node,
        cd.depth + 1,
        cd.path || nc.target_node,
        nc.target_node = ANY(cd.path)  -- Cycle detected!
    FROM cycle_detection cd
    INNER JOIN network_connections nc ON cd.target_node = nc.source_node
    WHERE NOT cd.has_cycle
        AND cd.depth < 100  -- Safety limit
)
SELECT 
    source_node,
    target_node,
    depth,
    path,
    has_cycle
FROM cycle_detection
WHERE has_cycle = TRUE;

Mathematical Formulation

For a directed graph G=(V,E)G = (V, E), the shortest path problem seeks:

minpP(s,t)(u,v)pw(u,v)\min_{p \in P(s,t)} \sum_{(u,v) \in p} w(u,v)

Where P(s,t)P(s,t) is the set of all paths from ss to tt and w(u,v)w(u,v) is the edge weight.

The recursive CTE implements a modified Dijkstra's algorithm:

d(v)=min(u,v)E{d(u)+w(u,v)}d(v) = \min_{(u,v) \in E} \{d(u) + w(u,v)\}

Time complexity: O(VE)O(V \cdot E) in worst case with recursive CTE.

⚠️

Performance Warning: Recursive CTEs can be expensive for deep hierarchies. Consider materializing results or using iterative approaches for production systems.

Advertisement