🎉 75% of content is free forever — Unlock Premium from $10/mo →
CW
Search courses…
💼 Servicesℹ️ About✉️ ContactView Pricing Plansfrom $10

Analytical SQL: Pivoting, Unpivoting, Grouping Sets, Cube

Advanced SQLAnalytical Patterns⭐ Premium

Advertisement

Interview Question: "Explain the difference between ROLLUP and CUBE. How do you pivot rows to columns without a CASE statement? What are GROUPING SETS and when would you use them?" — Asked at Tableau, Looker, Databricks for Analytics Engineer roles

ℹ️

Difficulty: Advanced | Companies: Tableau, Looker, Databricks, Snowflake, Google BigQuery | Time: 45-60 minutes

GROUPING SETS Fundamentals

GROUPING SETS allow multiple grouping specifications in one query:

GROUPING SETS={(A),(B),(A,B),()}\text{GROUPING SETS} = \{(A), (B), (A,B), ()\}
-- Create sales data for analysis
CREATE TABLE regional_sales (
    year INT,
    quarter INT,
    region VARCHAR(50),
    product_category VARCHAR(50),
    sales_amount DECIMAL(12,2),
    quantity INT
);

INSERT INTO regional_sales VALUES
(2023, 1, 'North', 'Electronics', 150000.00, 1200),
(2023, 1, 'North', 'Clothing', 80000.00, 2400),
(2023, 1, 'South', 'Electronics', 120000.00, 960),
(2023, 1, 'South', 'Clothing', 95000.00, 2850),
(2023, 2, 'North', 'Electronics', 165000.00, 1320),
(2023, 2, 'North', 'Clothing', 75000.00, 2250),
(2023, 2, 'South', 'Electronics', 135000.00, 1080),
(2023, 2, 'South', 'Clothing', 88000.00, 2640),
(2024, 1, 'North', 'Electronics', 180000.00, 1440),
(2024, 1, 'North', 'Clothing', 85000.00, 2550),
(2024, 1, 'South', 'Electronics', 145000.00, 1160),
(2024, 1, 'South', 'Clothing', 92000.00, 2760);

-- GROUPING SETS: Multiple aggregations in one query
SELECT 
    year,
    region,
    product_category,
    SUM(sales_amount) AS total_sales,
    SUM(quantity) AS total_qty,
    GROUPING(year) AS is_year_agg,
    GROUPING(region) AS is_region_agg,
    GROUPING(product_category) AS is_category_agg
FROM regional_sales
GROUP BY GROUPING SETS (
    (year, region, product_category),  -- Detail level
    (year, region),                    -- Region subtotals
    (year),                            -- Year totals
    ()                                 -- Grand total
)
ORDER BY year, region, product_category;

Output:

yearregionproduct_categorytotal_salestotal_qtyis_year_aggis_region_aggis_category_agg
2023NorthClothing155000.004650000
2023NorthElectronics315000.002520000
2023North*470000.007170001
2023SouthClothing183000.005490000
2023SouthElectronics255000.002040000
2023South*438000.007530001
2023**908000.0014700011
2024NorthClothing85000.002550000
2024NorthElectronics180000.001440000
2024North*265000.003990001
2024SouthClothing92000.002760000
2024SouthElectronics145000.001160000
2024South*237000.003920001
2024**502000.007910011
***1410000.0022610111

ℹ️

GROUPING Function: Returns 1 if column is aggregated (NULL from grouping), 0 if column is part of the GROUP BY.

ROLLUP vs CUBE

ROLLUP(A,B,C)={(A,B,C),(A,B),(A),()}\text{ROLLUP}(A, B, C) = \{(A,B,C), (A,B), (A), ()\}
CUBE(A,B,C)={(A,B,C),(A,B),(A,C),(B,C),(A),(B),(C),()}\text{CUBE}(A, B, C) = \{(A,B,C), (A,B), (A,C), (B,C), (A), (B), (C), ()\}
-- ROLLUP: Hierarchical subtotals
SELECT 
    year,
    quarter,
    region,
    SUM(sales_amount) AS total_sales
FROM regional_sales
GROUP BY ROLLUP(year, quarter, region)
ORDER BY year, quarter, region;

-- CUBE: All possible combinations
SELECT 
    year,
    quarter,
    region,
    SUM(sales_amount) AS total_sales
FROM regional_sales
GROUP BY CUBE(year, quarter, region)
ORDER BY year, quarter, region;

ROLLUP Output:

yearquarterregiontotal_sales
20231North230000.00
20231South215000.00
20231*445000.00
20232North240000.00
20232South223000.00
20232*463000.00
2023**908000.00
20241North265000.00
20241South237000.00
20241*502000.00
2024**502000.00
***1410000.00

Pivoting Rows to Columns

-- Static pivot using conditional aggregation
SELECT 
    year,
    SUM(CASE WHEN region = 'North' THEN sales_amount ELSE 0 END) AS north_sales,
    SUM(CASE WHEN region = 'South' THEN sales_amount ELSE 0 END) AS south_sales,
    SUM(CASE WHEN region = 'East' THEN sales_amount ELSE 0 END) AS east_sales,
    SUM(CASE WHEN region = 'West' THEN sales_amount ELSE 0 END) AS west_sales,
    SUM(sales_amount) AS total_sales
FROM regional_sales
WHERE year = 2024
GROUP BY year;

-- Pivot using crosstab (requires tablefunc extension)
CREATE EXTENSION IF NOT EXISTS tablefunc;

SELECT * FROM crosstab(
    'SELECT year, region, SUM(sales_amount)::INT 
     FROM regional_sales 
     GROUP BY year, region 
     ORDER BY year, region',
    'SELECT DISTINCT region FROM regional_sales ORDER BY region'
) AS ct(year INT, east INT, north INT, south INT, west INT);

Pivot Output:

yeareast_salesnorth_salessouth_saleswest_salestotal_sales
20230470000.00438000.000908000.00
20240265000.00237000.000502000.00

Unpivoting Columns to Rows

-- Create pivoted table
CREATE TABLE yearly_sales (
    year INT,
    north_sales DECIMAL(12,2),
    south_sales DECIMAL(12,2),
    east_sales DECIMAL(12,2),
    west_sales DECIMAL(12,2)
);

INSERT INTO yearly_sales VALUES
(2023, 470000.00, 438000.00, 0, 0),
(2024, 265000.00, 237000.00, 0, 0);

-- Unpivot using UNION ALL
SELECT year, 'North' AS region, north_sales AS sales_amount
FROM yearly_sales
UNION ALL
SELECT year, 'South' AS region, south_sales AS sales_amount
FROM yearly_sales
UNION ALL
SELECT year, 'East' AS region, east_sales AS sales_amount
FROM yearly_sales
UNION ALL
SELECT year, 'West' AS region, west_sales AS sales_amount
FROM yearly_sales
ORDER BY year, region;

-- Unpivot using LATERAL JOIN
SELECT 
    y.year,
    r.region,
    r.sales_amount
FROM yearly_sales y
CROSS JOIN LATERAL (
    VALUES 
        ('North', y.north_sales),
        ('South', y.south_sales),
        ('East', y.east_sales),
        ('West', y.west_sales)
) AS r(region, sales_amount)
ORDER BY y.year, r.region;

Unpivot Output:

yearregionsales_amount
2023East0
2023North470000.00
2023South438000.00
2023West0
2024East0
2024North265000.00
2024South237000.00
2024West0

⚠️

Performance: Unpivoting with UNION ALL creates multiple scans. For large datasets, consider using LATERAL JOIN or stored procedures.

Dynamic Pivot with EXECUTE

-- Dynamic pivot for unknown number of columns
DO $$
DECLARE
    sql TEXT;
    regions TEXT;
BEGIN
    -- Get distinct regions
    SELECT STRING_AGG(DISTINCT region, ', ' ORDER BY region) INTO regions
    FROM regional_sales;
    
    -- Build dynamic query
    sql := 'SELECT year, ';
    sql := sql || STRING_AGG(
        format('SUM(CASE WHEN region = %L THEN sales_amount ELSE 0 END) AS %I_sales', 
               region, LOWER(region)),
        ', '
    );
    sql := sql || ' FROM regional_sales GROUP BY year ORDER BY year';
    
    -- Execute and return results
    RAISE NOTICE '%', sql;
    EXECUTE sql;
END $$;

Window Functions with GROUPING SETS

-- Combine window functions with GROUPING SETS
WITH regional_totals AS (
    SELECT 
        year,
        region,
        SUM(sales_amount) AS total_sales
    FROM regional_sales
    GROUP BY GROUPING SETS ((year, region), (year), ())
)
SELECT 
    year,
    region,
    total_sales,
    -- Rank within year
    RANK() OVER (
        PARTITION BY year 
        ORDER BY total_sales DESC
    ) AS rank_in_year,
    -- Percentage of year total
    ROUND(
        total_sales * 100.0 / 
        SUM(total_sales) OVER (PARTITION BY year),
        2
    ) AS pct_of_year,
    -- Running total
    SUM(total_sales) OVER (
        ORDER BY year, region
        ROWS UNBOUNDED PRECEDING
    ) AS running_total
FROM regional_totals
ORDER BY year, region NULLS LAST;

Mathematical Properties

For GROUPING SETS with nn columns:

Number of grouping sets=2n (for CUBE)\text{Number of grouping sets} = 2^n \text{ (for CUBE)}
Number of grouping sets=n+1 (for ROLLUP)\text{Number of grouping sets} = n + 1 \text{ (for ROLLUP)}

The aggregation level hierarchy:

Level(S)=Swhere S{A1,A2,...,An}\text{Level}(S) = |S| \quad \text{where } S \subseteq \{A_1, A_2, ..., A_n\}

Advanced: LISTAGG and String Aggregation

-- Aggregate strings with LISTAGG
SELECT 
    year,
    region,
    LISTAGG(DISTINCT product_category, ', ') WITHIN GROUP (ORDER BY product_category) AS categories,
    STRING_AGG(DISTINCT product_category, ' | ' ORDER BY product_category) AS categories_alt
FROM regional_sales
GROUP BY year, region
ORDER BY year, region;

Output:

yearregioncategoriescategories_alt
2023NorthClothing, ElectronicsClothing | Electronics
2023SouthClothing, ElectronicsClothing | Electronics
2024NorthClothing, ElectronicsClothing | Electronics
2024SouthClothing, ElectronicsClothing | Electronics

ℹ️

LISTAGG vs STRING_AGG: LISTAGG is SQL standard, STRING_AGG is PostgreSQL-specific. Both support DISTINCT, ORDER BY, and separators.

Pivot with Aggregation Functions

-- Pivot with multiple aggregation functions
SELECT 
    year,
    region,
    COUNT(*) AS transaction_count,
    SUM(sales_amount) AS total_sales,
    AVG(sales_amount) AS avg_sale,
    MAX(sales_amount) AS max_sale,
    MIN(sales_amount) AS min_sale,
    STDDEV(sales_amount) AS stddev_sale
FROM regional_sales
GROUP BY year, region
ORDER BY year, region;

Performance Considerations

OperationTime ComplexitySpace ComplexityNotes
GROUPING SETSO(nlogn)O(n \log n)O(n)O(n)Sort + aggregate
ROLLUPO(nlogn)O(n \log n)O(n)O(n)Hierarchical
CUBEO(nlogn)O(n \log n)O(2k)O(2^k)All combinations
PIVOTO(n)O(n)O(n)O(n)Conditional
UNPIVOTO(nk)O(n \cdot k)O(nk)O(n \cdot k)UNION ALL

⚠️

Memory Warning: CUBE with many columns generates exponential grouping sets. Monitor memory usage with EXPLAIN ANALYZE.

Advertisement