Interview Question: "Explain the difference between ROLLUP and CUBE. How do you pivot rows to columns without a CASE statement? What are GROUPING SETS and when would you use them?" — Asked at Tableau, Looker, Databricks for Analytics Engineer roles
ℹ️
Difficulty: Advanced | Companies: Tableau, Looker, Databricks, Snowflake, Google BigQuery | Time: 45-60 minutes
GROUPING SETS Fundamentals
GROUPING SETS allow multiple grouping specifications in one query:
-- Create sales data for analysis
CREATE TABLE regional_sales (
year INT,
quarter INT,
region VARCHAR(50),
product_category VARCHAR(50),
sales_amount DECIMAL(12,2),
quantity INT
);
INSERT INTO regional_sales VALUES
(2023, 1, 'North', 'Electronics', 150000.00, 1200),
(2023, 1, 'North', 'Clothing', 80000.00, 2400),
(2023, 1, 'South', 'Electronics', 120000.00, 960),
(2023, 1, 'South', 'Clothing', 95000.00, 2850),
(2023, 2, 'North', 'Electronics', 165000.00, 1320),
(2023, 2, 'North', 'Clothing', 75000.00, 2250),
(2023, 2, 'South', 'Electronics', 135000.00, 1080),
(2023, 2, 'South', 'Clothing', 88000.00, 2640),
(2024, 1, 'North', 'Electronics', 180000.00, 1440),
(2024, 1, 'North', 'Clothing', 85000.00, 2550),
(2024, 1, 'South', 'Electronics', 145000.00, 1160),
(2024, 1, 'South', 'Clothing', 92000.00, 2760);
-- GROUPING SETS: Multiple aggregations in one query
SELECT
year,
region,
product_category,
SUM(sales_amount) AS total_sales,
SUM(quantity) AS total_qty,
GROUPING(year) AS is_year_agg,
GROUPING(region) AS is_region_agg,
GROUPING(product_category) AS is_category_agg
FROM regional_sales
GROUP BY GROUPING SETS (
(year, region, product_category), -- Detail level
(year, region), -- Region subtotals
(year), -- Year totals
() -- Grand total
)
ORDER BY year, region, product_category;
Output:
| year | region | product_category | total_sales | total_qty | is_year_agg | is_region_agg | is_category_agg |
|---|---|---|---|---|---|---|---|
| 2023 | North | Clothing | 155000.00 | 4650 | 0 | 0 | 0 |
| 2023 | North | Electronics | 315000.00 | 2520 | 0 | 0 | 0 |
| 2023 | North | * | 470000.00 | 7170 | 0 | 0 | 1 |
| 2023 | South | Clothing | 183000.00 | 5490 | 0 | 0 | 0 |
| 2023 | South | Electronics | 255000.00 | 2040 | 0 | 0 | 0 |
| 2023 | South | * | 438000.00 | 7530 | 0 | 0 | 1 |
| 2023 | * | * | 908000.00 | 14700 | 0 | 1 | 1 |
| 2024 | North | Clothing | 85000.00 | 2550 | 0 | 0 | 0 |
| 2024 | North | Electronics | 180000.00 | 1440 | 0 | 0 | 0 |
| 2024 | North | * | 265000.00 | 3990 | 0 | 0 | 1 |
| 2024 | South | Clothing | 92000.00 | 2760 | 0 | 0 | 0 |
| 2024 | South | Electronics | 145000.00 | 1160 | 0 | 0 | 0 |
| 2024 | South | * | 237000.00 | 3920 | 0 | 0 | 1 |
| 2024 | * | * | 502000.00 | 7910 | 0 | 1 | 1 |
| * | * | * | 1410000.00 | 22610 | 1 | 1 | 1 |
ℹ️
GROUPING Function: Returns 1 if column is aggregated (NULL from grouping), 0 if column is part of the GROUP BY.
ROLLUP vs CUBE
-- ROLLUP: Hierarchical subtotals
SELECT
year,
quarter,
region,
SUM(sales_amount) AS total_sales
FROM regional_sales
GROUP BY ROLLUP(year, quarter, region)
ORDER BY year, quarter, region;
-- CUBE: All possible combinations
SELECT
year,
quarter,
region,
SUM(sales_amount) AS total_sales
FROM regional_sales
GROUP BY CUBE(year, quarter, region)
ORDER BY year, quarter, region;
ROLLUP Output:
| year | quarter | region | total_sales |
|---|---|---|---|
| 2023 | 1 | North | 230000.00 |
| 2023 | 1 | South | 215000.00 |
| 2023 | 1 | * | 445000.00 |
| 2023 | 2 | North | 240000.00 |
| 2023 | 2 | South | 223000.00 |
| 2023 | 2 | * | 463000.00 |
| 2023 | * | * | 908000.00 |
| 2024 | 1 | North | 265000.00 |
| 2024 | 1 | South | 237000.00 |
| 2024 | 1 | * | 502000.00 |
| 2024 | * | * | 502000.00 |
| * | * | * | 1410000.00 |
Pivoting Rows to Columns
-- Static pivot using conditional aggregation
SELECT
year,
SUM(CASE WHEN region = 'North' THEN sales_amount ELSE 0 END) AS north_sales,
SUM(CASE WHEN region = 'South' THEN sales_amount ELSE 0 END) AS south_sales,
SUM(CASE WHEN region = 'East' THEN sales_amount ELSE 0 END) AS east_sales,
SUM(CASE WHEN region = 'West' THEN sales_amount ELSE 0 END) AS west_sales,
SUM(sales_amount) AS total_sales
FROM regional_sales
WHERE year = 2024
GROUP BY year;
-- Pivot using crosstab (requires tablefunc extension)
CREATE EXTENSION IF NOT EXISTS tablefunc;
SELECT * FROM crosstab(
'SELECT year, region, SUM(sales_amount)::INT
FROM regional_sales
GROUP BY year, region
ORDER BY year, region',
'SELECT DISTINCT region FROM regional_sales ORDER BY region'
) AS ct(year INT, east INT, north INT, south INT, west INT);
Pivot Output:
| year | east_sales | north_sales | south_sales | west_sales | total_sales |
|---|---|---|---|---|---|
| 2023 | 0 | 470000.00 | 438000.00 | 0 | 908000.00 |
| 2024 | 0 | 265000.00 | 237000.00 | 0 | 502000.00 |
Unpivoting Columns to Rows
-- Create pivoted table
CREATE TABLE yearly_sales (
year INT,
north_sales DECIMAL(12,2),
south_sales DECIMAL(12,2),
east_sales DECIMAL(12,2),
west_sales DECIMAL(12,2)
);
INSERT INTO yearly_sales VALUES
(2023, 470000.00, 438000.00, 0, 0),
(2024, 265000.00, 237000.00, 0, 0);
-- Unpivot using UNION ALL
SELECT year, 'North' AS region, north_sales AS sales_amount
FROM yearly_sales
UNION ALL
SELECT year, 'South' AS region, south_sales AS sales_amount
FROM yearly_sales
UNION ALL
SELECT year, 'East' AS region, east_sales AS sales_amount
FROM yearly_sales
UNION ALL
SELECT year, 'West' AS region, west_sales AS sales_amount
FROM yearly_sales
ORDER BY year, region;
-- Unpivot using LATERAL JOIN
SELECT
y.year,
r.region,
r.sales_amount
FROM yearly_sales y
CROSS JOIN LATERAL (
VALUES
('North', y.north_sales),
('South', y.south_sales),
('East', y.east_sales),
('West', y.west_sales)
) AS r(region, sales_amount)
ORDER BY y.year, r.region;
Unpivot Output:
| year | region | sales_amount |
|---|---|---|
| 2023 | East | 0 |
| 2023 | North | 470000.00 |
| 2023 | South | 438000.00 |
| 2023 | West | 0 |
| 2024 | East | 0 |
| 2024 | North | 265000.00 |
| 2024 | South | 237000.00 |
| 2024 | West | 0 |
⚠️
Performance: Unpivoting with UNION ALL creates multiple scans. For large datasets, consider using LATERAL JOIN or stored procedures.
Dynamic Pivot with EXECUTE
-- Dynamic pivot for unknown number of columns
DO $$
DECLARE
sql TEXT;
regions TEXT;
BEGIN
-- Get distinct regions
SELECT STRING_AGG(DISTINCT region, ', ' ORDER BY region) INTO regions
FROM regional_sales;
-- Build dynamic query
sql := 'SELECT year, ';
sql := sql || STRING_AGG(
format('SUM(CASE WHEN region = %L THEN sales_amount ELSE 0 END) AS %I_sales',
region, LOWER(region)),
', '
);
sql := sql || ' FROM regional_sales GROUP BY year ORDER BY year';
-- Execute and return results
RAISE NOTICE '%', sql;
EXECUTE sql;
END $$;
Window Functions with GROUPING SETS
-- Combine window functions with GROUPING SETS
WITH regional_totals AS (
SELECT
year,
region,
SUM(sales_amount) AS total_sales
FROM regional_sales
GROUP BY GROUPING SETS ((year, region), (year), ())
)
SELECT
year,
region,
total_sales,
-- Rank within year
RANK() OVER (
PARTITION BY year
ORDER BY total_sales DESC
) AS rank_in_year,
-- Percentage of year total
ROUND(
total_sales * 100.0 /
SUM(total_sales) OVER (PARTITION BY year),
2
) AS pct_of_year,
-- Running total
SUM(total_sales) OVER (
ORDER BY year, region
ROWS UNBOUNDED PRECEDING
) AS running_total
FROM regional_totals
ORDER BY year, region NULLS LAST;
Mathematical Properties
For GROUPING SETS with columns:
The aggregation level hierarchy:
Advanced: LISTAGG and String Aggregation
-- Aggregate strings with LISTAGG
SELECT
year,
region,
LISTAGG(DISTINCT product_category, ', ') WITHIN GROUP (ORDER BY product_category) AS categories,
STRING_AGG(DISTINCT product_category, ' | ' ORDER BY product_category) AS categories_alt
FROM regional_sales
GROUP BY year, region
ORDER BY year, region;
Output:
| year | region | categories | categories_alt |
|---|---|---|---|
| 2023 | North | Clothing, Electronics | Clothing | Electronics |
| 2023 | South | Clothing, Electronics | Clothing | Electronics |
| 2024 | North | Clothing, Electronics | Clothing | Electronics |
| 2024 | South | Clothing, Electronics | Clothing | Electronics |
ℹ️
LISTAGG vs STRING_AGG: LISTAGG is SQL standard, STRING_AGG is PostgreSQL-specific. Both support DISTINCT, ORDER BY, and separators.
Pivot with Aggregation Functions
-- Pivot with multiple aggregation functions
SELECT
year,
region,
COUNT(*) AS transaction_count,
SUM(sales_amount) AS total_sales,
AVG(sales_amount) AS avg_sale,
MAX(sales_amount) AS max_sale,
MIN(sales_amount) AS min_sale,
STDDEV(sales_amount) AS stddev_sale
FROM regional_sales
GROUP BY year, region
ORDER BY year, region;
Performance Considerations
| Operation | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| GROUPING SETS | Sort + aggregate | ||
| ROLLUP | Hierarchical | ||
| CUBE | All combinations | ||
| PIVOT | Conditional | ||
| UNPIVOT | UNION ALL |
⚠️
Memory Warning: CUBE with many columns generates exponential grouping sets. Monitor memory usage with EXPLAIN ANALYZE.