Interview Question: "Explain the difference between running total and moving average. How do you calculate year-over-year growth using window functions? What are the performance implications?" — Asked at Tableau, Looker, Databricks for Analytics roles
ℹ️
Difficulty: Advanced | Companies: Tableau, Looker, Databricks, Snowflake, Google BigQuery | Time: 45-60 minutes
Running Totals
-- Create sales data
CREATE TABLE daily_sales (
sale_date DATE PRIMARY KEY,
revenue DECIMAL(12,2),
units_sold INT
);
INSERT INTO daily_sales VALUES
('2024-01-01', 15000.00, 150),
('2024-01-02', 18000.00, 180),
('2024-01-03', 16500.00, 165),
('2024-01-04', 21000.00, 210),
('2024-01-05', 19500.00, 195),
('2024-01-08', 22000.00, 220),
('2024-01-09', 24000.00, 240),
('2024-01-10', 20500.00, 205),
('2024-01-11', 25500.00, 255),
('2024-01-12', 27000.00, 270);
-- Running total
SELECT
sale_date,
revenue,
SUM(revenue) OVER (ORDER BY sale_date) AS running_total,
SUM(units_sold) OVER (ORDER BY sale_date) AS running_units,
-- Running average
AVG(revenue) OVER (ORDER BY sale_date) AS running_avg
FROM daily_sales
ORDER BY sale_date;
Output:
| sale_date | revenue | running_total | running_units | running_avg |
|---|---|---|---|---|
| 2024-01-01 | 15000.00 | 15000.00 | 150 | 15000.00 |
| 2024-01-02 | 18000.00 | 33000.00 | 330 | 16500.00 |
| 2024-01-03 | 16500.00 | 49500.00 | 495 | 16500.00 |
| 2024-01-04 | 21000.00 | 70500.00 | 705 | 17625.00 |
| 2024-01-05 | 19500.00 | 90000.00 | 900 | 18000.00 |
| 2024-01-08 | 22000.00 | 112000.00 | 1120 | 18666.67 |
| 2024-01-09 | 24000.00 | 136000.00 | 1360 | 19428.57 |
| 2024-01-10 | 20500.00 | 156500.00 | 1565 | 19562.50 |
| 2024-01-11 | 25500.00 | 182000.00 | 1820 | 20222.22 |
| 2024-01-12 | 27000.00 | 209000.00 | 2090 | 20900.00 |
Moving Averages
-- Simple Moving Average (SMA)
SELECT
sale_date,
revenue,
-- 3-day SMA
AVG(revenue) OVER (
ORDER BY sale_date
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) AS sma_3day,
-- 5-day SMA
AVG(revenue) OVER (
ORDER BY sale_date
ROWS BETWEEN 4 PRECEDING AND CURRENT ROW
) AS sma_5day,
-- Weighted Moving Average
SUM(revenue * weight) OVER (
ORDER BY sale_date
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) / SUM(weight) OVER (
ORDER BY sale_date
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) AS wma_3day
FROM daily_sales
ORDER BY sale_date;
-- Exponential Moving Average (EMA)
WITH ema_calc AS (
SELECT
sale_date,
revenue,
0.3 AS alpha, -- Smoothing factor
LAG(revenue) OVER (ORDER BY sale_date) AS prev_revenue
FROM daily_sales
)
SELECT
sale_date,
revenue,
CASE
WHEN prev_revenue IS NULL THEN revenue
ELSE alpha * revenue + (1 - alpha) * prev_revenue
END AS ema
FROM ema_calc
ORDER BY sale_date;
Cumulative Distribution
-- Cumulative distribution
SELECT
sale_date,
revenue,
CUME_DIST() OVER (ORDER BY revenue) AS cumulative_dist,
PERCENT_RANK() OVER (ORDER BY revenue) AS percentile_rank,
NTILE(4) OVER (ORDER BY revenue) AS quartile,
NTILE(10) OVER (ORDER BY revenue) AS decile
FROM daily_sales
ORDER BY sale_date;
Output:
| sale_date | revenue | cumulative_dist | percentile_rank | quartile | decile |
|---|---|---|---|---|---|
| 2024-01-01 | 15000.00 | 0.10 | 0.00 | 1 | 1 |
| 2024-01-02 | 18000.00 | 0.20 | 0.11 | 1 | 2 |
| 2024-01-03 | 16500.00 | 0.30 | 0.22 | 2 | 3 |
| 2024-01-04 | 21000.00 | 0.40 | 0.33 | 2 | 4 |
| 2024-01-05 | 19500.00 | 0.50 | 0.44 | 3 | 5 |
| 2024-01-08 | 22000.00 | 0.60 | 0.55 | 3 | 6 |
| 2024-01-09 | 24000.00 | 0.70 | 0.66 | 4 | 7 |
| 2024-01-10 | 20500.00 | 0.80 | 0.77 | 4 | 8 |
| 2024-01-11 | 25500.00 | 0.90 | 0.88 | 4 | 9 |
| 2024-01-12 | 27000.00 | 1.00 | 1.00 | 4 | 10 |
Year-over-Year Growth
-- Year-over-year growth
SELECT
sale_date,
revenue,
LAG(revenue, 1) OVER (ORDER BY sale_date) AS prev_day,
LAG(revenue, 7) OVER (ORDER BY sale_date) AS prev_week,
LAG(revenue, 365) OVER (ORDER BY sale_date) AS prev_year,
-- Day-over-day growth
ROUND(
(revenue - LAG(revenue, 1) OVER (ORDER BY sale_date)) * 100.0 /
NULLIF(LAG(revenue, 1) OVER (ORDER BY sale_date), 0), 2
) AS dod_growth,
-- Week-over-week growth
ROUND(
(revenue - LAG(revenue, 7) OVER (ORDER BY sale_date)) * 100.0 /
NULLIF(LAG(revenue, 7) OVER (ORDER BY sale_date), 0), 2
) AS wow_growth
FROM daily_sales
ORDER BY sale_date;
Advanced Window Aggregates
-- Standard deviation and variance
SELECT
sale_date,
revenue,
AVG(revenue) OVER (ORDER BY sale_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_avg,
STDDEV(revenue) OVER (ORDER BY sale_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_stddev,
VARIANCE(revenue) OVER (ORDER BY sale_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_variance,
-- Coefficient of variation
ROUND(
STDDEV(revenue) OVER (ORDER BY sale_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) * 100.0 /
NULLIF(AVG(revenue) OVER (ORDER BY sale_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW), 0),
2
) AS cv_percent
FROM daily_sales
ORDER BY sale_date;
-- First and last values in window
SELECT
sale_date,
revenue,
FIRST_VALUE(revenue) OVER (ORDER BY sale_date) AS first_value,
LAST_VALUE(revenue) OVER (
ORDER BY sale_date
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS last_value,
NTH_VALUE(revenue, 3) OVER (ORDER BY sale_date) AS third_value
FROM daily_sales
ORDER BY sale_date;
Performance Considerations
| Operation | Time Complexity | Space Complexity |
|---|---|---|
| Running Total | ||
| Moving Average | ||
| Cumulative Dist | ||
| YoY Growth |
-- Optimize window functions
EXPLAIN (ANALYZE, BUFFERS)
SELECT
sale_date,
SUM(revenue) OVER (ORDER BY sale_date) AS running_total
FROM daily_sales;
-- Check for sort operations
EXPLAIN (ANALYZE, BUFFERS)
SELECT
sale_date,
AVG(revenue) OVER (ORDER BY sale_date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS sma
FROM daily_sales;
ℹ️
Pro Tip: Window functions are computed after WHERE but before ORDER BY. Use CTEs to break complex calculations into readable steps.
Mathematical Properties
For running total :
For exponential moving average:
The smoothing factor controls responsiveness:
Cumulative Distribution Functions
-- CDF and percentiles
SELECT
sale_date,
revenue,
CUME_DIST() OVER (ORDER BY revenue) AS cume_dist,
PERCENT_RANK() OVER (ORDER BY revenue) AS percent_rank,
NTILE(4) OVER (ORDER BY revenue) AS quartile,
NTILE(10) OVER (ORDER BY revenue) AS decile,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY revenue) OVER () AS median
FROM daily_sales
ORDER BY sale_date;
Output:
| sale_date | revenue | cume_dist | percent_rank | quartile | decile | median |
|---|---|---|---|---|---|---|
| 2024-01-01 | 15000.00 | 0.10 | 0.00 | 1 | 1 | 20250.00 |
| 2024-01-02 | 18000.00 | 0.20 | 0.11 | 1 | 2 | 20250.00 |
| 2024-01-03 | 16500.00 | 0.30 | 0.22 | 2 | 3 | 20250.00 |
| 2024-01-04 | 21000.00 | 0.40 | 0.33 | 2 | 4 | 20250.00 |
| 2024-01-05 | 19500.00 | 0.50 | 0.44 | 3 | 5 | 20250.00 |
| 2024-01-08 | 22000.00 | 0.60 | 0.55 | 3 | 6 | 20250.00 |
| 2024-01-09 | 24000.00 | 0.70 | 0.66 | 4 | 7 | 20250.00 |
| 2024-01-10 | 20500.00 | 0.80 | 0.77 | 4 | 8 | 20250.00 |
| 2024-01-11 | 25500.00 | 0.90 | 0.88 | 4 | 9 | 20250.00 |
| 2024-01-12 | 27000.00 | 1.00 | 1.00 | 4 | 10 | 20250.00 |
Moving Percentiles
-- 7-day moving percentile
SELECT
sale_date,
revenue,
PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY revenue) OVER (
ORDER BY sale_date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS moving_q1,
PERCENTILE_CONT(0.50) WITHIN GROUP (ORDER BY revenue) OVER (
ORDER BY sale_date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS moving_median,
PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY revenue) OVER (
ORDER BY sale_date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS moving_q3,
PERCENTILE_CONT(0.90) WITHIN GROUP (ORDER BY revenue) OVER (
ORDER BY sale_date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS moving_p90
FROM daily_sales
ORDER BY sale_date;
Z-Score and Outlier Detection
-- Calculate z-scores using window functions
WITH stats AS (
SELECT
sale_date,
revenue,
AVG(revenue) OVER () AS mean_revenue,
STDDEV(revenue) OVER () AS stddev_revenue
FROM daily_sales
)
SELECT
sale_date,
revenue,
ROUND(mean_revenue, 2) AS mean,
ROUND(stddev_revenue, 2) AS stddev,
ROUND((revenue - mean_revenue) / NULLIF(stddev_revenue, 0), 4) AS z_score,
CASE
WHEN ABS((revenue - mean_revenue) / NULLIF(stddev_revenue, 0)) > 2 THEN 'OUTLIER'
WHEN ABS((revenue - mean_revenue) / NULLIF(stddev_revenue, 0)) > 1.5 THEN 'UNUSUAL'
ELSE 'NORMAL'
END AS classification
FROM stats
ORDER BY sale_date;
Output:
| sale_date | revenue | mean | stddev | z_score | classification |
|---|---|---|---|---|---|
| 2024-01-01 | 15000.00 | 20900.00 | 4178.51 | -1.4117 | NORMAL |
| 2024-01-02 | 18000.00 | 20900.00 | 4178.51 | -0.6940 | NORMAL |
| 2024-01-03 | 16500.00 | 20900.00 | 4178.51 | -1.0529 | NORMAL |
| 2024-01-04 | 21000.00 | 20900.00 | 4178.51 | 0.0239 | NORMAL |
| 2024-01-05 | 19500.00 | 20900.00 | 4178.51 | -0.3350 | NORMAL |
| 2024-01-08 | 22000.00 | 20900.00 | 4178.51 | 0.2633 | NORMAL |
| 2024-01-09 | 24000.00 | 20900.00 | 4178.51 | 0.7419 | NORMAL |
| 2024-01-10 | 20500.00 | 20900.00 | 4178.51 | -0.0957 | NORMAL |
| 2024-01-11 | 25500.00 | 20900.00 | 4178.51 | 1.1009 | NORMAL |
| 2024-01-12 | 27000.00 | 20900.00 | 4178.51 | 1.4599 | NORMAL |
Performance Analysis
-- Compare window function performance
EXPLAIN (ANALYZE, BUFFERS)
SELECT
sale_date,
SUM(revenue) OVER (ORDER BY sale_date) AS running_total,
AVG(revenue) OVER (ORDER BY sale_date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS sma_3,
LAG(revenue, 1) OVER (ORDER BY sale_date) AS prev_value
FROM daily_sales;
-- Check sort operations
EXPLAIN (ANALYZE, BUFFERS)
SELECT
sale_date,
revenue,
RANK() OVER (ORDER BY revenue DESC) AS rank
FROM daily_sales;
Window Function Optimization Tips
- Partition wisely: Use PARTITION BY to reduce window size
- Frame bounds: Use precise ROWS bounds instead of RANGE
- Index support: Ensure ORDER BY columns are indexed
- Avoid repeated sorts: Combine window functions with same ORDER BY
- Materialize CTEs: Use MATERIALIZED for complex window calculations
ℹ️
Performance Tip: Window functions are computed after WHERE but before ORDER BY. Use CTEs to break complex calculations into readable steps.
Mathematical Properties
For running total :
For exponential moving average:
The smoothing factor controls responsiveness:
Time complexity for window operations:
Space complexity:
⚠️
Edge Cases: Window functions return NULL for rows where the frame is empty. Handle with COALESCE or CASE statements.