Interview Question: "What's the difference between ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW versus RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW? When would you use GROUPS?" — Asked at Google, Facebook, Amazon for Data Scientist roles
ℹ️
Difficulty: Advanced | Companies: Google, Facebook, Amazon, Microsoft, Apple | Time: 45-60 minutes
Window Frame Syntax
The complete window frame syntax:
-- Create time-series data for frame analysis
CREATE TABLE stock_prices (
trade_date DATE,
symbol VARCHAR(10),
price DECIMAL(10,2),
volume INT
);
INSERT INTO stock_prices VALUES
('2024-01-01', 'AAPL', 150.00, 1000000),
('2024-01-02', 'AAPL', 152.50, 1200000),
('2024-01-03', 'AAPL', 148.75, 800000),
('2024-01-04', 'AAPL', 155.00, 1500000),
('2024-01-05', 'AAPL', 153.25, 900000),
('2024-01-08', 'AAPL', 157.50, 1100000),
('2024-01-09', 'AAPL', 159.00, 1300000),
('2024-01-10', 'AAPL', 156.75, 700000),
('2024-01-11', 'AAPL', 160.00, 1400000),
('2024-01-12', 'AAPL', 162.50, 1600000);
ROWS Frame Type
ROWS counts physical rows regardless of values:
-- ROWS: Fixed number of physical rows
SELECT
trade_date,
price,
volume,
-- 3-day moving average (ROWS)
AVG(price) OVER (
ORDER BY trade_date
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) AS moving_avg_3day,
-- 5-day moving sum
SUM(volume) OVER (
ORDER BY trade_date
ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING
) AS moving_sum_5day,
-- Running average from start
AVG(price) OVER (
ORDER BY trade_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS running_avg,
-- Entire dataset average
AVG(price) OVER (
ORDER BY trade_date
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS overall_avg
FROM stock_prices
ORDER BY trade_date;
Output:
| trade_date | price | volume | moving_avg_3day | moving_sum_5day | running_avg | overall_avg |
|---|---|---|---|---|---|---|
| 2024-01-01 | 150.00 | 1000000 | 150.00 | 4500000 | 150.00 | 155.53 |
| 2024-01-02 | 152.50 | 1200000 | 151.25 | 4700000 | 151.25 | 155.53 |
| 2024-01-03 | 148.75 | 800000 | 150.42 | 5400000 | 150.42 | 155.53 |
| 2024-01-04 | 155.00 | 1500000 | 152.08 | 5700000 | 151.56 | 155.53 |
| 2024-01-05 | 153.25 | 900000 | 152.33 | 6800000 | 151.90 | 155.53 |
| 2024-01-08 | 157.50 | 1100000 | 155.25 | 6800000 | 152.83 | 155.53 |
| 2024-01-09 | 159.00 | 1300000 | 156.58 | 7000000 | 153.71 | 155.53 |
| 2024-01-10 | 156.75 | 700000 | 157.75 | 7100000 | 154.09 | 155.53 |
| 2024-01-11 | 160.00 | 1400000 | 158.58 | 7200000 | 154.72 | 155.53 |
| 2024-01-12 | 162.50 | 1600000 | 159.75 | 7100000 | 155.53 | 155.53 |
ℹ️
ROWS Behavior: With ROWS, the frame always contains exactly the specified number of rows. If fewer rows exist, the frame is smaller.
RANGE Frame Type
RANGE includes all rows with values within the logical range:
-- RANGE: Logical value range
SELECT
trade_date,
price,
volume,
-- Sum of prices within ±5 of current price
SUM(price) OVER (
ORDER BY price
RANGE BETWEEN 5 PRECEDING AND 5 FOLLOWING
) AS sum_within_range,
-- Count of stocks within 10% of current price
COUNT(*) OVER (
ORDER BY price
RANGE BETWEEN price * 0.1 PRECEDING AND price * 0.1 FOLLOWING
) AS count_nearby_prices,
-- Average of all rows with price <= current
AVG(volume) OVER (
ORDER BY price
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS avg_volume_lower_prices
FROM stock_prices
ORDER BY trade_date;
Output:
| trade_date | price | volume | sum_within_range | count_nearby_prices | avg_volume_lower_prices |
|---|---|---|---|---|---|
| 2024-01-01 | 150.00 | 1000000 | 451.25 | 3 | 1000000.00 |
| 2024-01-02 | 152.50 | 1200000 | 604.50 | 4 | 1100000.00 |
| 2024-01-03 | 148.75 | 800000 | 451.25 | 3 | 900000.00 |
| 2024-01-04 | 155.00 | 1500000 | 609.50 | 4 | 1166666.67 |
| 2024-01-05 | 153.25 | 900000 | 609.50 | 4 | 1100000.00 |
| 2024-01-08 | 157.50 | 1100000 | 467.50 | 2 | 1100000.00 |
| 2024-01-09 | 159.00 | 1300000 | 479.00 | 2 | 1200000.00 |
| 2024-01-10 | 156.75 | 700000 | 474.25 | 3 | 1033333.33 |
| 2024-01-11 | 160.00 | 1400000 | 320.00 | 1 | 1160000.00 |
| 2024-01-12 | 162.50 | 1600000 | 162.50 | 1 | 1200000.00 |
GROUPS Frame Type
GROUPS counts peer groups (rows with same ORDER BY value):
-- GROUPS: Peer group counting
SELECT
trade_date,
price,
volume,
-- Sum of 1 peer group before and after
SUM(price) OVER (
ORDER BY price
GROUPS BETWEEN 1 PRECEDING AND 1 FOLLOWING
) AS sum_peer_groups,
-- Count with GROUPS
COUNT(*) OVER (
ORDER BY volume
GROUPS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS count_peer_groups,
-- Average with GROUPS
AVG(price) OVER (
ORDER BY volume
GROUPS BETWEEN 2 PRECEDING AND 2 FOLLOWING
) AS avg_peer_groups
FROM stock_prices
ORDER BY trade_date;
Frame Boundary Comparison
-- Compare all three frame types
SELECT
trade_date,
price,
-- ROWS frame
SUM(price) OVER (
ORDER BY trade_date
ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING
) AS sum_rows,
-- RANGE frame (by value)
SUM(price) OVER (
ORDER BY price
RANGE BETWEEN 2 PRECEDING AND 2 FOLLOWING
) AS sum_range,
-- GROUPS frame
SUM(price) OVER (
ORDER BY price
GROUPS BETWEEN 1 PRECEDING AND 1 FOLLOWING
) AS sum_groups
FROM stock_prices
ORDER BY trade_date;
Output:
| trade_date | price | sum_rows | sum_range | sum_groups |
|---|---|---|---|---|
| 2024-01-01 | 150.00 | 302.50 | 451.25 | 604.50 |
| 2024-01-02 | 152.50 | 451.25 | 604.50 | 609.50 |
| 2024-01-03 | 148.75 | 456.25 | 451.25 | 604.50 |
| 2024-01-04 | 155.00 | 456.25 | 609.50 | 609.50 |
| 2024-01-05 | 153.25 | 466.25 | 609.50 | 609.50 |
| 2024-01-08 | 157.50 | 468.25 | 467.50 | 479.00 |
| 2024-01-09 | 159.00 | 473.25 | 479.00 | 479.00 |
| 2024-01-10 | 156.75 | 476.75 | 474.25 | 474.25 |
| 2024-01-11 | 160.00 | 479.25 | 320.00 | 160.00 |
| 2024-01-12 | 162.50 | 322.50 | 162.50 | 162.50 |
⚠️
Frame Confusion: RANGE with numeric ORDER BY uses ±n as value range, not row count. This can produce unexpected results.
EXCLUDE Clause Variations
-- EXCLUDE CURRENT ROW
SELECT
trade_date,
price,
AVG(price) OVER (
ORDER BY trade_date
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
EXCLUDE CURRENT ROW
) AS avg_excluding_current
FROM stock_prices
ORDER BY trade_date;
-- EXCLUDE GROUP (excludes all peers)
SELECT
trade_date,
price,
AVG(price) OVER (
ORDER BY price
GROUPS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
EXCLUDE GROUP
) AS avg_excluding_peers
FROM stock_prices
ORDER BY trade_date;
-- EXCLUDE TIES (excludes peers but keeps current)
SELECT
trade_date,
price,
AVG(price) OVER (
ORDER BY price
GROUPS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
EXCLUDE TIES
) AS avg_excluding_ties
FROM stock_prices
ORDER BY trade_date;
Advanced: Moving Percentiles
-- 7-day moving percentile
SELECT
trade_date,
price,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY price) OVER (
ORDER BY trade_date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS moving_median_7day,
PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY price) OVER (
ORDER BY trade_date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS moving_q1_7day,
PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY price) OVER (
ORDER BY trade_date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS moving_q3_7day
FROM stock_prices
ORDER BY trade_date;
Mathematical Formulas
For a window frame over ordered set :
The aggregate over frame :
ℹ️
Frame Default: Default frame is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW if ORDER BY is specified, or ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING without ORDER BY.
Frame Performance Implications
| Frame Type | Sort Required | Memory Usage | Speed |
|---|---|---|---|
| ROWS | Physical | Low | Fast |
| RANGE | Logical value | Medium | Medium |
| GROUPS | Peer groups | Medium | Medium |
-- Check frame execution cost
EXPLAIN (ANALYZE, BUFFERS)
SELECT
trade_date,
AVG(price) OVER (
ORDER BY trade_date
ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING
) AS moving_avg
FROM stock_prices;
⚠️
Performance Warning: RANGE frames with non-integer ORDER BY values can be expensive because the database must evaluate the range for each row.