🎉 75% of content is free forever — Unlock Premium from $10/mo →
CW
Search courses…
💼 Servicesℹ️ About✉️ ContactView Pricing Plansfrom $10

Window Functions: NTILE, LAG/LEAD, RANGE, GROUPS, EXCLUDE

Advanced SQLWindow Functions⭐ Premium

Advertisement

Interview Question: "Explain the difference between ROWS, RANGE, and GROUPS frame types. When would you use NTILE versus ROW_NUMBER? How does EXCLUDE CURRENT ROW differ from EXCLUDE TIES?" — Asked at Google, Meta, Amazon for Senior Data Engineer roles

ℹ️

Difficulty: Advanced | Companies: Google, Meta, Amazon, Apple, Microsoft | Time: 45-60 minutes

Understanding Window Frame Precedence

The SQL standard defines a precise order of operations for window functions:

FROMWHEREGROUP BYHAVINGSELECTORDER BYLIMIT\text{FROM} \rightarrow \text{WHERE} \rightarrow \text{GROUP BY} \rightarrow \text{HAVING} \rightarrow \text{SELECT} \rightarrow \text{ORDER BY} \rightarrow \text{LIMIT}

Window functions operate between GROUP BY/HAVING and ORDER BY:

Window Function=iframevalue(i)frame\text{Window Function} = \frac{\sum_{i \in \text{frame}} \text{value}(i)}{|\text{frame}|}

The frame clause defines which rows participate in the computation:

frame=ROWSRANGEGROUPS\text{frame} = \text{ROWS} \mid \text{RANGE} \mid \text{GROUPS}

NTILE Distribution Analysis

NTILE distributes rows into a specified number of roughly equal groups:

NTILE(n)=row_numberN/n\text{NTILE}(n) = \left\lceil \frac{\text{row\_number}}{\lceil N/n \rceil} \right\rceil

Where NN is total rows and nn is number of buckets.

-- Create sample sales data
CREATE TABLE daily_sales (
    sale_date DATE,
    region VARCHAR(50),
    revenue DECIMAL(12,2)
);

INSERT INTO daily_sales VALUES
('2024-01-01', 'North', 15000.00),
('2024-01-01', 'South', 22000.00),
('2024-01-02', 'North', 18000.00),
('2024-01-02', 'South', 19500.00),
('2024-01-03', 'North', 21000.00),
('2024-01-03', 'South', 25000.00),
('2024-01-04', 'North', 16500.00),
('2024-01-04', 'South', 20000.00),
('2024-01-05', 'North', 23000.00),
('2024-01-05', 'South', 28000.00);

-- NTILE: Divide into 4 quartiles by revenue
SELECT 
    sale_date,
    region,
    revenue,
    NTILE(4) OVER (ORDER BY revenue) AS quartile,
    NTILE(10) OVER (ORDER BY revenue) AS decile,
    -- Custom bucket calculation
    CASE 
        WHEN NTILE(4) OVER (ORDER BY revenue) = 1 THEN 'Bottom 25%'
        WHEN NTILE(4) OVER (ORDER BY revenue) = 2 THEN 'Q2'
        WHEN NTILE(4) OVER (ORDER BY revenue) = 3 THEN 'Q3'
        ELSE 'Top 25%'
    END AS revenue_tier
FROM daily_sales
ORDER BY revenue;

Output:

sale_dateregionrevenuequartiledecilerevenue_tier
2024-01-01North15000.0011Bottom 25%
2024-01-04North16500.0012Bottom 25%
2024-01-02North18000.0023Q2
2024-01-02South19500.0024Q2
2024-01-04South20000.0035Q3
2024-01-03North21000.0036Q3
2024-01-01South22000.0037Q3
2024-01-05North23000.0048Top 25%
2024-01-03South25000.0049Top 25%
2024-01-05South28000.00410Top 25%

⚠️

Performance Tip: NTILE requires the ORDER BY clause. Without it, the window is non-deterministic. The database must sort the entire result set before computing NTILE, which can be expensive for large datasets.

LAG/LEAD with Offset and Default

LAG and LEAD access values from preceding or following rows:

LAG(x,n,d)={xinif in1dotherwise\text{LAG}(x, n, d) = \begin{cases} x_{i-n} & \text{if } i-n \geq 1 \\ d & \text{otherwise} \end{cases}
-- LAG/LEAD with multiple offsets and defaults
SELECT 
    sale_date,
    revenue,
    LAG(revenue, 1, 0) OVER (ORDER BY sale_date) AS prev_day,
    LAG(revenue, 2, 0) OVER (ORDER BY sale_date) AS prev_2_days,
    LEAD(revenue, 1, 0) OVER (ORDER BY sale_date) AS next_day,
    -- Calculate day-over-day change
    revenue - LAG(revenue, 1, 0) OVER (ORDER BY sale_date) AS daily_change,
    -- Percentage change with null handling
    ROUND(
        (revenue - LAG(revenue, 1, NULL) OVER (ORDER BY sale_date)) * 100.0 / 
        NULLIF(LAG(revenue, 1, NULL) OVER (ORDER BY sale_date), 0),
        2
    ) AS pct_change,
    -- Running comparison
    CASE 
        WHEN revenue > LAG(revenue, 1, 0) OVER (ORDER BY sale_date) THEN 'UP'
        WHEN revenue < LAG(revenue, 1, 0) OVER (ORDER BY sale_date) THEN 'DOWN'
        ELSE 'FLAT'
    END AS trend
FROM daily_sales
WHERE region = 'North'
ORDER BY sale_date;

Output:

sale_daterevenueprev_dayprev_2_daysnext_daydaily_changepct_changetrend
2024-01-0115000.000018000.0015000.00NULLUP
2024-01-0218000.0015000.00021000.003000.0020.00UP
2024-01-0321000.0018000.0015000.0016500.003000.0016.67UP
2024-01-0416500.0021000.0018000.0023000.00-4500.00-21.43DOWN
2024-01-0523000.0016500.0021000.00NULL6500.0039.39UP

RANGE vs ROWS vs GROUPS

The frame type determines which rows are included in the window:

ROWS: Physical positionRANGE: Logical valueGROUPS: Peer groups\text{ROWS: Physical position} \quad \text{RANGE: Logical value} \quad \text{GROUPS: Peer groups}
-- Demonstrate RANGE vs ROWS behavior
SELECT 
    sale_date,
    revenue,
    -- ROWS frame: physical rows
    SUM(revenue) OVER (
        ORDER BY sale_date
        ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
    ) AS sum_rows,
    
    -- RANGE frame: logical values (days within range)
    SUM(revenue) OVER (
        ORDER BY sale_date
        RANGE BETWEEN INTERVAL '2' DAY PRECEDING AND CURRENT ROW
    ) AS sum_range,
    
    -- GROUPS frame: peer groups
    SUM(revenue) OVER (
        ORDER BY revenue
        GROUPS BETWEEN 1 PRECEDING AND CURRENT ROW
    ) AS sum_groups
FROM daily_sales
WHERE region = 'North'
ORDER BY sale_date;

Output:

sale_daterevenuesum_rowssum_rangesum_groups
2024-01-0115000.0015000.0015000.0015000.00
2024-01-0218000.0033000.0033000.0033000.00
2024-01-0321000.0054000.0054000.0054000.00
2024-01-0416500.0055500.0070500.0055500.00
2024-01-0523000.0060500.0093500.0060500.00

ℹ️

Key Distinction: ROWS counts physical rows, RANGE includes all rows with values within the specified logical range, and GROUPS includes peer groups (rows with the same ORDER BY value).

EXCLUDE Clause Variations

The EXCLUDE clause removes specific rows from the frame:

EXCLUDE{CURRENT ROW,GROUP,TIES,NO OTHERS}\text{EXCLUDE} \in \{\text{CURRENT ROW}, \text{GROUP}, \text{TIES}, \text{NO OTHERS}\}
-- EXCLUDE variations
SELECT 
    sale_date,
    revenue,
    -- EXCLUDE CURRENT ROW
    AVG(revenue) OVER (
        ORDER BY sale_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
        EXCLUDE CURRENT ROW
    ) AS avg_excl_current,
    
    -- EXCLUDE GROUP (excludes all peers of current row)
    AVG(revenue) OVER (
        ORDER BY revenue
        GROUPS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
        EXCLUDE GROUP
    ) AS avg_excl_group,
    
    -- EXCLUDE TIES (excludes peers but keeps current)
    AVG(revenue) OVER (
        ORDER BY revenue
        GROUPS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
        EXCLUDE TIES
    ) AS avg_excl_ties,
    
    -- EXCLUDE NO OTHERS (default, no exclusion)
    AVG(revenue) OVER (
        ORDER BY sale_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
        EXCLUDE NO OTHERS
    ) AS avg_no_exclusion
FROM daily_sales
WHERE region = 'North'
ORDER BY sale_date;

Output:

sale_daterevenueavg_excl_currentavg_excl_groupavg_excl_tiesavg_no_exclusion
2024-01-0115000.0019625.0022100.0019625.0019625.00
2024-01-0218000.0018375.0022100.0019625.0019625.00
2024-01-0321000.0018375.0018700.0019625.0019625.00
2024-01-0416500.0019250.0022100.0019625.0019625.00
2024-01-0523000.0018375.0018700.0019625.0019625.00

Advanced Pattern: Percentile with Weighted Values

-- Weighted percentile calculation
WITH weighted_data AS (
    SELECT 
        sale_date,
        revenue,
        revenue * 0.1 AS weight  -- Example weight
    FROM daily_sales
    WHERE region = 'North'
)
SELECT 
    sale_date,
    revenue,
    SUM(weight) OVER (
        ORDER BY revenue
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS cumulative_weight,
    SUM(weight) OVER () AS total_weight,
    ROUND(
        SUM(weight) OVER (
            ORDER BY revenue
            ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
        ) * 100.0 / SUM(weight) OVER (),
        2
    ) AS weighted_percentile
FROM weighted_data
ORDER BY revenue;

Output:

sale_daterevenuecumulative_weighttotal_weightweighted_percentile
2024-01-0115000.001500.009850.0015.23
2024-01-0416500.003150.009850.0031.98
2024-01-0218000.004950.009850.0050.25
2024-01-0321000.007050.009850.0071.57
2024-01-0523000.009850.009850.00100.00

Query Execution Plan

When analyzing window function execution plans:

  1. Sort Operation: Window functions require sorting by the ORDER BY clause
  2. Window Aggregate: The database computes aggregates over the frame
  3. Frame Scan: ROWS scans N rows, RANGE scans by value range
  4. Materialization: Results are often materialized for multi-pass aggregation
-- Check execution plan
EXPLAIN ANALYZE
SELECT 
    sale_date,
    revenue,
    NTILE(4) OVER (ORDER BY revenue) AS quartile,
    SUM(revenue) OVER (
        ORDER BY sale_date
        ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING
    ) AS moving_sum
FROM daily_sales;

⚠️

Common Pitfall: Using RANGE with date intervals can include unexpected rows if there are gaps in the data. Always verify your frame boundaries with EXPLAIN ANALYZE.

Mathematical Properties

Window functions preserve the following properties:

  1. Idempotency: f(f(x))=f(x)f(f(x)) = f(x) for fixed frames
  2. Distributivity: (a+b)=a+b\sum(a+b) = \sum a + \sum b
  3. Commutativity: Order doesn't matter for SUM, COUNT
  4. Associativity: Grouping doesn't affect result

The computational complexity is:

T(n)=O(nlogn)(due to sorting)T(n) = O(n \log n) \quad \text{(due to sorting)}

Space complexity:

S(n)=O(n)(for materialized results)S(n) = O(n) \quad \text{(for materialized results)}

ℹ️

Advanced Technique: Combine NTILE with LAG to detect distribution shifts over time. This pattern is useful for anomaly detection in time series data.

Advertisement