πŸŽ‰ 75% of content is free forever β€” Unlock Premium from $10/mo β†’
CW
Search courses…
πŸ’Ό Servicesℹ️ Aboutβœ‰οΈ ContactView Pricing Plansfrom $10

Topic: SQL Window Functions Mastery for FAANG Interviews

SQL AdvancedWindow Functions⭐ Premium

Advertisement

🎯 SQL Window Functions

Google & Meta Interview Deep Dive

🏒 Google🏒 Meta⚑ Difficulty: Hard⏱️ 45 min

πŸ“‹ Interview Question

β„ΉοΈπŸ”΄ Google/Meta Interview Question

"Given a table of employee salaries with department information, write a query to rank employees within each department by salary. Then find the top 3 earners per department. Additionally, calculate each employee's salary as a percentage of their department's total and the difference from the previous and next employee's salary using LAG and LEAD."

Companies: Google, Meta | Difficulty: Hard | Time: 45 minutes

πŸ“Š Setup: Sample Data

Let's create a realistic dataset to work with:

-- Create tables
CREATE TABLE employees (
    employee_id SERIAL PRIMARY KEY,
    employee_name VARCHAR(100),
    department VARCHAR(50),
    salary DECIMAL(10, 2),
    hire_date DATE
);

-- Insert sample data
INSERT INTO employees (employee_name, department, salary, hire_date) VALUES
('Alice Johnson', 'Engineering', 120000.00, '2020-01-15'),
('Bob Smith', 'Engineering', 95000.00, '2019-06-20'),
('Charlie Brown', 'Engineering', 135000.00, '2018-03-10'),
('Diana Ross', 'Engineering', 88000.00, '2021-09-01'),
('Eve Davis', 'Engineering', 110000.00, '2020-07-14'),
('Frank Wilson', 'Marketing', 78000.00, '2019-11-30'),
('Grace Lee', 'Marketing', 92000.00, '2020-04-22'),
('Hank Miller', 'Marketing', 65000.00, '2021-01-15'),
('Ivy Chen', 'Marketing', 105000.00, '2018-08-05'),
('Jack Taylor', 'Marketing', 72000.00, '2020-12-01'),
('Karen White', 'Data Science', 130000.00, '2019-02-14'),
('Leo Martin', 'Data Science', 115000.00, '2020-05-30'),
('Mia Garcia', 'Data Science', 140000.00, '2018-11-20'),
('Noah Clark', 'Data Science', 98000.00, '2021-03-10'),
('Olivia Lewis', 'Data Science', 108000.00, '2020-08-25');

πŸ”’ Part 1: ROW_NUMBER, RANK, and DENSE_RANK

The Core Problem

⚠️⚠️ Common Interview Pitfall

Interviewers often ask: "What's the difference between RANK and DENSE_RANK?" The key is handling ties:

  • ROW_NUMBER() β€” No ties, always unique sequential numbers
  • RANK() β€” Ties get same number, gaps in sequence (1, 1, 3)
  • DENSE_RANK() β€” Ties get same number, no gaps (1, 1, 2)
-- ROW_NUMBER: Assigns unique sequential numbers
SELECT
    employee_name,
    department,
    salary,
    ROW_NUMBER() OVER (
        PARTITION BY department
        ORDER BY salary DESC
    ) AS row_num,
    RANK() OVER (
        PARTITION BY department
        ORDER BY salary DESC
    ) AS rank_num,
    DENSE_RANK() OVER (
        PARTITION BY department
        ORDER BY salary DESC
    ) AS dense_rank_num
FROM employees
ORDER BY department, salary DESC;

Output:

employee_namedepartmentsalaryrow_numrank_numdense_rank_num
Charlie BrownEngineering135000111
Alice JohnsonEngineering120000222
Eve DavisEngineering110000333
Bob SmithEngineering95000444
Diana RossEngineering88000555
Ivy ChenMarketing105000111
Grace LeeMarketing92000222
Frank WilsonMarketing78000333
Jack TaylorMarketing72000444
Hank MillerMarketing65000555
Mia GarciaData Science140000111
Karen WhiteData Science130000222
Leo MartinData Science115000333
Olivia LewisData Science108000444
Noah ClarkData Science98000555

Handling Ties: When Rankings Are Equal

-- Adding duplicate salaries to demonstrate tie handling
INSERT INTO employees (employee_name, department, salary, hire_date) VALUES
('Zara Adams', 'Engineering', 120000.00, '2021-05-10'),  -- Same as Alice
('Yuki Tanaka', 'Engineering', 120000.00, '2020-03-15');  -- Same as Alice

-- See the difference in rankings
SELECT
    employee_name,
    department,
    salary,
    ROW_NUMBER() OVER (
        PARTITION BY department ORDER BY salary DESC
    ) AS row_number,
    RANK() OVER (
        PARTITION BY department ORDER BY salary DESC
    ) AS rank_val,
    DENSE_RANK() OVER (
        PARTITION BY department ORDER BY salary DESC
    ) AS dense_rank_val
FROM employees
WHERE department = 'Engineering'
ORDER BY salary DESC;

Key insight for interviews: ROW_NUMBER() will arbitrarily assign different numbers to ties (deterministic only if you add a secondary ORDER BY column), while RANK() creates gaps and DENSE_RANK() does not.

πŸ“ˆ Part 2: Finding Top N Per Group

πŸ’‘πŸ’‘ Pro Tip for Interviews

The most efficient way to find top N per group uses a subquery with DENSE_RANK or ROW_NUMBER in the WHERE clause. Always consider edge cases: what if there are fewer than N rows in a group?

-- Top 3 highest earners per department
WITH ranked_employees AS (
    SELECT
        employee_id,
        employee_name,
        department,
        salary,
        DENSE_RANK() OVER (
            PARTITION BY department
            ORDER BY salary DESC
        ) AS salary_rank
    FROM employees
)
SELECT
    employee_id,
    employee_name,
    department,
    salary,
    salary_rank
FROM ranked_employees
WHERE salary_rank <= 3
ORDER BY department, salary_rank;

Alternative: Using ROW_NUMBER for strictly unique results:

-- If you want exactly 3 rows per department (even with ties)
SELECT
    employee_id,
    employee_name,
    department,
    salary
FROM (
    SELECT
        *,
        ROW_NUMBER() OVER (
            PARTITION BY department
            ORDER BY salary DESC, employee_id  -- Break ties deterministically
        ) AS rn
    FROM employees
) sub
WHERE rn <= 3
ORDER BY department, rn;

πŸ“Š Part 3: LAG and LEAD Functions

β„ΉοΈπŸ” LAG and LEAD Explained

  • LAG(column, offset, default) β€” Access previous row's value
  • LEAD(column, offset, default) β€” Access next row's value
  • Third parameter provides a default when no previous/next row exists
-- Calculate salary difference from previous and next employee
-- Also compute running department total percentage
SELECT
    employee_name,
    department,
    salary,
    LAG(employee_name, 1, 'N/A') OVER (
        PARTITION BY department ORDER BY salary DESC
    ) AS prev_higher_earner,
    LAG(salary, 1, 0) OVER (
        PARTITION BY department ORDER BY salary DESC
    ) AS prev_salary,
    salary - LAG(salary, 1, 0) OVER (
        PARTITION BY department ORDER BY salary DESC
    ) AS diff_from_prev,
    LEAD(employee_name, 1, 'N/A') OVER (
        PARTITION BY department ORDER BY salary DESC
    ) AS next_lower_earner,
    LEAD(salary, 1, 0) OVER (
        PARTITION BY department ORDER BY salary DESC
    ) AS next_salary,
    ROUND(
        salary * 100.0 / SUM(salary) OVER (PARTITION BY department),
        2
    ) AS pct_of_dept_total
FROM employees
WHERE department = 'Engineering'
ORDER BY salary DESC;

Output:

employee_namesalaryprev_higher_earnerprev_salarydiff_from_prevnext_lower_earnernext_salarypct_of_dept_total
Charlie Brown135000N/A0135000Alice Johnson12000023.32%
Alice Johnson120000Charlie Brown135000-15000Eve Davis11000020.75%
Eve Davis110000Alice Johnson120000-10000Bob Smith9500019.03%
Bob Smith95000Eve Davis110000-15000Diana Ross8800016.43%
Diana Ross88000Bob Smith95000-7000N/A015.21%

πŸƒ Part 4: NTILE for Data Distribution

-- Divide employees into quartiles based on salary
SELECT
    employee_name,
    department,
    salary,
    NTILE(4) OVER (
        PARTITION BY department
        ORDER BY salary DESC
    ) AS salary_quartile,
    CASE
        WHEN NTILE(4) OVER (PARTITION BY department ORDER BY salary DESC) = 1
            THEN 'Top 25%'
        WHEN NTILE(4) OVER (PARTITION BY department ORDER BY salary DESC) = 2
            THEN 'Second 25%'
        WHEN NTILE(4) OVER (PARTITION BY department ORDER BY salary DESC) = 3
            THEN 'Third 25%'
        ELSE 'Bottom 25%'
    END AS quartile_label
FROM employees
ORDER BY department, salary DESC;

⚠️⚠️ NTILE Edge Case

NTILE distributes rows as evenly as possible. When the number of rows isn't evenly divisible by the number of buckets, the first few buckets get the extra rows. For example, 5 rows into 4 buckets: bucket 1 has 2 rows, buckets 2-4 have 1 each.

πŸ”— Part 5: Running Totals and Cumulative Calculations

-- Running total and cumulative average within each department
SELECT
    employee_name,
    department,
    salary,
    hire_date,
    SUM(salary) OVER (
        PARTITION BY department
        ORDER BY hire_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS running_total,
    AVG(salary) OVER (
        PARTITION BY department
        ORDER BY hire_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS running_avg,
    COUNT(*) OVER (
        PARTITION BY department
        ORDER BY hire_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS running_count,
    FIRST_VALUE(employee_name) OVER (
        PARTITION BY department
        ORDER BY hire_date
    ) AS first_hired,
    LAST_VALUE(employee_name) OVER (
        PARTITION BY department
        ORDER BY hire_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) AS last_hired
FROM employees
ORDER BY department, hire_date;

🎯 Part 6: Advanced Real-World Scenarios

Scenario: Salary Band Analysis

-- Calculate moving average salary (3-month window)
-- and identify salary outliers within department
WITH salary_stats AS (
    SELECT
        employee_name,
        department,
        salary,
        AVG(salary) OVER (PARTITION BY department) AS dept_avg,
        STDDEV(salary) OVER (PARTITION BY department) AS dept_stddev,
        MIN(salary) OVER (PARTITION BY department) AS dept_min,
        MAX(salary) OVER (PARTITION BY department) AS dept_max
    FROM employees
)
SELECT
    employee_name,
    department,
    salary,
    ROUND(dept_avg, 2) AS dept_avg,
    ROUND(salary - dept_avg, 2) AS deviation_from_avg,
    CASE
        WHEN salary > dept_avg + 2 * dept_stddev THEN 'High Outlier'
        WHEN salary < dept_avg - 2 * dept_stddev THEN 'Low Outlier'
        WHEN salary > dept_avg + dept_stddev THEN 'Above Average'
        WHEN salary < dept_avg - dept_stddev THEN 'Below Average'
        ELSE 'Average'
    END AS salary_category,
    ROUND((salary - dept_min) * 100.0 / NULLIF(dept_max - dept_min, 0), 2) AS percentile_position
FROM salary_stats
ORDER BY department, salary DESC;

Scenario: Year-over-Year Growth

-- Calculate salary growth for employees with multiple records
-- (Simulated with hire dates representing salary changes)
WITH salary_changes AS (
    SELECT
        employee_name,
        salary,
        hire_date,
        LAG(salary) OVER (
            PARTITION BY employee_name ORDER BY hire_date
        ) AS prev_salary,
        LAG(hire_date) OVER (
            PARTITION BY employee_name ORDER BY hire_date
        ) AS prev_date
    FROM employees
)
SELECT
    employee_name,
    salary AS current_salary,
    prev_salary,
    salary - prev_salary AS absolute_change,
    ROUND(
        (salary - prev_salary) * 100.0 / NULLIF(prev_salary, 0),
        2
    ) AS pct_change,
    EXTRACT(DAY FROM hire_date - prev_date) AS days_between_changes
FROM salary_changes
WHERE prev_salary IS NOT NULL
ORDER BY pct_change DESC;

πŸ§ͺ Part 7: Window Frames Deep Dive

πŸ’‘πŸ’‘ Window Frame Syntax

Understanding frame clauses is crucial for advanced window function usage:

  • ROWS BETWEEN n PRECEDING AND m FOLLOWING β€” Physical rows
  • RANGE BETWEEN n PRECEDING AND m FOLLOWING β€” Logical value range
  • GROUPS BETWEEN n PRECEDING AND m FOLLOWING β€” Groups of ties (PostgreSQL 11+)
-- Different window frame behaviors
SELECT
    employee_name,
    department,
    salary,
    -- Current row only
    SUM(salary) OVER (
        PARTITION BY department ORDER BY salary DESC
        ROWS BETWEEN CURRENT ROW AND CURRENT ROW
    ) AS current_only,
    -- 1 row before and after
    SUM(salary) OVER (
        PARTITION BY department ORDER BY salary DESC
        ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING
    ) AS moving_3,
    -- All rows up to current
    SUM(salary) OVER (
        PARTITION BY department ORDER BY salary DESC
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS cumulative,
    -- All rows in partition
    SUM(salary) OVER (
        PARTITION BY department
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) AS partition_total
FROM employees
WHERE department = 'Engineering'
ORDER BY salary DESC;

⏱️ Complexity Analysis

OperationTime ComplexitySpace ComplexityNotes
Simple Window FunctionO(n log n)O(n)Sort required for ORDER BY
Partition + RankO(n log n)O(n)Per-partition sorting
LAG/LEADO(n)O(1)Sequential access only
NTILEO(n log n)O(n)Requires full sort
Running TotalO(n log n)O(n)Cumulative computation

🎯 Quiz Section

πŸ† Best Practices for Interviews

πŸ’‘βœ… Best Practices

1. Always Specify Window Frame:

-- BAD: Implicit frame (might not be what you want)
SUM(salary) OVER (ORDER BY hire_date)

-- GOOD: Explicit frame
SUM(salary) OVER (ORDER BY hire_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)

2. Break Ties Deterministically:

-- BAD: Non-deterministic ordering
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary)

-- GOOD: Deterministic with secondary sort
ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC, employee_id)

3. Use CTEs for Readability:

-- BAD: Nested subqueries
SELECT * FROM (
    SELECT *, RANK() OVER (...) AS rn FROM (
        SELECT * FROM employees
    ) sub
) sub2 WHERE rn = 1

-- GOOD: CTE with clear steps
WITH ranked AS (
    SELECT *, RANK() OVER (...) AS rn FROM employees
)
SELECT * FROM ranked WHERE rn = 1

4. Performance Consideration:

  • Window functions with ORDER BY require sorting: O(n log n)
  • PARTITION BY without ORDER BY is cheaper
  • Avoid multiple window functions with different ORDER BY on same partition

πŸ”— Common Follow-Up Questions

  1. "How would you handle NULLs in LAG/LEAD?" β€” Use COALESCE with the default parameter
  2. "Can you use window functions in WHERE?" β€” No, use a CTE or subquery first
  3. "What's the difference between ROW_NUMBER and a sequential ID?" β€” ROW_NUMBER resets per partition; IDs don't
  4. "How do you optimize queries with multiple window functions?" β€” Combine partitions with same ORDER BY when possible

⚠️⚠️ Critical Interview Note

Always clarify with the interviewer: Do they want ties broken? Should the ranking include or exclude certain rows? What should happen at partition boundaries? These questions show attention to detail and real-world thinking.

Advertisement