Python List Comprehensions — Elegant Data Transformation

Python BasicsList ComprehensionsFree Lesson

Advertisement

Python List Comprehensions — Elegant Data Transformation

List comprehensions are one of Python's most elegant features. They let you create lists in a single, expressive line — transforming, filtering, and combining data without writing verbose loops.


Learning Objectives

By the end of this tutorial, you will be able to:

  1. Write basic list comprehensions to replace simple for-loops
  2. Apply conditions to filter and transform data
  3. Use the ternary operator inside comprehensions for conditional values
  4. Nest comprehensions to flatten 2D lists and generate combinations
  5. Leverage the walrus operator (:=) in Python 3.8+ for efficiency
  6. Compare performance of comprehensions, loops, and map()
  7. Write dict and set comprehensions for other collection types
  8. Use generator expressions for memory-efficient iteration
  9. Recognize when a comprehension is the wrong tool
  10. Follow readability guidelines and avoid common mistakes

What Are List Comprehensions?

A list comprehension is a concise syntax for creating a new list by performing an operation on each item in an existing iterable — optionally filtering items along the way.

Traditional For-Loop

squares = []
for x in range(10):
    squares.append(x ** 2)

print(squares)
# [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

Equivalent List Comprehension

squares = [x ** 2 for x in range(10)]

print(squares)
# [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

The comprehension communicates intent directly: "I'm building a list of squares from a range."

When to Use Comprehensions vs Loops

Use a Comprehension WhenUse a For-Loop When
Simple transformation of one iterableComplex logic with multiple statements
Filtering with clear conditionsSide effects (printing, writing files)
Building a new list from existing dataNested loops deeper than 2 levels
The expression is readable in one lineDebugging step-by-step is needed

Basic Syntax

[expression for item in iterable]

Reads as: "Give me expression for every item in iterable."

# Doubling numbers
numbers = [1, 2, 3, 4, 5]
doubled = [x * 2 for x in numbers]
print(doubled)  # [2, 4, 6, 8, 10]

# String transformation
words = ["hello", "world"]
upper_words = [word.upper() for word in words]
print(upper_words)  # ['HELLO', 'WORLD']

# Using a function
import math
values = [1, 4, 9, 16, 25]
roots = [math.sqrt(x) for x in values]
print(roots)  # [1.0, 2.0, 3.0, 4.0, 5.0]

Side-by-Side Comparison

# FOR LOOP
result = []
for char in "Python":
    if char.lower() in "aeiou":
        result.append(char.upper())

# LIST COMPREHENSION (one line)
result = [char.upper() for char in "Python" if char.lower() in "aeiou"]

print(result)  # ['O']

Filtering with Conditions

Add an if clause at the end to filter items:

[expression for item in iterable if condition]
# Even numbers
numbers = range(1, 11)
evens = [x for x in numbers if x % 2 == 0]
print(evens)  # [2, 4, 6, 8, 10]

# Strings by length
words = ["hi", "hello", "hey", "greetings", "yo"]
long_words = [word for word in words if len(word) > 3]
print(long_words)  # ['hello', 'greetings']

# Filtering with a predicate
def is_prime(n):
    if n < 2:
        return False
    for i in range(2, int(n ** 0.5) + 1):
        if n % i == 0:
            return False
    return True

primes = [x for x in range(2, 30) if is_prime(x)]
print(primes)  # [2, 3, 5, 7, 11, 13, 17, 19, 23, 29]

Multiple Conditions

numbers = range(1, 51)
# Multiples of 3 AND greater than 20
result = [x for x in numbers if x % 3 == 0 if x > 20]
print(result)  # [21, 24, 27, 30, 33, 36, 39, 42, 45, 48]

Filtering with not in

vowels = "aeiou"
consonants = [c for c in "python" if c not in vowels]
print(consonants)  # ['p', 'y', 't', 'h', 'n']

Conditional Expressions (Ternary)

To produce different values based on a condition, use the ternary operator before the for:

[expr_if_true if condition else expr_if_false for item in iterable]
# Classify numbers
numbers = [3, -1, 0, 7, -4, 2]
labels = ["positive" if x > 0 else "negative" if x < 0 else "zero" for x in numbers]
print(labels)  # ['positive', 'negative', 'zero', 'positive', 'negative', 'positive']

# Map to pass/fail
scores = [85, 42, 91, 67, 73, 58]
grades = ["pass" if s >= 60 else "fail" for s in scores]
print(grades)  # ['pass', 'fail', 'pass', 'pass', 'pass', 'fail']

# Clamp values to range
raw_values = [-5, 0, 12, 25, 8, -3]
clamped = [max(0, min(10, x)) for x in raw_values]
print(clamped)  # [0, 0, 10, 10, 8, 0]

Critical: Condition Placement

# TERNARY (before for) — transforms values
[x ** 2 if x % 2 == 0 else x for x in range(6)]
# [0, 1, 4, 3, 16, 5]

# FILTER (after for) — removes items
[x ** 2 for x in range(6) if x % 2 == 0]
# [0, 4, 16]

Nested Comprehensions

Nested comprehensions iterate over multiple levels of an iterable.

Flattening a 2D List

matrix = [
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
]

flat = [num for row in matrix for num in row]
print(flat)  # [1, 2, 3, 4, 5, 6, 7, 8, 9]

Read left to right: "For each row in matrix, for each num in row, yield num."

Matrix Transposition

matrix = [
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
]

transposed = [[row[i] for row in matrix] for i in range(3)]
print(transposed)  # [[1, 4, 7], [2, 5, 8], [3, 6, 9]]

Generating Coordinate Pairs

grid = [(x, y) for x in range(3) for y in range(3)]
print(grid)
# [(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2), (2, 0), (2, 1), (2, 2)]

Filtering Nested Data

students = [
    {"name": "Alice", "grade": 92},
    {"name": "Bob", "grade": 78},
    {"name": "Charlie", "grade": 85},
    {"name": "Diana", "grade": 95}
]

honors = [s["name"] for s in students if s["grade"] >= 85]
print(honors)  # ['Alice', 'Charlie', 'Diana']

Readability Warning

If you need more than 2 levels of nesting or multiple conditions, break it into a regular loop:

# HARD TO READ
result = [c for a in matrix1 for b in a for c in b if c > 0]

# EASIER TO READ
result = []
for a in matrix1:
    for b in a:
        if b > 0:
            result.extend(b)

Walrus Operator in Comprehensions (Python 3.8+)

The walrus operator (:=) assigns a value to a variable as part of an expression, avoiding redundant computation.

import math

numbers = [1, 15, 3, 22, 8, 30]

# WITHOUT walrus — computes sqrt twice
result = [(x, math.sqrt(x)) for x in numbers if math.sqrt(x) == int(math.sqrt(x))]

# WITH walrus — computes sqrt once, reuses it
result = [(x, root) for x in numbers if (root := math.sqrt(x)) == int(root)]
print(result)  # [(1, 1.0)]

Filtering with Expensive Calls

# Avoid calling expensive_function twice
data = [1, 2, 3, 4, 5]
result = [val * 10 for x in data if (val := expensive_call(x)) > 0]

Performance

Comprehensions are typically faster than equivalent for-loops because they avoid repeated append calls and are implemented in optimized C. They are roughly equivalent to map() in speed. However, they build the entire list in memory at once:

# This allocates ~8 MB immediately
big_list = [x ** 2 for x in range(1_000_000)]

Memory Considerations

A list comprehension builds the entire list in memory at once:

# This allocates ~8 MB immediately
big_list = [x ** 2 for x in range(1_000_000)]

For very large datasets, consider a generator expression instead.

When NOT to Use Comprehensions

# BAD — side effects
[print(x) for x in range(5)]  # Creates list of Nones

# BAD — too complex
result = [
    transform(x) if x > 0 else fallback(x)
    for x in data
    if validate(x) and x not in excluded and len(str(x)) > 2
]  # Use a loop instead

Dict and Set Comprehensions

Dict Comprehension

# From pairs
pairs = [("a", 1), ("b", 2), ("c", 3)]
d = {k: v for k, v in pairs}
print(d)  # {'a': 1, 'b': 2, 'c': 3}

# Squares dict
squares_dict = {x: x ** 2 for x in range(6)}
print(squares_dict)  # {0: 0, 1: 1, 2: 4, 3: 9, 4: 16, 5: 25}

# Invert a dictionary
original = {"a": 1, "b": 2, "c": 3}
inverted = {v: k for k, v in original.items()}
print(inverted)  # {1: 'a', 2: 'b', 3: 'c'}

# Filter
prices = {"apple": 1.0, "banana": 0.5, "steak": 15.0, "bread": 2.5}
cheap = {k: v for k, v in prices.items() if v < 5.0}
print(cheap)  # {'apple': 1.0, 'banana': 0.5, 'bread': 2.5}

Set Comprehension

sentence = "the quick brown fox jumps over the lazy dog"
unique_lengths = {len(word) for word in sentence.split()}
print(unique_lengths)  # {3, 4, 5}

Generator Expressions

A generator expression uses () instead of []. It produces items lazily — one at a time — without building the entire list in memory.

import sys

# List — ~8 MB
big_list = [x ** 2 for x in range(1_000_000)]
print(sys.getsizeof(big_list))

# Generator — ~200 bytes
big_gen = (x ** 2 for x in range(1_000_000))
print(sys.getsizeof(big_gen))

When to Use Generators

# USE LIST when you need indexing, len(), or multiple passes
first_ten = [x ** 2 for x in range(10)]

# USE GENERATOR for single-pass iteration over large data
total = sum(x ** 2 for x in range(10_000_000))  # Efficient

# Short-circuit with any/all
has_negative = any(x < 0 for x in [1, 2, -3, 4])
all_positive = all(x > 0 for x in [1, 2, 3, 4])

Common Patterns

# Mapping and transforming
str_nums = ["1", "2", "3"]
int_nums = [int(s) for s in str_nums]

# Filtering and flattening
nested = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
evens = [x for row in nested for x in row if x % 2 == 0]

# String manipulation
words = ["Portable", "Document", "Format"]
acronym = "".join(w[0] for w in words)  # "PDF"

sentence = "the quick brown fox"
title = " ".join(w.capitalize() for w in sentence.split())
# "The Quick Brown Fox"

Readability Guidelines

  • Rule of two: Don't nest more than 2 levels
  • Break complex comprehensions into regular loops when readability suffers
  • Keep on one line if possible; use multi-line formatting for long expressions
  • Name results descriptively when the transformation isn't obvious
# GOOD — readable
eligible_voters = [
    citizen["name"]
    for citizen in population
    if citizen["age"] >= 18
    and citizen["registered"]
]

# BAD — anonymous and cryptic
x = [c["n"] for c in p if c["a"] >= 18 and c["r"]]

Common Mistakes

1. Overly Complex Comprehensions

# BAD
result = [(x, y, z) for x in range(10) for y in range(10) for z in range(10)
          if x + y + z == 15 and x * y * z > 30]

# BETTER — use a loop
result = []
for x in range(10):
    for y in range(10):
        for z in range(10):
            if x + y + z == 15 and x * y * z > 30:
                result.append((x, y, z))

2. Side Effects in Comprehensions

# BAD — list of Nones
[print(x) for x in range(5)]

# GOOD — use a loop
for x in range(5):
    print(x)

3. Forgetting the Condition Position

# FILTER (after for) — removes items
evens = [x for x in range(10) if x % 2 == 0]
# [0, 2, 4, 6, 8]

# TERNARY (before for) — transforms values
result = [x ** 2 if x % 2 == 0 else x for x in range(10)]
# [0, 1, 4, 3, 16, 5, 36, 7, 64, 9]

4. Using [] When () Would Suffice

# BAD — builds full list in memory
total = sum([x ** 2 for x in range(10_000_000)])

# GOOD — lazy evaluation
total = sum(x ** 2 for x in range(10_000_000))

5. Shadowing the Loop Variable

# DANGEROUS — x is overwritten
x = 10
result = [x for x in range(5)]
print(x)  # 4, not 10!

# BETTER — use different variable name
result = [i for i in range(5)]

Practice Exercises

Exercise 1: Data Pipeline

Extract the names of students who scored above 80, sorted by score descending.

students = [
    {"name": "Alice", "score": 92},
    {"name": "Bob", "score": 75},
    {"name": "Charlie", "score": 88},
    {"name": "Diana", "score": 95},
    {"name": "Eve", "score": 62},
]

top_students = [
    s["name"]
    for s in sorted(students, key=lambda s: -s["score"])
    if s["score"] > 80
]

print(top_students)  # ['Diana', 'Alice', 'Charlie']

Exercise 2: Unique Values from Nested Data

Flatten a matrix and return unique values in order.

matrix = [
    [1, 2, 3],
    [2, 3, 4],
    [3, 4, 5],
    [5, 6, 7]
]

seen = set()
unique_ordered = [x for row in matrix for x in row if x not in seen and not seen.add(x)]
print(unique_ordered)  # [1, 2, 3, 4, 5, 6, 7]

Exercise 3: Revenue Calculation

Calculate total revenue from nested order data.

orders = [
    {"id": 1, "items": [{"name": "A", "price": 10, "qty": 2}, {"name": "B", "price": 5, "qty": 4}]},
    {"id": 2, "items": [{"name": "C", "price": 20, "qty": 1}]},
    {"id": 3, "items": [{"name": "A", "price": 10, "qty": 1}, {"name": "D", "price": 15, "qty": 3}]},
]

total_revenue = sum(
    item["price"] * item["qty"]
    for order in orders
    for item in order["items"]
)

print(total_revenue)  # 100

Key Takeaways

  1. List comprehensions replace simple for-loops with concise, expressive syntax
  2. Filtering uses if at the end; transforming uses ternary if-else before for
  3. Nested comprehensions flatten and combine iterables — keep nesting to 2 levels max
  4. The walrus operator (:=) avoids redundant computation in complex comprehensions
  5. Comprehensions are faster than for-loops due to optimized C implementation
  6. Dict and set comprehensions extend the pattern to other collection types
  7. Generator expressions use () for memory-efficient lazy evaluation
  8. Prefer generators with sum(), max(), any(), all() for large datasets
  9. Break complex comprehensions into regular loops when readability suffers
  10. Avoid side effects — if you're printing or writing, use a loop instead

What's Next

→ Python Dictionary Methods — Complete Reference → Python Sets — Theory and Operations → Lambda Functions and Functional Programming → Error Handling with try/except

Advertisement

Need Expert Python Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement