Python List Comprehensions ā Elegant Data Transformation
List comprehensions are one of Python's most elegant features. They let you create lists in a single, expressive line ā transforming, filtering, and combining data without writing verbose loops.
Learning Objectives
By the end of this tutorial, you will be able to:
- Write basic list comprehensions to replace simple for-loops
- Apply conditions to filter and transform data
- Use the ternary operator inside comprehensions for conditional values
- Nest comprehensions to flatten 2D lists and generate combinations
- Leverage the walrus operator (
:=) in Python 3.8+ for efficiency - Compare performance of comprehensions, loops, and
map() - Write dict and set comprehensions for other collection types
- Use generator expressions for memory-efficient iteration
- Recognize when a comprehension is the wrong tool
- Follow readability guidelines and avoid common mistakes
What Are List Comprehensions?
A list comprehension is a concise syntax for creating a new list by performing an operation on each item in an existing iterable ā optionally filtering items along the way.
Traditional For-Loop
squares = []
for x in range(10):
squares.append(x ** 2)
print(squares)
# [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
Equivalent List Comprehension
squares = [x ** 2 for x in range(10)]
print(squares)
# [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
The comprehension communicates intent directly: "I'm building a list of squares from a range."
When to Use Comprehensions vs Loops
| Use a Comprehension When | Use a For-Loop When |
|---|---|
| Simple transformation of one iterable | Complex logic with multiple statements |
| Filtering with clear conditions | Side effects (printing, writing files) |
| Building a new list from existing data | Nested loops deeper than 2 levels |
| The expression is readable in one line | Debugging step-by-step is needed |
Basic Syntax
[expression for item in iterable]
Reads as: "Give me expression for every item in iterable."
# Doubling numbers
numbers = [1, 2, 3, 4, 5]
doubled = [x * 2 for x in numbers]
print(doubled) # [2, 4, 6, 8, 10]
# String transformation
words = ["hello", "world"]
upper_words = [word.upper() for word in words]
print(upper_words) # ['HELLO', 'WORLD']
# Using a function
import math
values = [1, 4, 9, 16, 25]
roots = [math.sqrt(x) for x in values]
print(roots) # [1.0, 2.0, 3.0, 4.0, 5.0]
Side-by-Side Comparison
# FOR LOOP
result = []
for char in "Python":
if char.lower() in "aeiou":
result.append(char.upper())
# LIST COMPREHENSION (one line)
result = [char.upper() for char in "Python" if char.lower() in "aeiou"]
print(result) # ['O']
Filtering with Conditions
Add an if clause at the end to filter items:
[expression for item in iterable if condition]
# Even numbers
numbers = range(1, 11)
evens = [x for x in numbers if x % 2 == 0]
print(evens) # [2, 4, 6, 8, 10]
# Strings by length
words = ["hi", "hello", "hey", "greetings", "yo"]
long_words = [word for word in words if len(word) > 3]
print(long_words) # ['hello', 'greetings']
# Filtering with a predicate
def is_prime(n):
if n < 2:
return False
for i in range(2, int(n ** 0.5) + 1):
if n % i == 0:
return False
return True
primes = [x for x in range(2, 30) if is_prime(x)]
print(primes) # [2, 3, 5, 7, 11, 13, 17, 19, 23, 29]
Multiple Conditions
numbers = range(1, 51)
# Multiples of 3 AND greater than 20
result = [x for x in numbers if x % 3 == 0 if x > 20]
print(result) # [21, 24, 27, 30, 33, 36, 39, 42, 45, 48]
Filtering with not in
vowels = "aeiou"
consonants = [c for c in "python" if c not in vowels]
print(consonants) # ['p', 'y', 't', 'h', 'n']
Conditional Expressions (Ternary)
To produce different values based on a condition, use the ternary operator before the for:
[expr_if_true if condition else expr_if_false for item in iterable]
# Classify numbers
numbers = [3, -1, 0, 7, -4, 2]
labels = ["positive" if x > 0 else "negative" if x < 0 else "zero" for x in numbers]
print(labels) # ['positive', 'negative', 'zero', 'positive', 'negative', 'positive']
# Map to pass/fail
scores = [85, 42, 91, 67, 73, 58]
grades = ["pass" if s >= 60 else "fail" for s in scores]
print(grades) # ['pass', 'fail', 'pass', 'pass', 'pass', 'fail']
# Clamp values to range
raw_values = [-5, 0, 12, 25, 8, -3]
clamped = [max(0, min(10, x)) for x in raw_values]
print(clamped) # [0, 0, 10, 10, 8, 0]
Critical: Condition Placement
# TERNARY (before for) ā transforms values
[x ** 2 if x % 2 == 0 else x for x in range(6)]
# [0, 1, 4, 3, 16, 5]
# FILTER (after for) ā removes items
[x ** 2 for x in range(6) if x % 2 == 0]
# [0, 4, 16]
Nested Comprehensions
Nested comprehensions iterate over multiple levels of an iterable.
Flattening a 2D List
matrix = [
[1, 2, 3],
[4, 5, 6],
[7, 8, 9]
]
flat = [num for row in matrix for num in row]
print(flat) # [1, 2, 3, 4, 5, 6, 7, 8, 9]
Read left to right: "For each row in matrix, for each num in row, yield num."
Matrix Transposition
matrix = [
[1, 2, 3],
[4, 5, 6],
[7, 8, 9]
]
transposed = [[row[i] for row in matrix] for i in range(3)]
print(transposed) # [[1, 4, 7], [2, 5, 8], [3, 6, 9]]
Generating Coordinate Pairs
grid = [(x, y) for x in range(3) for y in range(3)]
print(grid)
# [(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2), (2, 0), (2, 1), (2, 2)]
Filtering Nested Data
students = [
{"name": "Alice", "grade": 92},
{"name": "Bob", "grade": 78},
{"name": "Charlie", "grade": 85},
{"name": "Diana", "grade": 95}
]
honors = [s["name"] for s in students if s["grade"] >= 85]
print(honors) # ['Alice', 'Charlie', 'Diana']
Readability Warning
If you need more than 2 levels of nesting or multiple conditions, break it into a regular loop:
# HARD TO READ
result = [c for a in matrix1 for b in a for c in b if c > 0]
# EASIER TO READ
result = []
for a in matrix1:
for b in a:
if b > 0:
result.extend(b)
Walrus Operator in Comprehensions (Python 3.8+)
The walrus operator (:=) assigns a value to a variable as part of an expression, avoiding redundant computation.
import math
numbers = [1, 15, 3, 22, 8, 30]
# WITHOUT walrus ā computes sqrt twice
result = [(x, math.sqrt(x)) for x in numbers if math.sqrt(x) == int(math.sqrt(x))]
# WITH walrus ā computes sqrt once, reuses it
result = [(x, root) for x in numbers if (root := math.sqrt(x)) == int(root)]
print(result) # [(1, 1.0)]
Filtering with Expensive Calls
# Avoid calling expensive_function twice
data = [1, 2, 3, 4, 5]
result = [val * 10 for x in data if (val := expensive_call(x)) > 0]
Performance
Comprehensions are typically faster than equivalent for-loops because they avoid repeated append calls and are implemented in optimized C. They are roughly equivalent to map() in speed. However, they build the entire list in memory at once:
# This allocates ~8 MB immediately
big_list = [x ** 2 for x in range(1_000_000)]
Memory Considerations
A list comprehension builds the entire list in memory at once:
# This allocates ~8 MB immediately
big_list = [x ** 2 for x in range(1_000_000)]
For very large datasets, consider a generator expression instead.
When NOT to Use Comprehensions
# BAD ā side effects
[print(x) for x in range(5)] # Creates list of Nones
# BAD ā too complex
result = [
transform(x) if x > 0 else fallback(x)
for x in data
if validate(x) and x not in excluded and len(str(x)) > 2
] # Use a loop instead
Dict and Set Comprehensions
Dict Comprehension
# From pairs
pairs = [("a", 1), ("b", 2), ("c", 3)]
d = {k: v for k, v in pairs}
print(d) # {'a': 1, 'b': 2, 'c': 3}
# Squares dict
squares_dict = {x: x ** 2 for x in range(6)}
print(squares_dict) # {0: 0, 1: 1, 2: 4, 3: 9, 4: 16, 5: 25}
# Invert a dictionary
original = {"a": 1, "b": 2, "c": 3}
inverted = {v: k for k, v in original.items()}
print(inverted) # {1: 'a', 2: 'b', 3: 'c'}
# Filter
prices = {"apple": 1.0, "banana": 0.5, "steak": 15.0, "bread": 2.5}
cheap = {k: v for k, v in prices.items() if v < 5.0}
print(cheap) # {'apple': 1.0, 'banana': 0.5, 'bread': 2.5}
Set Comprehension
sentence = "the quick brown fox jumps over the lazy dog"
unique_lengths = {len(word) for word in sentence.split()}
print(unique_lengths) # {3, 4, 5}
Generator Expressions
A generator expression uses () instead of []. It produces items lazily ā one at a time ā without building the entire list in memory.
import sys
# List ā ~8 MB
big_list = [x ** 2 for x in range(1_000_000)]
print(sys.getsizeof(big_list))
# Generator ā ~200 bytes
big_gen = (x ** 2 for x in range(1_000_000))
print(sys.getsizeof(big_gen))
When to Use Generators
# USE LIST when you need indexing, len(), or multiple passes
first_ten = [x ** 2 for x in range(10)]
# USE GENERATOR for single-pass iteration over large data
total = sum(x ** 2 for x in range(10_000_000)) # Efficient
# Short-circuit with any/all
has_negative = any(x < 0 for x in [1, 2, -3, 4])
all_positive = all(x > 0 for x in [1, 2, 3, 4])
Common Patterns
# Mapping and transforming
str_nums = ["1", "2", "3"]
int_nums = [int(s) for s in str_nums]
# Filtering and flattening
nested = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
evens = [x for row in nested for x in row if x % 2 == 0]
# String manipulation
words = ["Portable", "Document", "Format"]
acronym = "".join(w[0] for w in words) # "PDF"
sentence = "the quick brown fox"
title = " ".join(w.capitalize() for w in sentence.split())
# "The Quick Brown Fox"
Readability Guidelines
- Rule of two: Don't nest more than 2 levels
- Break complex comprehensions into regular loops when readability suffers
- Keep on one line if possible; use multi-line formatting for long expressions
- Name results descriptively when the transformation isn't obvious
# GOOD ā readable
eligible_voters = [
citizen["name"]
for citizen in population
if citizen["age"] >= 18
and citizen["registered"]
]
# BAD ā anonymous and cryptic
x = [c["n"] for c in p if c["a"] >= 18 and c["r"]]
Common Mistakes
1. Overly Complex Comprehensions
# BAD
result = [(x, y, z) for x in range(10) for y in range(10) for z in range(10)
if x + y + z == 15 and x * y * z > 30]
# BETTER ā use a loop
result = []
for x in range(10):
for y in range(10):
for z in range(10):
if x + y + z == 15 and x * y * z > 30:
result.append((x, y, z))
2. Side Effects in Comprehensions
# BAD ā list of Nones
[print(x) for x in range(5)]
# GOOD ā use a loop
for x in range(5):
print(x)
3. Forgetting the Condition Position
# FILTER (after for) ā removes items
evens = [x for x in range(10) if x % 2 == 0]
# [0, 2, 4, 6, 8]
# TERNARY (before for) ā transforms values
result = [x ** 2 if x % 2 == 0 else x for x in range(10)]
# [0, 1, 4, 3, 16, 5, 36, 7, 64, 9]
4. Using [] When () Would Suffice
# BAD ā builds full list in memory
total = sum([x ** 2 for x in range(10_000_000)])
# GOOD ā lazy evaluation
total = sum(x ** 2 for x in range(10_000_000))
5. Shadowing the Loop Variable
# DANGEROUS ā x is overwritten
x = 10
result = [x for x in range(5)]
print(x) # 4, not 10!
# BETTER ā use different variable name
result = [i for i in range(5)]
Practice Exercises
Exercise 1: Data Pipeline
Extract the names of students who scored above 80, sorted by score descending.
students = [
{"name": "Alice", "score": 92},
{"name": "Bob", "score": 75},
{"name": "Charlie", "score": 88},
{"name": "Diana", "score": 95},
{"name": "Eve", "score": 62},
]
top_students = [
s["name"]
for s in sorted(students, key=lambda s: -s["score"])
if s["score"] > 80
]
print(top_students) # ['Diana', 'Alice', 'Charlie']
Exercise 2: Unique Values from Nested Data
Flatten a matrix and return unique values in order.
matrix = [
[1, 2, 3],
[2, 3, 4],
[3, 4, 5],
[5, 6, 7]
]
seen = set()
unique_ordered = [x for row in matrix for x in row if x not in seen and not seen.add(x)]
print(unique_ordered) # [1, 2, 3, 4, 5, 6, 7]
Exercise 3: Revenue Calculation
Calculate total revenue from nested order data.
orders = [
{"id": 1, "items": [{"name": "A", "price": 10, "qty": 2}, {"name": "B", "price": 5, "qty": 4}]},
{"id": 2, "items": [{"name": "C", "price": 20, "qty": 1}]},
{"id": 3, "items": [{"name": "A", "price": 10, "qty": 1}, {"name": "D", "price": 15, "qty": 3}]},
]
total_revenue = sum(
item["price"] * item["qty"]
for order in orders
for item in order["items"]
)
print(total_revenue) # 100
Key Takeaways
- List comprehensions replace simple for-loops with concise, expressive syntax
- Filtering uses
ifat the end; transforming uses ternaryif-elsebeforefor - Nested comprehensions flatten and combine iterables ā keep nesting to 2 levels max
- The walrus operator (
:=) avoids redundant computation in complex comprehensions - Comprehensions are faster than for-loops due to optimized C implementation
- Dict and set comprehensions extend the pattern to other collection types
- Generator expressions use
()for memory-efficient lazy evaluation - Prefer generators with
sum(),max(),any(),all()for large datasets - Break complex comprehensions into regular loops when readability suffers
- Avoid side effects ā if you're printing or writing, use a loop instead
What's Next
ā Python Dictionary Methods ā Complete Reference ā Python Sets ā Theory and Operations ā Lambda Functions and Functional Programming ā Error Handling with try/except