Generators, Iterators, yield, iter, next
Understanding iteration protocols and lazy evaluation in Python
Interview Question
"Explain the difference between iterators and generators. How does yield work internally? Implement custom iterators and generators for real-world use cases. Discuss memory efficiency and performance implications."
Difficulty: Hard | Frequently asked at Google, Meta, Amazon, Microsoft
Theoretical Foundation
Iterator Protocol
The iterator protocol consists of two methods:
__iter__(): Returns the iterator object itself__next__(): Returns the next value or raisesStopIteration
# Basic iterator implementation
class CountUp:
"""Custom iterator that counts up to a limit."""
def __init__(self, start=0, limit=10):
self.current = start
self.limit = limit
def __iter__(self):
return self
def __next__(self):
if self.current >= self.limit:
raise StopIteration
value = self.current
self.current += 1
return value
# Usage
counter = CountUp(start=1, limit=5)
for num in counter:
print(num, end=" ")
# Output: 1 2 3 4 5
# Manual iteration
counter2 = CountUp(start=1, limit=3)
print(next(counter2)) # 1
print(next(counter2)) # 2
print(next(counter2)) # 3
# next(counter2) would raise StopIteration
Iterables vs Iterators
# Iterable: Has __iter__() method
# Iterator: Has __iter__() AND __next__() methods
class MyIterable:
"""Example of an iterable."""
def __init__(self, data):
self.data = data
def __iter__(self):
# Returns a new iterator each time
return MyIterator(self.data)
class MyIterator:
"""Example of an iterator."""
def __init__(self, data):
self.data = data
self.index = 0
def __iter__(self):
return self
def __next__(self):
if self.index >= len(self.data):
raise StopIteration
value = self.data[self.index]
self.index += 1
return value
# Usage
my_iterable = MyIterable([1, 2, 3])
# Can create multiple independent iterators
for item in my_iterable:
print(item, end=" ") # 1 2 3
print()
for item in my_iterable: # Works again!
print(item, end=" ") # 1 2 3
ℹ️
Key Concept: Iterables can create multiple independent iterators, while iterators are consumed once.
Generators
Generator Functions
Generators are functions that use yield to produce values lazily.
def countdown(n):
"""Generator function for countdown."""
print("Starting countdown")
while n > 0:
yield n # Pauses here
n -= 1
print("Done!")
# Usage
gen = countdown(3)
print(next(gen)) # Starting countdown, 3
print(next(gen)) # 2
print(next(gen)) # 1
# next(gen) would print "Done!" and raise StopIteration
# In for loop
for num in countdown(3):
print(num, end=" ")
# Output: Starting countdown 3 2 1 Done!
Generator Expressions
# List comprehension (creates full list in memory)
squares_list = [x*x for x in range(1000000)]
print(f"List size: {len(squares_list)} items")
print(f"Memory: {squares_list.__sizeof__()} bytes")
# Generator expression (lazy evaluation)
squares_gen = (x*x for x in range(1000000))
print(f"Generator: {squares_gen}")
print(f"Memory: {squares_gen.__sizeof__()} bytes")
# Process one item at a time
total = sum(x*x for x in range(1000000))
print(f"Total: {total}")
Output:
List size: 1000000 items
Memory: 8448728 bytes
Generator: <generator object <genexpr> at 0x...>
Memory: 200 bytes
Total: 333333333333500000
How yield Works Internally
def simple_generator():
"""Step-by-step yield execution."""
print("Step 1: Before first yield")
yield 1
print("Step 2: After first yield, before second")
yield 2
print("Step 3: After second yield")
return "Done" # Optional return value
# Demonstrate execution flow
gen = simple_generator()
print("Creating generator (nothing executes yet)")
print(f"First next(): ", end="")
print(next(gen)) # Executes until first yield
print(f"Second next(): ", end="")
print(next(gen)) # Resumes and executes until second yield
print(f"Third next(): ", end="")
try:
next(gen) # Resumes and executes until return/StopIteration
except StopIteration as e:
print(f"StopIteration with value: {e.value}")
Output:
Creating generator (nothing executes yet)
First next(): Step 1: Before first yield
1
Second next(): Step 2: After first yield, before second
2
Third next(): Step 3: After second yield
StopIteration with value: Done
💡
Key Insight: yield pauses the generator's execution and saves its state. next() resumes from where it left off.
Memory Efficiency
Generator vs List
import sys
import time
def memory_comparison():
"""Compare memory usage of generators vs lists."""
n = 1000000
# List: Creates all items in memory
start = time.time()
list_data = [i * i for i in range(n)]
list_time = time.time() - start
list_memory = sys.getsizeof(list_data)
# Generator: Creates items on demand
start = time.time()
gen_data = (i * i for i in range(n))
gen_time = time.time() - start
gen_memory = sys.getsizeof(gen_data)
print(f"List: {list_memory:,} bytes, {list_time:.4f}s")
print(f"Generator: {gen_memory:,} bytes, {gen_time:.4f}s")
print(f"Memory savings: {(1 - gen_memory/list_memory) * 100:.1f}%")
memory_comparison()
Output:
List: 8,448,728 bytes, 0.0312s
Generator: 200 bytes, 0.0001s
Memory savings: 100.0%
Infinite Sequences
def fibonacci():
"""Infinite Fibonacci sequence."""
a, b = 0, 1
while True:
yield a
a, b = b, a + b
def primes():
"""Infinite prime number generator."""
yield 2
candidate = 3
while True:
is_prime = True
for i in range(2, int(candidate ** 0.5) + 1):
if candidate % i == 0:
is_prime = False
break
if is_prime:
yield candidate
candidate += 2
# Get first 10 Fibonacci numbers
fib = fibonacci()
first_10_fib = [next(fib) for _ in range(10)]
print(f"First 10 Fibonacci: {first_10_fib}")
# Get first 10 primes
prime_gen = primes()
first_10_primes = [next(prime_gen) for _ in range(10)]
print(f"First 10 Primes: {first_10_primes}")
Output:
First 10 Fibonacci: [0, 1, 1, 2, 3, 5, 8, 13, 21, 34]
First 10 Primes: [2, 3, 5, 7, 11, 13, 17, 19, 23, 29]
ℹ️
Memory Efficiency: Generators can represent infinite sequences with O(1) memory, while lists would consume all available memory.
Advanced Generator Patterns
Generator Pipeline
def read_large_file(file_path):
"""Read file line by line (memory efficient)."""
with open(file_path, 'r') as f:
for line in f:
yield line.strip()
def filter_comments(lines):
"""Filter out comment lines."""
for line in lines:
if not line.startswith('#'):
yield line
def parse_csv(lines):
"""Parse CSV lines."""
for line in lines:
yield line.split(',')
def transform_data(records):
"""Transform data."""
for record in records:
yield {
'name': record[0],
'age': int(record[1]),
'email': record[2]
}
# Pipeline: file -> filter -> parse -> transform
def process_large_file(file_path):
"""Process large file with generator pipeline."""
lines = read_large_file(file_path)
non_comments = filter_comments(lines)
csv_data = parse_csv(non_comments)
transformed = transform_data(csv_data)
return transformed
# Usage
# for record in process_large_file('large_data.csv'):
# process(record)
Generator with send()
def accumulator():
"""Generator that accumulates values using send()."""
total = 0
while True:
value = yield total
if value is None:
break
total += value
# Usage
acc = accumulator()
next(acc) # Prime the generator
print(acc.send(10)) # 10
print(acc.send(20)) # 30
print(acc.send(30)) # 60
# Advanced: Average calculator
def running_average():
"""Calculate running average using send()."""
total = 0
count = 0
average = 0
while True:
value = yield average
if value is None:
break
total += value
count += 1
average = total / count
avg = running_average()
next(avg) # Prime
print(avg.send(10)) # 10.0
print(avg.send(20)) # 15.0
print(avg.send(30)) # 20.0
Generator Delegation with yield from
def flatten_nested(nested_list):
"""Flatten arbitrarily nested lists using yield from."""
for item in nested_list:
if isinstance(item, (list, tuple)):
yield from flatten_nested(item)
else:
yield item
# Usage
nested = [1, [2, 3], [4, [5, 6]], [[7, 8], 9]]
flat = list(flatten_nested(nested))
print(f"Flattened: {flat}")
# Generator delegation
def outer_generator():
"""Delegate to inner generators."""
yield from inner_generator_1()
yield from inner_generator_2()
def inner_generator_1():
yield 1
yield 2
def inner_generator_2():
yield 3
yield 4
# Usage
for item in outer_generator():
print(item, end=" ") # 1 2 3 4
💡
Interview Tip: yield from is a powerful tool for generator delegation and flattening nested structures.
Real-World Use Cases
Database Pagination
def fetch_paginated(url, page_size=100):
"""Fetch paginated data from API."""
page = 1
while True:
# In real code: response = requests.get(f"{url}?page={page}&size={page_size}")
# Simulate API response
data = [{"id": i, "page": page} for i in range(page_size)]
if not data:
break
yield from data
page += 1
# Usage
# for record in fetch_paginated("https://api.example.com/users"):
# process(record)
File Processing
def read_in_chunks(file_obj, chunk_size=8192):
"""Read file in chunks for memory efficiency."""
while True:
chunk = file_obj.read(chunk_size)
if not chunk:
break
yield chunk
def process_large_file(filename):
"""Process large file chunk by chunk."""
with open(filename, 'rb') as f:
for chunk in read_in_chunks(f):
# Process chunk
yield len(chunk)
# Usage
# for chunk_size in process_large_file("huge_file.bin"):
# print(f"Processed {chunk_size} bytes")
Data Stream Processing
import time
from collections import deque
def tail_log(file_path, n=10):
"""Tail last n lines of a log file (like Unix tail -f)."""
with open(file_path, 'r') as f:
# Read last n lines
last_lines = deque(maxlen=n)
for line in f:
last_lines.append(line)
yield from last_lines
# Follow new lines
while True:
line = f.readline()
if not line:
time.sleep(0.1)
continue
yield line
# Usage
# for line in tail_log("/var/log/app.log"):
# print(line, end="")
Web Scraping
import re
def extract_links(html_content):
"""Extract links from HTML content."""
link_pattern = r'href=["\'](.*?)["\']'
for match in re.finditer(link_pattern, html_content):
yield match.group(1)
def crawl(start_url, max_depth=2):
"""Simple web crawler using generators."""
visited = set()
queue = [(start_url, 0)]
while queue:
url, depth = queue.pop(0)
if url in visited or depth > max_depth:
continue
visited.add(url)
# In real code: response = requests.get(url)
# html = response.text
html = f"<a href='http://example.com/page{i}'>Link {i}</a>"
yield {'url': url, 'depth': depth, 'links': list(extract_links(html))}
# Add new URLs to queue
for link in extract_links(html):
if link not in visited:
queue.append((link, depth + 1))
ℹ️
Real-World Pattern: Generators are essential for processing large datasets, streaming data, and memory-efficient operations.
Itertools for Advanced Iteration
import itertools
import operator
def itertools_examples():
"""Demonstrate powerful itertools functions."""
# chain: Combine multiple iterables
list1 = [1, 2, 3]
list2 = [4, 5, 6]
chained = itertools.chain(list1, list2)
print(f"chain: {list(chained)}")
# islice: Slice an iterator
def infinite_counter():
n = 0
while True:
yield n
n += 1
first_10 = list(itertools.islice(infinite_counter(), 10))
print(f"islice: {first_10}")
# groupby: Group consecutive elements
data = [('A', 1), ('A', 2), ('B', 3), ('B', 4), ('A', 5)]
grouped = itertools.groupby(data, key=lambda x: x[0])
for key, group in grouped:
print(f"groupby {key}: {list(group)}")
# product: Cartesian product
colors = ['red', 'blue']
sizes = ['S', 'M', 'L']
products = list(itertools.product(colors, sizes))
print(f"product: {products}")
# permutations and combinations
items = ['A', 'B', 'C']
perms = list(itertools.permutations(items, 2))
print(f"permutations: {perms}")
combos = list(itertools.combinations(items, 2))
print(f"combinations: {combos}")
# accumulate: Running totals
numbers = [1, 2, 3, 4, 5]
accumulated = list(itertools.accumulate(numbers))
print(f"accumulate: {accumulated}")
itertools_examples()
Output:
chain: [1, 2, 3, 4, 5, 6]
islice: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
groupby A: [('A', 1), ('A', 2)]
groupby B: [('B', 3), ('B', 4)]
groupby A: [('A', 5)]
product: [('red', 'S'), ('red', 'M'), ('red', 'L'), ('blue', 'S'), ('blue', 'M'), ('blue', 'L')]
permutations: [('A', 'B'), ('A', 'C'), ('B', 'A'), ('B', 'C'), ('C', 'A'), ('C', 'B')]
combinations: [('A', 'B'), ('A', 'C'), ('B', 'C')]
accumulate: [1, 3, 6, 10, 15]
Performance Analysis
Time Complexity
| Operation | List | Generator | Notes |
|---|---|---|---|
| Creation | O(n) | O(1) | Generator is lazy |
| First element | O(1) | O(1) | |
| All elements | O(n) | O(n) | Same |
| Random access | O(1) | O(n) | Generator must iterate |
| Memory | O(n) | O(1) | Generator is better |
Space Complexity
import sys
def space_comparison():
"""Compare space complexity."""
n = 1000000
# List: O(n) space
list_data = [i for i in range(n)]
print(f"List memory: {sys.getsizeof(list_data):,} bytes")
# Generator: O(1) space
gen_data = (i for i in range(n))
print(f"Generator memory: {sys.getsizeof(gen_data)} bytes")
# Even better: Use itertools
import itertools
gen_itertools = itertools.count()
print(f"itertools memory: {sys.getsizeof(gen_itertools)} bytes")
space_comparison()
Output:
List memory: 8,448,728 bytes
Generator memory: 200 bytes
itertools memory: 56 bytes
⚠️
Performance Trade-off: Generators save memory but don't support random access. Choose based on your access pattern.
Common Patterns and Idioms
Generator Comprehension
# Generator comprehension syntax
squares = (x*x for x in range(10))
even_squares = (x for x in squares if x % 2 == 0)
# Chaining generators
def process_data(data):
return (
transform(item)
for item in data
if item is not None
and item > 0
)
def transform(x):
return x * 2
StopIteration for Flow Control
def first_positive(numbers):
"""Return first positive number using StopIteration."""
def _generator():
for n in numbers:
if n > 0:
yield n
return # Raises StopIteration with None
return next(_generator(), None)
# Alternative: Custom exception
class Found(Exception):
def __init__(self, value):
self.value = value
def find_in_nested(data, target):
"""Find target in nested structure."""
def _search(obj):
if isinstance(obj, list):
for item in obj:
_search(item)
elif obj == target:
raise Found(obj)
try:
_search(data)
return None
except Found as e:
return e.value
result = find_in_nested([1, [2, 3], [4, [5]]], 3)
print(f"Found: {result}")
Context Manager Generators
from contextlib import contextmanager
@contextmanager
def managed_resource(name):
"""Context manager using generator."""
print(f"Acquiring {name}")
resource = {"name": name, "active": True}
try:
yield resource
finally:
print(f"Releasing {name}")
resource["active"] = False
# Usage
with managed_resource("database") as res:
print(f"Using {res}")
# Do work
print(f"After: {res}")
💡
Interview Tip: Generators + context managers = powerful patterns for resource management.
Interview Tips
Common Follow-up Questions
-
"What's the difference between
returnandyieldin a generator?"yieldproduces a value and pausesreturnraisesStopIteration(can optionally carry a value)returnin generators (Python 3.3+) setsStopIteration.value
-
"How do you send values to a generator?"
- Use
generator.send(value) - Generator must be primed with
next()first yieldexpression returns the sent value
- Use
-
"What are generator pipelines?"
- Chain multiple generators together
- Each generator processes data and passes to next
- Memory efficient for large data processing
Code Review Tips
# BAD: Creating large list
def get_data():
return [process(x) for x in large_dataset]
# GOOD: Generator for large data
def get_data():
return (process(x) for x in large_dataset)
# BAD: Not handling StopIteration
def first_item(iterable):
return next(iterable)
# GOOD: Handling empty iterable
def first_item(iterable, default=None):
return next(iterable, default)
# BAD: Consuming generator multiple times
def process():
gen = (x for x in range(10))
list1 = list(gen)
list2 = list(gen) # Empty!
# GOOD: Create new generator
def process():
def gen():
return (x for x in range(10))
list1 = list(gen())
list2 = list(gen()) # Works!
⚠️
Common Mistake: Generators are consumed once. Create new generators if you need multiple passes.
Summary
| Feature | Iterator | Generator | Generator Expression |
|---|---|---|---|
| Syntax | Class with __iter__, __next__ | Function with yield | (expr for x in iterable) |
| Memory | O(n) or O(1) | O(1) | O(1) |
| State | Manual management | Automatic | Automatic |
| Complexity | More code | Less code | Least code |
| Flexibility | Full control | Limited | Limited |
When to Use What?
- Iterators: When you need full control over iteration
- Generators: For lazy evaluation and memory efficiency
- Generator Expressions: For simple transformations
- itertools: For complex iteration patterns
ℹ️
Key Takeaway: Generators and iterators are fundamental to Python's memory efficiency. Master them for writing Pythonic, performant code.
Practice Problems
- Custom Range: Implement a range() function using an iterator
- Flatten Nested: Create a generator to flatten arbitrarily nested lists
- Window Sliding: Implement a sliding window generator
- Chunk Splitter: Create a generator that splits data into chunks
- Infinite Buffer: Implement a ring buffer using generators
Further Reading
- PEP 255: Simple Generators
- PEP 342: Coroutines via Enhanced Generators
- PEP 380: Syntax for Delegating to a Sub-Generator
- Books: "Fluent Python" by Luciano Ramalho
Remember: Generators are lazy iterators that yield values one at a time. They're essential for memory-efficient processing of large datasets.