Python Generators

Introduction

Python generators are functions that produce a sequence of values lazily, one at a time, using the yield keyword. They provide memory-efficient iteration over large datasets and are essential for handling streaming data in data science applications. Generators maintain state between iterations, making them powerful for complex data processing pipelines.

Key Concepts

Lazy evaluation: Values produced on-demand, not stored in memory
Yield keyword: Pauses function and returns value without terminating
Generator object: Iterator that produces values when iterated
Memory efficiency: Only one item in memory at a time
Stateful iteration: Maintains position between iterations
Generator expressions: Similar to list comprehensions but lazy

Python Implementation

# Basic generator function
def count_up_to(n):
    count = 1
    while count <= n:
        yield count
        count += 1

for i in count_up_to(5):
    print(i)  # Prints 1, 2, 3, 4, 5

# Generator for large data processing
def process_large_file(filepath):
    with open(filepath, 'r') as file:
        for line in file:
            yield line.strip()

# Fibonacci generator
def fibonacci():
    a, b = 0, 1
    while True:
        yield a
        a, b = b, a + b

fib = fibonacci()
for i in range(10):
    print(next(fib))  # 0, 1, 1, 2, 3, 5, 8, 13, 21, 34

# Generator expression (like list comprehension but lazy)
squares_gen = (x**2 for x in range(1000000))
# Does not create list in memory - each value computed on demand

# Chaining generators
def pipeline(data):
    for item in data:
        yield item * 2

def filter_pipeline(data):
    for item in data:
        if item > 0:
            yield item

# Using send() for two-way communication
def counter():
    total = 0
    while True:
        value = yield total
        if value:
            total += value

c = counter()
print(c.send(None))  # 0
print(c.send(5))     # 5
print(c.send(3))     # 8

When to Use

Processing large datasets that don't fit in memory
Streaming data from files or APIs
Creating infinite sequences
Building data pipelines with transformation steps
Avoiding memory overhead of list creation
Implementing custom iteration patterns

Key Takeaways

Generators are memory-efficient for large datasets as they don't store all values
The yield keyword pauses execution, maintaining state between iterations
Generator expressions offer lazy evaluation similar to generators
Generators can be chained for efficient data pipelines
Once exhausted, generators cannot be reused without recreation

Introduction

Key Concepts

Python Implementation

When to Use

Key Takeaways

Need Expert Data Science Help?