Google, Meta & Amazon Interview

Generators, Iterators, yield, iter, next

Understanding iteration protocols and lazy evaluation in Python

Interview Question

"Explain the difference between iterators and generators. How does yield work internally? Implement custom iterators and generators for real-world use cases. Discuss memory efficiency and performance implications."

Difficulty: Hard | Frequently asked at Google, Meta, Amazon, Microsoft

Theoretical Foundation

Iterator Protocol

The iterator protocol consists of two methods:

__iter__(): Returns the iterator object itself
__next__(): Returns the next value or raises StopIteration

# Basic iterator implementation
class CountUp:
    """Custom iterator that counts up to a limit."""
    
    def __init__(self, start=0, limit=10):
        self.current = start
        self.limit = limit
    
    def __iter__(self):
        return self
    
    def __next__(self):
        if self.current >= self.limit:
            raise StopIteration
        value = self.current
        self.current += 1
        return value

# Usage
counter = CountUp(start=1, limit=5)
for num in counter:
    print(num, end=" ")
# Output: 1 2 3 4 5

# Manual iteration
counter2 = CountUp(start=1, limit=3)
print(next(counter2))  # 1
print(next(counter2))  # 2
print(next(counter2))  # 3
# next(counter2) would raise StopIteration

Iterables vs Iterators

# Iterable: Has __iter__() method
# Iterator: Has __iter__() AND __next__() methods

class MyIterable:
    """Example of an iterable."""
    
    def __init__(self, data):
        self.data = data
    
    def __iter__(self):
        # Returns a new iterator each time
        return MyIterator(self.data)

class MyIterator:
    """Example of an iterator."""
    
    def __init__(self, data):
        self.data = data
        self.index = 0
    
    def __iter__(self):
        return self
    
    def __next__(self):
        if self.index >= len(self.data):
            raise StopIteration
        value = self.data[self.index]
        self.index += 1
        return value

# Usage
my_iterable = MyIterable([1, 2, 3])

# Can create multiple independent iterators
for item in my_iterable:
    print(item, end=" ")  # 1 2 3
print()

for item in my_iterable:  # Works again!
    print(item, end=" ")  # 1 2 3

ℹ️

Key Concept: Iterables can create multiple independent iterators, while iterators are consumed once.

Generators

Generator Functions

Generators are functions that use yield to produce values lazily.

def countdown(n):
    """Generator function for countdown."""
    print("Starting countdown")
    while n > 0:
        yield n  # Pauses here
        n -= 1
    print("Done!")

# Usage
gen = countdown(3)
print(next(gen))  # Starting countdown, 3
print(next(gen))  # 2
print(next(gen))  # 1
# next(gen) would print "Done!" and raise StopIteration

# In for loop
for num in countdown(3):
    print(num, end=" ")
# Output: Starting countdown 3 2 1 Done!

Generator Expressions

# List comprehension (creates full list in memory)
squares_list = [x*x for x in range(1000000)]
print(f"List size: {len(squares_list)} items")
print(f"Memory: {squares_list.__sizeof__()} bytes")

# Generator expression (lazy evaluation)
squares_gen = (x*x for x in range(1000000))
print(f"Generator: {squares_gen}")
print(f"Memory: {squares_gen.__sizeof__()} bytes")

# Process one item at a time
total = sum(x*x for x in range(1000000))
print(f"Total: {total}")

Output:

Architecture Diagram

List size: 1000000 items
Memory: 8448728 bytes
Generator: <generator object <genexpr> at 0x...>
Memory: 200 bytes
Total: 333333333333500000

How yield Works Internally

def simple_generator():
    """Step-by-step yield execution."""
    print("Step 1: Before first yield")
    yield 1
    print("Step 2: After first yield, before second")
    yield 2
    print("Step 3: After second yield")
    return "Done"  # Optional return value

# Demonstrate execution flow
gen = simple_generator()

print("Creating generator (nothing executes yet)")
print(f"First next(): ", end="")
print(next(gen))  # Executes until first yield

print(f"Second next(): ", end="")
print(next(gen))  # Resumes and executes until second yield

print(f"Third next(): ", end="")
try:
    next(gen)  # Resumes and executes until return/StopIteration
except StopIteration as e:
    print(f"StopIteration with value: {e.value}")

Output:

Architecture Diagram

Creating generator (nothing executes yet)
First next(): Step 1: Before first yield
1
Second next(): Step 2: After first yield, before second
2
Third next(): Step 3: After second yield
StopIteration with value: Done

💡

Key Insight: yield pauses the generator's execution and saves its state. next() resumes from where it left off.

Memory Efficiency

Generator vs List

import sys
import time

def memory_comparison():
    """Compare memory usage of generators vs lists."""
    
    n = 1000000
    
    # List: Creates all items in memory
    start = time.time()
    list_data = [i * i for i in range(n)]
    list_time = time.time() - start
    list_memory = sys.getsizeof(list_data)
    
    # Generator: Creates items on demand
    start = time.time()
    gen_data = (i * i for i in range(n))
    gen_time = time.time() - start
    gen_memory = sys.getsizeof(gen_data)
    
    print(f"List: {list_memory:,} bytes, {list_time:.4f}s")
    print(f"Generator: {gen_memory:,} bytes, {gen_time:.4f}s")
    print(f"Memory savings: {(1 - gen_memory/list_memory) * 100:.1f}%")

memory_comparison()

Output:

Architecture Diagram

List: 8,448,728 bytes, 0.0312s
Generator: 200 bytes, 0.0001s
Memory savings: 100.0%

Infinite Sequences

def fibonacci():
    """Infinite Fibonacci sequence."""
    a, b = 0, 1
    while True:
        yield a
        a, b = b, a + b

def primes():
    """Infinite prime number generator."""
    yield 2
    candidate = 3
    while True:
        is_prime = True
        for i in range(2, int(candidate ** 0.5) + 1):
            if candidate % i == 0:
                is_prime = False
                break
        if is_prime:
            yield candidate
        candidate += 2

# Get first 10 Fibonacci numbers
fib = fibonacci()
first_10_fib = [next(fib) for _ in range(10)]
print(f"First 10 Fibonacci: {first_10_fib}")

# Get first 10 primes
prime_gen = primes()
first_10_primes = [next(prime_gen) for _ in range(10)]
print(f"First 10 Primes: {first_10_primes}")

Output:

Architecture Diagram

First 10 Fibonacci: [0, 1, 1, 2, 3, 5, 8, 13, 21, 34]
First 10 Primes: [2, 3, 5, 7, 11, 13, 17, 19, 23, 29]

ℹ️

Memory Efficiency: Generators can represent infinite sequences with O(1) memory, while lists would consume all available memory.

Advanced Generator Patterns

Generator Pipeline

def read_large_file(file_path):
    """Read file line by line (memory efficient)."""
    with open(file_path, 'r') as f:
        for line in f:
            yield line.strip()

def filter_comments(lines):
    """Filter out comment lines."""
    for line in lines:
        if not line.startswith('#'):
            yield line

def parse_csv(lines):
    """Parse CSV lines."""
    for line in lines:
        yield line.split(',')

def transform_data(records):
    """Transform data."""
    for record in records:
        yield {
            'name': record[0],
            'age': int(record[1]),
            'email': record[2]
        }

# Pipeline: file -> filter -> parse -> transform
def process_large_file(file_path):
    """Process large file with generator pipeline."""
    lines = read_large_file(file_path)
    non_comments = filter_comments(lines)
    csv_data = parse_csv(non_comments)
    transformed = transform_data(csv_data)
    return transformed

# Usage
# for record in process_large_file('large_data.csv'):
#     process(record)

Generator with send()

def accumulator():
    """Generator that accumulates values using send()."""
    total = 0
    while True:
        value = yield total
        if value is None:
            break
        total += value

# Usage
acc = accumulator()
next(acc)  # Prime the generator
print(acc.send(10))   # 10
print(acc.send(20))   # 30
print(acc.send(30))   # 60

# Advanced: Average calculator
def running_average():
    """Calculate running average using send()."""
    total = 0
    count = 0
    average = 0
    while True:
        value = yield average
        if value is None:
            break
        total += value
        count += 1
        average = total / count

avg = running_average()
next(avg)  # Prime
print(avg.send(10))   # 10.0
print(avg.send(20))   # 15.0
print(avg.send(30))   # 20.0

Generator Delegation with yield from

def flatten_nested(nested_list):
    """Flatten arbitrarily nested lists using yield from."""
    for item in nested_list:
        if isinstance(item, (list, tuple)):
            yield from flatten_nested(item)
        else:
            yield item

# Usage
nested = [1, [2, 3], [4, [5, 6]], [[7, 8], 9]]
flat = list(flatten_nested(nested))
print(f"Flattened: {flat}")

# Generator delegation
def outer_generator():
    """Delegate to inner generators."""
    yield from inner_generator_1()
    yield from inner_generator_2()

def inner_generator_1():
    yield 1
    yield 2

def inner_generator_2():
    yield 3
    yield 4

# Usage
for item in outer_generator():
    print(item, end=" ")  # 1 2 3 4

💡

Interview Tip: yield from is a powerful tool for generator delegation and flattening nested structures.

Real-World Use Cases

Database Pagination

def fetch_paginated(url, page_size=100):
    """Fetch paginated data from API."""
    page = 1
    while True:
        # In real code: response = requests.get(f"{url}?page={page}&size={page_size}")
        # Simulate API response
        data = [{"id": i, "page": page} for i in range(page_size)]
        
        if not data:
            break
        
        yield from data
        page += 1

# Usage
# for record in fetch_paginated("https://api.example.com/users"):
#     process(record)

File Processing

def read_in_chunks(file_obj, chunk_size=8192):
    """Read file in chunks for memory efficiency."""
    while True:
        chunk = file_obj.read(chunk_size)
        if not chunk:
            break
        yield chunk

def process_large_file(filename):
    """Process large file chunk by chunk."""
    with open(filename, 'rb') as f:
        for chunk in read_in_chunks(f):
            # Process chunk
            yield len(chunk)

# Usage
# for chunk_size in process_large_file("huge_file.bin"):
#     print(f"Processed {chunk_size} bytes")

Data Stream Processing

import time
from collections import deque

def tail_log(file_path, n=10):
    """Tail last n lines of a log file (like Unix tail -f)."""
    with open(file_path, 'r') as f:
        # Read last n lines
        last_lines = deque(maxlen=n)
        for line in f:
            last_lines.append(line)
        
        yield from last_lines
        
        # Follow new lines
        while True:
            line = f.readline()
            if not line:
                time.sleep(0.1)
                continue
            yield line

# Usage
# for line in tail_log("/var/log/app.log"):
#     print(line, end="")

Web Scraping

import re

def extract_links(html_content):
    """Extract links from HTML content."""
    link_pattern = r'href=["\'](.*?)["\']'
    for match in re.finditer(link_pattern, html_content):
        yield match.group(1)

def crawl(start_url, max_depth=2):
    """Simple web crawler using generators."""
    visited = set()
    queue = [(start_url, 0)]
    
    while queue:
        url, depth = queue.pop(0)
        
        if url in visited or depth > max_depth:
            continue
        
        visited.add(url)
        
        # In real code: response = requests.get(url)
        # html = response.text
        html = f"<a href='http://example.com/page{i}'>Link {i}</a>"
        
        yield {'url': url, 'depth': depth, 'links': list(extract_links(html))}
        
        # Add new URLs to queue
        for link in extract_links(html):
            if link not in visited:
                queue.append((link, depth + 1))

ℹ️

Real-World Pattern: Generators are essential for processing large datasets, streaming data, and memory-efficient operations.

Itertools for Advanced Iteration

import itertools
import operator

def itertools_examples():
    """Demonstrate powerful itertools functions."""
    
    # chain: Combine multiple iterables
    list1 = [1, 2, 3]
    list2 = [4, 5, 6]
    chained = itertools.chain(list1, list2)
    print(f"chain: {list(chained)}")
    
    # islice: Slice an iterator
    def infinite_counter():
        n = 0
        while True:
            yield n
            n += 1
    
    first_10 = list(itertools.islice(infinite_counter(), 10))
    print(f"islice: {first_10}")
    
    # groupby: Group consecutive elements
    data = [('A', 1), ('A', 2), ('B', 3), ('B', 4), ('A', 5)]
    grouped = itertools.groupby(data, key=lambda x: x[0])
    for key, group in grouped:
        print(f"groupby {key}: {list(group)}")
    
    # product: Cartesian product
    colors = ['red', 'blue']
    sizes = ['S', 'M', 'L']
    products = list(itertools.product(colors, sizes))
    print(f"product: {products}")
    
    # permutations and combinations
    items = ['A', 'B', 'C']
    perms = list(itertools.permutations(items, 2))
    print(f"permutations: {perms}")
    
    combos = list(itertools.combinations(items, 2))
    print(f"combinations: {combos}")
    
    # accumulate: Running totals
    numbers = [1, 2, 3, 4, 5]
    accumulated = list(itertools.accumulate(numbers))
    print(f"accumulate: {accumulated}")

itertools_examples()

Output:

Architecture Diagram

chain: [1, 2, 3, 4, 5, 6]
islice: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
groupby A: [('A', 1), ('A', 2)]
groupby B: [('B', 3), ('B', 4)]
groupby A: [('A', 5)]
product: [('red', 'S'), ('red', 'M'), ('red', 'L'), ('blue', 'S'), ('blue', 'M'), ('blue', 'L')]
permutations: [('A', 'B'), ('A', 'C'), ('B', 'A'), ('B', 'C'), ('C', 'A'), ('C', 'B')]
combinations: [('A', 'B'), ('A', 'C'), ('B', 'C')]
accumulate: [1, 3, 6, 10, 15]

Performance Analysis

Time Complexity

Operation	List	Generator	Notes
Creation	O(n)	O(1)	Generator is lazy
First element	O(1)	O(1)
All elements	O(n)	O(n)	Same
Random access	O(1)	O(n)	Generator must iterate
Memory	O(n)	O(1)	Generator is better

Space Complexity

import sys

def space_comparison():
    """Compare space complexity."""
    
    n = 1000000
    
    # List: O(n) space
    list_data = [i for i in range(n)]
    print(f"List memory: {sys.getsizeof(list_data):,} bytes")
    
    # Generator: O(1) space
    gen_data = (i for i in range(n))
    print(f"Generator memory: {sys.getsizeof(gen_data)} bytes")
    
    # Even better: Use itertools
    import itertools
    gen_itertools = itertools.count()
    print(f"itertools memory: {sys.getsizeof(gen_itertools)} bytes")

space_comparison()

Output:

Architecture Diagram

List memory: 8,448,728 bytes
Generator memory: 200 bytes
itertools memory: 56 bytes

⚠️

Performance Trade-off: Generators save memory but don't support random access. Choose based on your access pattern.

Common Patterns and Idioms

Generator Comprehension

# Generator comprehension syntax
squares = (x*x for x in range(10))
even_squares = (x for x in squares if x % 2 == 0)

# Chaining generators
def process_data(data):
    return (
        transform(item)
        for item in data
        if item is not None
        and item > 0
    )

def transform(x):
    return x * 2

StopIteration for Flow Control

def first_positive(numbers):
    """Return first positive number using StopIteration."""
    def _generator():
        for n in numbers:
            if n > 0:
                yield n
                return  # Raises StopIteration with None
    
    return next(_generator(), None)

# Alternative: Custom exception
class Found(Exception):
    def __init__(self, value):
        self.value = value

def find_in_nested(data, target):
    """Find target in nested structure."""
    def _search(obj):
        if isinstance(obj, list):
            for item in obj:
                _search(item)
        elif obj == target:
            raise Found(obj)
    
    try:
        _search(data)
        return None
    except Found as e:
        return e.value

result = find_in_nested([1, [2, 3], [4, [5]]], 3)
print(f"Found: {result}")

Context Manager Generators

from contextlib import contextmanager

@contextmanager
def managed_resource(name):
    """Context manager using generator."""
    print(f"Acquiring {name}")
    resource = {"name": name, "active": True}
    try:
        yield resource
    finally:
        print(f"Releasing {name}")
        resource["active"] = False

# Usage
with managed_resource("database") as res:
    print(f"Using {res}")
    # Do work
print(f"After: {res}")

💡

Interview Tip: Generators + context managers = powerful patterns for resource management.

Interview Tips

Common Follow-up Questions

"What's the difference between return and yield in a generator?"
- yield produces a value and pauses
- return raises StopIteration (can optionally carry a value)
- return in generators (Python 3.3+) sets StopIteration.value
"How do you send values to a generator?"
- Use generator.send(value)
- Generator must be primed with next() first
- yield expression returns the sent value
"What are generator pipelines?"
- Chain multiple generators together
- Each generator processes data and passes to next
- Memory efficient for large data processing

Code Review Tips

# BAD: Creating large list
def get_data():
    return [process(x) for x in large_dataset]

# GOOD: Generator for large data
def get_data():
    return (process(x) for x in large_dataset)

# BAD: Not handling StopIteration
def first_item(iterable):
    return next(iterable)

# GOOD: Handling empty iterable
def first_item(iterable, default=None):
    return next(iterable, default)

# BAD: Consuming generator multiple times
def process():
    gen = (x for x in range(10))
    list1 = list(gen)
    list2 = list(gen)  # Empty!

# GOOD: Create new generator
def process():
    def gen():
        return (x for x in range(10))
    list1 = list(gen())
    list2 = list(gen())  # Works!

⚠️

Common Mistake: Generators are consumed once. Create new generators if you need multiple passes.

Summary

Feature	Iterator	Generator	Generator Expression
Syntax	Class with `__iter__`, `__next__`	Function with `yield`	`(expr for x in iterable)`
Memory	O(n) or O(1)	O(1)	O(1)
State	Manual management	Automatic	Automatic
Complexity	More code	Less code	Least code
Flexibility	Full control	Limited	Limited

When to Use What?

Iterators: When you need full control over iteration
Generators: For lazy evaluation and memory efficiency
Generator Expressions: For simple transformations
itertools: For complex iteration patterns

ℹ️

Key Takeaway: Generators and iterators are fundamental to Python's memory efficiency. Master them for writing Pythonic, performant code.

Practice Problems

Custom Range: Implement a range() function using an iterator
Flatten Nested: Create a generator to flatten arbitrarily nested lists
Window Sliding: Implement a sliding window generator
Chunk Splitter: Create a generator that splits data into chunks
Infinite Buffer: Implement a ring buffer using generators

Generators, Iterators, yield, __iter__, __next__

Generators, Iterators, yield, iter, next

Interview Question

Theoretical Foundation

Iterator Protocol

Iterables vs Iterators

Generators

Generator Functions

Generator Expressions

How yield Works Internally

Memory Efficiency

Generator vs List

Infinite Sequences

Advanced Generator Patterns

Generator Pipeline

Generator with send()

Generator Delegation with yield from

Real-World Use Cases

Database Pagination

File Processing

Data Stream Processing

Web Scraping

Itertools for Advanced Iteration

Performance Analysis

Time Complexity

Space Complexity

Common Patterns and Idioms

Generator Comprehension

StopIteration for Flow Control

Context Manager Generators

Interview Tips

Common Follow-up Questions

Code Review Tips

Summary

When to Use What?

Practice Problems

Further Reading

Generators, Iterators, yield, iter, next