Python File I/O — Advanced Patterns

Advanced file I/O patterns handle large files, concurrent access, and safe writes. These patterns are essential for production applications.

Learning Objectives

Process large files efficiently with generators
Use memory-mapped files for fast random access
Write atomic files safely (prevent corruption)
Work with temporary files and directories

Large File Processing

Loading entire files into memory is dangerous for large files. A 10GB log file will crash your program if you try f.read(). Instead, process line by line.

def process_large_file(filename):
    """Process file line by line without loading into memory."""
    with open(filename, 'r', buffering=8192) as f:
        for line in f:  # Python reads in chunks automatically
            yield line.strip()

# Memory-efficient line counting
def count_lines(filename):
    return sum(1 for _ in process_large_file(filename))

# Memory-efficient word counting
def count_words(filename):
    word_count = 0
    for line in process_large_file(filename):
        word_count += len(line.split())
    return word_count

# Process only specific lines
def process_every_nth(filename, n=10):
    for i, line in enumerate(process_large_file(filename)):
        if i % n == 0:
            yield line

Why This Works

When you iterate over a file object (for line in f), Python does NOT read the entire file into memory. Instead:

It reads a buffer (default 8KB)
Yields one line at a time
Reads more from disk when buffer is exhausted

This means you can process gigabyte files with only megabytes of RAM.

Memory-Mapped Files

Memory mapping lets you access file contents as if they were in memory, but the OS handles loading pages on demand. This is perfect for random access in large files.

import mmap

# Create a memory-mapped file
with open('large_file.bin', 'r+b') as f:
    # Map the entire file (0 = entire file)
    mm = mmap.mmap(f.fileno(), 0)

    # Access like a byte array
    print(mm[0:100])       # First 100 bytes
    print(mm[1000:2000])   # Bytes 1000-2000

    # Seek and read
    mm.seek(500)
    data = mm.read(100)    # Read 100 bytes from position 500

    # Write
    mm.seek(0)
    mm.write(b'HEADER')

    mm.close()

When to Use Memory Mapping

Scenario	Regular File	Memory Map
Sequential read	Good	Good
Random access	Slow (seek)	Fast
Small file (less than 1MB)	Good	Overkill
Large file (greater than 100MB)	OK	Better
Multiple processes reading	Complex	Easy

Atomic Writes

Atomic writes prevent file corruption. If your program crashes mid-write, the file is either completely written or not written at all — never partially written.

import tempfile
import os

def atomic_write(filename, content):
    """Write atomically using temp file + rename."""
    dir_name = os.path.dirname(filename) or '.'

    # Write to temp file in same directory (same filesystem)
    with tempfile.NamedTemporaryFile(
        mode='w',
        dir=dir_name,
        delete=False,
        suffix='.tmp'
    ) as tmp:
        tmp.write(content)
        tmp_path = tmp.name

    # Atomic on most filesystems (POSIX)
    os.replace(tmp_path, filename)

# Usage
atomic_write('config.json', '{"key": "value"}')

Why This Matters

Without atomic writes:

Program starts writing to config.json
Program crashes at 50% — file is now corrupt
On restart, program reads corrupt config → crash loop

With atomic writes:

Program writes to config.json.tmp
Program crashes — config.json is untouched
On restart, program reads valid config.json

Temporary Files and Directories

import tempfile
import os

# Temporary file (auto-deleted when closed)
with tempfile.NamedTemporaryFile(
    mode='w',
    suffix='.txt',
    delete=True  # Auto-delete on close
) as f:
    f.write("temporary data")
    temp_path = f.name
    # Process data using temp_path

# Temporary directory (all contents deleted)
with tempfile.TemporaryDirectory() as tmpdir:
    # Create files in tmpdir
    path = os.path.join(tmpdir, 'data.txt')
    with open(path, 'w') as f:
        f.write("temp")
    # All files in tmpdir are deleted when exiting

# Persistent temporary file (you must delete manually)
tmp = tempfile.NamedTemporaryFile(delete=False)
tmp.close()
# ... use tmp.name ...
os.unlink(tmp.name)  # Delete when done

File Watching

import time
from pathlib import Path

def watch_file(filename, callback, interval=1.0):
    """Watch for file changes and call callback."""
    last_mtime = Path(filename).stat().st_mtime

    while True:
        try:
            current_mtime = Path(filename).stat().st_mtime
            if current_mtime != last_mtime:
                callback(filename)
                last_mtime = current_mtime
        except FileNotFoundError:
            pass
        time.sleep(interval)

# Usage
def on_change(filename):
    print(f"{filename} was modified!")

watch_file('config.json', on_change)

Key Takeaways

Use generators for memory-efficient line-by-line processing
Memory-mapped files are fast for random access in large files
Atomic writes prevent corruption on crashes
Use tempfile for temporary storage (auto-cleanup)
Buffer large reads for better performance
Use os.replace() for atomic file replacement
Always close files (use context managers)
Use buffering parameter for custom buffer sizes

Python File I/O — Advanced Patterns

Python File I/O — Advanced Patterns

Learning Objectives

Large File Processing

Why This Works

Memory-Mapped Files

When to Use Memory Mapping

Atomic Writes

Why This Matters

Temporary Files and Directories

File Watching

Key Takeaways

Need Expert Python Help?