Python File Handling — Reading, Writing, and Managing Files

Learning Objectives

By the end of this tutorial, you will be able to:

Open files using the open() function and understand all file modes
Use context managers (with statement) to safely handle files
Read files using read(), readline(), readlines(), and iteration
Write data to files with write() and writelines()
Work with file paths using os.path and pathlib
Perform common file operations like renaming, copying, and deleting
Read and write CSV and JSON files
Use temporary files safely
Avoid common file handling mistakes

Opening Files

Python's built-in open() function is the gateway to file operations. It returns a file object that you can use to read from or write to the file.

The `open()` Function

file = open("example.txt", "r")
content = file.read()
file.close()

The open() function takes two primary arguments: the file path and the mode. The mode determines how you interact with the file — whether you are reading, writing, or appending data.

File Modes

Mode	Description	File Must Exist	Creates File	Position
`'r'`	Read (text)	Yes	No	Start
`'w'`	Write (text)	No	Yes (overwrites)	Start
`'a'`	Append (text)	No	Yes	End
`'r+'`	Read + Write (text)	Yes	No	Start
`'w+'`	Write + Read (text)	No	Yes (overwrites)	Start
`'a+'`	Append + Read (text)	No	Yes	End
`'x'`	Exclusive create	No	Yes (fails if exists)	Start

# Reading a file (default mode is 'r')
f = open("data.txt", "r")

# Writing to a file (overwrites existing content)
f = open("output.txt", "w")
f.write("Hello, World!")
f.close()

# Appending to a file
f = open("log.txt", "a")
f.write("New log entry\n")
f.close()

# Creating a new file (fails if file already exists)
f = open("new_file.txt", "x")
f.write("Created for the first time")
f.close()

# This will raise a FileExistsError
# f = open("new_file.txt", "x")  # FileExistsError

Binary Modes

Binary modes are used for non-text files like images, audio, or any data that should be treated as raw bytes rather than text.

# Reading a binary file (e.g., an image)
with open("photo.jpg", "rb") as f:
    data = f.read()
    print(type(data))  # <class 'bytes'>
    print(len(data))   # Size in bytes

# Writing binary data
with open("output.bin", "wb") as f:
    f.write(b"\x00\x01\x02\x03\x04")

The Encoding Parameter

When working with text files, encoding determines how characters are stored as bytes. UTF-8 is the most common encoding and is the default in Python 3.

# Explicitly specifying encoding (recommended)
with open("notes.txt", "r", encoding="utf-8") as f:
    content = f.read()

# Writing with specific encoding
with open("unicode.txt", "w", encoding="utf-8") as f:
    f.write("Caf\u00e9 r\u00e9sum\u00e9")

# Using a different encoding
with open("legacy.txt", "r", encoding="latin-1") as f:
    content = f.read()

Context Managers (with Statement)

The with statement ensures that files are properly closed after the block of code exits, even if an exception occurs. This is the recommended way to handle files in Python.

Why Use `with` Statement

Without with, you must manually close files, which is error-prone:

# Bad: Manual file closing (not recommended)
f = open("data.txt", "r")
try:
    content = f.read()
    # If an exception occurs here, f.close() is never called
finally:
    f.close()

# Good: Using context manager
with open("data.txt", "r") as f:
    content = f.read()
# File is automatically closed here, even if an exception occurred

Auto-Closing Files

The with statement calls __enter__ when entering the block and __exit__ when leaving it. The __exit__ method of file objects calls close().

with open("example.txt", "w") as f:
    f.write("This file will be closed automatically")
    print(f.closed)  # False — still inside the block

print(f.closed)  # True — automatically closed

Multiple Files in One `with`

You can handle multiple files in a single with statement:

with open("input.txt", "r") as infile, open("output.txt", "w") as outfile:
    for line in infile:
        outfile.write(line.upper())

# Or using parenthesized context manager (Python 3.10+)
with (
    open("input.txt", "r") as infile,
    open("output.txt", "w") as outfile,
):
    for line in infile:
        outfile.write(line.upper())

Reading Files

Python provides several methods to read file content, each suited for different use cases.

`read()` — Entire File

with open("poem.txt", "r") as f:
    content = f.read()
    print(content)

Output:

Architecture Diagram

Roses are red,
Violets are blue,
Python is awesome,
And so are you.

# Reading a file and printing its content
with open("data.txt", "r") as f:
    whole_file = f.read()
    print(f"File content:\n{whole_file}")
    print(f"Total characters: {len(whole_file)}")

`read(n)` — N Characters at a Time

with open("data.txt", "r") as f:
    chunk1 = f.read(5)   # Read first 5 characters
    chunk2 = f.read(5)   # Read next 5 characters
    print(f"Chunk 1: '{chunk1}'")
    print(f"Chunk 2: '{chunk2}'")

Output:

Architecture Diagram

Chunk 1: 'Hello'
Chunk 2: ', Wor'

`readline()` — One Line at a Time

with open("fruits.txt", "r") as f:
    line1 = f.readline()
    line2 = f.readline()
    print(f"Line 1: {line1.strip()}")  # strip() removes trailing newline
    print(f"Line 2: {line2.strip()}")

Output:

Architecture Diagram

Line 1: Apple
Line 2: Banana

`readlines()` — List of Lines

with open("fruits.txt", "r") as f:
    lines = f.readlines()
    print(lines)
    print(f"Total lines: {len(lines)}")

Output:

Architecture Diagram

['Apple\n', 'Banana\n', 'Cherry\n', 'Date\n']
Total lines: 4

Iterating Line by Line (Memory Efficient)

This is the most memory-efficient way to read large files, as only one line is loaded into memory at a time.

with open("large_file.txt", "r") as f:
    for line in f:
        print(line.strip())  # strip() removes the trailing newline

# Counting lines in a file
with open("data.txt", "r") as f:
    line_count = sum(1 for line in f)
    print(f"File has {line_count} lines")

# Finding lines containing a keyword
with open("logs.txt", "r") as f:
    error_lines = [line.strip() for line in f if "ERROR" in line]
    print(f"Found {len(error_lines)} error lines")

Writing Files

`write()` — Write a String

with open("output.txt", "w") as f:
    bytes_written = f.write("First line\n")
    f.write("Second line\n")
    f.write(f"Third line\n")
    print(f"Bytes written: {bytes_written}")

# Writing formatted data
with open("report.txt", "w", encoding="utf-8") as f:
    f.write("=" * 40 + "\n")
    f.write("SALES REPORT\n")
    f.write("=" * 40 + "\n\n")
    f.write(f"Total Sales: $12,450.00\n")
    f.write(f"Items Sold: 342\n")
    f.write(f"Average: $36.40\n")

`writelines()` — Write a List of Strings

lines = ["Line 1\n", "Line 2\n", "Line 3\n"]

with open("output.txt", "w") as f:
    f.writelines(lines)

Important: writelines() does not add newlines. You must include \n in each string.

# writelines does NOT add newlines automatically
fruits = ["apple", "banana", "cherry"]

with open("fruits.txt", "w") as f:
    f.writelines(fruits)  # Results in: "applebananacherry"

# Correct approach — include newlines
with open("fruits.txt", "w") as f:
    f.writelines(f"{fruit}\n" for fruit in fruits)

Truncating Files

You can truncate a file to a specific size using truncate():

with open("data.txt", "r+") as f:
    content = f.read()
    print(f"Original size: {len(content)} characters")
    f.seek(0)        # Go back to the start
    f.truncate(10)   # Keep only first 10 characters

Working with Paths

`os.path` Module

The os.path module provides functions for common path operations:

import os

# Joining path components
path = os.path.join("folder", "subfolder", "file.txt")
print(path)  # folder/subfolder/file.txt (on Unix)
# folder\subfolder\file.txt (on Windows)

# Getting absolute path
abs_path = os.path.abspath("data.txt")
print(abs_path)  # C:\Users\user\project\data.txt

# Getting filename and extension
filename = os.path.basename("/home/user/report.pdf")
print(filename)  # report.pdf

dirname = os.path.dirname("/home/user/report.pdf")
print(dirname)  # /home/user

name, ext = os.path.splitext("report.pdf")
print(name, ext)  # report .pdf

# Checking file existence
if os.path.exists("config.txt"):
    print("Config file found!")
else:
    print("Config file not found.")

# Checking file type
print(os.path.isfile("data.txt"))    # True if it's a file
print(os.path.isdir("folder"))       # True if it's a directory

`pathlib` Module (Modern Approach)

pathlib provides an object-oriented interface to filesystem paths. It is the recommended approach in modern Python.

from pathlib import Path

# Creating Path objects
p = Path("folder") / "subfolder" / "file.txt"
print(p)  # folder/subfolder/file.txt

# Path components
path = Path("/home/user/documents/report.pdf")
print(path.name)      # report.pdf
print(path.stem)      # report
print(path.suffix)    # .pdf
print(path.parent)    # /home/user/documents

# Absolute path
abs_path = Path("data.txt").resolve()
print(abs_path)

# Checking existence
if Path("config.txt").exists():
    print("Config file exists!")

if Path("data.txt").is_file():
    print("It's a file!")

if Path("my_folder").is_dir():
    print("It's a directory!")

Creating, Checking, and Joining Paths

from pathlib import Path

# Create a Path object
base = Path("project")
data_dir = base / "data"
output_file = data_dir / "results.csv"

print(data_dir)     # project/data
print(output_file)  # project/data/results.csv

# Check before creating
if not data_dir.exists():
    data_dir.mkdir(parents=True)  # parents=True creates parent dirs too
    print(f"Created: {data_dir}")

File Operations

Renaming Files

import os
from pathlib import Path

# Using os.rename
os.rename("old_name.txt", "new_name.txt")

# Using pathlib (recommended)
path = Path("old_name.txt")
path.rename("new_name.txt")

Copying Files

import shutil

# Copy a file
shutil.copy("source.txt", "destination.txt")

# Copy with metadata (permissions, timestamps)
shutil.copy2("source.txt", "destination.txt")

# Copy an entire directory
shutil.copytree("source_dir", "destination_dir")

Deleting Files

import os
from pathlib import Path

# Using os.remove
os.remove("file_to_delete.txt")

# Using pathlib
Path("file_to_delete.txt").unlink()

# Deleting a directory and all its contents
import shutil
shutil.rmtree("directory_to_delete")

# Safe deletion (only if exists)
Path("maybe_exists.txt").unlink(missing_ok=True)

Creating Directories

import os
from pathlib import Path

# Create a single directory
os.mkdir("new_folder")

# Create directory and any missing parents
os.makedirs("path/to/new/folder", exist_ok=True)

# Using pathlib
Path("new_folder").mkdir(exist_ok=True)
Path("path/to/new/folder").mkdir(parents=True, exist_ok=True)

Listing Directory Contents

import os
from pathlib import Path

# List all items in a directory
items = os.listdir("my_folder")
print(items)

# Using pathlib
path = Path("my_folder")
for item in path.iterdir():
    print(item.name, "is", "dir" if item.is_dir() else "file")

# Glob pattern matching
py_files = list(Path(".").glob("*.py"))
print(py_files)

# Recursive glob
all_py = list(Path(".").rglob("*.py"))
print(all_py)

Working with CSV

CSV Module Basics

The csv module provides built-in support for reading and writing CSV files.

import csv

# Basic CSV reading
with open("data.csv", "r", newline="", encoding="utf-8") as f:
    reader = csv.reader(f)
    for row in reader:
        print(row)

Reading CSV Files

import csv

# Reading CSV into a list of lists
with open("employees.csv", "r", newline="", encoding="utf-8") as f:
    reader = csv.reader(f)
    header = next(reader)  # Skip header
    data = list(reader)
    print(f"Header: {header}")
    print(f"Rows: {len(data)}")
    for row in data[:3]:  # Print first 3 rows
        print(row)

Output:

Architecture Diagram

Header: ['Name', 'Department', 'Salary']
Rows: 5
['Alice', 'Engineering', '85000']
['Bob', 'Marketing', '72000']
['Charlie', 'Engineering', '90000']

Writing CSV Files

import csv

# Writing CSV from a list of lists
data = [
    ["Name", "Age", "City"],
    ["Alice", "30", "New York"],
    ["Bob", "25", "San Francisco"],
    ["Charlie", "35", "Chicago"],
]

with open("people.csv", "w", newline="", encoding="utf-8") as f:
    writer = csv.writer(f)
    writer.writerows(data)

DictReader and DictWriter

import csv

# DictReader — each row is a dictionary
with open("employees.csv", "r", newline="", encoding="utf-8") as f:
    reader = csv.DictReader(f)
    for row in reader:
        print(f"{row['Name']} works in {row['Department']}")

# DictWriter — write from dictionaries
employees = [
    {"Name": "Alice", "Department": "Engineering", "Salary": "85000"},
    {"Name": "Bob", "Department": "Marketing", "Salary": "72000"},
]

with open("output.csv", "w", newline="", encoding="utf-8") as f:
    fieldnames = ["Name", "Department", "Salary"]
    writer = csv.DictWriter(f, fieldnames=fieldnames)
    writer.writeheader()
    writer.writerows(employees)

Working with JSON

`json.load()` and `json.dump()`

These functions work with file objects directly.

import json

# Writing JSON to a file
data = {
    "name": "Alice",
    "age": 30,
    "skills": ["Python", "SQL", "JavaScript"],
    "address": {
        "city": "New York",
        "zip": "10001"
    }
}

with open("person.json", "w", encoding="utf-8") as f:
    json.dump(data, f, indent=2)

# Reading JSON from a file
with open("person.json", "r", encoding="utf-8") as f:
    loaded_data = json.load(f)
    print(loaded_data["name"])  # Alice
    print(loaded_data["skills"])  # ['Python', 'SQL', 'JavaScript']

`json.loads()` and `json.dumps()`

These functions work with strings instead of files.

import json

# Convert Python dict to JSON string
data = {"name": "Bob", "age": 25}
json_string = json.dumps(data)
print(json_string)  # {"name": "Bob", "age": 25}
print(type(json_string))  # <class 'str'>

# Convert JSON string to Python dict
parsed = json.loads(json_string)
print(parsed["name"])  # Bob

Pretty Printing

import json

data = {"users": [{"name": "Alice", "age": 30}, {"name": "Bob", "age": 25}]}

# Compact output
compact = json.dumps(data)
print(compact)
# {"users": [{"name": "Alice", "age": 30}, {"name": "Bob", "age": 25}]}

# Pretty printed output
pretty = json.dumps(data, indent=4)
print(pretty)
# {
#     "users": [
#         {
#             "name": "Alice",
#             "age": 30
#         },
#         {
#             "name": "Bob",
#             "age": 25
#         }
#     ]
# }

# Sorted keys
sorted_json = json.dumps(data, indent=2, sort_keys=True)
print(sorted_json)

Custom Encoders and Decoders

import json
from datetime import datetime, date

# Custom encoder for objects that aren't JSON serializable
class DateEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, (datetime, date)):
            return obj.isoformat()
        return super().default(obj)

data = {
    "event": "Conference",
    "date": datetime.now()
}

serialized = json.dumps(data, cls=DateEncoder, indent=2)
print(serialized)
# {
#   "event": "Conference",
#   "date": "2025-01-15T10:30:00.123456"
# }

# Custom decoder
def date_decoder(dct):
    for key, value in dct.items():
        if isinstance(value, str):
            try:
                dct[key] = datetime.fromisoformat(value)
            except (ValueError, TypeError):
                pass
    return dct

parsed = json.loads(serialized, object_hook=date_decoder)
print(parsed["date"])  # datetime object

Temporary Files

The tempfile module creates temporary files and directories that are automatically cleaned up.

`NamedTemporaryFile`

import tempfile

# Creates a temporary file with a name
with tempfile.NamedTemporaryFile(mode="w", suffix=".txt", delete=True) as f:
    f.write("Temporary data")
    print(f"File name: {f.name}")
    # File is deleted when the block exits

# Keeping the file after closing
temp = tempfile.NamedTemporaryFile(mode="w", suffix=".txt", delete=False)
temp.write("Persistent temp data")
temp.close()
print(f"Temp file: {temp.name}")
# You must delete it manually: os.unlink(temp.name)

`TemporaryDirectory`

import tempfile
import os

# Creates a temporary directory
with tempfile.TemporaryDirectory() as tmpdir:
    print(f"Temp directory: {tmpdir}")

    # Create files inside it
    filepath = os.path.join(tmpdir, "data.txt")
    with open(filepath, "w") as f:
        f.write("Temporary content")

    # Process the files
    with open(filepath, "r") as f:
        print(f.read())

# Directory and all contents are automatically deleted

File Locking (Preview)

File locking prevents multiple processes from accessing the same file simultaneously, which can prevent data corruption.

`fcntl` on Unix

import fcntl

with open("shared.txt", "r+") as f:
    try:
        fcntl.flock(f, fcntl.LOCK_EX | fcntl.LOCK_NB)  # Exclusive lock, non-blocking
        # Critical section — only one process at a time
        data = f.read()
        f.seek(0)
        f.write(data + "Updated by process 1\n")
    except BlockingIOError:
        print("File is locked by another process")
    finally:
        fcntl.flock(f, fcntl.LOCK_UN)  # Release the lock

`msvcrt` on Windows

import msvcrt

with open("shared.txt", "r+") as f:
    try:
        msvcrt.locking(f.fileno(), msvcrt.LK_NBLCK, 1)  # Lock 1 byte
        # Critical section
        data = f.read()
    except OSError:
        print("File is locked by another process")
    finally:
        try:
            msvcrt.locking(f.fileno(), msvcrt.LK_UNLCK, 1)  # Release lock
        except OSError:
            pass

Common Mistakes

Mistake 1: Not Closing Files

Problem: Forgetting to close files leads to resource leaks and potential data loss.

# Bad: File never closed
f = open("data.txt", "w")
f.write("Hello")
# What if an exception happens before f.close()?

# Good: Use context manager
with open("data.txt", "w") as f:
    f.write("Hello")
# Automatically closed, even if exception occurs

Mistake 2: Forgetting the Encoding Parameter

Problem: Different operating systems use different default encodings, which can cause Unicode errors.

# Bad: Relies on system default encoding
with open("data.txt", "r") as f:
    content = f.read()  # May fail on some systems

# Good: Explicitly specify encoding
with open("data.txt", "r", encoding="utf-8") as f:
    content = f.read()  # Works consistently everywhere

Mistake 3: Reading Entire Large Files into Memory

Problem: Loading huge files consumes all available memory.

# Bad: Loads entire file into memory
with open("huge_log.txt", "r") as f:
    lines = f.readlines()  # Could use GBs of RAM!

# Good: Process line by line
with open("huge_log.txt", "r") as f:
    for line in f:  # Only one line in memory at a time
        process(line.strip())

Mistake 4: Not Handling `FileNotFoundError`

Problem: Assuming a file exists without checking first.

# Bad: Crashes if file doesn't exist
with open("config.txt", "r") as f:
    config = f.read()

# Good: Handle the exception or check first
from pathlib import Path

config_file = Path("config.txt")
if config_file.exists():
    with open(config_file, "r", encoding="utf-8") as f:
        config = f.read()
else:
    config = "default_config"

# Or use try/except
try:
    with open("config.txt", "r", encoding="utf-8") as f:
        config = f.read()
except FileNotFoundError:
    config = "default_config"

Mistake 5: Text Mode vs. Binary Mode Confusion

Problem: Using text mode for binary files or vice versa can corrupt data.

# Bad: Opening a binary file in text mode
with open("image.png", "r") as f:
    data = f.read()  # May raise UnicodeDecodeError

# Good: Use binary mode for non-text files
with open("image.png", "rb") as f:
    data = f.read()

# Bad: Using binary mode for text files
with open("data.txt", "rb") as f:
    text = f.read()  # Returns bytes, not str

# Good: Use text mode for text files
with open("data.txt", "r", encoding="utf-8") as f:
    text = f.read()  # Returns str

Practice Exercises

Exercise 1: File Word Counter

Write a function that reads a text file and returns a dictionary with word counts.

def count_words(filename):
    word_counts = {}
    with open(filename, "r", encoding="utf-8") as f:
        for line in f:
            words = line.strip().split()
            for word in words:
                word = word.lower().strip(".,!?;:'\"")
                word_counts[word] = word_counts.get(word, 0) + 1
    return word_counts

# Test
result = count_words("poem.txt")
for word, count in sorted(result.items(), key=lambda x: x[1], reverse=True):
    print(f"{word}: {count}")

Exercise 2: CSV Data Processor

Write a function that reads a CSV file and calculates the average of a numeric column.

import csv

def average_column(filename, column_name):
    with open(filename, "r", newline="", encoding="utf-8") as f:
        reader = csv.DictReader(f)
        values = []
        for row in reader:
            try:
                values.append(float(row[column_name]))
            except (ValueError, KeyError):
                continue

        if not values:
            return 0
        return sum(values) / len(values)

# Test with employees.csv
avg_salary = average_column("employees.csv", "Salary")
print(f"Average salary: ${avg_salary:,.2f}")

Exercise 3: JSON Configuration Manager

Write a class that manages configuration settings stored in a JSON file.

import json
from pathlib import Path

class ConfigManager:
    def __init__(self, config_file="config.json"):
        self.config_file = Path(config_file)
        self.config = self._load()

    def _load(self):
        if self.config_file.exists():
            with open(self.config_file, "r", encoding="utf-8") as f:
                return json.load(f)
        return {}

    def save(self):
        with open(self.config_file, "w", encoding="utf-8") as f:
            json.dump(self.config, f, indent=2)

    def get(self, key, default=None):
        return self.config.get(key, default)

    def set(self, key, value):
        self.config[key] = value
        self.save()

    def delete(self, key):
        if key in self.config:
            del self.config[key]
            self.save()

# Usage
config = ConfigManager()
config.set("theme", "dark")
config.set("language", "en")
config.set("font_size", 14)

print(config.get("theme"))       # dark
print(config.get("language"))    # en
print(config.get("missing", "default"))  # default

Key Takeaways

Always use with statements to ensure files are properly closed, even when exceptions occur.
Specify encoding explicitly (encoding="utf-8") for consistent behavior across operating systems.
Use line-by-line iteration for large files to avoid memory issues.
Use pathlib for modern, object-oriented path manipulation — it's cleaner than os.path.
Use csv.DictReader and csv.DictWriter when working with CSV files that have headers.
Use json.dump() and json.load() for file-based JSON operations, json.dumps() and json.loads() for string-based operations.
Always handle FileNotFoundError when reading files that may not exist.
Use tempfile module for temporary files to ensure automatic cleanup.

Python File Handling — Reading, Writing, and Managing Files

Python File Handling — Reading, Writing, and Managing Files

Learning Objectives

Opening Files

The open() Function

File Modes

Binary Modes

The Encoding Parameter

Context Managers (with Statement)

Why Use with Statement

Auto-Closing Files

Multiple Files in One with

Reading Files

read() — Entire File

read(n) — N Characters at a Time

readline() — One Line at a Time

readlines() — List of Lines

Iterating Line by Line (Memory Efficient)

Writing Files

write() — Write a String

writelines() — Write a List of Strings

Truncating Files

Working with Paths

os.path Module

pathlib Module (Modern Approach)

Creating, Checking, and Joining Paths

File Operations

Renaming Files

Copying Files

Deleting Files

Creating Directories

Listing Directory Contents

Working with CSV

CSV Module Basics

Reading CSV Files

Writing CSV Files

DictReader and DictWriter

Working with JSON

json.load() and json.dump()

json.loads() and json.dumps()

Pretty Printing

Custom Encoders and Decoders

Temporary Files

NamedTemporaryFile

TemporaryDirectory

File Locking (Preview)

fcntl on Unix

msvcrt on Windows

Common Mistakes

Mistake 1: Not Closing Files

Mistake 2: Forgetting the Encoding Parameter

Mistake 3: Reading Entire Large Files into Memory

Mistake 4: Not Handling FileNotFoundError

Mistake 5: Text Mode vs. Binary Mode Confusion

Practice Exercises

Exercise 1: File Word Counter

Exercise 2: CSV Data Processor

Exercise 3: JSON Configuration Manager

Key Takeaways

Need Expert Python Help?

The `open()` Function

Why Use `with` Statement

Multiple Files in One `with`

`read()` — Entire File

`read(n)` — N Characters at a Time

`readline()` — One Line at a Time

`readlines()` — List of Lines

`write()` — Write a String

`writelines()` — Write a List of Strings

`os.path` Module

`pathlib` Module (Modern Approach)

`json.load()` and `json.dump()`

`json.loads()` and `json.dumps()`

`NamedTemporaryFile`

`TemporaryDirectory`

`fcntl` on Unix

`msvcrt` on Windows

Mistake 4: Not Handling `FileNotFoundError`