Python File Handling — Reading, Writing, and Managing Files
Learning Objectives
By the end of this tutorial, you will be able to:
- Open files using the
open()function and understand all file modes - Use context managers (
withstatement) to safely handle files - Read files using
read(),readline(),readlines(), and iteration - Write data to files with
write()andwritelines() - Work with file paths using
os.pathandpathlib - Perform common file operations like renaming, copying, and deleting
- Read and write CSV and JSON files
- Use temporary files safely
- Avoid common file handling mistakes
Opening Files
Python's built-in open() function is the gateway to file operations. It returns a file object that you can use to read from or write to the file.
The open() Function
file = open("example.txt", "r")
content = file.read()
file.close()
The open() function takes two primary arguments: the file path and the mode. The mode determines how you interact with the file — whether you are reading, writing, or appending data.
File Modes
| Mode | Description | File Must Exist | Creates File | Position |
|---|---|---|---|---|
'r' | Read (text) | Yes | No | Start |
'w' | Write (text) | No | Yes (overwrites) | Start |
'a' | Append (text) | No | Yes | End |
'r+' | Read + Write (text) | Yes | No | Start |
'w+' | Write + Read (text) | No | Yes (overwrites) | Start |
'a+' | Append + Read (text) | No | Yes | End |
'x' | Exclusive create | No | Yes (fails if exists) | Start |
# Reading a file (default mode is 'r')
f = open("data.txt", "r")
# Writing to a file (overwrites existing content)
f = open("output.txt", "w")
f.write("Hello, World!")
f.close()
# Appending to a file
f = open("log.txt", "a")
f.write("New log entry\n")
f.close()
# Creating a new file (fails if file already exists)
f = open("new_file.txt", "x")
f.write("Created for the first time")
f.close()
# This will raise a FileExistsError
# f = open("new_file.txt", "x") # FileExistsError
Binary Modes
Binary modes are used for non-text files like images, audio, or any data that should be treated as raw bytes rather than text.
# Reading a binary file (e.g., an image)
with open("photo.jpg", "rb") as f:
data = f.read()
print(type(data)) # <class 'bytes'>
print(len(data)) # Size in bytes
# Writing binary data
with open("output.bin", "wb") as f:
f.write(b"\x00\x01\x02\x03\x04")
The Encoding Parameter
When working with text files, encoding determines how characters are stored as bytes. UTF-8 is the most common encoding and is the default in Python 3.
# Explicitly specifying encoding (recommended)
with open("notes.txt", "r", encoding="utf-8") as f:
content = f.read()
# Writing with specific encoding
with open("unicode.txt", "w", encoding="utf-8") as f:
f.write("Caf\u00e9 r\u00e9sum\u00e9")
# Using a different encoding
with open("legacy.txt", "r", encoding="latin-1") as f:
content = f.read()
Context Managers (with Statement)
The with statement ensures that files are properly closed after the block of code exits, even if an exception occurs. This is the recommended way to handle files in Python.
Why Use with Statement
Without with, you must manually close files, which is error-prone:
# Bad: Manual file closing (not recommended)
f = open("data.txt", "r")
try:
content = f.read()
# If an exception occurs here, f.close() is never called
finally:
f.close()
# Good: Using context manager
with open("data.txt", "r") as f:
content = f.read()
# File is automatically closed here, even if an exception occurred
Auto-Closing Files
The with statement calls __enter__ when entering the block and __exit__ when leaving it. The __exit__ method of file objects calls close().
with open("example.txt", "w") as f:
f.write("This file will be closed automatically")
print(f.closed) # False — still inside the block
print(f.closed) # True — automatically closed
Multiple Files in One with
You can handle multiple files in a single with statement:
with open("input.txt", "r") as infile, open("output.txt", "w") as outfile:
for line in infile:
outfile.write(line.upper())
# Or using parenthesized context manager (Python 3.10+)
with (
open("input.txt", "r") as infile,
open("output.txt", "w") as outfile,
):
for line in infile:
outfile.write(line.upper())
Reading Files
Python provides several methods to read file content, each suited for different use cases.
read() — Entire File
with open("poem.txt", "r") as f:
content = f.read()
print(content)
Output:
Roses are red,
Violets are blue,
Python is awesome,
And so are you.
# Reading a file and printing its content
with open("data.txt", "r") as f:
whole_file = f.read()
print(f"File content:\n{whole_file}")
print(f"Total characters: {len(whole_file)}")
read(n) — N Characters at a Time
with open("data.txt", "r") as f:
chunk1 = f.read(5) # Read first 5 characters
chunk2 = f.read(5) # Read next 5 characters
print(f"Chunk 1: '{chunk1}'")
print(f"Chunk 2: '{chunk2}'")
Output:
Chunk 1: 'Hello'
Chunk 2: ', Wor'
readline() — One Line at a Time
with open("fruits.txt", "r") as f:
line1 = f.readline()
line2 = f.readline()
print(f"Line 1: {line1.strip()}") # strip() removes trailing newline
print(f"Line 2: {line2.strip()}")
Output:
Line 1: Apple
Line 2: Banana
readlines() — List of Lines
with open("fruits.txt", "r") as f:
lines = f.readlines()
print(lines)
print(f"Total lines: {len(lines)}")
Output:
['Apple\n', 'Banana\n', 'Cherry\n', 'Date\n']
Total lines: 4
Iterating Line by Line (Memory Efficient)
This is the most memory-efficient way to read large files, as only one line is loaded into memory at a time.
with open("large_file.txt", "r") as f:
for line in f:
print(line.strip()) # strip() removes the trailing newline
# Counting lines in a file
with open("data.txt", "r") as f:
line_count = sum(1 for line in f)
print(f"File has {line_count} lines")
# Finding lines containing a keyword
with open("logs.txt", "r") as f:
error_lines = [line.strip() for line in f if "ERROR" in line]
print(f"Found {len(error_lines)} error lines")
Writing Files
write() — Write a String
with open("output.txt", "w") as f:
bytes_written = f.write("First line\n")
f.write("Second line\n")
f.write(f"Third line\n")
print(f"Bytes written: {bytes_written}")
# Writing formatted data
with open("report.txt", "w", encoding="utf-8") as f:
f.write("=" * 40 + "\n")
f.write("SALES REPORT\n")
f.write("=" * 40 + "\n\n")
f.write(f"Total Sales: $12,450.00\n")
f.write(f"Items Sold: 342\n")
f.write(f"Average: $36.40\n")
writelines() — Write a List of Strings
lines = ["Line 1\n", "Line 2\n", "Line 3\n"]
with open("output.txt", "w") as f:
f.writelines(lines)
Important: writelines() does not add newlines. You must include \n in each string.
# writelines does NOT add newlines automatically
fruits = ["apple", "banana", "cherry"]
with open("fruits.txt", "w") as f:
f.writelines(fruits) # Results in: "applebananacherry"
# Correct approach — include newlines
with open("fruits.txt", "w") as f:
f.writelines(f"{fruit}\n" for fruit in fruits)
Truncating Files
You can truncate a file to a specific size using truncate():
with open("data.txt", "r+") as f:
content = f.read()
print(f"Original size: {len(content)} characters")
f.seek(0) # Go back to the start
f.truncate(10) # Keep only first 10 characters
Working with Paths
os.path Module
The os.path module provides functions for common path operations:
import os
# Joining path components
path = os.path.join("folder", "subfolder", "file.txt")
print(path) # folder/subfolder/file.txt (on Unix)
# folder\subfolder\file.txt (on Windows)
# Getting absolute path
abs_path = os.path.abspath("data.txt")
print(abs_path) # C:\Users\user\project\data.txt
# Getting filename and extension
filename = os.path.basename("/home/user/report.pdf")
print(filename) # report.pdf
dirname = os.path.dirname("/home/user/report.pdf")
print(dirname) # /home/user
name, ext = os.path.splitext("report.pdf")
print(name, ext) # report .pdf
# Checking file existence
if os.path.exists("config.txt"):
print("Config file found!")
else:
print("Config file not found.")
# Checking file type
print(os.path.isfile("data.txt")) # True if it's a file
print(os.path.isdir("folder")) # True if it's a directory
pathlib Module (Modern Approach)
pathlib provides an object-oriented interface to filesystem paths. It is the recommended approach in modern Python.
from pathlib import Path
# Creating Path objects
p = Path("folder") / "subfolder" / "file.txt"
print(p) # folder/subfolder/file.txt
# Path components
path = Path("/home/user/documents/report.pdf")
print(path.name) # report.pdf
print(path.stem) # report
print(path.suffix) # .pdf
print(path.parent) # /home/user/documents
# Absolute path
abs_path = Path("data.txt").resolve()
print(abs_path)
# Checking existence
if Path("config.txt").exists():
print("Config file exists!")
if Path("data.txt").is_file():
print("It's a file!")
if Path("my_folder").is_dir():
print("It's a directory!")
Creating, Checking, and Joining Paths
from pathlib import Path
# Create a Path object
base = Path("project")
data_dir = base / "data"
output_file = data_dir / "results.csv"
print(data_dir) # project/data
print(output_file) # project/data/results.csv
# Check before creating
if not data_dir.exists():
data_dir.mkdir(parents=True) # parents=True creates parent dirs too
print(f"Created: {data_dir}")
File Operations
Renaming Files
import os
from pathlib import Path
# Using os.rename
os.rename("old_name.txt", "new_name.txt")
# Using pathlib (recommended)
path = Path("old_name.txt")
path.rename("new_name.txt")
Copying Files
import shutil
# Copy a file
shutil.copy("source.txt", "destination.txt")
# Copy with metadata (permissions, timestamps)
shutil.copy2("source.txt", "destination.txt")
# Copy an entire directory
shutil.copytree("source_dir", "destination_dir")
Deleting Files
import os
from pathlib import Path
# Using os.remove
os.remove("file_to_delete.txt")
# Using pathlib
Path("file_to_delete.txt").unlink()
# Deleting a directory and all its contents
import shutil
shutil.rmtree("directory_to_delete")
# Safe deletion (only if exists)
Path("maybe_exists.txt").unlink(missing_ok=True)
Creating Directories
import os
from pathlib import Path
# Create a single directory
os.mkdir("new_folder")
# Create directory and any missing parents
os.makedirs("path/to/new/folder", exist_ok=True)
# Using pathlib
Path("new_folder").mkdir(exist_ok=True)
Path("path/to/new/folder").mkdir(parents=True, exist_ok=True)
Listing Directory Contents
import os
from pathlib import Path
# List all items in a directory
items = os.listdir("my_folder")
print(items)
# Using pathlib
path = Path("my_folder")
for item in path.iterdir():
print(item.name, "is", "dir" if item.is_dir() else "file")
# Glob pattern matching
py_files = list(Path(".").glob("*.py"))
print(py_files)
# Recursive glob
all_py = list(Path(".").rglob("*.py"))
print(all_py)
Working with CSV
CSV Module Basics
The csv module provides built-in support for reading and writing CSV files.
import csv
# Basic CSV reading
with open("data.csv", "r", newline="", encoding="utf-8") as f:
reader = csv.reader(f)
for row in reader:
print(row)
Reading CSV Files
import csv
# Reading CSV into a list of lists
with open("employees.csv", "r", newline="", encoding="utf-8") as f:
reader = csv.reader(f)
header = next(reader) # Skip header
data = list(reader)
print(f"Header: {header}")
print(f"Rows: {len(data)}")
for row in data[:3]: # Print first 3 rows
print(row)
Output:
Header: ['Name', 'Department', 'Salary']
Rows: 5
['Alice', 'Engineering', '85000']
['Bob', 'Marketing', '72000']
['Charlie', 'Engineering', '90000']
Writing CSV Files
import csv
# Writing CSV from a list of lists
data = [
["Name", "Age", "City"],
["Alice", "30", "New York"],
["Bob", "25", "San Francisco"],
["Charlie", "35", "Chicago"],
]
with open("people.csv", "w", newline="", encoding="utf-8") as f:
writer = csv.writer(f)
writer.writerows(data)
DictReader and DictWriter
import csv
# DictReader — each row is a dictionary
with open("employees.csv", "r", newline="", encoding="utf-8") as f:
reader = csv.DictReader(f)
for row in reader:
print(f"{row['Name']} works in {row['Department']}")
# DictWriter — write from dictionaries
employees = [
{"Name": "Alice", "Department": "Engineering", "Salary": "85000"},
{"Name": "Bob", "Department": "Marketing", "Salary": "72000"},
]
with open("output.csv", "w", newline="", encoding="utf-8") as f:
fieldnames = ["Name", "Department", "Salary"]
writer = csv.DictWriter(f, fieldnames=fieldnames)
writer.writeheader()
writer.writerows(employees)
Working with JSON
json.load() and json.dump()
These functions work with file objects directly.
import json
# Writing JSON to a file
data = {
"name": "Alice",
"age": 30,
"skills": ["Python", "SQL", "JavaScript"],
"address": {
"city": "New York",
"zip": "10001"
}
}
with open("person.json", "w", encoding="utf-8") as f:
json.dump(data, f, indent=2)
# Reading JSON from a file
with open("person.json", "r", encoding="utf-8") as f:
loaded_data = json.load(f)
print(loaded_data["name"]) # Alice
print(loaded_data["skills"]) # ['Python', 'SQL', 'JavaScript']
json.loads() and json.dumps()
These functions work with strings instead of files.
import json
# Convert Python dict to JSON string
data = {"name": "Bob", "age": 25}
json_string = json.dumps(data)
print(json_string) # {"name": "Bob", "age": 25}
print(type(json_string)) # <class 'str'>
# Convert JSON string to Python dict
parsed = json.loads(json_string)
print(parsed["name"]) # Bob
Pretty Printing
import json
data = {"users": [{"name": "Alice", "age": 30}, {"name": "Bob", "age": 25}]}
# Compact output
compact = json.dumps(data)
print(compact)
# {"users": [{"name": "Alice", "age": 30}, {"name": "Bob", "age": 25}]}
# Pretty printed output
pretty = json.dumps(data, indent=4)
print(pretty)
# {
# "users": [
# {
# "name": "Alice",
# "age": 30
# },
# {
# "name": "Bob",
# "age": 25
# }
# ]
# }
# Sorted keys
sorted_json = json.dumps(data, indent=2, sort_keys=True)
print(sorted_json)
Custom Encoders and Decoders
import json
from datetime import datetime, date
# Custom encoder for objects that aren't JSON serializable
class DateEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, (datetime, date)):
return obj.isoformat()
return super().default(obj)
data = {
"event": "Conference",
"date": datetime.now()
}
serialized = json.dumps(data, cls=DateEncoder, indent=2)
print(serialized)
# {
# "event": "Conference",
# "date": "2025-01-15T10:30:00.123456"
# }
# Custom decoder
def date_decoder(dct):
for key, value in dct.items():
if isinstance(value, str):
try:
dct[key] = datetime.fromisoformat(value)
except (ValueError, TypeError):
pass
return dct
parsed = json.loads(serialized, object_hook=date_decoder)
print(parsed["date"]) # datetime object
Temporary Files
The tempfile module creates temporary files and directories that are automatically cleaned up.
NamedTemporaryFile
import tempfile
# Creates a temporary file with a name
with tempfile.NamedTemporaryFile(mode="w", suffix=".txt", delete=True) as f:
f.write("Temporary data")
print(f"File name: {f.name}")
# File is deleted when the block exits
# Keeping the file after closing
temp = tempfile.NamedTemporaryFile(mode="w", suffix=".txt", delete=False)
temp.write("Persistent temp data")
temp.close()
print(f"Temp file: {temp.name}")
# You must delete it manually: os.unlink(temp.name)
TemporaryDirectory
import tempfile
import os
# Creates a temporary directory
with tempfile.TemporaryDirectory() as tmpdir:
print(f"Temp directory: {tmpdir}")
# Create files inside it
filepath = os.path.join(tmpdir, "data.txt")
with open(filepath, "w") as f:
f.write("Temporary content")
# Process the files
with open(filepath, "r") as f:
print(f.read())
# Directory and all contents are automatically deleted
File Locking (Preview)
File locking prevents multiple processes from accessing the same file simultaneously, which can prevent data corruption.
fcntl on Unix
import fcntl
with open("shared.txt", "r+") as f:
try:
fcntl.flock(f, fcntl.LOCK_EX | fcntl.LOCK_NB) # Exclusive lock, non-blocking
# Critical section — only one process at a time
data = f.read()
f.seek(0)
f.write(data + "Updated by process 1\n")
except BlockingIOError:
print("File is locked by another process")
finally:
fcntl.flock(f, fcntl.LOCK_UN) # Release the lock
msvcrt on Windows
import msvcrt
with open("shared.txt", "r+") as f:
try:
msvcrt.locking(f.fileno(), msvcrt.LK_NBLCK, 1) # Lock 1 byte
# Critical section
data = f.read()
except OSError:
print("File is locked by another process")
finally:
try:
msvcrt.locking(f.fileno(), msvcrt.LK_UNLCK, 1) # Release lock
except OSError:
pass
Common Mistakes
Mistake 1: Not Closing Files
Problem: Forgetting to close files leads to resource leaks and potential data loss.
# Bad: File never closed
f = open("data.txt", "w")
f.write("Hello")
# What if an exception happens before f.close()?
# Good: Use context manager
with open("data.txt", "w") as f:
f.write("Hello")
# Automatically closed, even if exception occurs
Mistake 2: Forgetting the Encoding Parameter
Problem: Different operating systems use different default encodings, which can cause Unicode errors.
# Bad: Relies on system default encoding
with open("data.txt", "r") as f:
content = f.read() # May fail on some systems
# Good: Explicitly specify encoding
with open("data.txt", "r", encoding="utf-8") as f:
content = f.read() # Works consistently everywhere
Mistake 3: Reading Entire Large Files into Memory
Problem: Loading huge files consumes all available memory.
# Bad: Loads entire file into memory
with open("huge_log.txt", "r") as f:
lines = f.readlines() # Could use GBs of RAM!
# Good: Process line by line
with open("huge_log.txt", "r") as f:
for line in f: # Only one line in memory at a time
process(line.strip())
Mistake 4: Not Handling FileNotFoundError
Problem: Assuming a file exists without checking first.
# Bad: Crashes if file doesn't exist
with open("config.txt", "r") as f:
config = f.read()
# Good: Handle the exception or check first
from pathlib import Path
config_file = Path("config.txt")
if config_file.exists():
with open(config_file, "r", encoding="utf-8") as f:
config = f.read()
else:
config = "default_config"
# Or use try/except
try:
with open("config.txt", "r", encoding="utf-8") as f:
config = f.read()
except FileNotFoundError:
config = "default_config"
Mistake 5: Text Mode vs. Binary Mode Confusion
Problem: Using text mode for binary files or vice versa can corrupt data.
# Bad: Opening a binary file in text mode
with open("image.png", "r") as f:
data = f.read() # May raise UnicodeDecodeError
# Good: Use binary mode for non-text files
with open("image.png", "rb") as f:
data = f.read()
# Bad: Using binary mode for text files
with open("data.txt", "rb") as f:
text = f.read() # Returns bytes, not str
# Good: Use text mode for text files
with open("data.txt", "r", encoding="utf-8") as f:
text = f.read() # Returns str
Practice Exercises
Exercise 1: File Word Counter
Write a function that reads a text file and returns a dictionary with word counts.
def count_words(filename):
word_counts = {}
with open(filename, "r", encoding="utf-8") as f:
for line in f:
words = line.strip().split()
for word in words:
word = word.lower().strip(".,!?;:'\"")
word_counts[word] = word_counts.get(word, 0) + 1
return word_counts
# Test
result = count_words("poem.txt")
for word, count in sorted(result.items(), key=lambda x: x[1], reverse=True):
print(f"{word}: {count}")
Exercise 2: CSV Data Processor
Write a function that reads a CSV file and calculates the average of a numeric column.
import csv
def average_column(filename, column_name):
with open(filename, "r", newline="", encoding="utf-8") as f:
reader = csv.DictReader(f)
values = []
for row in reader:
try:
values.append(float(row[column_name]))
except (ValueError, KeyError):
continue
if not values:
return 0
return sum(values) / len(values)
# Test with employees.csv
avg_salary = average_column("employees.csv", "Salary")
print(f"Average salary: ${avg_salary:,.2f}")
Exercise 3: JSON Configuration Manager
Write a class that manages configuration settings stored in a JSON file.
import json
from pathlib import Path
class ConfigManager:
def __init__(self, config_file="config.json"):
self.config_file = Path(config_file)
self.config = self._load()
def _load(self):
if self.config_file.exists():
with open(self.config_file, "r", encoding="utf-8") as f:
return json.load(f)
return {}
def save(self):
with open(self.config_file, "w", encoding="utf-8") as f:
json.dump(self.config, f, indent=2)
def get(self, key, default=None):
return self.config.get(key, default)
def set(self, key, value):
self.config[key] = value
self.save()
def delete(self, key):
if key in self.config:
del self.config[key]
self.save()
# Usage
config = ConfigManager()
config.set("theme", "dark")
config.set("language", "en")
config.set("font_size", 14)
print(config.get("theme")) # dark
print(config.get("language")) # en
print(config.get("missing", "default")) # default
Key Takeaways
- Always use
withstatements to ensure files are properly closed, even when exceptions occur. - Specify encoding explicitly (
encoding="utf-8") for consistent behavior across operating systems. - Use line-by-line iteration for large files to avoid memory issues.
- Use
pathlibfor modern, object-oriented path manipulation — it's cleaner thanos.path. - Use
csv.DictReaderandcsv.DictWriterwhen working with CSV files that have headers. - Use
json.dump()andjson.load()for file-based JSON operations,json.dumps()andjson.loads()for string-based operations. - Always handle
FileNotFoundErrorwhen reading files that may not exist. - Use
tempfilemodule for temporary files to ensure automatic cleanup.