Python Dataclasses — Modern Class Syntax
Dataclasses reduce boilerplate for classes that primarily store data. They auto-generate __init__, __repr__, __eq__, and more using a simple decorator.
Learning Objectives
- Create dataclasses with @dataclass decorator
- Use field() for custom defaults and options
- Create frozen (immutable) dataclasses with slots
- Use post_init for computed fields and validation
- Compare dataclasses with NamedTuple, attrs, and regular classes
Basic Dataclass
The @dataclass decorator automatically generates special methods:
from dataclasses import dataclass
@dataclass
class Point:
x: float
y: float
p = Point(3.0, 4.0)
print(p) # Point(x=3.0, y=4.0)
print(p.x, p.y) # 3.0 4.0
print(p == Point(3.0, 4.0)) # True — auto-generated __eq__
What @dataclass generates automatically:
| Method | Generated | Customizable |
|---|---|---|
__init__ | Yes | Yes |
__repr__ | Yes | Yes |
__eq__ | Yes | Yes |
__hash__ | No (frozen=True) | Yes |
__lt__ | No | Yes |
field() Options
Use field() when you need mutable defaults or custom behavior:
from dataclasses import dataclass, field
from typing import List, Dict
@dataclass
class Student:
name: str
age: int
grades: List[float] = field(default_factory=list)
gpa: float = field(init=False)
tags: Dict[str, str] = field(default_factory=dict)
def __post_init__(self):
if self.grades:
self.gpa = sum(self.grades) / len(self.grades)
else:
self.gpa = 0.0
s = Student("Alice", 20, [3.5, 4.0, 3.8])
print(s.gpa) # 3.766...
print(s.tags) # {}
Field options reference:
| Option | Description |
|---|---|
default | Default value for the field |
default_factory | Factory function for mutable defaults |
init | Include field in __init__ (default: True) |
repr | Include field in __repr__ (default: True) |
compare | Include field in __eq__ and __lt__ (default: True) |
hash | Include field in __hash__ |
metadata | Dict of user-defined data |
Frozen Dataclass (Immutable)
Create immutable instances that can be used as dict keys or set elements:
@dataclass(frozen=True)
class Config:
host: str = "localhost"
port: int = 8080
debug: bool = False
config = Config()
# config.port = 9000 # AttributeError: frozen
# Hashable — can be dict key or set element
configs = {Config(): "default", Config("0.0.0.0", 9000): "custom"}
print(configs[Config()]) # "default"
Slots for Memory Efficiency
Use slots=True to reduce memory usage:
@dataclass(slots=True)
class Vector:
x: float
y: float
v = Vector(1.0, 2.0)
print(v.x, v.y)
# Without slots, each instance has a __dict__ (~400 bytes overhead)
# With slots, memory is reduced by ~40% for many instances
post_init for Computed Fields
Use __post_init__ to compute derived values or validate data:
from dataclasses import dataclass, field
from datetime import datetime
@dataclass
class Event:
name: str
start: datetime
end: datetime
duration_hours: float = field(init=False)
def __post_init__(self):
if self.end <= self.start:
raise ValueError("End must be after start")
delta = self.end - self.start
self.duration_hours = delta.total_seconds() / 3600
event = Event("Meeting", datetime(2024, 1, 1, 9), datetime(2024, 1, 1, 11))
print(event.duration_hours) # 2.0
Inheritance
Dataclasses support inheritance cleanly:
@dataclass
class Animal:
name: str
species: str
@dataclass
class Pet(Animal):
owner: str
vaccinated: bool = True
pet = Pet("Rex", "Dog", "Alice")
print(pet) # Pet(name='Rex', species='Dog', owner='Alice', vaccinated=True)
Warning: Inheritance with frozen=True can be tricky — child classes may need special handling.
Comparison Table
| Feature | Dataclass | NamedTuple | attrs | Regular Class |
|---|---|---|---|---|
| Boilerplate | Low | Low | Medium | High |
| Mutable | Yes | No | Yes | Yes |
| Hashable | Optional | Always | Optional | No |
| Type hints | Required | Required | Required | Optional |
| Inheritance | Yes | Limited | Yes | Yes |
| Performance | Good | Best | Good | Good |
Real-World Example: API Response Model
from dataclasses import dataclass, field
from typing import List, Optional
from datetime import datetime
@dataclass
class APIResponse:
status: int
data: List[dict]
message: str = "OK"
timestamp: datetime = field(default_factory=datetime.now)
metadata: dict = field(default_factory=dict)
@property
def is_success(self) -> bool:
return 200 <= self.status < 300
def add_metadata(self, key: str, value):
self.metadata[key] = value
response = APIResponse(200, [{"id": 1}], "Users retrieved")
response.add_metadata("page", 1)
print(response.is_success) # True
Common Mistakes
| Mistake | Problem | Solution |
|---|---|---|
| Using mutable default | grades: List = [] | Use field(default_factory=list) |
Forgetting init=False | Computed field in __init__ | Set init=False for derived fields |
| Mixing frozen and mutable | frozen=True with mutable fields | Use field(hash=False) or restructure |
| Overcomplicating | Using dataclass for behavior-heavy classes | Use regular class if methods dominate |
Key Takeaways
@dataclassauto-generates__init__,__repr__,__eq__— reducing boilerplate by ~50%- Always use
field(default_factory=...)for mutable defaults (lists, dicts, sets) frozen=Truecreates immutable, hashable instances — perfect for config objects__post_init__enables computed fields and validation without extra__init__code- Use
slots=Truefor memory-efficient instances when storing many objects - Prefer dataclasses over NamedTuple for mutable data containers