Python Sets — Mathematical Operations & Patterns
Sets are unordered collections of unique elements. They support mathematical set operations and are invaluable for deduplication and membership testing.
Learning Objectives
- Perform union, intersection, difference, and symmetric difference
- Use frozenset for immutable sets and dict keys
- Apply sets for deduplication and fast lookups
- Understand set performance characteristics
Set Operations
a = {1, 2, 3, 4, 5}
b = {4, 5, 6, 7, 8}
# Union — all elements from both sets
print(a | b) # {1, 2, 3, 4, 5, 6, 7, 8}
print(a.union(b)) # Same result
# Intersection — common elements
print(a & b) # {4, 5}
print(a.intersection(b))
# Difference — elements in a but not b
print(a - b) # {1, 2, 3}
print(a.difference(b))
# Symmetric difference — elements in either but not both
print(a ^ b) # {1, 2, 3, 6, 7, 8}
print(a.symmetric_difference(b))
Subset and Superset
small = {1, 2}
large = {1, 2, 3, 4, 5}
print(small.issubset(large)) # True
print(small <= large) # True
print(large.issuperset(small)) # True
print(large >= small) # True
# Proper subset (strictly smaller)
print(small < large) # True
print({1, 2} < {1, 2}) # False (not strict)
frozenset — Immutable Sets
# frozenset can be used as dict key or set element
fs = frozenset([1, 2, 3])
# Use as dict key
permissions = {
frozenset(["read"]): "viewer",
frozenset(["read", "write"]): "editor",
frozenset(["read", "write", "admin"]): "admin"
}
user_perms = frozenset(["read", "write"])
role = permissions.get(user_perms, "unknown")
Practical Patterns
# Deduplication preserving order
items = ["apple", "banana", "apple", "cherry", "banana"]
unique = list(dict.fromkeys(items)) # ['apple', 'banana', 'cherry']
# Find common elements between lists
list_a = [1, 2, 3, 4, 5]
list_b = [4, 5, 6, 7, 8]
common = list(set(list_a) & set(list_b)) # [4, 5]
# Remove None values
data = [1, None, 2, None, 3]
clean = [x for x in data if x is not None]
# Check if all items in list are unique
def all_unique(items):
return len(items) == len(set(items))
# Set-based fast lookup
valid_codes = {200, 201, 204, 301, 302, 304}
status = 200
if status in valid_codes:
print("Success")
Performance Comparison
import time
large_list = list(range(100000))
large_set = set(range(100000))
# Membership test: set is dramatically faster
start = time.time()
99999 in large_list
list_time = time.time() - start # ~2ms
start = time.time()
99999 in large_set
set_time = time.time() - start # ~0.001ms
Key Takeaways
- Sets are O(1) for membership testing vs O(n) for lists
- Use
|,&,-,^for set operations frozensetis hashable — use as dict keys- Sets automatically remove duplicates
- Use sets for fast deduplication and membership checks