Python Sets — Understanding Unordered Collections
Learning Objectives
By the end of this tutorial, you will be able to:
- Create sets using literals, constructors, and comprehensions
- Perform all mathematical set operations: union, intersection, difference, and symmetric difference
- Test subset, superset, and disjoint relationships between sets
- Modify sets using add, update, remove, discard, pop, and clear
- Understand frozenset and when to use immutable sets
- Write set comprehensions with conditions
- Recognize the performance advantages of sets for membership testing
- Apply sets to solve real-world programming problems like deduplication and data validation
What Are Sets?
A set is an unordered collection of unique, hashable elements. Unlike lists or tuples, sets do not preserve insertion order and automatically eliminate duplicates.
# A simple set
fruits = {"apple", "banana", "cherry"}
print(fruits) # {'banana', 'cherry', 'apple'} (order may vary)
print(type(fruits)) # <class 'set'>
Key Characteristics
| Characteristic | Description |
|---|---|
| Unordered | Elements have no defined position |
| Mutable | Can add and remove elements (frozenset is immutable) |
| No duplicates | Each element appears at most once |
| Heterogeneous | Can contain different data types |
| Unindexable | Cannot access elements by index or slice |
| Hashable elements | Elements must be hashable (immutable types) |
| Fast lookup | O(1) average time for membership testing |
# Duplicates are automatically removed
numbers = {1, 2, 2, 3, 3, 3}
print(numbers) # {1, 2, 3}
# Mixed types work
mixed = {1, "hello", 3.14, True}
print(mixed) # {1, 3.14, 'hello'}
Hashable vs Unhashable Elements
Elements stored in a set must be hashable (immutable). This means you can store strings, numbers, tuples, and frozensets — but not lists, dictionaries, or other sets.
# Valid set elements
valid = {1, "hello", (1, 2, 3), frozenset({4, 5})}
# Invalid — uncommenting raises TypeError
# invalid = {1, [2, 3]} # lists are unhashable
# invalid = {1, {"a": 1}} # dicts are unhashable
# invalid = {1, {2, 3}} # sets are unhashable
Creating Sets
1. Set Literal
Use curly braces — but not for empty sets (that creates a dictionary):
# Set with elements
colors = {"red", "green", "blue"}
print(colors) # {'red', 'green', 'blue'}
# Empty set — must use set(), NOT {}
empty_set = set()
empty_dict = {} # This is a dictionary, not a set!
print(type(empty_set)) # <class 'set'>
print(type(empty_dict)) # <class 'dict'>
2. set() Constructor
Convert any iterable into a set:
# From a list
from_list = set([1, 2, 3, 2, 1])
print(from_list) # {1, 2, 3}
# From a tuple
from_tuple = set((10, 20, 30))
print(from_tuple) # {10, 20, 30}
# From a string (each character becomes an element)
from_string = set("hello")
print(from_string) # {'h', 'e', 'l', 'o'}
# From a range
from_range = set(range(5))
print(from_range) # {0, 1, 2, 3, 4}
# From a dictionary (keys become elements)
from_dict = set({"a": 1, "b": 2, "c": 3})
print(from_dict) # {'a', 'b', 'c'}
3. Set Comprehension
Create sets using comprehension syntax:
# Squares of 0-9
squares = {x**2 for x in range(10)}
print(squares) # {0, 1, 4, 9, 16, 25, 36, 49, 64, 81}
# Even numbers only
evens = {x for x in range(20) if x % 2 == 0}
print(evens) # {0, 2, 4, 6, 8, 10, 12, 14, 16, 18}
# Transform and filter
processed = {x.upper() for x in ["hello", "world", "python"] if len(x) > 4}
print(processed) # {'WORLD', 'PYTHON'}
4. From Strings, Lists, Tuples
# Characters from a string (duplicates removed)
word = "mississippi"
unique_chars = set(word)
print(unique_chars) # {'m', 'i', 's', 'p'}
# Deduplicate a list
names = ["Alice", "Bob", "Alice", "Charlie", "Bob"]
unique_names = set(names)
print(unique_names) # {'Alice', 'Bob', 'Charlie'}
# Convert back to sorted list
unique_sorted = sorted(set(names))
print(unique_sorted) # ['Alice', 'Bob', 'Charlie']
Set Operations
Sets support all the mathematical operations you'd find in set theory. Python provides both operator syntax and method syntax for each operation.
Visual Reference
Set A = {1, 2, 3, 4} Set B = {3, 4, 5, 6}
Union (A | B):
{1, 2, 3, 4, 5, 6}
Intersection (A & B):
{3, 4}
Difference (A - B):
{1, 2}
Symmetric Difference (A ^ B):
{1, 2, 5, 6}
Union (| or union())
Combines all elements from both sets:
A = {1, 2, 3, 4}
B = {3, 4, 5, 6}
# Operator syntax
print(A | B) # {1, 2, 3, 4, 5, 6}
# Method syntax
print(A.union(B)) # {1, 2, 3, 4, 5, 6}
# Union with multiple sets
C = {7, 8}
print(A | B | C) # {1, 2, 3, 4, 5, 6, 7, 8}
# Union with other iterables
print(A.union([5, 6, 7])) # {1, 2, 3, 4, 5, 6, 7}
Intersection (& or intersection())
Elements present in both sets:
A = {1, 2, 3, 4}
B = {3, 4, 5, 6}
# Operator syntax
print(A & B) # {3, 4}
# Method syntax
print(A.intersection(B)) # {3, 4}
# Intersection with multiple sets
C = {4, 5, 6, 7}
print(A & B & C) # {4}
# Intersection with other iterables
print(A.intersection([3, 4, 5, 6])) # {3, 4}
Difference (- or difference())
Elements in the first set but not in the second:
A = {1, 2, 3, 4}
B = {3, 4, 5, 6}
# Operator syntax
print(A - B) # {1, 2}
# Method syntax
print(A.difference(B)) # {1, 2}
# Reverse difference
print(B - A) # {5, 6}
# Difference with multiple sets
C = {2, 3}
print(A - B - C) # {1}
# Difference with other iterables
print(A.difference([3, 4, 5])) # {1, 2}
Symmetric Difference (^ or symmetric_difference())
Elements in either set but not in both:
A = {1, 2, 3, 4}
B = {3, 4, 5, 6}
# Operator syntax
print(A ^ B) # {1, 2, 5, 6}
# Method syntax
print(A.symmetric_difference(B)) # {1, 2, 5, 6}
# Symmetric difference update (in-place)
A ^= B
print(A) # {1, 2, 5, 6}
Visual Summary of Operations
A = {1, 2, 3, 4} B = {3, 4, 5, 6}
┌─────────────────────────────────────────────┐
│ Operation │ Result │
├─────────────────────────────────────────────┤
│ A | B (union) │ {1,2,3,4,5,6} │
│ A & B (intersection) │ {3,4} │
│ A - B (difference) │ {1,2} │
│ B - A (difference) │ {5,6} │
│ A ^ B (sym diff) │ {1,2,5,6} │
└─────────────────────────────────────────────┘
Subset (<= or issubset())
Tests if all elements of one set are in another:
A = {1, 2}
B = {1, 2, 3, 4}
C = {1, 2}
# Operator syntax
print(A <= B) # True
print(B <= A) # False
# Method syntax
print(A.issubset(B)) # True
# A set is a subset of itself
print(A <= A) # True
print(A.issubset(A)) # True
# Proper subset (<)
print(A < B) # True
print(A < A) # False (not proper subset of itself)
Superset (>= or issuperset())
Tests if a set contains all elements of another:
A = {1, 2, 3, 4}
B = {1, 2}
# Operator syntax
print(A >= B) # True
print(B >= A) # False
# Method syntax
print(A.issuperset(B)) # True
# Proper superset (>)
print(A > B) # True
print(A > A) # False
Disjoint (isdisjoint())
Returns True if two sets have no elements in common:
A = {1, 2, 3}
B = {4, 5, 6}
C = {3, 4, 5}
print(A.isdisjoint(B)) # True (no common elements)
print(A.isdisjoint(C)) # False (share element 3)
Modifying Sets
Adding Elements
fruits = {"apple", "banana"}
# add() — adds a single element
fruits.add("cherry")
print(fruits) # {'apple', 'banana', 'cherry'}
# add() does nothing if element already exists
fruits.add("apple")
print(fruits) # {'apple', 'banana', 'cherry'} (no change)
# update() — adds multiple elements from an iterable
fruits.update(["date", "elderberry"])
print(fruits) # {'apple', 'banana', 'cherry', 'date', 'elderberry'}
# update() with sets
fruits.update({"fig", "grape"})
print(fruits) # {'apple', 'banana', 'cherry', 'date', 'elderberry', 'fig', 'grape'}
Removing Elements
colors = {"red", "green", "blue", "yellow"}
# remove() — removes element, raises KeyError if not found
colors.remove("red")
print(colors) # {'green', 'blue', 'yellow'}
# This raises KeyError:
# colors.remove("purple")
# discard() — removes element, does nothing if not found
colors.discard("green")
print(colors) # {'blue', 'yellow'}
colors.discard("purple") # No error!
print(colors) # {'blue', 'yellow'}
# pop() — removes and returns an arbitrary element
popped = colors.pop()
print(popped) # 'blue' (or 'yellow', order not guaranteed)
print(colors) # {'yellow'} (or empty)
# clear() — removes all elements
colors.clear()
print(colors) # set()
remove() vs discard()
| Method | Element exists | Element missing |
|---|---|---|
remove() | Removes element | Raises KeyError |
discard() | Removes element | Does nothing |
s = {1, 2, 3}
# Use remove() when missing element is an error
s.remove(2)
# Use discard() when missing element is expected
s.discard(99) # No error
Set Update Operations
A = {1, 2, 3}
B = {3, 4, 5}
# |= is shorthand for update/union
A |= B
print(A) # {1, 2, 3, 4, 5}
# &= is shorthand for intersection_update
A = {1, 2, 3, 4, 5}
A &= {2, 3, 6}
print(A) # {2, 3}
# -= is shorthand for difference_update
A = {1, 2, 3, 4}
A -= {3, 4, 5}
print(A) # {1, 2}
# ^= is shorthand for symmetric_difference_update
A = {1, 2, 3}
A ^= {2, 3, 4}
print(A) # {1, 4}
Frozenset
A frozenset is an immutable version of set. Once created, it cannot be modified — making it hashable and usable as a dictionary key or element of another set.
# Creating a frozenset
fs = frozenset([1, 2, 3, 4])
print(fs) # frozenset({1, 2, 3, 4})
print(type(fs)) # <class 'frozenset'>
# frozenset is hashable — can be a dictionary key
locations = {
frozenset({"office", "home"}): "commute",
frozenset({"gym", "park"}): "exercise",
}
print(locations[frozenset({"office", "home"})]) # 'commute'
# frozenset can be an element of a set
nested = {frozenset({1, 2}), frozenset({3, 4})}
print(nested) # {frozenset({1, 2}), frozenset({3, 4})}
Frozenset Operations
Frozensets support all read-only set operations but not mutation operations:
fs1 = frozenset({1, 2, 3})
fs2 = frozenset({3, 4, 5})
# These work (return new frozensets)
print(fs1 | fs2) # frozenset({1, 2, 3, 4, 5})
print(fs1 & fs2) # frozenset({3})
print(fs1 - fs2) # frozenset({1, 2})
print(fs1 ^ fs2) # frozenset({1, 2, 4, 5})
print(fs1 <= fs2) # False
print(fs1.issubset(fs2)) # False
# These raise AttributeError:
# fs1.add(4)
# fs1.remove(1)
# fs1.update([5, 6])
When to Use frozenset
| Scenario | Use frozenset? |
|---|---|
| Set needs to be a dictionary key | Yes |
| Set needs to be an element of another set | Yes |
| Set must not be modified after creation | Yes |
| Need to add/remove elements | No — use regular set |
| Need fast, immutable membership testing | Yes |
Set Comprehensions
Set comprehensions create sets using a concise expression, similar to list comprehensions but with curly braces.
Basic Syntax
# {expression for item in iterable}
squares = {x**2 for x in range(10)}
print(squares) # {0, 1, 4, 9, 16, 25, 36, 49, 64, 81}
With Conditions
# Filter: only even numbers
evens = {x for x in range(20) if x % 2 == 0}
print(evens) # {0, 2, 4, 6, 8, 10, 12, 14, 16, 18}
# Transform and filter
long_words = {word.upper() for word in ["hello", "hi", "world", "python", "a"]
if len(word) > 2}
print(long_words) # {'HELLO', 'WORLD', 'PYTHON'}
Nested Iteration
# Cartesian product
pairs = {(x, y) for x in range(3) for y in range(3) if x != y}
print(pairs)
# {(0, 1), (0, 2), (1, 0), (1, 2), (2, 0), (2, 1)}
Practical Examples
# Extract unique file extensions
files = ["report.pdf", "data.csv", "image.png", "backup.pdf", "notes.csv"]
extensions = {f.split(".")[-1] for f in files}
print(extensions) # {'pdf', 'csv', 'png'}
# Unique first letters
words = ["apple", "avocado", "banana", "blueberry", "cherry"]
first_letters = {word[0] for word in words}
print(first_letters) # {'a', 'b', 'c'}
# Character frequency set
sentence = "the quick brown fox"
unique_lengths = {len(word) for word in sentence.split()}
print(unique_lengths) # {3, 5}
Performance
Sets are optimized for membership testing and mathematical operations. Understanding their performance characteristics helps you choose the right data structure.
Time Complexity
| Operation | Set | List | Dict |
|---|---|---|---|
Membership test (x in s) | O(1) average | O(n) | O(1) average |
| Add element | O(1) average | O(1)* | O(1) average |
| Remove element | O(1) average | O(n) | O(1) average |
| Union | O(n + m) | — | — |
| Intersection | O(min(n, m)) | — | — |
| Iteration | O(n) | O(n) | O(n) |
* amortized for append
Membership Testing: Set vs List
import time
# Create test data
large_list = list(range(1_000_000))
large_set = set(range(1_000_000))
# Test membership — list is slow
start = time.time()
for i in range(1000):
_ = 999_999 in large_list
list_time = time.time() - start
# Test membership — set is fast
start = time.time()
for i in range(1000):
_ = 999_999 in large_set
set_time = time.time() - start
print(f"List: {list_time:.4f}s")
print(f"Set: {set_time:.4f}s")
# Set is typically 100-1000x faster for large collections
Memory Considerations
Sets use more memory than lists due to hash table overhead. Use sets when fast lookup matters more than memory:
import sys
lst = list(range(1000))
s = set(range(1000))
print(f"List size: {sys.getsizeof(lst):,} bytes") # ~8,056 bytes
print(f"Set size: {sys.getsizeof(s):,} bytes") # ~32,768 bytes
Real-World Use Cases
1. Removing Duplicates
# Deduplicate a list while preserving order
def deduplicate(items):
seen = set()
result = []
for item in items:
if item not in seen:
seen.add(item)
result.append(item)
return result
users = ["Alice", "Bob", "Alice", "Charlie", "Bob", "David"]
print(deduplicate(users))
# ['Alice', 'Bob', 'Charlie', 'David']
2. Membership Testing
# Validate user input against allowed values
VALID_STATUSES = {"active", "inactive", "pending", "suspended"}
def validate_status(status):
if status not in VALID_STATUSES:
raise ValueError(f"Invalid status: {status}")
return status
print(validate_status("active")) # 'active'
# validate_status("deleted") # Raises ValueError
3. Finding Common Elements
# Students enrolled in both courses
course_a = {"Alice", "Bob", "Charlie", "Diana"}
course_b = {"Bob", "Diana", "Eve", "Frank"}
both_courses = course_a & course_b
print(both_courses) # {'Bob', 'Diana'}
either_course = course_a | course_b
print(either_course) # {'Alice', 'Bob', 'Charlie', 'Diana', 'Eve', 'Frank'}
only_a = course_a - course_b
print(only_a) # {'Alice', 'Charlie'}
4. Venn Diagram Analysis
# Analyzing overlapping categories
frontend = {"HTML", "CSS", "JavaScript", "React"}
backend = {"Python", "SQL", "JavaScript", "Docker"}
devops = {"Docker", "Kubernetes", "AWS", "Python"}
# Skills unique to each area
only_frontend = frontend - backend - devops
only_backend = backend - frontend - devops
only_devops = devops - frontend - backend
# Skills shared across all three
common_all = frontend & backend & devops
print(f"Frontend only: {only_frontend}")
print(f"Backend only: {only_backend}")
print(f"DevOps only: {only_devops}")
print(f"All three: {common_all}")
# All three: {'Python', 'Docker'}
5. Data Validation and Cleaning
# Remove invalid entries
required_fields = {"name", "email", "password"}
submitted_data = {"name": "Alice", "email": "alice@example.com"}
missing = required_fields - set(submitted_data.keys())
if missing:
print(f"Missing fields: {missing}")
# Missing fields: {'password'}
# Find duplicate IDs
user_ids = [101, 202, 303, 101, 202, 404]
seen = set()
duplicates = set()
for uid in user_ids:
if uid in seen:
duplicates.add(uid)
seen.add(uid)
print(f"Duplicate IDs: {duplicates}")
# Duplicate IDs: {102, 202}
6. Permission Checking
# Check what a user can access
admin_perms = {"read", "write", "delete", "admin"}
user_perms = {"read", "write"}
guest_perms = {"read"}
def check_access(user_role, required_permission):
role_perms = {
"admin": admin_perms,
"user": user_perms,
"guest": guest_perms,
}
return required_permission in role_perms.get(user_role, set())
print(check_access("admin", "delete")) # True
print(check_access("user", "delete")) # False
print(check_access("guest", "read")) # True
Set vs Other Types
set vs frozenset
| Feature | set | frozenset |
|---|---|---|
| Mutable | Yes | No |
| Hashable | No | Yes |
| Can be dict key | No | Yes |
| Can be set element | No | Yes |
| Supports add/remove | Yes | No |
| Performance | Same | Same |
set vs list
| Feature | set | list |
|---|---|---|
| Ordered | No | Yes |
| Indexable | No | Yes |
| Duplicates | No | Yes |
| Membership test | O(1) | O(n) |
| Mutable | Yes | Yes |
| Use case | Unique items, fast lookup | Ordered data, duplicates |
set vs dict
| Feature | set | dict |
|---|---|---|
| Stores | Keys only | Key-value pairs |
| Lookup by value | O(1) | O(n) |
| Lookup by key | O(1) | O(1) |
| Use case | Unique keys | Mappings |
set vs Counter
| Feature | set | Counter |
|---|---|---|
| Counts occurrences | No | Yes |
| Stores frequencies | No | Yes |
| Use case | Unique items | Frequency analysis |
from collections import Counter
# Counter tracks how many times each element appears
words = ["apple", "banana", "apple", "cherry", "banana", "apple"]
word_counts = Counter(words)
print(word_counts) # Counter({'apple': 3, 'banana': 2, 'cherry': 1})
# Set only knows about existence
unique_words = set(words)
print(unique_words) # {'apple', 'banana', 'cherry'}
Common Mistakes
1. Empty Set Syntax
# WRONG — this creates a dictionary
empty = {}
# CORRECT — use set()
empty = set()
print(type({})) # <class 'dict'>
print(type(set())) # <class 'set'>
2. Unhashable Elements
# WRONG — lists cannot be set elements
# s = {[1, 2], [3, 4]} # TypeError: unhashable type: 'list'
# CORRECT — use tuples or frozensets
s = {(1, 2), (3, 4)}
print(s) # {(1, 2), (3, 4)}
# CORRECT — use frozenset for nested sets
s = {frozenset({1, 2}), frozenset({3, 4})}
print(s) # {frozenset({1, 2}), frozenset({3, 4})}
3. Modifying During Iteration
# WRONG — modifying set while iterating causes RuntimeError
s = {1, 2, 3, 4, 5}
# for x in s:
# if x % 2 == 0:
# s.remove(x) # RuntimeError
# CORRECT — iterate over a copy
s = {1, 2, 3, 4, 5}
for x in s.copy():
if x % 2 == 0:
s.remove(x)
print(s) # {1, 3, 5}
# CORRECT — use set comprehension
s = {1, 2, 3, 4, 5}
s = {x for x in s if x % 2 != 0}
print(s) # {1, 3, 5}
4. Order Not Guaranteed
# Sets do NOT guarantee insertion order
s = {3, 1, 4, 1, 5, 9, 2, 6}
print(s) # Order is unpredictable!
# If you need ordered unique elements:
from collections import OrderedDict
ordered = list(OrderedDict.fromkeys([3, 1, 4, 1, 5, 9, 2, 6]))
print(ordered) # [3, 1, 4, 5, 9, 2, 6]
5. Confusing | with ||
# WRONG — || is not a valid operator
# result = set1 || set2 # SyntaxError
# CORRECT — use | or union()
result = set1 | set2
result = set1.union(set2)
Practice Exercises
Exercise 1: Find the Symmetric Difference
Write a function that finds elements present in exactly one of two sets.
def unique_to_each(set1, set2):
"""
Return elements that are in set1 or set2, but not both.
>>> unique_to_each({1, 2, 3}, {3, 4, 5})
{1, 2, 4, 5}
>>> unique_to_each({"a", "b"}, {"a", "b", "c"})
{'c'}
"""
return set1 ^ set2
# Test
print(unique_to_each({1, 2, 3}, {3, 4, 5})) # {1, 2, 4, 5}
print(unique_to_each({"a", "b"}, {"a", "b", "c"})) # {'c'}
Solution:
def unique_to_each(set1, set2):
return set1 ^ set2
# Alternative: (set1 - set2) | (set2 - set1)
# Test cases
assert unique_to_each({1, 2, 3}, {3, 4, 5}) == {1, 2, 4, 5}
assert unique_to_each({"a", "b"}, {"a", "b", "c"}) == {"c"}
assert unique_to_each(set(), set()) == set()
print("All tests passed!")
Exercise 2: Group by First Letter
Write a function that groups words by their first letter using a dictionary of sets.
def group_by_first_letter(words):
"""
Group words by their first letter.
>>> group_by_first_letter(["apple", "banana", "avocado", "blueberry"])
{'a': {'apple', 'avocado'}, 'b': {'banana', 'blueberry'}}
"""
groups = {}
for word in words:
first = word[0]
if first not in groups:
groups[first] = set()
groups[first].add(word)
return groups
# Test
words = ["apple", "banana", "avocado", "blueberry", "cherry", "blueberry"]
print(group_by_first_letter(words))
# {'a': {'apple', 'avocado'}, 'b': {'banana', 'blueberry'}, 'c': {'cherry'}}
Solution:
def group_by_first_letter(words):
groups = {}
for word in words:
first = word[0]
groups.setdefault(first, set()).add(word)
return groups
# Test
result = group_by_first_letter(["apple", "banana", "avocado", "blueberry"])
assert result == {"a": {"apple", "avocado"}, "b": {"banana", "blueberry"}}
print("All tests passed!")
Exercise 3: Set-Based Data Validation
Write a function that validates a dataset against constraints using set operations.
def validate_student_records(records):
"""
Validate student records:
- All required fields present
- No duplicate student IDs
- All grades are valid
Returns list of error messages.
>>> validate_student_records([
... {"id": 1, "name": "Alice", "grade": "A"},
... {"id": 2, "name": "Bob", "grade": "B"},
... ])
[]
"""
required_fields = {"id", "name", "grade"}
valid_grades = {"A", "B", "C", "D", "F"}
errors = []
seen_ids = set()
for i, record in enumerate(records):
# Check required fields
missing = required_fields - set(record.keys())
if missing:
errors.append(f"Record {i}: missing fields {missing}")
# Check duplicate IDs
student_id = record.get("id")
if student_id in seen_ids:
errors.append(f"Record {i}: duplicate ID {student_id}")
seen_ids.add(student_id)
# Check valid grade
grade = record.get("grade")
if grade and grade not in valid_grades:
errors.append(f"Record {i}: invalid grade '{grade}'")
return errors
# Test
records = [
{"id": 1, "name": "Alice", "grade": "A"},
{"id": 2, "name": "Bob", "grade": "B"},
{"id": 1, "name": "Charlie", "grade": "X"}, # duplicate ID, invalid grade
{"id": 3}, # missing fields
]
print(validate_student_records(records))
Solution:
def validate_student_records(records):
required_fields = {"id", "name", "grade"}
valid_grades = {"A", "B", "C", "D", "F"}
errors = []
seen_ids = set()
for i, record in enumerate(records):
missing = required_fields - set(record.keys())
if missing:
errors.append(f"Record {i}: missing fields {missing}")
student_id = record.get("id")
if student_id in seen_ids:
errors.append(f"Record {i}: duplicate ID {student_id}")
seen_ids.add(student_id)
grade = record.get("grade")
if grade and grade not in valid_grades:
errors.append(f"Record {i}: invalid grade '{grade}'")
return errors
# Test
records = [
{"id": 1, "name": "Alice", "grade": "A"},
{"id": 2, "name": "Bob", "grade": "B"},
{"id": 1, "name": "Charlie", "grade": "X"},
{"id": 3},
]
errors = validate_student_records(records)
assert len(errors) == 3
assert "duplicate ID 1" in errors[0]
assert "invalid grade" in errors[1]
assert "missing fields" in errors[2]
print("All tests passed!")
Key Takeaways
- Sets store unique, hashable elements — duplicates are automatically removed
- Use
set()not{}for empty sets —{}creates a dictionary - Set operations mirror mathematics — union (
|), intersection (&), difference (-), symmetric difference (^) - Membership testing is O(1) — sets are vastly faster than lists for
inchecks remove()raises KeyError if element is missing;discard()does not- frozenset is immutable — use it as dictionary keys or set elements
- Set comprehensions use curly braces:
{x for x in range(10)} - Sets don't preserve order — if order matters, use lists or
OrderedDict - Elements must be hashable — no lists, dicts, or other sets as elements
- Use sets for deduplication, membership testing, and mathematical operations