Python Sets — Understanding Unordered Collections

Python BasicsSetsFree Lesson

Advertisement

Python Sets — Understanding Unordered Collections

Learning Objectives

By the end of this tutorial, you will be able to:

  • Create sets using literals, constructors, and comprehensions
  • Perform all mathematical set operations: union, intersection, difference, and symmetric difference
  • Test subset, superset, and disjoint relationships between sets
  • Modify sets using add, update, remove, discard, pop, and clear
  • Understand frozenset and when to use immutable sets
  • Write set comprehensions with conditions
  • Recognize the performance advantages of sets for membership testing
  • Apply sets to solve real-world programming problems like deduplication and data validation

What Are Sets?

A set is an unordered collection of unique, hashable elements. Unlike lists or tuples, sets do not preserve insertion order and automatically eliminate duplicates.

# A simple set
fruits = {"apple", "banana", "cherry"}
print(fruits)       # {'banana', 'cherry', 'apple'} (order may vary)
print(type(fruits)) # <class 'set'>

Key Characteristics

CharacteristicDescription
UnorderedElements have no defined position
MutableCan add and remove elements (frozenset is immutable)
No duplicatesEach element appears at most once
HeterogeneousCan contain different data types
UnindexableCannot access elements by index or slice
Hashable elementsElements must be hashable (immutable types)
Fast lookupO(1) average time for membership testing
# Duplicates are automatically removed
numbers = {1, 2, 2, 3, 3, 3}
print(numbers)  # {1, 2, 3}

# Mixed types work
mixed = {1, "hello", 3.14, True}
print(mixed)  # {1, 3.14, 'hello'}

Hashable vs Unhashable Elements

Elements stored in a set must be hashable (immutable). This means you can store strings, numbers, tuples, and frozensets — but not lists, dictionaries, or other sets.

# Valid set elements
valid = {1, "hello", (1, 2, 3), frozenset({4, 5})}

# Invalid — uncommenting raises TypeError
# invalid = {1, [2, 3]}       # lists are unhashable
# invalid = {1, {"a": 1}}     # dicts are unhashable
# invalid = {1, {2, 3}}       # sets are unhashable

Creating Sets

1. Set Literal

Use curly braces — but not for empty sets (that creates a dictionary):

# Set with elements
colors = {"red", "green", "blue"}
print(colors)  # {'red', 'green', 'blue'}

# Empty set — must use set(), NOT {}
empty_set = set()
empty_dict = {}  # This is a dictionary, not a set!

print(type(empty_set))  # <class 'set'>
print(type(empty_dict))  # <class 'dict'>

2. set() Constructor

Convert any iterable into a set:

# From a list
from_list = set([1, 2, 3, 2, 1])
print(from_list)  # {1, 2, 3}

# From a tuple
from_tuple = set((10, 20, 30))
print(from_tuple)  # {10, 20, 30}

# From a string (each character becomes an element)
from_string = set("hello")
print(from_string)  # {'h', 'e', 'l', 'o'}

# From a range
from_range = set(range(5))
print(from_range)  # {0, 1, 2, 3, 4}

# From a dictionary (keys become elements)
from_dict = set({"a": 1, "b": 2, "c": 3})
print(from_dict)  # {'a', 'b', 'c'}

3. Set Comprehension

Create sets using comprehension syntax:

# Squares of 0-9
squares = {x**2 for x in range(10)}
print(squares)  # {0, 1, 4, 9, 16, 25, 36, 49, 64, 81}

# Even numbers only
evens = {x for x in range(20) if x % 2 == 0}
print(evens)  # {0, 2, 4, 6, 8, 10, 12, 14, 16, 18}

# Transform and filter
processed = {x.upper() for x in ["hello", "world", "python"] if len(x) > 4}
print(processed)  # {'WORLD', 'PYTHON'}

4. From Strings, Lists, Tuples

# Characters from a string (duplicates removed)
word = "mississippi"
unique_chars = set(word)
print(unique_chars)  # {'m', 'i', 's', 'p'}

# Deduplicate a list
names = ["Alice", "Bob", "Alice", "Charlie", "Bob"]
unique_names = set(names)
print(unique_names)  # {'Alice', 'Bob', 'Charlie'}

# Convert back to sorted list
unique_sorted = sorted(set(names))
print(unique_sorted)  # ['Alice', 'Bob', 'Charlie']

Set Operations

Sets support all the mathematical operations you'd find in set theory. Python provides both operator syntax and method syntax for each operation.

Visual Reference

    Set A = {1, 2, 3, 4}       Set B = {3, 4, 5, 6}

    Union (A | B):
    {1, 2, 3, 4, 5, 6}

    Intersection (A & B):
    {3, 4}

    Difference (A - B):
    {1, 2}

    Symmetric Difference (A ^ B):
    {1, 2, 5, 6}

Union (| or union())

Combines all elements from both sets:

A = {1, 2, 3, 4}
B = {3, 4, 5, 6}

# Operator syntax
print(A | B)       # {1, 2, 3, 4, 5, 6}

# Method syntax
print(A.union(B))  # {1, 2, 3, 4, 5, 6}

# Union with multiple sets
C = {7, 8}
print(A | B | C)   # {1, 2, 3, 4, 5, 6, 7, 8}

# Union with other iterables
print(A.union([5, 6, 7]))  # {1, 2, 3, 4, 5, 6, 7}

Intersection (& or intersection())

Elements present in both sets:

A = {1, 2, 3, 4}
B = {3, 4, 5, 6}

# Operator syntax
print(A & B)              # {3, 4}

# Method syntax
print(A.intersection(B))  # {3, 4}

# Intersection with multiple sets
C = {4, 5, 6, 7}
print(A & B & C)          # {4}

# Intersection with other iterables
print(A.intersection([3, 4, 5, 6]))  # {3, 4}

Difference (- or difference())

Elements in the first set but not in the second:

A = {1, 2, 3, 4}
B = {3, 4, 5, 6}

# Operator syntax
print(A - B)             # {1, 2}

# Method syntax
print(A.difference(B))   # {1, 2}

# Reverse difference
print(B - A)             # {5, 6}

# Difference with multiple sets
C = {2, 3}
print(A - B - C)         # {1}

# Difference with other iterables
print(A.difference([3, 4, 5]))  # {1, 2}

Symmetric Difference (^ or symmetric_difference())

Elements in either set but not in both:

A = {1, 2, 3, 4}
B = {3, 4, 5, 6}

# Operator syntax
print(A ^ B)                        # {1, 2, 5, 6}

# Method syntax
print(A.symmetric_difference(B))    # {1, 2, 5, 6}

# Symmetric difference update (in-place)
A ^= B
print(A)  # {1, 2, 5, 6}

Visual Summary of Operations

    A = {1, 2, 3, 4}    B = {3, 4, 5, 6}

    ┌─────────────────────────────────────────────┐
    │  Operation              │  Result           │
    ├─────────────────────────────────────────────┤
    │  A | B  (union)         │  {1,2,3,4,5,6}   │
    │  A & B  (intersection)  │  {3,4}            │
    │  A - B  (difference)    │  {1,2}            │
    │  B - A  (difference)    │  {5,6}            │
    │  A ^ B  (sym diff)      │  {1,2,5,6}       │
    └─────────────────────────────────────────────┘

Subset (<= or issubset())

Tests if all elements of one set are in another:

A = {1, 2}
B = {1, 2, 3, 4}
C = {1, 2}

# Operator syntax
print(A <= B)             # True
print(B <= A)             # False

# Method syntax
print(A.issubset(B))      # True

# A set is a subset of itself
print(A <= A)             # True
print(A.issubset(A))      # True

# Proper subset (<)
print(A < B)              # True
print(A < A)              # False (not proper subset of itself)

Superset (>= or issuperset())

Tests if a set contains all elements of another:

A = {1, 2, 3, 4}
B = {1, 2}

# Operator syntax
print(A >= B)              # True
print(B >= A)              # False

# Method syntax
print(A.issuperset(B))    # True

# Proper superset (>)
print(A > B)              # True
print(A > A)              # False

Disjoint (isdisjoint())

Returns True if two sets have no elements in common:

A = {1, 2, 3}
B = {4, 5, 6}
C = {3, 4, 5}

print(A.isdisjoint(B))  # True  (no common elements)
print(A.isdisjoint(C))  # False (share element 3)

Modifying Sets

Adding Elements

fruits = {"apple", "banana"}

# add() — adds a single element
fruits.add("cherry")
print(fruits)  # {'apple', 'banana', 'cherry'}

# add() does nothing if element already exists
fruits.add("apple")
print(fruits)  # {'apple', 'banana', 'cherry'} (no change)

# update() — adds multiple elements from an iterable
fruits.update(["date", "elderberry"])
print(fruits)  # {'apple', 'banana', 'cherry', 'date', 'elderberry'}

# update() with sets
fruits.update({"fig", "grape"})
print(fruits)  # {'apple', 'banana', 'cherry', 'date', 'elderberry', 'fig', 'grape'}

Removing Elements

colors = {"red", "green", "blue", "yellow"}

# remove() — removes element, raises KeyError if not found
colors.remove("red")
print(colors)  # {'green', 'blue', 'yellow'}

# This raises KeyError:
# colors.remove("purple")

# discard() — removes element, does nothing if not found
colors.discard("green")
print(colors)  # {'blue', 'yellow'}

colors.discard("purple")  # No error!
print(colors)  # {'blue', 'yellow'}

# pop() — removes and returns an arbitrary element
popped = colors.pop()
print(popped)   # 'blue' (or 'yellow', order not guaranteed)
print(colors)   # {'yellow'} (or empty)

# clear() — removes all elements
colors.clear()
print(colors)  # set()

remove() vs discard()

MethodElement existsElement missing
remove()Removes elementRaises KeyError
discard()Removes elementDoes nothing
s = {1, 2, 3}

# Use remove() when missing element is an error
s.remove(2)

# Use discard() when missing element is expected
s.discard(99)  # No error

Set Update Operations

A = {1, 2, 3}
B = {3, 4, 5}

# |= is shorthand for update/union
A |= B
print(A)  # {1, 2, 3, 4, 5}

# &= is shorthand for intersection_update
A = {1, 2, 3, 4, 5}
A &= {2, 3, 6}
print(A)  # {2, 3}

# -= is shorthand for difference_update
A = {1, 2, 3, 4}
A -= {3, 4, 5}
print(A)  # {1, 2}

# ^= is shorthand for symmetric_difference_update
A = {1, 2, 3}
A ^= {2, 3, 4}
print(A)  # {1, 4}

Frozenset

A frozenset is an immutable version of set. Once created, it cannot be modified — making it hashable and usable as a dictionary key or element of another set.

# Creating a frozenset
fs = frozenset([1, 2, 3, 4])
print(fs)        # frozenset({1, 2, 3, 4})
print(type(fs))  # <class 'frozenset'>

# frozenset is hashable — can be a dictionary key
locations = {
    frozenset({"office", "home"}): "commute",
    frozenset({"gym", "park"}): "exercise",
}
print(locations[frozenset({"office", "home"})])  # 'commute'

# frozenset can be an element of a set
nested = {frozenset({1, 2}), frozenset({3, 4})}
print(nested)  # {frozenset({1, 2}), frozenset({3, 4})}

Frozenset Operations

Frozensets support all read-only set operations but not mutation operations:

fs1 = frozenset({1, 2, 3})
fs2 = frozenset({3, 4, 5})

# These work (return new frozensets)
print(fs1 | fs2)   # frozenset({1, 2, 3, 4, 5})
print(fs1 & fs2)   # frozenset({3})
print(fs1 - fs2)   # frozenset({1, 2})
print(fs1 ^ fs2)   # frozenset({1, 2, 4, 5})
print(fs1 <= fs2)  # False
print(fs1.issubset(fs2))  # False

# These raise AttributeError:
# fs1.add(4)
# fs1.remove(1)
# fs1.update([5, 6])

When to Use frozenset

ScenarioUse frozenset?
Set needs to be a dictionary keyYes
Set needs to be an element of another setYes
Set must not be modified after creationYes
Need to add/remove elementsNo — use regular set
Need fast, immutable membership testingYes

Set Comprehensions

Set comprehensions create sets using a concise expression, similar to list comprehensions but with curly braces.

Basic Syntax

# {expression for item in iterable}
squares = {x**2 for x in range(10)}
print(squares)  # {0, 1, 4, 9, 16, 25, 36, 49, 64, 81}

With Conditions

# Filter: only even numbers
evens = {x for x in range(20) if x % 2 == 0}
print(evens)  # {0, 2, 4, 6, 8, 10, 12, 14, 16, 18}

# Transform and filter
long_words = {word.upper() for word in ["hello", "hi", "world", "python", "a"]
              if len(word) > 2}
print(long_words)  # {'HELLO', 'WORLD', 'PYTHON'}

Nested Iteration

# Cartesian product
pairs = {(x, y) for x in range(3) for y in range(3) if x != y}
print(pairs)
# {(0, 1), (0, 2), (1, 0), (1, 2), (2, 0), (2, 1)}

Practical Examples

# Extract unique file extensions
files = ["report.pdf", "data.csv", "image.png", "backup.pdf", "notes.csv"]
extensions = {f.split(".")[-1] for f in files}
print(extensions)  # {'pdf', 'csv', 'png'}

# Unique first letters
words = ["apple", "avocado", "banana", "blueberry", "cherry"]
first_letters = {word[0] for word in words}
print(first_letters)  # {'a', 'b', 'c'}

# Character frequency set
sentence = "the quick brown fox"
unique_lengths = {len(word) for word in sentence.split()}
print(unique_lengths)  # {3, 5}

Performance

Sets are optimized for membership testing and mathematical operations. Understanding their performance characteristics helps you choose the right data structure.

Time Complexity

OperationSetListDict
Membership test (x in s)O(1) averageO(n)O(1) average
Add elementO(1) averageO(1)*O(1) average
Remove elementO(1) averageO(n)O(1) average
UnionO(n + m)
IntersectionO(min(n, m))
IterationO(n)O(n)O(n)

* amortized for append

Membership Testing: Set vs List

import time

# Create test data
large_list = list(range(1_000_000))
large_set = set(range(1_000_000))

# Test membership — list is slow
start = time.time()
for i in range(1000):
    _ = 999_999 in large_list
list_time = time.time() - start

# Test membership — set is fast
start = time.time()
for i in range(1000):
    _ = 999_999 in large_set
set_time = time.time() - start

print(f"List: {list_time:.4f}s")
print(f"Set:  {set_time:.4f}s")
# Set is typically 100-1000x faster for large collections

Memory Considerations

Sets use more memory than lists due to hash table overhead. Use sets when fast lookup matters more than memory:

import sys

lst = list(range(1000))
s = set(range(1000))

print(f"List size: {sys.getsizeof(lst):,} bytes")  # ~8,056 bytes
print(f"Set size:  {sys.getsizeof(s):,} bytes")    # ~32,768 bytes

Real-World Use Cases

1. Removing Duplicates

# Deduplicate a list while preserving order
def deduplicate(items):
    seen = set()
    result = []
    for item in items:
        if item not in seen:
            seen.add(item)
            result.append(item)
    return result

users = ["Alice", "Bob", "Alice", "Charlie", "Bob", "David"]
print(deduplicate(users))
# ['Alice', 'Bob', 'Charlie', 'David']

2. Membership Testing

# Validate user input against allowed values
VALID_STATUSES = {"active", "inactive", "pending", "suspended"}

def validate_status(status):
    if status not in VALID_STATUSES:
        raise ValueError(f"Invalid status: {status}")
    return status

print(validate_status("active"))   # 'active'
# validate_status("deleted")      # Raises ValueError

3. Finding Common Elements

# Students enrolled in both courses
course_a = {"Alice", "Bob", "Charlie", "Diana"}
course_b = {"Bob", "Diana", "Eve", "Frank"}

both_courses = course_a & course_b
print(both_courses)  # {'Bob', 'Diana'}

either_course = course_a | course_b
print(either_course)  # {'Alice', 'Bob', 'Charlie', 'Diana', 'Eve', 'Frank'}

only_a = course_a - course_b
print(only_a)  # {'Alice', 'Charlie'}

4. Venn Diagram Analysis

# Analyzing overlapping categories
frontend = {"HTML", "CSS", "JavaScript", "React"}
backend = {"Python", "SQL", "JavaScript", "Docker"}
devops = {"Docker", "Kubernetes", "AWS", "Python"}

# Skills unique to each area
only_frontend = frontend - backend - devops
only_backend = backend - frontend - devops
only_devops = devops - frontend - backend

# Skills shared across all three
common_all = frontend & backend & devops

print(f"Frontend only: {only_frontend}")
print(f"Backend only:  {only_backend}")
print(f"DevOps only:   {only_devops}")
print(f"All three:     {common_all}")
# All three:     {'Python', 'Docker'}

5. Data Validation and Cleaning

# Remove invalid entries
required_fields = {"name", "email", "password"}
submitted_data = {"name": "Alice", "email": "alice@example.com"}

missing = required_fields - set(submitted_data.keys())
if missing:
    print(f"Missing fields: {missing}")
# Missing fields: {'password'}

# Find duplicate IDs
user_ids = [101, 202, 303, 101, 202, 404]
seen = set()
duplicates = set()
for uid in user_ids:
    if uid in seen:
        duplicates.add(uid)
    seen.add(uid)
print(f"Duplicate IDs: {duplicates}")
# Duplicate IDs: {102, 202}

6. Permission Checking

# Check what a user can access
admin_perms = {"read", "write", "delete", "admin"}
user_perms = {"read", "write"}
guest_perms = {"read"}

def check_access(user_role, required_permission):
    role_perms = {
        "admin": admin_perms,
        "user": user_perms,
        "guest": guest_perms,
    }
    return required_permission in role_perms.get(user_role, set())

print(check_access("admin", "delete"))   # True
print(check_access("user", "delete"))    # False
print(check_access("guest", "read"))     # True

Set vs Other Types

set vs frozenset

Featuresetfrozenset
MutableYesNo
HashableNoYes
Can be dict keyNoYes
Can be set elementNoYes
Supports add/removeYesNo
PerformanceSameSame

set vs list

Featuresetlist
OrderedNoYes
IndexableNoYes
DuplicatesNoYes
Membership testO(1)O(n)
MutableYesYes
Use caseUnique items, fast lookupOrdered data, duplicates

set vs dict

Featuresetdict
StoresKeys onlyKey-value pairs
Lookup by valueO(1)O(n)
Lookup by keyO(1)O(1)
Use caseUnique keysMappings

set vs Counter

FeaturesetCounter
Counts occurrencesNoYes
Stores frequenciesNoYes
Use caseUnique itemsFrequency analysis
from collections import Counter

# Counter tracks how many times each element appears
words = ["apple", "banana", "apple", "cherry", "banana", "apple"]
word_counts = Counter(words)
print(word_counts)  # Counter({'apple': 3, 'banana': 2, 'cherry': 1})

# Set only knows about existence
unique_words = set(words)
print(unique_words)  # {'apple', 'banana', 'cherry'}

Common Mistakes

1. Empty Set Syntax

# WRONG — this creates a dictionary
empty = {}

# CORRECT — use set()
empty = set()

print(type({}))   # <class 'dict'>
print(type(set()))  # <class 'set'>

2. Unhashable Elements

# WRONG — lists cannot be set elements
# s = {[1, 2], [3, 4]}  # TypeError: unhashable type: 'list'

# CORRECT — use tuples or frozensets
s = {(1, 2), (3, 4)}
print(s)  # {(1, 2), (3, 4)}

# CORRECT — use frozenset for nested sets
s = {frozenset({1, 2}), frozenset({3, 4})}
print(s)  # {frozenset({1, 2}), frozenset({3, 4})}

3. Modifying During Iteration

# WRONG — modifying set while iterating causes RuntimeError
s = {1, 2, 3, 4, 5}
# for x in s:
#     if x % 2 == 0:
#         s.remove(x)  # RuntimeError

# CORRECT — iterate over a copy
s = {1, 2, 3, 4, 5}
for x in s.copy():
    if x % 2 == 0:
        s.remove(x)
print(s)  # {1, 3, 5}

# CORRECT — use set comprehension
s = {1, 2, 3, 4, 5}
s = {x for x in s if x % 2 != 0}
print(s)  # {1, 3, 5}

4. Order Not Guaranteed

# Sets do NOT guarantee insertion order
s = {3, 1, 4, 1, 5, 9, 2, 6}
print(s)  # Order is unpredictable!

# If you need ordered unique elements:
from collections import OrderedDict
ordered = list(OrderedDict.fromkeys([3, 1, 4, 1, 5, 9, 2, 6]))
print(ordered)  # [3, 1, 4, 5, 9, 2, 6]

5. Confusing | with ||

# WRONG — || is not a valid operator
# result = set1 || set2  # SyntaxError

# CORRECT — use | or union()
result = set1 | set2
result = set1.union(set2)

Practice Exercises

Exercise 1: Find the Symmetric Difference

Write a function that finds elements present in exactly one of two sets.

def unique_to_each(set1, set2):
    """
    Return elements that are in set1 or set2, but not both.
    
    >>> unique_to_each({1, 2, 3}, {3, 4, 5})
    {1, 2, 4, 5}
    >>> unique_to_each({"a", "b"}, {"a", "b", "c"})
    {'c'}
    """
    return set1 ^ set2

# Test
print(unique_to_each({1, 2, 3}, {3, 4, 5}))  # {1, 2, 4, 5}
print(unique_to_each({"a", "b"}, {"a", "b", "c"}))  # {'c'}

Solution:

def unique_to_each(set1, set2):
    return set1 ^ set2
    # Alternative: (set1 - set2) | (set2 - set1)

# Test cases
assert unique_to_each({1, 2, 3}, {3, 4, 5}) == {1, 2, 4, 5}
assert unique_to_each({"a", "b"}, {"a", "b", "c"}) == {"c"}
assert unique_to_each(set(), set()) == set()
print("All tests passed!")

Exercise 2: Group by First Letter

Write a function that groups words by their first letter using a dictionary of sets.

def group_by_first_letter(words):
    """
    Group words by their first letter.
    
    >>> group_by_first_letter(["apple", "banana", "avocado", "blueberry"])
    {'a': {'apple', 'avocado'}, 'b': {'banana', 'blueberry'}}
    """
    groups = {}
    for word in words:
        first = word[0]
        if first not in groups:
            groups[first] = set()
        groups[first].add(word)
    return groups

# Test
words = ["apple", "banana", "avocado", "blueberry", "cherry", "blueberry"]
print(group_by_first_letter(words))
# {'a': {'apple', 'avocado'}, 'b': {'banana', 'blueberry'}, 'c': {'cherry'}}

Solution:

def group_by_first_letter(words):
    groups = {}
    for word in words:
        first = word[0]
        groups.setdefault(first, set()).add(word)
    return groups

# Test
result = group_by_first_letter(["apple", "banana", "avocado", "blueberry"])
assert result == {"a": {"apple", "avocado"}, "b": {"banana", "blueberry"}}
print("All tests passed!")

Exercise 3: Set-Based Data Validation

Write a function that validates a dataset against constraints using set operations.

def validate_student_records(records):
    """
    Validate student records:
    - All required fields present
    - No duplicate student IDs
    - All grades are valid
    
    Returns list of error messages.
    
    >>> validate_student_records([
    ...     {"id": 1, "name": "Alice", "grade": "A"},
    ...     {"id": 2, "name": "Bob", "grade": "B"},
    ... ])
    []
    """
    required_fields = {"id", "name", "grade"}
    valid_grades = {"A", "B", "C", "D", "F"}
    errors = []
    
    seen_ids = set()
    for i, record in enumerate(records):
        # Check required fields
        missing = required_fields - set(record.keys())
        if missing:
            errors.append(f"Record {i}: missing fields {missing}")
        
        # Check duplicate IDs
        student_id = record.get("id")
        if student_id in seen_ids:
            errors.append(f"Record {i}: duplicate ID {student_id}")
        seen_ids.add(student_id)
        
        # Check valid grade
        grade = record.get("grade")
        if grade and grade not in valid_grades:
            errors.append(f"Record {i}: invalid grade '{grade}'")
    
    return errors

# Test
records = [
    {"id": 1, "name": "Alice", "grade": "A"},
    {"id": 2, "name": "Bob", "grade": "B"},
    {"id": 1, "name": "Charlie", "grade": "X"},  # duplicate ID, invalid grade
    {"id": 3},  # missing fields
]
print(validate_student_records(records))

Solution:

def validate_student_records(records):
    required_fields = {"id", "name", "grade"}
    valid_grades = {"A", "B", "C", "D", "F"}
    errors = []
    
    seen_ids = set()
    for i, record in enumerate(records):
        missing = required_fields - set(record.keys())
        if missing:
            errors.append(f"Record {i}: missing fields {missing}")
        
        student_id = record.get("id")
        if student_id in seen_ids:
            errors.append(f"Record {i}: duplicate ID {student_id}")
        seen_ids.add(student_id)
        
        grade = record.get("grade")
        if grade and grade not in valid_grades:
            errors.append(f"Record {i}: invalid grade '{grade}'")
    
    return errors

# Test
records = [
    {"id": 1, "name": "Alice", "grade": "A"},
    {"id": 2, "name": "Bob", "grade": "B"},
    {"id": 1, "name": "Charlie", "grade": "X"},
    {"id": 3},
]
errors = validate_student_records(records)
assert len(errors) == 3
assert "duplicate ID 1" in errors[0]
assert "invalid grade" in errors[1]
assert "missing fields" in errors[2]
print("All tests passed!")

Key Takeaways

  1. Sets store unique, hashable elements — duplicates are automatically removed
  2. Use set() not {} for empty sets{} creates a dictionary
  3. Set operations mirror mathematics — union (|), intersection (&), difference (-), symmetric difference (^)
  4. Membership testing is O(1) — sets are vastly faster than lists for in checks
  5. remove() raises KeyError if element is missing; discard() does not
  6. frozenset is immutable — use it as dictionary keys or set elements
  7. Set comprehensions use curly braces: {x for x in range(10)}
  8. Sets don't preserve order — if order matters, use lists or OrderedDict
  9. Elements must be hashable — no lists, dicts, or other sets as elements
  10. Use sets for deduplication, membership testing, and mathematical operations

Advertisement

Need Expert Python Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement