Python Sets — Understanding Unordered Collections

Learning Objectives

By the end of this tutorial, you will be able to:

Create sets using literals, constructors, and comprehensions
Perform all mathematical set operations: union, intersection, difference, and symmetric difference
Test subset, superset, and disjoint relationships between sets
Modify sets using add, update, remove, discard, pop, and clear
Understand frozenset and when to use immutable sets
Write set comprehensions with conditions
Recognize the performance advantages of sets for membership testing
Apply sets to solve real-world programming problems like deduplication and data validation

What Are Sets?

A set is an unordered collection of unique, hashable elements. Unlike lists or tuples, sets do not preserve insertion order and automatically eliminate duplicates.

# A simple set
fruits = {"apple", "banana", "cherry"}
print(fruits)       # {'banana', 'cherry', 'apple'} (order may vary)
print(type(fruits)) # <class 'set'>

Key Characteristics

Characteristic	Description
Unordered	Elements have no defined position
Mutable	Can add and remove elements (frozenset is immutable)
No duplicates	Each element appears at most once
Heterogeneous	Can contain different data types
Unindexable	Cannot access elements by index or slice
Hashable elements	Elements must be hashable (immutable types)
Fast lookup	O(1) average time for membership testing

# Duplicates are automatically removed
numbers = {1, 2, 2, 3, 3, 3}
print(numbers)  # {1, 2, 3}

# Mixed types work
mixed = {1, "hello", 3.14, True}
print(mixed)  # {1, 3.14, 'hello'}

Hashable vs Unhashable Elements

Elements stored in a set must be hashable (immutable). This means you can store strings, numbers, tuples, and frozensets — but not lists, dictionaries, or other sets.

# Valid set elements
valid = {1, "hello", (1, 2, 3), frozenset({4, 5})}

# Invalid — uncommenting raises TypeError
# invalid = {1, [2, 3]}       # lists are unhashable
# invalid = {1, {"a": 1}}     # dicts are unhashable
# invalid = {1, {2, 3}}       # sets are unhashable

Creating Sets

1. Set Literal

Use curly braces — but not for empty sets (that creates a dictionary):

# Set with elements
colors = {"red", "green", "blue"}
print(colors)  # {'red', 'green', 'blue'}

# Empty set — must use set(), NOT {}
empty_set = set()
empty_dict = {}  # This is a dictionary, not a set!

print(type(empty_set))  # <class 'set'>
print(type(empty_dict))  # <class 'dict'>

2. set() Constructor

Convert any iterable into a set:

# From a list
from_list = set([1, 2, 3, 2, 1])
print(from_list)  # {1, 2, 3}

# From a tuple
from_tuple = set((10, 20, 30))
print(from_tuple)  # {10, 20, 30}

# From a string (each character becomes an element)
from_string = set("hello")
print(from_string)  # {'h', 'e', 'l', 'o'}

# From a range
from_range = set(range(5))
print(from_range)  # {0, 1, 2, 3, 4}

# From a dictionary (keys become elements)
from_dict = set({"a": 1, "b": 2, "c": 3})
print(from_dict)  # {'a', 'b', 'c'}

3. Set Comprehension

Create sets using comprehension syntax:

# Squares of 0-9
squares = {x**2 for x in range(10)}
print(squares)  # {0, 1, 4, 9, 16, 25, 36, 49, 64, 81}

# Even numbers only
evens = {x for x in range(20) if x % 2 == 0}
print(evens)  # {0, 2, 4, 6, 8, 10, 12, 14, 16, 18}

# Transform and filter
processed = {x.upper() for x in ["hello", "world", "python"] if len(x) > 4}
print(processed)  # {'WORLD', 'PYTHON'}

4. From Strings, Lists, Tuples

# Characters from a string (duplicates removed)
word = "mississippi"
unique_chars = set(word)
print(unique_chars)  # {'m', 'i', 's', 'p'}

# Deduplicate a list
names = ["Alice", "Bob", "Alice", "Charlie", "Bob"]
unique_names = set(names)
print(unique_names)  # {'Alice', 'Bob', 'Charlie'}

# Convert back to sorted list
unique_sorted = sorted(set(names))
print(unique_sorted)  # ['Alice', 'Bob', 'Charlie']

Set Operations

Sets support all the mathematical operations you'd find in set theory. Python provides both operator syntax and method syntax for each operation.

Visual Reference

Architecture Diagram

    Set A = {1, 2, 3, 4}       Set B = {3, 4, 5, 6}

    Union (A | B):
    {1, 2, 3, 4, 5, 6}

    Intersection (A & B):
    {3, 4}

    Difference (A - B):
    {1, 2}

    Symmetric Difference (A ^ B):
    {1, 2, 5, 6}

Union (| or union())

Combines all elements from both sets:

A = {1, 2, 3, 4}
B = {3, 4, 5, 6}

# Operator syntax
print(A | B)       # {1, 2, 3, 4, 5, 6}

# Method syntax
print(A.union(B))  # {1, 2, 3, 4, 5, 6}

# Union with multiple sets
C = {7, 8}
print(A | B | C)   # {1, 2, 3, 4, 5, 6, 7, 8}

# Union with other iterables
print(A.union([5, 6, 7]))  # {1, 2, 3, 4, 5, 6, 7}

Intersection (& or intersection())

Elements present in both sets:

A = {1, 2, 3, 4}
B = {3, 4, 5, 6}

# Operator syntax
print(A & B)              # {3, 4}

# Method syntax
print(A.intersection(B))  # {3, 4}

# Intersection with multiple sets
C = {4, 5, 6, 7}
print(A & B & C)          # {4}

# Intersection with other iterables
print(A.intersection([3, 4, 5, 6]))  # {3, 4}

Difference (- or difference())

Elements in the first set but not in the second:

A = {1, 2, 3, 4}
B = {3, 4, 5, 6}

# Operator syntax
print(A - B)             # {1, 2}

# Method syntax
print(A.difference(B))   # {1, 2}

# Reverse difference
print(B - A)             # {5, 6}

# Difference with multiple sets
C = {2, 3}
print(A - B - C)         # {1}

# Difference with other iterables
print(A.difference([3, 4, 5]))  # {1, 2}

Symmetric Difference (^ or symmetric_difference())

Elements in either set but not in both:

A = {1, 2, 3, 4}
B = {3, 4, 5, 6}

# Operator syntax
print(A ^ B)                        # {1, 2, 5, 6}

# Method syntax
print(A.symmetric_difference(B))    # {1, 2, 5, 6}

# Symmetric difference update (in-place)
A ^= B
print(A)  # {1, 2, 5, 6}

Visual Summary of Operations

Architecture Diagram

    A = {1, 2, 3, 4}    B = {3, 4, 5, 6}

    +---------------------------------------------+
    |  Operation              |  Result           |
    +---------------------------------------------+
    |  A | B  (union)         |  {1,2,3,4,5,6}   |
    |  A & B  (intersection)  |  {3,4}            |
    |  A - B  (difference)    |  {1,2}            |
    |  B - A  (difference)    |  {5,6}            |
    |  A ^ B  (sym diff)      |  {1,2,5,6}       |
    +---------------------------------------------+

Subset (`<=` or `issubset()`)

Tests if all elements of one set are in another:

A = {1, 2}
B = {1, 2, 3, 4}
C = {1, 2}

# Operator syntax
print(A <= B)             # True
print(B <= A)             # False

# Method syntax
print(A.issubset(B))      # True

# A set is a subset of itself
print(A <= A)             # True
print(A.issubset(A))      # True

# Proper subset (<)
print(A < B)              # True
print(A < A)              # False (not proper subset of itself)

Superset (`>=` or `issuperset()`)

Tests if a set contains all elements of another:

A = {1, 2, 3, 4}
B = {1, 2}

# Operator syntax
print(A >= B)              # True
print(B >= A)              # False

# Method syntax
print(A.issuperset(B))    # True

# Proper superset (>)
print(A > B)              # True
print(A > A)              # False

Disjoint (isdisjoint())

Returns True if two sets have no elements in common:

A = {1, 2, 3}
B = {4, 5, 6}
C = {3, 4, 5}

print(A.isdisjoint(B))  # True  (no common elements)
print(A.isdisjoint(C))  # False (share element 3)

Modifying Sets

Adding Elements

fruits = {"apple", "banana"}

# add() — adds a single element
fruits.add("cherry")
print(fruits)  # {'apple', 'banana', 'cherry'}

# add() does nothing if element already exists
fruits.add("apple")
print(fruits)  # {'apple', 'banana', 'cherry'} (no change)

# update() — adds multiple elements from an iterable
fruits.update(["date", "elderberry"])
print(fruits)  # {'apple', 'banana', 'cherry', 'date', 'elderberry'}

# update() with sets
fruits.update({"fig", "grape"})
print(fruits)  # {'apple', 'banana', 'cherry', 'date', 'elderberry', 'fig', 'grape'}

Removing Elements

colors = {"red", "green", "blue", "yellow"}

# remove() — removes element, raises KeyError if not found
colors.remove("red")
print(colors)  # {'green', 'blue', 'yellow'}

# This raises KeyError:
# colors.remove("purple")

# discard() — removes element, does nothing if not found
colors.discard("green")
print(colors)  # {'blue', 'yellow'}

colors.discard("purple")  # No error!
print(colors)  # {'blue', 'yellow'}

# pop() — removes and returns an arbitrary element
popped = colors.pop()
print(popped)   # 'blue' (or 'yellow', order not guaranteed)
print(colors)   # {'yellow'} (or empty)

# clear() — removes all elements
colors.clear()
print(colors)  # set()

remove() vs discard()

Method	Element exists	Element missing
`remove()`	Removes element	Raises `KeyError`
`discard()`	Removes element	Does nothing

s = {1, 2, 3}

# Use remove() when missing element is an error
s.remove(2)

# Use discard() when missing element is expected
s.discard(99)  # No error

Set Update Operations

A = {1, 2, 3}
B = {3, 4, 5}

# |= is shorthand for update/union
A |= B
print(A)  # {1, 2, 3, 4, 5}

# &= is shorthand for intersection_update
A = {1, 2, 3, 4, 5}
A &= {2, 3, 6}
print(A)  # {2, 3}

# -= is shorthand for difference_update
A = {1, 2, 3, 4}
A -= {3, 4, 5}
print(A)  # {1, 2}

# ^= is shorthand for symmetric_difference_update
A = {1, 2, 3}
A ^= {2, 3, 4}
print(A)  # {1, 4}

Frozenset

A frozenset is an immutable version of set. Once created, it cannot be modified — making it hashable and usable as a dictionary key or element of another set.

# Creating a frozenset
fs = frozenset([1, 2, 3, 4])
print(fs)        # frozenset({1, 2, 3, 4})
print(type(fs))  # <class 'frozenset'>

# frozenset is hashable — can be a dictionary key
locations = {
    frozenset({"office", "home"}): "commute",
    frozenset({"gym", "park"}): "exercise",
}
print(locations[frozenset({"office", "home"})])  # 'commute'

# frozenset can be an element of a set
nested = {frozenset({1, 2}), frozenset({3, 4})}
print(nested)  # {frozenset({1, 2}), frozenset({3, 4})}

Frozenset Operations

Frozensets support all read-only set operations but not mutation operations:

fs1 = frozenset({1, 2, 3})
fs2 = frozenset({3, 4, 5})

# These work (return new frozensets)
print(fs1 | fs2)   # frozenset({1, 2, 3, 4, 5})
print(fs1 & fs2)   # frozenset({3})
print(fs1 - fs2)   # frozenset({1, 2})
print(fs1 ^ fs2)   # frozenset({1, 2, 4, 5})
print(fs1 <= fs2)  # False
print(fs1.issubset(fs2))  # False

# These raise AttributeError:
# fs1.add(4)
# fs1.remove(1)
# fs1.update([5, 6])

When to Use frozenset

Scenario	Use frozenset?
Set needs to be a dictionary key	Yes
Set needs to be an element of another set	Yes
Set must not be modified after creation	Yes
Need to add/remove elements	No — use regular set
Need fast, immutable membership testing	Yes

Set Comprehensions

Set comprehensions create sets using a concise expression, similar to list comprehensions but with curly braces.

Basic Syntax

# {expression for item in iterable}
squares = {x**2 for x in range(10)}
print(squares)  # {0, 1, 4, 9, 16, 25, 36, 49, 64, 81}

With Conditions

# Filter: only even numbers
evens = {x for x in range(20) if x % 2 == 0}
print(evens)  # {0, 2, 4, 6, 8, 10, 12, 14, 16, 18}

# Transform and filter
long_words = {word.upper() for word in ["hello", "hi", "world", "python", "a"]
              if len(word) > 2}
print(long_words)  # {'HELLO', 'WORLD', 'PYTHON'}

Nested Iteration

# Cartesian product
pairs = {(x, y) for x in range(3) for y in range(3) if x != y}
print(pairs)
# {(0, 1), (0, 2), (1, 0), (1, 2), (2, 0), (2, 1)}

Practical Examples

# Extract unique file extensions
files = ["report.pdf", "data.csv", "image.png", "backup.pdf", "notes.csv"]
extensions = {f.split(".")[-1] for f in files}
print(extensions)  # {'pdf', 'csv', 'png'}

# Unique first letters
words = ["apple", "avocado", "banana", "blueberry", "cherry"]
first_letters = {word[0] for word in words}
print(first_letters)  # {'a', 'b', 'c'}

# Character frequency set
sentence = "the quick brown fox"
unique_lengths = {len(word) for word in sentence.split()}
print(unique_lengths)  # {3, 5}

Performance

Sets are optimized for membership testing and mathematical operations. Understanding their performance characteristics helps you choose the right data structure.

Time Complexity

Operation	Set	List	Dict
Membership test (`x in s`)	O(1) average	O(n)	O(1) average
Add element	O(1) average	O(1)*	O(1) average
Remove element	O(1) average	O(n)	O(1) average
Union	O(n + m)	—	—
Intersection	O(min(n, m))	—	—
Iteration	O(n)	O(n)	O(n)

* amortized for append

Membership Testing: Set vs List

import time

# Create test data
large_list = list(range(1_000_000))
large_set = set(range(1_000_000))

# Test membership — list is slow
start = time.time()
for i in range(1000):
    _ = 999_999 in large_list
list_time = time.time() - start

# Test membership — set is fast
start = time.time()
for i in range(1000):
    _ = 999_999 in large_set
set_time = time.time() - start

print(f"List: {list_time:.4f}s")
print(f"Set:  {set_time:.4f}s")
# Set is typically 100-1000x faster for large collections

Memory Considerations

Sets use more memory than lists due to hash table overhead. Use sets when fast lookup matters more than memory:

import sys

lst = list(range(1000))
s = set(range(1000))

print(f"List size: {sys.getsizeof(lst):,} bytes")  # ~8,056 bytes
print(f"Set size:  {sys.getsizeof(s):,} bytes")    # ~32,768 bytes

Real-World Use Cases

1. Removing Duplicates

# Deduplicate a list while preserving order
def deduplicate(items):
    seen = set()
    result = []
    for item in items:
        if item not in seen:
            seen.add(item)
            result.append(item)
    return result

users = ["Alice", "Bob", "Alice", "Charlie", "Bob", "David"]
print(deduplicate(users))
# ['Alice', 'Bob', 'Charlie', 'David']

2. Membership Testing

# Validate user input against allowed values
VALID_STATUSES = {"active", "inactive", "pending", "suspended"}

def validate_status(status):
    if status not in VALID_STATUSES:
        raise ValueError(f"Invalid status: {status}")
    return status

print(validate_status("active"))   # 'active'
# validate_status("deleted")      # Raises ValueError

3. Finding Common Elements

# Students enrolled in both courses
course_a = {"Alice", "Bob", "Charlie", "Diana"}
course_b = {"Bob", "Diana", "Eve", "Frank"}

both_courses = course_a & course_b
print(both_courses)  # {'Bob', 'Diana'}

either_course = course_a | course_b
print(either_course)  # {'Alice', 'Bob', 'Charlie', 'Diana', 'Eve', 'Frank'}

only_a = course_a - course_b
print(only_a)  # {'Alice', 'Charlie'}

4. Venn Diagram Analysis

# Analyzing overlapping categories
frontend = {"HTML", "CSS", "JavaScript", "React"}
backend = {"Python", "SQL", "JavaScript", "Docker"}
devops = {"Docker", "Kubernetes", "AWS", "Python"}

# Skills unique to each area
only_frontend = frontend - backend - devops
only_backend = backend - frontend - devops
only_devops = devops - frontend - backend

# Skills shared across all three
common_all = frontend & backend & devops

print(f"Frontend only: {only_frontend}")
print(f"Backend only:  {only_backend}")
print(f"DevOps only:   {only_devops}")
print(f"All three:     {common_all}")
# All three:     {'Python', 'Docker'}

5. Data Validation and Cleaning

# Remove invalid entries
required_fields = {"name", "email", "password"}
submitted_data = {"name": "Alice", "email": "alice@example.com"}

missing = required_fields - set(submitted_data.keys())
if missing:
    print(f"Missing fields: {missing}")
# Missing fields: {'password'}

# Find duplicate IDs
user_ids = [101, 202, 303, 101, 202, 404]
seen = set()
duplicates = set()
for uid in user_ids:
    if uid in seen:
        duplicates.add(uid)
    seen.add(uid)
print(f"Duplicate IDs: {duplicates}")
# Duplicate IDs: {102, 202}

6. Permission Checking

# Check what a user can access
admin_perms = {"read", "write", "delete", "admin"}
user_perms = {"read", "write"}
guest_perms = {"read"}

def check_access(user_role, required_permission):
    role_perms = {
        "admin": admin_perms,
        "user": user_perms,
        "guest": guest_perms,
    }
    return required_permission in role_perms.get(user_role, set())

print(check_access("admin", "delete"))   # True
print(check_access("user", "delete"))    # False
print(check_access("guest", "read"))     # True

Set vs Other Types

set vs frozenset

Feature	set	frozenset
Mutable	Yes	No
Hashable	No	Yes
Can be dict key	No	Yes
Can be set element	No	Yes
Supports add/remove	Yes	No
Performance	Same	Same

set vs list

Feature	set	list
Ordered	No	Yes
Indexable	No	Yes
Duplicates	No	Yes
Membership test	O(1)	O(n)
Mutable	Yes	Yes
Use case	Unique items, fast lookup	Ordered data, duplicates

set vs dict

Feature	set	dict
Stores	Keys only	Key-value pairs
Lookup by value	O(1)	O(n)
Lookup by key	O(1)	O(1)
Use case	Unique keys	Mappings

set vs Counter

Feature	set	Counter
Counts occurrences	No	Yes
Stores frequencies	No	Yes
Use case	Unique items	Frequency analysis

from collections import Counter

# Counter tracks how many times each element appears
words = ["apple", "banana", "apple", "cherry", "banana", "apple"]
word_counts = Counter(words)
print(word_counts)  # Counter({'apple': 3, 'banana': 2, 'cherry': 1})

# Set only knows about existence
unique_words = set(words)
print(unique_words)  # {'apple', 'banana', 'cherry'}

Common Mistakes

1. Empty Set Syntax

# WRONG — this creates a dictionary
empty = {}

# CORRECT — use set()
empty = set()

print(type({}))   # <class 'dict'>
print(type(set()))  # <class 'set'>

2. Unhashable Elements

# WRONG — lists cannot be set elements
# s = {[1, 2], [3, 4]}  # TypeError: unhashable type: 'list'

# CORRECT — use tuples or frozensets
s = {(1, 2), (3, 4)}
print(s)  # {(1, 2), (3, 4)}

# CORRECT — use frozenset for nested sets
s = {frozenset({1, 2}), frozenset({3, 4})}
print(s)  # {frozenset({1, 2}), frozenset({3, 4})}

3. Modifying During Iteration

# WRONG — modifying set while iterating causes RuntimeError
s = {1, 2, 3, 4, 5}
# for x in s:
#     if x % 2 == 0:
#         s.remove(x)  # RuntimeError

# CORRECT — iterate over a copy
s = {1, 2, 3, 4, 5}
for x in s.copy():
    if x % 2 == 0:
        s.remove(x)
print(s)  # {1, 3, 5}

# CORRECT — use set comprehension
s = {1, 2, 3, 4, 5}
s = {x for x in s if x % 2 != 0}
print(s)  # {1, 3, 5}

4. Order Not Guaranteed

# Sets do NOT guarantee insertion order
s = {3, 1, 4, 1, 5, 9, 2, 6}
print(s)  # Order is unpredictable!

# If you need ordered unique elements:
from collections import OrderedDict
ordered = list(OrderedDict.fromkeys([3, 1, 4, 1, 5, 9, 2, 6]))
print(ordered)  # [3, 1, 4, 5, 9, 2, 6]

5. Confusing | with ||

# WRONG — || is not a valid operator
# result = set1 || set2  # SyntaxError

# CORRECT — use | or union()
result = set1 | set2
result = set1.union(set2)

Practice Exercises

Exercise 1: Find the Symmetric Difference

Write a function that finds elements present in exactly one of two sets.

def unique_to_each(set1, set2):
    """
    Return elements that are in set1 or set2, but not both.

    >>> unique_to_each({1, 2, 3}, {3, 4, 5})
    {1, 2, 4, 5}
    >>> unique_to_each({"a", "b"}, {"a", "b", "c"})
    {'c'}
    """
    return set1 ^ set2

# Test
print(unique_to_each({1, 2, 3}, {3, 4, 5}))  # {1, 2, 4, 5}
print(unique_to_each({"a", "b"}, {"a", "b", "c"}))  # {'c'}

Solution:

def unique_to_each(set1, set2):
    return set1 ^ set2
    # Alternative: (set1 - set2) | (set2 - set1)

# Test cases
assert unique_to_each({1, 2, 3}, {3, 4, 5}) == {1, 2, 4, 5}
assert unique_to_each({"a", "b"}, {"a", "b", "c"}) == {"c"}
assert unique_to_each(set(), set()) == set()
print("All tests passed!")

Exercise 2: Group by First Letter

Write a function that groups words by their first letter using a dictionary of sets.

def group_by_first_letter(words):
    """
    Group words by their first letter.

    >>> group_by_first_letter(["apple", "banana", "avocado", "blueberry"])
    {'a': {'apple', 'avocado'}, 'b': {'banana', 'blueberry'}}
    """
    groups = {}
    for word in words:
        first = word[0]
        if first not in groups:
            groups[first] = set()
        groups[first].add(word)
    return groups

# Test
words = ["apple", "banana", "avocado", "blueberry", "cherry", "blueberry"]
print(group_by_first_letter(words))
# {'a': {'apple', 'avocado'}, 'b': {'banana', 'blueberry'}, 'c': {'cherry'}}

Solution:

def group_by_first_letter(words):
    groups = {}
    for word in words:
        first = word[0]
        groups.setdefault(first, set()).add(word)
    return groups

# Test
result = group_by_first_letter(["apple", "banana", "avocado", "blueberry"])
assert result == {"a": {"apple", "avocado"}, "b": {"banana", "blueberry"}}
print("All tests passed!")

Exercise 3: Set-Based Data Validation

Write a function that validates a dataset against constraints using set operations.

def validate_student_records(records):
    """
    Validate student records:
    - All required fields present
    - No duplicate student IDs
    - All grades are valid

    Returns list of error messages.

    >>> validate_student_records([
    ...     {"id": 1, "name": "Alice", "grade": "A"},
    ...     {"id": 2, "name": "Bob", "grade": "B"},
    ... ])
    []
    """
    required_fields = {"id", "name", "grade"}
    valid_grades = {"A", "B", "C", "D", "F"}
    errors = []

    seen_ids = set()
    for i, record in enumerate(records):
        # Check required fields
        missing = required_fields - set(record.keys())
        if missing:
            errors.append(f"Record {i}: missing fields {missing}")

        # Check duplicate IDs
        student_id = record.get("id")
        if student_id in seen_ids:
            errors.append(f"Record {i}: duplicate ID {student_id}")
        seen_ids.add(student_id)

        # Check valid grade
        grade = record.get("grade")
        if grade and grade not in valid_grades:
            errors.append(f"Record {i}: invalid grade '{grade}'")

    return errors

# Test
records = [
    {"id": 1, "name": "Alice", "grade": "A"},
    {"id": 2, "name": "Bob", "grade": "B"},
    {"id": 1, "name": "Charlie", "grade": "X"},  # duplicate ID, invalid grade
    {"id": 3},  # missing fields
]
print(validate_student_records(records))

Solution:

def validate_student_records(records):
    required_fields = {"id", "name", "grade"}
    valid_grades = {"A", "B", "C", "D", "F"}
    errors = []

    seen_ids = set()
    for i, record in enumerate(records):
        missing = required_fields - set(record.keys())
        if missing:
            errors.append(f"Record {i}: missing fields {missing}")

        student_id = record.get("id")
        if student_id in seen_ids:
            errors.append(f"Record {i}: duplicate ID {student_id}")
        seen_ids.add(student_id)

        grade = record.get("grade")
        if grade and grade not in valid_grades:
            errors.append(f"Record {i}: invalid grade '{grade}'")

    return errors

# Test
records = [
    {"id": 1, "name": "Alice", "grade": "A"},
    {"id": 2, "name": "Bob", "grade": "B"},
    {"id": 1, "name": "Charlie", "grade": "X"},
    {"id": 3},
]
errors = validate_student_records(records)
assert len(errors) == 3
assert "duplicate ID 1" in errors[0]
assert "invalid grade" in errors[1]
assert "missing fields" in errors[2]
print("All tests passed!")

Key Takeaways

Sets store unique, hashable elements — duplicates are automatically removed
Use set() not {} for empty sets — {} creates a dictionary
Set operations mirror mathematics — union (|), intersection (&), difference (-), symmetric difference (^)
Membership testing is O(1) — sets are vastly faster than lists for in checks
remove() raises KeyError if element is missing; discard() does not
frozenset is immutable — use it as dictionary keys or set elements
Set comprehensions use curly braces: {x for x in range(10)}
Sets don't preserve order — if order matters, use lists or OrderedDict
Elements must be hashable — no lists, dicts, or other sets as elements
Use sets for deduplication, membership testing, and mathematical operations

Python Sets — Understanding Unordered Collections

Python Sets — Understanding Unordered Collections

Learning Objectives

What Are Sets?

Key Characteristics

Hashable vs Unhashable Elements

Creating Sets

1. Set Literal

2. set() Constructor

3. Set Comprehension

4. From Strings, Lists, Tuples

Set Operations

Visual Reference

Union (| or union())

Intersection (& or intersection())

Difference (- or difference())

Symmetric Difference (^ or symmetric_difference())

Visual Summary of Operations

Subset (<= or issubset())

Superset (>= or issuperset())

Disjoint (isdisjoint())

Modifying Sets

Adding Elements

Removing Elements

remove() vs discard()

Set Update Operations

Frozenset

Frozenset Operations

When to Use frozenset

Set Comprehensions

Basic Syntax

With Conditions

Nested Iteration

Practical Examples

Performance

Time Complexity

Membership Testing: Set vs List

Memory Considerations

Real-World Use Cases

1. Removing Duplicates

2. Membership Testing

3. Finding Common Elements

4. Venn Diagram Analysis

5. Data Validation and Cleaning

6. Permission Checking

Set vs Other Types

set vs frozenset

set vs list

set vs dict

set vs Counter

Common Mistakes

1. Empty Set Syntax

2. Unhashable Elements

3. Modifying During Iteration

4. Order Not Guaranteed

5. Confusing | with ||

Practice Exercises

Exercise 1: Find the Symmetric Difference

Exercise 2: Group by First Letter

Exercise 3: Set-Based Data Validation

Key Takeaways

Need Expert Python Help?

Subset (`<=` or `issubset()`)

Superset (`>=` or `issuperset()`)