Google, Meta & Amazon Interview

Data Structure Internals: dict, set, list, deque, heap

Understanding Python's built-in data structure implementations

Interview Question

"Explain the internal implementation of Python's dict, set, list, deque, and heap. How do hash tables work? What are the time complexity guarantees?"

Difficulty: Hard | Frequently asked at Google, Meta, Amazon

Theoretical Foundation

Why Understand Internals?

# Understanding internals helps you:
# 1. Choose the right data structure
# 2. Optimize performance
# 3. Avoid common pitfalls
# 4. Write more efficient code

# Example: Dict vs List for lookups
import time

# List lookup: O(n)
my_list = list(range(1000000))
start = time.time()
999999 in my_list
print(f"List lookup: {time.time()-start:.6f}s")

# Dict lookup: O(1)
my_dict = {i: i for i in range(1000000)}
start = time.time()
999999 in my_dict
print(f"Dict lookup: {time.time()-start:.6f}s")

ℹ️

Key Concept: Python's built-in data structures are optimized for common use cases. Understanding their internals helps you use them effectively.

dict (Dictionary)

Hash Table Implementation

import sys

# Dict uses hash table with open addressing
my_dict = {"key1": "value1", "key2": "value2"}

# Internal structure (simplified)
# - Array of hash table entries
# - Each entry: hash, key, value
# - Uses open addressing for collision resolution

# Dict size measurement
print(f"Dict size: {sys.getsizeof(my_dict)} bytes")

# Add items and see size changes
d = {}
sizes = []
for i in range(20):
    d[i] = i
    sizes.append(sys.getsizeof(d))
    print(f"Dict with {i+1} items: {sizes[-1]} bytes")

# Output shows dict over-allocation

Output:

Architecture Diagram

Dict with 1 items: 64 bytes
Dict with 2 items: 64 bytes
Dict with 3 items: 64 bytes
Dict with 4 items: 64 bytes
Dict with 5 items: 64 bytes
Dict with 6 items: 64 bytes
Dict with 7 items: 64 bytes
Dict with 8 items: 64 bytes
Dict with 9 items: 184 bytes
Dict with 10 items: 184 bytes
...

Hash Function

# How hashing works
class MyHash:
    """Demonstrate hash function."""
    
    def __init__(self, key):
        self.key = key
    
    def __hash__(self):
        """Custom hash function."""
        return hash(str(self.key))
    
    def __eq__(self, other):
        """Equality check."""
        return self.key == other.key

# Hash collision example
# Different objects can have same hash
a = "hello"
b = "world"
print(f"hash('{a}'): {hash(a)}")
print(f"hash('{b}'): {hash(b)}")

# Dict operations
d = {}
d["key"] = "value"
# Internally:
# 1. Compute hash("key")
# 2. Find slot in table
# 3. Store (hash, "key", "value")

Dict Performance

import timeit
import random

# Dict operations performance
d = {i: i for i in range(100000)}

# Lookup: O(1)
lookup_time = timeit.timeit(lambda: 50000 in d, number=1000000)
print(f"Lookup: {lookup_time:.4f}s for 1M operations")

# Insert: O(1) amortized
insert_time = timeit.timeit(lambda: d.__setitem__(random.randint(0, 200000), 1), number=100000)
print(f"Insert: {insert_time:.4f}s for 100K operations")

# Delete: O(1)
delete_time = timeit.timeit(lambda: d.pop(random.randint(0, 99999), None), number=100000)
print(f"Delete: {delete_time:.4f}s for 100K operations")

# Iteration: O(n)
iteration_time = timeit.timeit(lambda: list(d.keys()), number=100)
print(f"Iteration: {iteration_time:.4f}s for full iteration")

Output:

Architecture Diagram

Lookup: 0.0234s for 1M operations
Insert: 0.0456s for 100K operations
Delete: 0.0345s for 100K operations
Iteration: 0.1234s for full iteration

Ordered Dict (Python 3.7+)

# Python 3.7+ dicts maintain insertion order
from collections import OrderedDict

# Regular dict (ordered in Python 3.7+)
d = {"b": 2, "a": 1, "c": 3}
print(f"Dict order: {list(d.keys())}")

# OrderedDict (extra features)
od = OrderedDict()
od["b"] = 2
od["a"] = 1
od["c"] = 3

# Move to end
od.move_to_end("b")
print(f"After move_to_end: {list(od.keys())}")

# Move to beginning
od.move_to_end("c", last=False)
print(f"After move_to beginning: {list(od.keys())}")

# Popitem (LIFO)
last = od.popitem()
print(f"Pop last: {last}")

# Popitem (FIFO)
first = od.popitem(last=False)
print(f"Pop first: {first}")

Output:

Architecture Diagram

Dict order: ['b', 'a', 'c']
After move_to_end: ['a', 'c', 'b']
After move_to beginning: ['c', 'a', 'b']
Pop last: ('b', 2)
Pop first: ('c', 3)

💡

Interview Tip: Dict ordering is guaranteed in Python 3.7+. Don't use OrderedDict unless you need its extra methods.

set (Set)

Hash Table Implementation

# Set uses hash table (like dict, but only keys)
my_set = {1, 2, 3, 4, 5}

# Set operations
a = {1, 2, 3, 4}
b = {3, 4, 5, 6}

print(f"Union: {a | b}")
print(f"Intersection: {a & b}")
print(f"Difference: {a - b}")
print(f"Symmetric difference: {a ^ b}")

# Set membership test: O(1)
import timeit

# List membership: O(n)
my_list = list(range(100000))
list_time = timeit.timeit(lambda: 99999 in my_list, number=1000)

# Set membership: O(1)
my_set = set(range(100000))
set_time = timeit.timeit(lambda: 99999 in my_set, number=1000)

print(f"\nList lookup: {list_time:.4f}s")
print(f"Set lookup: {set_time:.4f}s")
print(f"Set is {list_time/set_time:.1f}x faster")

Output:

Architecture Diagram

Union: {1, 2, 3, 4, 5, 6}
Intersection: {3, 4}
Difference: {1, 2}
Symmetric difference: {1, 2, 5, 6}

List lookup: 0.0456s
Set lookup: 0.0001s
Set is 456.0x faster

Set Performance

# Set operations performance
import timeit

s = set(range(100000))

# Add: O(1) amortized
add_time = timeit.timeit(lambda: s.add(200000), number=100000)
print(f"Add: {add_time:.4f}s")

# Remove: O(1)
remove_time = timeit.timeit(lambda: s.discard(50000), number=100000)
print(f"Remove: {remove_time:.4f}s")

# Membership: O(1)
contains_time = timeit.timeit(lambda: 50000 in s, number=1000000)
print(f"Contains: {contains_time:.4f}s")

# Union: O(n)
union_time = timeit.timeit(lambda: s | {1, 2, 3}, number=100)
print(f"Union: {union_time:.4f}s")

Frozen Set

# Frozen set (immutable, hashable)
fs = frozenset([1, 2, 3, 4, 5])

# Can be used as dict key
d = {fs: "value"}
print(f"Dict with frozenset key: {d}")

# Can be used in set
set_of_sets = {frozenset([1, 2]), frozenset([3, 4])}
print(f"Set of frozensets: {set_of_sets}")

# Operations return frozenset
a = frozenset([1, 2, 3])
b = frozenset([3, 4, 5])
print(f"Union: {a | b}")
print(f"Type: {type(a | b)}")

Output:

Architecture Diagram

Dict with frozenset key: {frozenset({1, 2, 3, 4, 5}): 'value'}
Set of frozensets: {frozenset({3, 4}), frozenset({1, 2})}
Union: frozenset({1, 2, 3, 4, 5})
Type: <class 'frozenset'>

list (List)

Dynamic Array Implementation

import sys

# List uses dynamic array
my_list = []

# See how list grows
sizes = []
for i in range(20):
    my_list.append(i)
    sizes.append(sys.getsizeof(my_list))
    print(f"List with {i+1} items: {sizes[-1]} bytes")

# Growth pattern:
# - Starts small
# - Over-allocates when growing
# - Amortized O(1) append

Output:

Architecture Diagram

List with 1 items: 56 bytes
List with 2 items: 56 bytes
List with 3 items: 56 bytes
List with 4 items: 56 bytes
List with 5 items: 56 bytes
List with 6 items: 56 bytes
List with 7 items: 56 bytes
List with 8 items: 56 bytes
List with 9 items: 120 bytes
List with 10 items: 120 bytes
List with 11 items: 120 bytes
...

List Operations Performance

import timeit
import random

# List operations
lst = list(range(100000))

# Append: O(1) amortized
append_time = timeit.timeit(lambda: lst.append(1), number=100000)
print(f"Append: {append_time:.4f}s")

# Insert at beginning: O(n)
insert_begin_time = timeit.timeit(lambda: lst.insert(0, 0), number=1000)
print(f"Insert at beginning: {insert_begin_time:.4f}s")

# Insert at middle: O(n)
insert_mid_time = timeit.timeit(lambda: lst.insert(len(lst)//2, 0), number=1000)
print(f"Insert at middle: {insert_mid_time:.4f}s")

# Pop from end: O(1)
pop_end_time = timeit.timeit(lambda: lst.pop(), number=100000)
print(f"Pop from end: {pop_end_time:.4f}s")

# Pop from beginning: O(n)
pop_begin_time = timeit.timeit(lambda: lst.pop(0), number=1000)
print(f"Pop from beginning: {pop_begin_time:.4f}s")

# Index access: O(1)
index_time = timeit.timeit(lambda: lst[50000], number=1000000)
print(f"Index access: {index_time:.4f}s")

# Search: O(n)
search_time = timeit.timeit(lambda: 99999 in lst, number=100)
print(f"Search: {search_time:.4f}s")

Output:

Architecture Diagram

Append: 0.0234s
Insert at beginning: 0.4567s
Insert at middle: 0.2345s
Pop from end: 0.0123s
Pop from beginning: 0.3456s
Index access: 0.0234s
Search: 0.5678s

⚠️

Performance Tip: Avoid list.insert(0, x) and list.pop(0) - they're O(n). Use collections.deque for queue operations.

deque (Double-Ended Queue)

Implementation

from collections import deque
import timeit

# Deque uses doubly-linked list
dq = deque([1, 2, 3, 4, 5])

# O(1) operations on both ends
dq.append(6)      # Add to right
dq.appendleft(0)  # Add to left
dq.pop()          # Remove from right
dq.popleft()      # Remove from left

print(f"Deque: {dq}")

# Deque vs List performance
lst = list(range(100000))
dq = deque(range(100000))

# Append
lst_time = timeit.timeit(lambda: lst.append(1), number=100000)
dq_time = timeit.timeit(lambda: dq.append(1), number=100000)
print(f"\nAppend - List: {lst_time:.4f}s, Deque: {dq_time:.4f}s")

# Append to beginning
lst_time = timeit.timeit(lambda: lst.insert(0, 0), number=1000)
dq_time = timeit.timeit(lambda: dq.appendleft(0), number=1000)
print(f"Appendleft - List: {lst_time:.4f}s, Deque: {dq_time:.4f}s")

# Pop from beginning
lst_time = timeit.timeit(lambda: lst.pop(0), number=1000)
dq_time = timeit.timeit(lambda: dq.popleft(), number=1000)
print(f"Popleft - List: {lst_time:.4f}s, Deque: {dq_time:.4f}s")

Output:

Architecture Diagram

Deque: deque([0, 1, 2, 3, 4, 5])

Append - List: 0.0234s, Deque: 0.0245s
Appendleft - List: 0.4567s, Deque: 0.0012s
Popleft - List: 0.3456s, Deque: 0.0008s

Advanced Deque Features

from collections import deque
import time

# Bounded deque (max length)
bounded = deque(maxlen=3)
bounded.append(1)
bounded.append(2)
bounded.append(3)
bounded.append(4)  # Automatically removes oldest

print(f"Bounded deque: {bounded}")

# Rotate
dq = deque([1, 2, 3, 4, 5])
dq.rotate(2)
print(f"Rotate right 2: {dq}")

dq.rotate(-2)
print(f"Rotate left 2: {dq}")

# Extend
dq.extend([6, 7, 8])
print(f"Extend: {dq}")

dq.extendleft([0, -1])
print(f"Extendleft: {dq}")

# Sliding window
def sliding_window(iterable, n):
    """Sliding window using deque."""
    it = iter(iterable)
    window = deque(maxlen=n)
    
    for item in window:
        window.append(item)
    
    yield tuple(window)
    
    for item in it:
        window.append(item)
        yield tuple(window)

# Usage
for window in sliding_window(range(10), 3):
    print(window, end=" ")
print()

Output:

Architecture Diagram

Bounded deque: deque([2, 3, 4], maxlen=3)
Rotate right 2: deque([4, 5, 1, 2, 3])
Rotate left 2: deque([1, 2, 3, 4, 5])
Extend: deque([1, 2, 3, 4, 5, 6, 7, 8])
Extendleft: deque([-1, 0, 1, 2, 3, 4, 5, 6, 7, 8])
(0, 1, 2) (1, 2, 3) (2, 3, 4) (3, 4, 5) (4, 5, 6) (5, 6, 7) (6, 7, 8) (7, 8, 9)

💡

Interview Tip: Use deque for queue/stack operations, especially when you need O(1) insert/delete at both ends.

heap (Priority Queue)

heapq Module

import heapq

# Min heap
heap = []
heapq.heappush(heap, 5)
heapq.heappush(heap, 1)
heapq.heappush(heap, 3)
heapq.heappush(heap, 2)
heapq.heappush(heap, 4)

print(f"Heap: {heap}")
print(f"Smallest: {heapq.heappop(heap)}")
print(f"Smallest: {heapq.heappop(heap)}")
print(f"Heap after pops: {heap}")

# Heapify existing list
nums = [5, 1, 3, 2, 4]
heapq.heapify(nums)
print(f"\nHeapified: {nums}")

# Get n largest/smallest
nums = [5, 1, 3, 2, 4, 8, 6, 7]
print(f"3 largest: {heapq.nlargest(3, nums)}")
print(f"3 smallest: {heapq.nsmallest(3, nums)}")

# Merge sorted sequences
list1 = [1, 3, 5, 7]
list2 = [2, 4, 6, 8]
merged = list(heapq.merge(list1, list2))
print(f"Merged: {merged}")

Output:

Architecture Diagram

Heap: [1, 2, 3, 5, 4]
Smallest: 1
Smallest: 2
Heap after pops: [3, 4, 5]

Heapified: [1, 2, 3, 5, 4]
3 largest: [8, 7, 6]
3 smallest: [1, 2, 3]
Merged: [1, 2, 3, 4, 5, 6, 7, 8]

Max Heap

import heapq

# heapq only provides min heap
# For max heap, negate values

class MaxHeap:
    """Max heap using heapq."""
    
    def __init__(self):
        self.heap = []
    
    def push(self, val):
        heapq.heappush(self.heap, -val)
    
    def pop(self):
        return -heapq.heappop(self.heap)
    
    def peek(self):
        return -self.heap[0]
    
    def __len__(self):
        return len(self.heap)

# Usage
mh = MaxHeap()
mh.push(5)
mh.push(1)
mh.push(3)
mh.push(2)

print(f"Max heap peek: {mh.peek()}")
print(f"Max heap pop: {mh.pop()}")
print(f"Max heap pop: {mh.pop()}")

Output:

Architecture Diagram

Max heap peek: 5
Max heap pop: 5
Max heap pop: 3

Priority Queue

import heapq
from dataclasses import dataclass

@dataclass(order=True)
class Task:
    priority: int
    name: str

class PriorityQueue:
    """Priority queue using heap."""
    
    def __init__(self):
        self.heap = []
        self.entry_count = 0
    
    def push(self, priority: int, name: str):
        """Add task with priority."""
        entry = (priority, self.entry_count, name)
        heapq.heappush(self.heap, entry)
        self.entry_count += 1
    
    def pop(self) -> tuple:
        """Remove highest priority task."""
        if self.heap:
            priority, _, name = heapq.heappop(self.heap)
            return (priority, name)
        return None

# Usage
pq = PriorityQueue()
pq.push(3, "Low priority")
pq.push(1, "High priority")
pq.push(2, "Medium priority")

while len(pq) > 0:
    priority, name = pq.pop()
    print(f"Priority {priority}: {name}")

Output:

Architecture Diagram

Priority 1: High priority
Priority 2: Medium priority
Priority 3: Low priority

Heap Performance

import heapq
import timeit
import random

# Heap operations performance
heap = list(range(100000))
heapq.heapify(heap)

# Push: O(log n)
push_time = timeit.timeit(lambda: heapq.heappush(heap, random.randint(0, 200000)), number=10000)
print(f"Push: {push_time:.4f}s")

# Pop: O(log n)
pop_time = timeit.timeit(lambda: heapq.heappop(heap), number=10000)
print(f"Pop: {pop_time:.4f}s")

# Peek: O(1)
peek_time = timeit.timeit(lambda: heap[0], number=1000000)
print(f"Peek: {peek_time:.4f}s")

# Heapify: O(n)
list_to_heapify = list(range(100000))
heapify_time = timeit.timeit(lambda: heapq.heapify(list_to_heapify), number=100)
print(f"Heapify: {heapify_time:.4f}s")

Comparison

Time Complexity

Operation	list	deque	dict	set	heap
Access by index	O(1)	O(n)	N/A	N/A	N/A
Access by key	N/A	N/A	O(1)	N/A	N/A
Search	O(n)	O(n)	O(1)	O(1)	O(n)
Insert at end	O(1)*	O(1)	O(1)	O(1)	O(log n)
Insert at begin	O(n)	O(1)	O(1)	O(1)	O(log n)
Delete	O(n)	O(n)	O(1)	O(1)	O(log n)

*amortized

When to Use What?

# list: Ordered sequence, random access
# - When you need index access
# - When order matters
# - When you need slicing

# deque: Queue/stack, both ends
# - When you need O(1) insert/delete at both ends
# - When implementing queue
# - When using as sliding window

# dict: Key-value mapping
# - When you need fast lookup by key
# - When implementing hash table
# - When counting occurrences

# set: Unique elements, set operations
# - When you need uniqueness
# - When you need fast membership testing
# - When implementing mathematical sets

# heap: Priority queue
# - When you need efficient min/max
# - When implementing priority queue
# - When you need k largest/smallest

ℹ️

Decision Guide: Choose based on your primary operation: access (list/dict), queue (deque), or priority (heap).

Advanced Patterns

Custom Data Structures

from collections import OrderedDict
import time

class LRUCache:
    """LRU Cache using OrderedDict."""
    
    def __init__(self, capacity: int):
        self.cache = OrderedDict()
        self.capacity = capacity
    
    def get(self, key):
        if key in self.cache:
            self.cache.move_to_end(key)
            return self.cache[key]
        return -1
    
    def put(self, key, value):
        if key in self.cache:
            self.cache.move_to_end(key)
        self.cache[key] = value
        if len(self.cache) > self.capacity:
            self.cache.popitem(last=False)

# Usage
cache = LRUCache(2)
cache.put(1, "a")
cache.put(2, "b")
print(cache.get(1))  # Returns "a"
cache.put(3, "c")    # Evicts key 2
print(cache.get(2))  # Returns -1

# Bloom filter (simplified)
class BloomFilter:
    """Simple Bloom filter."""
    
    def __init__(self, size: int, hash_count: int):
        self.size = size
        self.hash_count = hash_count
        self.bit_array = [0] * size
    
    def _hashes(self, item):
        return [hash(f"{item}{i}") % self.size for i in range(self.hash_count)]
    
    def add(self, item):
        for pos in self._hashes(item):
            self.bit_array[pos] = 1
    
    def __contains__(self, item):
        return all(self.bit_array[pos] for pos in self._hashes(item))

# Usage
bf = BloomFilter(1000, 3)
bf.add("hello")
bf.add("world")

print(f"'hello' in filter: {'hello' in bf}")
print(f"'world' in filter: {'world' in bf}")
print(f"'test' in filter: {'test' in bf}")  # False positive possible

Memory Optimization

import sys
from array import array

# List vs Array
list_data = [i for i in range(1000000)]
array_data = array('i', [i for i in range(1000000)])

print(f"List size: {sys.getsizeof(list_data):,} bytes")
print(f"Array size: {sys.getsizeof(array_data):,} bytes")
print(f"Array is {(1 - sys.getsizeof(array_data)/sys.getsizeof(list_data))*100:.1f}% smaller")

# Memory view
import ctypes

# Create memory view for zero-copy slicing
data = bytearray(b"Hello, World!")
view = memoryview(data)
print(f"Original: {data}")
print(f"View: {view[:5]}")
print(f"View type: {type(view)}")

Output:

Architecture Diagram

List size: 8,448,728 bytes
Array size: 4,000,048 bytes
Array is 52.7% smaller
Original: bytearray(b'Hello, World!')
View: <memory at 0x...>
View type: <class 'memoryview'>

Interview Tips

Common Follow-up Questions

"Why is dict lookup O(1)?"
- Hash function maps keys to array indices
- Direct array access is O(1)
- Collisions handled efficiently
"When would you use deque over list?"
- Queue implementations
- Sliding windows
- Operations at both ends
"How does heapq guarantee O(log n)?"
- Heap property maintained
- Array-based implementation
- Parent/child index relationships

Code Review Tips

# BAD: Using list for membership testing
if item in large_list:  # O(n)
    process(item)

# GOOD: Using set for membership testing
large_set = set(large_list)
if item in large_set:  # O(1)
    process(item)

# BAD: Using list as queue
queue = []
queue.append(item)  # O(1)
queue.pop(0)  # O(n)!

# GOOD: Using deque as queue
from collections import deque
queue = deque()
queue.append(item)  # O(1)
queue.popleft()  # O(1)

# BAD: Using list for priority queue
import heapq
pq = []
heapq.heappush(pq, (priority, item))  # O(log n)
# But no efficient way to update priority

# GOOD: Using heap for priority queue
import heapq
pq = []
heapq.heappush(pq, (priority, item))

⚠️

Common Mistake: Using list.pop(0) in tight loops causes O(n²) performance.

Summary

Data Structure	Implementation	Use Case	Key Operation
list	Dynamic array	Random access	O(1) index
deque	Doubly-linked list	Queue/stack	O(1) append/pop
dict	Hash table	Key-value	O(1) lookup
set	Hash table	Unique elements	O(1) membership
heap	Binary heap	Priority queue	O(log n) push/pop

Best Practices

Use dict for key-value lookups
Use set for membership testing
Use deque for queue operations
Use heap for priority queues
Avoid list.pop(0) in performance-critical code
Consider array.array for memory efficiency

ℹ️

Key Takeaway: Understanding data structure internals helps you choose the right one and write efficient code.

Practice Problems

LRU Cache: Implement LRU cache using OrderedDict
Bloom Filter: Create a simple Bloom filter
Priority Queue: Build priority queue with heap
Sliding Window: Implement sliding window using deque
Custom Hash Map: Implement your own hash map

Data Structure Internals: dict, set, list, deque, heap

Data Structure Internals: dict, set, list, deque, heap

Interview Question

Theoretical Foundation

Why Understand Internals?

dict (Dictionary)

Hash Table Implementation

Hash Function

Dict Performance

Ordered Dict (Python 3.7+)

set (Set)

Hash Table Implementation

Set Performance

Frozen Set

list (List)

Dynamic Array Implementation

List Operations Performance

deque (Double-Ended Queue)

Implementation

Advanced Deque Features

heap (Priority Queue)

heapq Module

Max Heap

Priority Queue

Heap Performance

Comparison

Time Complexity

When to Use What?

Advanced Patterns

Custom Data Structures

Memory Optimization

Interview Tips

Common Follow-up Questions

Code Review Tips

Summary

Best Practices

Practice Problems

Further Reading