Python Data Types and Structures
Understanding data types and structures is fundamental to efficient data manipulation in Python.
Primitive Types
# Numeric Types
x_int = 42 # int
x_float = 3.14 # float
x_complex = 2+3j # complex
# Boolean
flag = True # bool (subclass of int: True == 1, False == 0)
# String
name = "Data Science" # str
Type Checking and Conversion
type(42) # <class 'int'>
type(3.14) # <class 'float'>
int(3.99) # 3 (truncates toward zero)
float(42) # 42.0
str(100) # '100'
bool(0) # False
bool("") # False
bool([]) # False
bool(None) # False
Collections Overview
Mutability Comparison
# Mutable: list, dict, set
lst = [1, 2, 3]
lst[0] = 99 # Valid: [99, 2, 3]
# Immutable: int, float, str, tuple, frozenset
tup = (1, 2, 3)
tup[0] = 99 # TypeError: 'tuple' does not support item assignment
Core Operations
List Operations
lst = [3, 1, 4, 1, 5, 9, 2, 6]
lst.append(7) # [3, 1, 4, 1, 5, 9, 2, 6, 7]
lst.insert(0, 0) # [0, 3, 1, 4, 1, 5, 9, 2, 6, 7]
lst.pop() # returns 7, lst = [0, 3, 1, 4, 1, 5, 9, 2, 6]
lst.remove(1) # removes first occurrence of 1
lst.sort() # in-place sort
sorted_lst = sorted(lst) # returns new sorted list
lst.reverse() # in-place reverse
lst.extend([10, 11]) # appends all elements
Dictionary Operations
data = {"name": "Alice", "age": 30, "city": "NYC"}
data["name"] # "Alice"
data.get("salary", 0) # 0 (default if key missing)
data["salary"] = 85000 # add new key-value pair
del data["city"] # remove key-value pair
data.keys() # dict_keys(["name", "age", "salary"])
data.values() # dict_values(["Alice", 30, 85000])
data.items() # dict_items([("name", "Alice"), ...])
Set Operations
A = {1, 2, 3, 4, 5}
B = {4, 5, 6, 7, 8}
A | B # Union: {1, 2, 3, 4, 5, 6, 7, 8}
A & B # Intersection: {4, 5}
A - B # Difference: {1, 2, 3}
A ^ B # Symmetric: {1, 2, 3, 6, 7, 8}
Time Complexity of Operations
| Operation | list | dict | set | tuple |
|---|---|---|---|---|
| Access by index/key | O(1) | O(1) avg | N/A | O(1) |
| Search (x in s) | O(n) | O(1) avg | O(1) avg | O(n) |
| Insert | O(1) append, O(n) insert | O(1) avg | O(1) avg | N/A |
| Delete | O(n) | O(1) avg | O(1) avg | N/A |
| Update | O(1) by index | O(1) avg | N/A | N/A |
| Length | O(1) | O(1) | O(1) | O(1) |
Slicing Syntax
lst = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
lst[2:5] # [2, 3, 4]
lst[:3] # [0, 1, 2]
lst[7:] # [7, 8, 9]
lst[::2] # [0, 2, 4, 6, 8] (every 2nd element)
lst[::-1] # [9, 8, 7, 6, 5, 4, 3, 2, 1, 0] (reversed)
Unpacking and Packing
# Tuple/list unpacking
a, b, *rest = [1, 2, 3, 4, 5]
# a=1, b=2, rest=[3, 4, 5]
first, *middle, last = (10, 20, 30, 40, 50)
# first=10, middle=[20, 30, 40], last=50
# Dictionary unpacking
dict1 = {"a": 1, "b": 2}
dict2 = {"b": 3, "c": 4}
merged = {**dict1, **dict2} # {"a": 1, "b": 3, "c": 4}
Nested Structures
# List of dicts (common in data science)
students = [
{"name": "Alice", "grade": 95, "courses": ["ML", "Stats"]},
{"name": "Bob", "grade": 88, "courses": ["DL", "NLP"]},
{"name": "Charlie", "grade": 92, "courses": ["Stats", "DL"]}
]
# Access nested data
students[0]["courses"][1] # "Stats"
# Dict of lists
matrix = {
"row1": [1, 2, 3],
"row2": [4, 5, 6],
"row3": [7, 8, 9]
}
Specialized Collections
from collections import defaultdict, Counter, namedtuple, deque
# defaultdict - auto-initializes missing keys
word_count = defaultdict(int)
for word in ["hello", "world", "hello"]:
word_count[word] += 1 # {"hello": 2, "world": 1}
# Counter - frequency counting
Counter(["a", "b", "a", "c", "a"]) # Counter({'a': 3, 'b': 1, 'c': 1})
# namedtuple - immutable with named fields
Point = namedtuple("Point", ["x", "y"])
p = Point(3, 4)
p.x # 3
# deque - O(1) append/popleft
dq = deque([1, 2, 3])
dq.appendleft(0) # deque([0, 1, 2, 3])
dq.pop() # returns 3
Summary
- Lists → ordered, mutable sequences (most flexible)
- Tuples → ordered, immutable sequences (hashable, faster)
- Dicts → key-value mappings (O(1) lookup)
- Sets → unordered unique elements (set operations)
- Choose the right structure based on mutability needs and access patterns.