Python NumPy Essentials
Master NumPy — array operations, broadcasting, vectorization, and linear algebra.
Why NumPy?
DfNumPy
NumPy (Numerical Python) is the foundational library for scientific computing in Python. It provides a high-performance multi-dimensional array object (ndarray) and tools for working with these arrays. NumPy arrays are stored as contiguous blocks of memory, enabling vectorized operations that are 10-100x faster than Python lists.
NumPy vs Python Lists — Performance Comparison
import numpy as np
import time
size = 1_000_000
python_list = list(range(size))
start = time.time()
python_result = [x * 2 for x in python_list]
list_time = time.time() - start
numpy_array = np.arange(size)
start = time.time()
numpy_result = numpy_array * 2
numpy_time = time.time() - start
print(f"Python list time: {list_time:.4f}s")
print(f"NumPy array time: {numpy_time:.4f}s")
print(f"NumPy is {list_time/numpy_time:.1f}x faster")
Output:
Python list time: 0.1234s
NumPy array time: 0.0021s
NumPy is 58.8x faster
Why NumPy Is Fast
NumPy's speed comes from three key design choices:
- Contiguous memory layout: Arrays are stored in continuous memory blocks, enabling CPU cache efficiency
- Vectorization: Operations are applied element-wise without Python loops (loops run in C)
- Broadcasting: Arithmetic operations automatically handle arrays of different shapes
Memory Contiguity and Cache Efficiency
Here,
- =Number of elements in the array
- =Object boxing, type checking, reference counting (~100 ns/op)
- =Non-contiguous memory access causing cache misses
- =Single Instruction Multiple Data — CPU processes 4-16 elements per cycle
ndarray — The Core Data Structure
Dfndarray
An ndarray is a homogeneous, fixed-size, multi-dimensional array. Each element occupies the same number of bytes, enabling direct pointer arithmetic for indexing. The array metadata (shape, strides, dtype) is stored separately from the data buffer.
import numpy as np
# From Python lists
arr1d = np.array([1, 2, 3, 4, 5])
print(f"Shape: {arr1d.shape}") # (5,)
print(f"ndim: {arr1d.ndim}") # 1
print(f"dtype: {arr1d.dtype}") # int64
print(f"itemsize: {arr1d.itemsize}") # 8 bytes
print(f"nbytes: {arr1d.nbytes}") # 40 bytes
# 2D array (matrix)
arr2d = np.array([[1, 2, 3], [4, 5, 6]])
print(f"Shape: {arr2d.shape}") # (2, 3)
print(f"Strides: {arr2d.strides}") # (24, 8) — bytes to next row/col
Understanding Strides
Strides tell NumPy how many bytes to skip in memory to move to the next element along each axis. For a (m, n) array of int64: strides = (n*8, 8). Transposing only swaps strides — no data is moved.
Array Creation Functions
# Filled arrays
zeros_2x3 = np.zeros((2, 3)) # float64 by default
ones_int = np.ones((4, 2), dtype=int) # specify dtype
full_val = np.full((3, 3), 7.5) # fill with constant
empty_arr = np.empty((2, 2)) # uninitialized (fast)
# Sequences
linear = np.linspace(0, 1, 100) # 100 points from 0 to 1 (inclusive)
aranged = np.arange(0, 10, 0.5) # step-based
# Identity and diagonal
eye = np.eye(4, dtype=int)
diag = np.diag([1, 2, 3, 4])
# From existing data
copied = np.array(arr1d, copy=True)
asarray = np.asarray(arr1d, dtype=float)
Broadcasting
DfBroadcasting
Broadcasting is NumPy's mechanism for performing arithmetic operations on arrays with different shapes. It automatically "stretches" the smaller array across the larger array so that they have compatible shapes, without copying data.
# Broadcasting examples
arr2d = np.array([[1, 2, 3], [4, 5, 6]]) # shape (2, 3)
arr1d = np.array([10, 20, 30]) # shape (3,)
result = arr2d + arr1d
print(result)
# [[11, 22, 33],
# [14, 25, 36]]
Broadcasting Compatibility Rule
Here,
- =Size of dimension i in the first array
- =Size of dimension j in the second array
- =Singleton dimension — broadcasts to match the other
When Broadcasting Fails
a = np.ones((3, 4))
b = np.ones((3,))
# ValueError: operands could not be broadcast together with shapes (3,4) (3,)
# Reason: (3,4) vs (1,3) → dim 1: 4 != 3 and neither is 1
# Fix: reshape b to (1,3) or (3,1) depending on intent
b_row = b.reshape(1, 3) # (1, 3) → broadcasts across rows
b_col = b.reshape(3, 1) # (3, 1) → broadcasts across columns
Vectorization
DfVectorization
Vectorization is the process of applying operations to entire arrays at once, without explicit Python loops. This is possible because NumPy operations are implemented in optimized C code that operates on contiguous memory blocks.
a = np.array([1, 2, 3, 4, 5])
b = np.array([10, 20, 30, 40, 50])
print(f"Addition: {a + b}") # [11, 22, 33, 44, 55]
print(f"Subtraction: {b - a}") # [9, 18, 27, 36, 45]
print(f"Multiplication: {a * b}") # [10, 40, 90, 160, 250]
print(f"Division: {b / a}") # [10, 10, 10, 10, 10]
print(f"Power: {a ** 2}") # [1, 4, 9, 16, 25]
# Scalar broadcast
print(f"Add scalar: {a + 10}") # [11, 12, 13, 14, 15]
print(f"Multiply scalar: {a * 3}") # [3, 6, 9, 12, 15]
Vectorized vs Loop — Practical Benchmark
# Non-vectorized (slow)
def euclidean_distance_loop(x1, y1, x2, y2):
return np.sqrt((x2 - x1)**2 + (y2 - y1)**2)
# Vectorized (fast)
def euclidean_distance_vec(x1, y1, x2, y2):
return np.sqrt((x2 - x1)**2 + (y2 - y1)**2)
n = 1_000_000
x1 = np.random.rand(n)
y1 = np.random.rand(n)
x2 = np.random.rand(n)
y2 = np.random.rand(n)
# Both produce identical results, but vectorized is ~50x faster
Matrix Operations
Matrix Multiplication
Here,
- =First matrix (m × n)
- =Second matrix (n × p)
- =Result matrix (m × p)
- =Element in row i, column k of A
- =Element in row k, column j of B
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])
# Matrix multiplication (3 equivalent forms)
C1 = A @ B
C2 = np.dot(A, B)
C3 = np.matmul(A, B)
# Verify
assert np.allclose(C1, C2) and np.allclose(C2, C3)
# Element-wise (Hadamard product)
D = A * B # [[5, 12], [21, 32]]
# Transpose
print(A.T) # [[1, 3], [2, 4]]
# Determinant
det = np.linalg.det(A) # -2.0
# Inverse (only if det != 0)
inv = np.linalg.inv(A)
# Eigenvalues and eigenvectors
eigenvalues, eigenvectors = np.linalg.eig(A)
print(f"λ = {eigenvalues}")
print(f"v = {eigenvectors}")
# SVD
U, S, Vt = np.linalg.svd(A)
# A = U @ diag(S) @ Vt
Matrix Multiplication Rules
For to be defined: columns of must equal rows of . If is and is , result is . Matrix multiplication is not commutative: in general.
Eigenvalue Decomposition
Here,
- =Square matrix (n × n)
- =Eigenvalue (scalar)
- =Eigenvector (non-zero vector)
Indexing and Slicing
arr = np.array([[1, 2, 3, 4],
[5, 6, 7, 8],
[9, 10, 11, 12]])
# Single element
print(arr[1, 2]) # 7
# Slice: rows 0:2, cols 1:3
print(arr[0:2, 1:3])
# [[2, 3],
# [6, 7]]
# Column extraction
print(arr[:, 2]) # [3, 7, 11]
# Boolean indexing
data = np.array([10, 25, 30, 15, 40])
mask = data > 20
print(data[mask]) # [25, 30, 40]
# Fancy indexing
indices = [0, 2, 4]
print(data[indices]) # [10, 30, 40]
# Conditional assignment
np.where(data > 20, data, 0) # [0, 25, 30, 0, 40]
Boolean Indexing as Matrix Multiplication
Boolean indexing arr[mask] is equivalent to element-wise multiplication with a boolean mask broadcast to the array shape, followed by flattening. This is mathematically: .
Linear Algebra Applications
Solving Linear Systems
Linear System
Here,
- =Coefficient matrix (n × n)
- =Unknown vector (n × 1)
- =Right-hand side vector (n × 1)
# Solve: 2x + 3y = 8, x + 2y = 5
A = np.array([[2, 3], [1, 2]])
b = np.array([8, 5])
x = np.linalg.solve(A, b)
print(f"x = {x[0]:.1f}, y = {x[1]:.1f}") # x = 1.0, y = 2.0
# Verify: A @ x should equal b
print(f"Verification: {A @ x}") # [8. 5.]
Least Squares (Overdetermined Systems)
Ordinary Least Squares
Here,
- =Design matrix (n × p) with n observations, p features
- =Response vector (n × 1)
- =OLS estimate of coefficients
# Generate synthetic data: y = 3x + 2 + noise
np.random.seed(42)
X = np.random.rand(100, 1)
y = 3 * X + 2 + np.random.randn(100, 1) * 0.3
# Add intercept column
X_design = np.hstack([np.ones((100, 1)), X])
# OLS solution
beta_hat = np.linalg.inv(X_design.T @ X_design) @ X_design.T @ y
print(f"Intercept: {beta_hat[0, 0]:.2f}, Slope: {beta_hat[1, 0]:.2f}")
# Intercept ≈ 2.00, Slope ≈ 3.00
Random Sampling
# Reproducibility
np.random.seed(42)
# Uniform distribution [0, 1)
uniform = np.random.rand(3, 3)
# Standard normal (μ=0, σ=1)
normal = np.random.randn(3, 3)
# Custom normal
normal_custom = np.random.normal(loc=5.0, scale=2.0, size=(3, 3))
# Integers
dice = np.random.randint(1, 7, size=(4, 4))
# Choice sampling
population = np.array([10, 20, 30, 40, 50])
sample = np.random.choice(population, size=3, replace=False)
# Shuffle
np.random.shuffle(dice) # in-place
# Permutation
perm = np.random.permutation(population)
# New-style Generator (recommended)
rng = np.random.default_rng(seed=42)
samples = rng.normal(0, 1, size=1000)
Random Generator Evolution
np.random.seed() and np.random.randn() use the legacy RandomState. The modern np.random.default_rng() uses PCG64, which is faster, has better statistical properties, and supports newer distributions. Prefer default_rng() for new code.
Useful NumPy Functions
arr = np.array([3, 1, 4, 1, 5, 9, 2, 6, 5, 3])
# Aggregation
print(f"Sum: {np.sum(arr)}") # 40
print(f"Mean: {np.mean(arr)}") # 4.0
print(f"Median: {np.median(arr)}") # 3.5
print(f"Std Dev: {np.std(arr):.2f}") # 2.16
print(f"Min: {np.min(arr)}") # 1
print(f"Max: {np.max(arr)}") # 9
print(f"Argmin: {np.argmin(arr)}") # 1
print(f"Argmax: {np.argmax(arr)}") # 5
# Cumulative
print(f"Cumsum: {np.cumsum(arr)}")
# Sorting
print(f"Sorted: {np.sort(arr)}")
# Unique values
arr2 = np.array([1, 1, 2, 2, 3, 3, 3])
values, counts = np.unique(arr2, return_counts=True)
print(f"Values: {values}, Counts: {counts}")
# Where (conditional)
scores = np.array([85, 92, 78, 95, 88])
grades = np.where(scores >= 90, 'A',
np.where(scores >= 80, 'B', 'C'))
print(f"Grades: {grades}")
# Axis-wise operations
matrix = np.array([[1, 2, 3], [4, 5, 6]])
print(f"Row sums: {matrix.sum(axis=1)}") # [6, 15]
print(f"Col means: {matrix.mean(axis=0)}") # [2.5, 3.5, 4.5]
Practical Example: Data Normalization
Min-Max Normalization
Here,
- =Original data value
- =Minimum value in the dataset
- =Maximum value in the dataset
- =Normalized value in [0, 1]
Z-Score Standardization
Here,
- =Original data value
- =Population mean
- =Population standard deviation
- =Standardized score (number of std devs from mean)
from sklearn.preprocessing import MinMaxScaler, StandardScaler
X = np.array([[25, 50000], [30, 80000], [35, 120000], [40, 95000]])
# Min-Max
X_minmax = MinMaxScaler().fit_transform(X)
# Z-Score
X_standard = StandardScaler().fit_transform(X)
print(f"Mean: {X_standard.mean(axis=0)}") # [0, 0]
print(f"Std: {X_standard.std(axis=0)}") # [1, 1]
| Scenario | Use | Why |
|---|---|---|
| Neural network with ReLU | Min-Max | Keeps values positive, bounded range |
| Linear/Logistic Regression | Z-Score | Assumptions align with standardized data |
| KNN / K-Means | Z-Score | Distance metrics need comparable scales |
| PCA | Z-Score | Variance scaling must be equal |
| Tree-based models | Neither | Trees are scale-invariant |
Key Takeaways
Summary: NumPy Essentials
- NumPy arrays are significantly faster than Python lists due to contiguous memory and vectorized operations
- Vectorization eliminates explicit loops — operations are applied element-wise in optimized C
- Broadcasting enables operations on arrays of different shapes without data copying
- Matrix multiplication uses
@operator; remember dimension rules ( times ) - Master indexing and slicing for efficient data manipulation (label-based, position-based, boolean)
- Normalization is critical for distance-based algorithms — choose Min-Max or Z-Score based on your model's assumptions
- NumPy is the foundation for Pandas, Scikit-learn, TensorFlow, and PyTorch
Practice Exercise
- Create a 5x5 random matrix and compute its eigenvalues and eigenvectors. Verify that and .
- Implement row-wise and column-wise L2 normalization using broadcasting: .
- Given two sets of 3D points, compute the pairwise Euclidean distance matrix using broadcasting: .
- Implement a vectorized softmax function: . Verify numerical stability by subtracting the max before exponentiation.
- Use SVD to perform rank- approximation of a random matrix. Compute the relative reconstruction error for .