CW

Python NumPy Essentials

Module 2: NumPy & PandasFree Lesson

Advertisement

Python NumPy Essentials

Master NumPy — array operations, broadcasting, vectorization, and linear algebra.

NumPy EcosystemndarrayContiguous MemoryHomogeneous TypesFixed SizeBroadcastingShape AlignmentNo Data CopyImplicit ExpansionVectorizationElement-wise OpsC-level Loops10-100x FasterLinear AlgebraDot ProductEigen DecompositionSVD / QRRandom SamplingDistributions / SeedsArray Creationzeros / ones / arangeIndexing and SlicingFancy / BooleanMemory LayoutRow-major (C) / Col-major

Why NumPy?

DfNumPy

NumPy (Numerical Python) is the foundational library for scientific computing in Python. It provides a high-performance multi-dimensional array object (ndarray) and tools for working with these arrays. NumPy arrays are stored as contiguous blocks of memory, enabling vectorized operations that are 10-100x faster than Python lists.

NumPy vs Python Lists — Performance Comparison

import numpy as np
import time

size = 1_000_000

python_list = list(range(size))
start = time.time()
python_result = [x * 2 for x in python_list]
list_time = time.time() - start

numpy_array = np.arange(size)
start = time.time()
numpy_result = numpy_array * 2
numpy_time = time.time() - start

print(f"Python list time: {list_time:.4f}s")
print(f"NumPy array time: {numpy_time:.4f}s")
print(f"NumPy is {list_time/numpy_time:.1f}x faster")

Output:

Architecture Diagram
Python list time: 0.1234s
NumPy array time: 0.0021s
NumPy is 58.8x faster

Why NumPy Is Fast

NumPy's speed comes from three key design choices:

  1. Contiguous memory layout: Arrays are stored in continuous memory blocks, enabling CPU cache efficiency
  2. Vectorization: Operations are applied element-wise without Python loops (loops run in C)
  3. Broadcasting: Arithmetic operations automatically handle arrays of different shapes

Memory Contiguity and Cache Efficiency

textTimetextlist=O(n)times(textPythonoverhead+textpointerchase)textTimetextnumpy=O(n)timestextSIMDtextthroughput\\text{Time}_{\\text{list}} = O(n) \\times (\\text{Python overhead} + \\text{pointer chase}) \\\\ \\text{Time}_{\\text{numpy}} = O(n) \\times \\text{SIMD}_{\\text{throughput}}

Here,

  • nn=Number of elements in the array
  • PythonoverheadPython overhead=Object boxing, type checking, reference counting (~100 ns/op)
  • pointerchasepointer chase=Non-contiguous memory access causing cache misses
  • SIMDthroughputSIMD_{throughput}=Single Instruction Multiple Data — CPU processes 4-16 elements per cycle

ndarray — The Core Data Structure

Dfndarray

An ndarray is a homogeneous, fixed-size, multi-dimensional array. Each element occupies the same number of bytes, enabling direct pointer arithmetic for indexing. The array metadata (shape, strides, dtype) is stored separately from the data buffer.

import numpy as np

# From Python lists
arr1d = np.array([1, 2, 3, 4, 5])
print(f"Shape: {arr1d.shape}")        # (5,)
print(f"ndim: {arr1d.ndim}")          # 1
print(f"dtype: {arr1d.dtype}")        # int64
print(f"itemsize: {arr1d.itemsize}")  # 8 bytes
print(f"nbytes: {arr1d.nbytes}")      # 40 bytes

# 2D array (matrix)
arr2d = np.array([[1, 2, 3], [4, 5, 6]])
print(f"Shape: {arr2d.shape}")        # (2, 3)
print(f"Strides: {arr2d.strides}")    # (24, 8) — bytes to next row/col

Understanding Strides

Strides tell NumPy how many bytes to skip in memory to move to the next element along each axis. For a (m, n) array of int64: strides = (n*8, 8). Transposing only swaps strides — no data is moved.

Array Creation Functions

# Filled arrays
zeros_2x3 = np.zeros((2, 3))            # float64 by default
ones_int = np.ones((4, 2), dtype=int)   # specify dtype
full_val = np.full((3, 3), 7.5)         # fill with constant
empty_arr = np.empty((2, 2))            # uninitialized (fast)

# Sequences
linear = np.linspace(0, 1, 100)          # 100 points from 0 to 1 (inclusive)
aranged = np.arange(0, 10, 0.5)          # step-based

# Identity and diagonal
eye = np.eye(4, dtype=int)
diag = np.diag([1, 2, 3, 4])

# From existing data
copied = np.array(arr1d, copy=True)
asarray = np.asarray(arr1d, dtype=float)

Broadcasting

DfBroadcasting

Broadcasting is NumPy's mechanism for performing arithmetic operations on arrays with different shapes. It automatically "stretches" the smaller array across the larger array so that they have compatible shapes, without copying data.

Broadcasting Rules — Step by StepStep 1: Align Dimensions (pad with 1s on left)A: (2, 3)B: (1, 3)Step 2: Check Compatibility (equal or 1)dim 0: max(2, 1) = 2 → broadcastdim 1: max(3, 3) = 3 → matchStep 3: Result Shape:(2, 3)
# Broadcasting examples
arr2d = np.array([[1, 2, 3], [4, 5, 6]])  # shape (2, 3)
arr1d = np.array([10, 20, 30])             # shape (3,)

result = arr2d + arr1d
print(result)
# [[11, 22, 33],
#  [14, 25, 36]]

Broadcasting Compatibility Rule

textTwodimensionsarecompatiblewhen:di=djquadtextorquaddi=1quadtextorquaddj=1\\text{Two dimensions are compatible when:} \\\\ d_i = d_j \\quad \\text{or} \\quad d_i = 1 \\quad \\text{or} \\quad d_j = 1

Here,

  • did_i=Size of dimension i in the first array
  • djd_j=Size of dimension j in the second array
  • 11=Singleton dimension — broadcasts to match the other

When Broadcasting Fails

a = np.ones((3, 4))
b = np.ones((3,))
# ValueError: operands could not be broadcast together with shapes (3,4) (3,)
# Reason: (3,4) vs (1,3) → dim 1: 4 != 3 and neither is 1

# Fix: reshape b to (1,3) or (3,1) depending on intent
b_row = b.reshape(1, 3)  # (1, 3) → broadcasts across rows
b_col = b.reshape(3, 1)  # (3, 1) → broadcasts across columns

Vectorization

DfVectorization

Vectorization is the process of applying operations to entire arrays at once, without explicit Python loops. This is possible because NumPy operations are implemented in optimized C code that operates on contiguous memory blocks.

a = np.array([1, 2, 3, 4, 5])
b = np.array([10, 20, 30, 40, 50])

print(f"Addition: {a + b}")        # [11, 22, 33, 44, 55]
print(f"Subtraction: {b - a}")     # [9, 18, 27, 36, 45]
print(f"Multiplication: {a * b}")  # [10, 40, 90, 160, 250]
print(f"Division: {b / a}")        # [10, 10, 10, 10, 10]
print(f"Power: {a ** 2}")          # [1, 4, 9, 16, 25]

# Scalar broadcast
print(f"Add scalar: {a + 10}")     # [11, 12, 13, 14, 15]
print(f"Multiply scalar: {a * 3}") # [3, 6, 9, 12, 15]

Vectorized vs Loop — Practical Benchmark

# Non-vectorized (slow)
def euclidean_distance_loop(x1, y1, x2, y2):
    return np.sqrt((x2 - x1)**2 + (y2 - y1)**2)

# Vectorized (fast)
def euclidean_distance_vec(x1, y1, x2, y2):
    return np.sqrt((x2 - x1)**2 + (y2 - y1)**2)

n = 1_000_000
x1 = np.random.rand(n)
y1 = np.random.rand(n)
x2 = np.random.rand(n)
y2 = np.random.rand(n)

# Both produce identical results, but vectorized is ~50x faster

Matrix Operations

Matrix Multiplication

Cij=sumk=1nAikcdotBkjC_{ij} = \\sum_{k=1}^{n} A_{ik} \\cdot B_{kj}

Here,

  • AA=First matrix (m × n)
  • BB=Second matrix (n × p)
  • CC=Result matrix (m × p)
  • AikA_{ik}=Element in row i, column k of A
  • BkjB_{kj}=Element in row k, column j of B
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])

# Matrix multiplication (3 equivalent forms)
C1 = A @ B
C2 = np.dot(A, B)
C3 = np.matmul(A, B)

# Verify
assert np.allclose(C1, C2) and np.allclose(C2, C3)

# Element-wise (Hadamard product)
D = A * B  # [[5, 12], [21, 32]]

# Transpose
print(A.T)  # [[1, 3], [2, 4]]

# Determinant
det = np.linalg.det(A)  # -2.0

# Inverse (only if det != 0)
inv = np.linalg.inv(A)

# Eigenvalues and eigenvectors
eigenvalues, eigenvectors = np.linalg.eig(A)
print(f"λ = {eigenvalues}")
print(f"v = {eigenvectors}")

# SVD
U, S, Vt = np.linalg.svd(A)
# A = U @ diag(S) @ Vt

Matrix Multiplication Rules

For A×BA \times B to be defined: columns of AA must equal rows of BB. If AA is (m×n)(m \times n) and BB is (n×p)(n \times p), result is (m×p)(m \times p). Matrix multiplication is not commutative: ABBAAB \neq BA in general.

Eigenvalue Decomposition

Av=lambdavAv = \\lambda v

Here,

  • AA=Square matrix (n × n)
  • λ\lambda=Eigenvalue (scalar)
  • vv=Eigenvector (non-zero vector)

Indexing and Slicing

arr = np.array([[1, 2, 3, 4],
                [5, 6, 7, 8],
                [9, 10, 11, 12]])

# Single element
print(arr[1, 2])       # 7

# Slice: rows 0:2, cols 1:3
print(arr[0:2, 1:3])
# [[2, 3],
#  [6, 7]]

# Column extraction
print(arr[:, 2])       # [3, 7, 11]

# Boolean indexing
data = np.array([10, 25, 30, 15, 40])
mask = data > 20
print(data[mask])       # [25, 30, 40]

# Fancy indexing
indices = [0, 2, 4]
print(data[indices])    # [10, 30, 40]

# Conditional assignment
np.where(data > 20, data, 0)  # [0, 25, 30, 0, 40]

Boolean Indexing as Matrix Multiplication

Boolean indexing arr[mask] is equivalent to element-wise multiplication with a boolean mask broadcast to the array shape, followed by flattening. This is mathematically: textresult=textflatten(AodotM)\\text{result} = \\text{flatten}(A \\odot M).

Linear Algebra Applications

Solving Linear Systems

Linear System

Ax=bimpliesx=A1bAx = b \\implies x = A^{-1}b

Here,

  • AA=Coefficient matrix (n × n)
  • xx=Unknown vector (n × 1)
  • bb=Right-hand side vector (n × 1)
# Solve: 2x + 3y = 8, x + 2y = 5
A = np.array([[2, 3], [1, 2]])
b = np.array([8, 5])

x = np.linalg.solve(A, b)
print(f"x = {x[0]:.1f}, y = {x[1]:.1f}")  # x = 1.0, y = 2.0

# Verify: A @ x should equal b
print(f"Verification: {A @ x}")  # [8. 5.]

Least Squares (Overdetermined Systems)

Ordinary Least Squares

hatbeta=(XTX)1XTy\\hat{\\beta} = (X^TX)^{-1}X^Ty

Here,

  • XX=Design matrix (n × p) with n observations, p features
  • yy=Response vector (n × 1)
  • β^\hat{\beta}=OLS estimate of coefficients
# Generate synthetic data: y = 3x + 2 + noise
np.random.seed(42)
X = np.random.rand(100, 1)
y = 3 * X + 2 + np.random.randn(100, 1) * 0.3

# Add intercept column
X_design = np.hstack([np.ones((100, 1)), X])

# OLS solution
beta_hat = np.linalg.inv(X_design.T @ X_design) @ X_design.T @ y
print(f"Intercept: {beta_hat[0, 0]:.2f}, Slope: {beta_hat[1, 0]:.2f}")
# Intercept ≈ 2.00, Slope ≈ 3.00

Random Sampling

# Reproducibility
np.random.seed(42)

# Uniform distribution [0, 1)
uniform = np.random.rand(3, 3)

# Standard normal (μ=0, σ=1)
normal = np.random.randn(3, 3)

# Custom normal
normal_custom = np.random.normal(loc=5.0, scale=2.0, size=(3, 3))

# Integers
dice = np.random.randint(1, 7, size=(4, 4))

# Choice sampling
population = np.array([10, 20, 30, 40, 50])
sample = np.random.choice(population, size=3, replace=False)

# Shuffle
np.random.shuffle(dice)  # in-place

# Permutation
perm = np.random.permutation(population)

# New-style Generator (recommended)
rng = np.random.default_rng(seed=42)
samples = rng.normal(0, 1, size=1000)

Random Generator Evolution

np.random.seed() and np.random.randn() use the legacy RandomState. The modern np.random.default_rng() uses PCG64, which is faster, has better statistical properties, and supports newer distributions. Prefer default_rng() for new code.

Useful NumPy Functions

arr = np.array([3, 1, 4, 1, 5, 9, 2, 6, 5, 3])

# Aggregation
print(f"Sum: {np.sum(arr)}")           # 40
print(f"Mean: {np.mean(arr)}")         # 4.0
print(f"Median: {np.median(arr)}")     # 3.5
print(f"Std Dev: {np.std(arr):.2f}")   # 2.16
print(f"Min: {np.min(arr)}")           # 1
print(f"Max: {np.max(arr)}")           # 9
print(f"Argmin: {np.argmin(arr)}")     # 1
print(f"Argmax: {np.argmax(arr)}")     # 5

# Cumulative
print(f"Cumsum: {np.cumsum(arr)}")

# Sorting
print(f"Sorted: {np.sort(arr)}")

# Unique values
arr2 = np.array([1, 1, 2, 2, 3, 3, 3])
values, counts = np.unique(arr2, return_counts=True)
print(f"Values: {values}, Counts: {counts}")

# Where (conditional)
scores = np.array([85, 92, 78, 95, 88])
grades = np.where(scores >= 90, 'A',
         np.where(scores >= 80, 'B', 'C'))
print(f"Grades: {grades}")

# Axis-wise operations
matrix = np.array([[1, 2, 3], [4, 5, 6]])
print(f"Row sums: {matrix.sum(axis=1)}")    # [6, 15]
print(f"Col means: {matrix.mean(axis=0)}")  # [2.5, 3.5, 4.5]

Practical Example: Data Normalization

Min-Max Normalization

xtextnormalized=fracxmin(x)max(x)min(x)x_{\\text{normalized}} = \\frac{x - \\min(x)}{\\max(x) - \\min(x)}

Here,

  • xx=Original data value
  • min(x)min(x)=Minimum value in the dataset
  • max(x)max(x)=Maximum value in the dataset
  • xnormalizedx_normalized=Normalized value in [0, 1]

Z-Score Standardization

z=fracxmusigmaz = \\frac{x - \\mu}{\\sigma}

Here,

  • xx=Original data value
  • μ\mu=Population mean
  • σ\sigma=Population standard deviation
  • zz=Standardized score (number of std devs from mean)
from sklearn.preprocessing import MinMaxScaler, StandardScaler

X = np.array([[25, 50000], [30, 80000], [35, 120000], [40, 95000]])

# Min-Max
X_minmax = MinMaxScaler().fit_transform(X)

# Z-Score
X_standard = StandardScaler().fit_transform(X)
print(f"Mean: {X_standard.mean(axis=0)}")  # [0, 0]
print(f"Std:  {X_standard.std(axis=0)}")   # [1, 1]
ScenarioUseWhy
Neural network with ReLUMin-MaxKeeps values positive, bounded range
Linear/Logistic RegressionZ-ScoreAssumptions align with standardized data
KNN / K-MeansZ-ScoreDistance metrics need comparable scales
PCAZ-ScoreVariance scaling must be equal
Tree-based modelsNeitherTrees are scale-invariant

Key Takeaways

Summary: NumPy Essentials

  1. NumPy arrays are significantly faster than Python lists due to contiguous memory and vectorized operations
  2. Vectorization eliminates explicit loops — operations are applied element-wise in optimized C
  3. Broadcasting enables operations on arrays of different shapes without data copying
  4. Matrix multiplication uses @ operator; remember dimension rules (m×nm \times n times n×pn \times p)
  5. Master indexing and slicing for efficient data manipulation (label-based, position-based, boolean)
  6. Normalization is critical for distance-based algorithms — choose Min-Max or Z-Score based on your model's assumptions
  7. NumPy is the foundation for Pandas, Scikit-learn, TensorFlow, and PyTorch

Practice Exercise

  1. Create a 5x5 random matrix and compute its eigenvalues and eigenvectors. Verify that λi=tr(A)\sum \lambda_i = \text{tr}(A) and λi=det(A)\prod \lambda_i = \det(A).
  2. Implement row-wise and column-wise L2 normalization using broadcasting: xtextnorm=xx2x_{\\text{norm}} = \frac{x}{\|x\|_2}.
  3. Given two sets of 3D points, compute the pairwise Euclidean distance matrix using broadcasting: dij=k(xikxjk)2d_{ij} = \sqrt{\sum_k (x_{ik} - x_{jk})^2}.
  4. Implement a vectorized softmax function: σ(zi)=ezijezj\sigma(z_i) = \frac{e^{z_i}}{\sum_j e^{z_j}}. Verify numerical stability by subtracting the max before exponentiation.
  5. Use SVD to perform rank-kk approximation of a random 100×100100 \times 100 matrix. Compute the relative reconstruction error for k=5,10,20,50k = 5, 10, 20, 50.

Advertisement

Need Expert Data Science Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement