Vectors and Vector Spaces

ℹ️ Why It Matters

Vectors are the fundamental building blocks of all data in machine learning. Every data point — an image, a word, a patient record — is represented as a vector. Understanding vectors deeply is non-negotiable for ML engineers.

What is a Vector?

A vector is an ordered list of numbers that represents both magnitude and direction. Think of it as an arrow pointing to a location in space.

DfVector

A vector is an ordered tuple of numbers that represents a point or direction in space. A vector in n-dimensional space is written as:

Vector Notation

\vec{v} = \begin{bmatrix} v_1 \\ v_2 \\ \vdots \\ v_n \end{bmatrix} \in \mathbb{R}^n

Here,

$\vec{v}$ =The vector (an ordered list of numbers)
$v_1, v_2, \ldots, v_n$ =Individual components of the vector
$\mathbb{R}^n$ =n-dimensional real coordinate space

Real-World Analogy: A vector is like a GPS coordinate — it tells you exactly where something is. [3, 4] means 3 steps east, 4 steps north.

Types of Vectors

Type	Description	Example	Dimension
2D Vector	Two components (x, y)	`[3, 4]`	2
3D Vector	Three components (x, y, z)	`[1, 2, 3]`	3
nD Vector	n components	`[v₁, v₂, ..., vₙ]`	n
Row Vector	Written horizontally	`[1, 2, 3]`	1×n
Column Vector	Written vertically	`[1; 2; 3]`	n×1

⚠️ Convention

In ML, we typically work with column vectors (n×1 matrices). When someone says "vector" without qualification, they usually mean column vector.

Vector Operations

Vector Addition

Two vectors of the same size can be added by adding their corresponding components.

Vector Addition

\vec{u} + \vec{v} = \begin{bmatrix} u_1 + v_1 \\ u_2 + v_2 \\ \vdots \\ u_n + v_n \end{bmatrix}

Here,

$\vec{u}, \vec{v}$ =Two vectors of the same dimension
$\vec{u} + \vec{v}$ =Resultant vector with summed components

📝Example: Vector Addition

If $\vec{u} = \begin{bmatrix} 1 \\ 2 \end{bmatrix}$ and $\vec{v} = \begin{bmatrix} 3 \\ 4 \end{bmatrix}$ , then:

\vec{u} + \vec{v} = \begin{bmatrix} 1 + 3 \\ 2 + 4 \end{bmatrix} = \begin{bmatrix} 4 \\ 6 \end{bmatrix}

Geometric Interpretation: Vector addition follows the "tip-to-tail" rule — place the tail of one vector at the tip of the other. The resultant goes from the starting point to the final tip.

Properties of Vector Addition:

Commutative: $\vec{u} + \vec{v} = \vec{v} + \vec{u}$
Associative: $(\vec{u} + \vec{v}) + \vec{w} = \vec{u} + (\vec{v} + \vec{w})$
Identity: $\vec{v} + \vec{0} = \vec{v}$
Inverse: $\vec{v} + (-\vec{v}) = \vec{0}$

Scalar Multiplication

A vector can be multiplied by a scalar (a single number), which scales each component.

Scalar Multiplication

c\vec{v} = \begin{bmatrix} c \cdot v_1 \\ c \cdot v_2 \\ \vdots \\ c \cdot v_n \end{bmatrix}

Here,

$c$ =A scalar (single real number)
$\vec{v}$ =The vector to be scaled
$c\vec{v}$ =The scaled vector

📝Example: Scalar Multiplication

If $\vec{v} = \begin{bmatrix} 2 \\ 3 \end{bmatrix}$ and $c = 4$ , then:

c\vec{v} = 4 \begin{bmatrix} 2 \\ 3 \end{bmatrix} = \begin{bmatrix} 8 \\ 12 \end{bmatrix}

Properties of Scalar Multiplication:

Distributive: $c(\vec{u} + \vec{v}) = c\vec{u} + c\vec{v}$
Associative: $c(d\vec{v}) = (cd)\vec{v}$
Identity: $1 \cdot \vec{v} = \vec{v}$

Dot Product (Inner Product)

The dot product measures how much two vectors point in the same direction. It returns a scalar, not a vector.

Dot Product (Algebraic)

\vec{u} \cdot \vec{v} = \sum_{i=1}^{n} u_i v_i = u_1 v_1 + u_2 v_2 + \cdots + u_n v_n

Here,

$\vec{u} \cdot \vec{v}$ =The dot product (scalar result)
$u_i, v_i$ =The i-th components of vectors u and v

Geometric Interpretation:

Dot Product (Geometric)

\vec{u} \cdot \vec{v} = \|\vec{u}\| \|\vec{v}\| \cos\theta

Here,

$\|\vec{u}\|, \|\vec{v}\|$ =Magnitudes of the vectors
$\theta$ =Angle between the vectors

📝Example: Dot Product

If $\vec{u} = \begin{bmatrix} 1 \\ 2 \\ 3 \end{bmatrix}$ and $\vec{v} = \begin{bmatrix} 4 \\ 5 \\ 6 \end{bmatrix}$ :

\vec{u} \cdot \vec{v} = (1)(4) + (2)(5) + (3)(6) = 4 + 10 + 18 = 32

💡 Key Insight

If the dot product is zero, the vectors are perpendicular (orthogonal). This is the foundation of many machine learning algorithms — PCA, attention mechanisms, and similarity measures all rely on dot products.

Properties of Dot Product:

Commutative: $\vec{u} \cdot \vec{v} = \vec{v} \cdot \vec{u}$
Distributive: $\vec{u} \cdot (\vec{v} + \vec{w}) = \vec{u} \cdot \vec{v} + \vec{u} \cdot \vec{w}$
Scalar: $(c\vec{u}) \cdot \vec{v} = c(\vec{u} \cdot \vec{v})$

Magnitude (Length) of a Vector

The magnitude of a vector is its length — the distance from the origin to the point it represents.

Vector Magnitude (Euclidean Norm)

\|\vec{v}\| = \sqrt{v_1^2 + v_2^2 + \cdots + v_n^2} = \sqrt{\sum_{i=1}^{n} v_i^2}

Here,

$\|\vec{v}\|$ =The magnitude (length) of vector v
$v_i$ =The i-th component of vector v

📝Example: Magnitude

For $\vec{v} = \begin{bmatrix} 3 \\ 4 \end{bmatrix}$ :

\|\vec{v}\| = \sqrt{3^2 + 4^2} = \sqrt{9 + 16} = \sqrt{25} = 5

Other Norms (Generalized Magnitude):

Norm	Formula	Name	Use Case
L1	$\\|\vec{v}\\|_1 = \sum \|v_i\|$	Manhattan	Feature selection (Lasso)
L2	$\\|\vec{v}\\|_2 = \sqrt{\sum v_i^2}$	Euclidean	Most common, Ridge
L∞	$\\|\vec{v}\\|_\infty = \max \|v_i\|$	Maximum	Worst-case analysis
Lp	$\\|\vec{v}\\|_p = (\sum \|v_i\|^p)^{1/p}$	General	Flexibility

Unit Vector

A unit vector is a vector with magnitude 1. It preserves direction while removing scale.

Unit Vector

\hat{v} = \frac{\vec{v}}{\|\vec{v}\|}

Here,

$\hat{v}$ =The unit vector in the direction of v
$\vec{v}$ =The original vector
$\|\vec{v}\|$ =The magnitude of v

📝Example: Unit Vector

For $\vec{v} = \begin{bmatrix} 3 \\ 4 \end{bmatrix}$ with $\|\vec{v}\| = 5$ :

\hat{v} = \frac{1}{5}\begin{bmatrix} 3 \\ 4 \end{bmatrix} = \begin{bmatrix} 0.6 \\ 0.8 \end{bmatrix}

Verification: $\|\hat{v}\| = \sqrt{0.6^2 + 0.8^2} = \sqrt{0.36 + 0.64} = 1$ ✓

ℹ️ Why Unit Vectors Matter

Unit vectors are used to represent direction only. In attention mechanisms, query/key vectors are often normalized to unit length so the dot product measures pure cosine similarity.

Angle Between Vectors

Angle Formula

\cos\theta = \frac{\vec{u} \cdot \vec{v}}{\|\vec{u}\| \|\vec{v}\|}

Here,

$\theta$ =Angle between vectors u and v
$\vec{u} \cdot \vec{v}$ =Dot product
$\|\vec{u}\|, \|\vec{v}\|$ =Magnitudes

📝Example: Angle

For $\vec{u} = [1, 0]$ and $\vec{v} = [0, 1]$ :

\cos\theta = \frac{(1)(0) + (0)(1)}{1 \times 1} = 0 \implies \theta = 90°

The vectors are perpendicular (orthogonal).

Linear Independence

Vectors are linearly independent if none of them can be written as a combination of the others.

DfLinear Independence

A set of vectors $\{\vec{v}_1, \vec{v}_2, \ldots, \vec{v}_k\}$ is linearly independent if the only solution to:

c_1\vec{v}_1 + c_2\vec{v}_2 + \cdots + c_k\vec{v}_k = \vec{0}

is $c_1 = c_2 = \cdots = c_k = 0$ .

📝Example: Linear Independence

Check if $\vec{u} = [1, 2]$ and $\vec{v} = [3, 4]$ are linearly independent.

Set $c_1[1, 2] + c_2[3, 4] = [0, 0]$ :

$c_1 + 3c_2 = 0$
$2c_1 + 4c_2 = 0$

From equation 1: $c_1 = -3c_2$ . Substitute into equation 2: $2(-3c_2) + 4c_2 = -2c_2 = 0 \implies c_2 = 0 \implies c_1 = 0$ .

Linearly independent ✓

⚠️ Common Mistake

If one vector is a scalar multiple of another, they are linearly dependent. For example, $[1, 2]$ and $[2, 4]$ are dependent because $[2, 4] = 2[1, 2]$ .

Span

DfSpan

The span of a set of vectors is the set of all possible linear combinations of those vectors.

Span

\text{Span}(\{\vec{v}_1, \ldots, \vec{v}_k\}) = \{c_1\vec{v}_1 + \cdots + c_k\vec{v}_k : c_i \in \mathbb{R}\}

Here,

$\vec{v}_1, \ldots, \vec{v}_k$ =The generating vectors
$c_i$ =Scalar coefficients

📝Example: Span

$\text{Span}([1, 0], [0, 1]) = \text{entire 2D plane} \, (\mathbb{R}^2)$
$\text{Span}([1, 1]) = \text{just the line } y = x$
$\text{Span}([1, 0], [2, 0]) = \text{the x-axis only}$ (second vector is redundant)

Vector Space

DfVector Space

A vector space is a collection of vectors where you can add them and multiply by scalars and still stay in the space.

Axioms of a Vector Space:

Closure under addition: $\vec{u} + \vec{v}$ is in the space
Closure under scalar multiplication: $c\vec{v}$ is in the space
Associativity of addition
Commutativity of addition
Existence of zero vector
Existence of additive inverse
Distributivity of scalar multiplication
Compatibility of scalar multiplication

📝Examples of Vector Spaces

$\mathbb{R}^n$ (all n-dimensional real vectors)
All polynomials of degree ≤ n
All continuous functions on [a, b]
The set of all m×n matrices

Subspace

DfSubspace

A subspace is a subset of a vector space that is itself a vector space (closed under addition and scalar multiplication).

📝Example: Subspace

The x-axis in $\mathbb{R}^2$ is a subspace
The set of all solutions to $Ax = 0$ is a subspace (null space)
The set of all symmetric matrices is a subspace

Basis and Dimension

DfBasis

A basis is a set of linearly independent vectors that can represent every vector in the space.

DfDimension

The dimension is the number of vectors in the basis.

📝Example: Basis

Standard basis for $\mathbb{R}^3$ : $\{\vec{e}_1, \vec{e}_2, \vec{e}_3\} = \{[1,0,0], [0,1,0], [0,0,1]\}$

Any vector $[a, b, c] = a\vec{e}_1 + b\vec{e}_2 + c\vec{e}_3$

Dimension of $\mathbb{R}^3 = 3$

Python Implementation

import numpy as np

# Create vectors
u = np.array([1, 2, 3])
v = np.array([4, 5, 6])

# Vector addition
addition = u + v  # [5, 7, 9]

# Scalar multiplication
scaled = 3 * u  # [3, 6, 9]

# Dot product
dot_product = np.dot(u, v)  # 32

# Magnitude (L2 norm)
magnitude = np.linalg.norm(u)  # sqrt(14) ≈ 3.74

# Other norms
l1_norm = np.linalg.norm(u, ord=1)  # |1| + |2| + |3| = 6
linf_norm = np.linalg.norm(u, ord=np.inf)  # max(|1|, |2|, |3|) = 3

# Unit vector
unit_vector = u / np.linalg.norm(u)

# Angle between vectors
cos_angle = np.dot(u, v) / (np.linalg.norm(u) * np.linalg.norm(v))
angle_degrees = np.degrees(np.arccos(np.clip(cos_angle, -1, 1)))

# Linear independence check
matrix = np.column_stack([u, v[:3]])
rank = np.linalg.matrix_rank(matrix)
print(f"Rank: {rank}, Independent: {rank == min(u.shape[0], 2)}")

print(f"Addition: {addition}")
print(f"Scaled: {scaled}")
print(f"Dot product: {dot_product}")
print(f"Magnitude: {magnitude:.4f}")
print(f"L1 norm: {l1_norm}")
print(f"L∞ norm: {linf_norm}")
print(f"Unit vector: {unit_vector}")
print(f"Angle: {angle_degrees:.2f}°")

Applications in AI/ML

Application	Vector Concept	How It's Used
Word Embeddings	High-dimensional vectors	Word2Vec, GloVe represent words as 300D vectors
Image Representations	Pixel vectors	Flattened image as a vector for classification
Attention Mechanism	Dot product similarity	Query·Key dot product measures relevance
Cosine Similarity	Angle between vectors	Semantic search, recommendation systems
Feature Vectors	Data point vectors	Each row in a dataset is a feature vector
Loss Landscapes	Gradient vectors	Gradient descent follows negative gradient

💡 Practical Insight

The dot product is the most important operation in ML. Attention in transformers is just scaled dot-product attention: $\text{Attention}(Q, K, V) = \text{softmax}\left(\frac{QK^T}{\sqrt{d_k}}\right)V$ . Understanding vectors deeply helps you understand transformers.

Common Mistakes

Mistake	Why It's Wrong	Correct Approach
Adding vectors of different dimensions	$[1,2] + [3,4,5]$ is undefined	Ensure same dimension
Forgetting dot product returns scalar	$\vec{u} \cdot \vec{v}$ is a number, not a vector	Use cross product for vector result
Confusing dot product with element-wise	Dot product sums, element-wise doesn't	$\vec{u} \cdot \vec{v} = \sum u_i v_i$
Normalizing zero vector	$\\|\vec{0}\\| = 0$ , division by zero	Check magnitude before normalizing
Assuming orthogonal = uncorrelated	Orthogonal vectors are uncorrelated only if mean-centered	Center data first

Interview Questions

Q1: What's the difference between dot product and cross product? A: Dot product returns a scalar measuring alignment. Cross product returns a vector perpendicular to both inputs (3D only).

Q2: Why do we normalize vectors in attention mechanisms? A: Normalizing to unit length makes the dot product equivalent to cosine similarity, which is bounded between -1 and 1, preventing softmax saturation.

Q3: How do you check if a set of vectors is linearly independent? A: Form a matrix with vectors as columns and compute the rank. If rank = number of vectors, they're independent.

Q4: What's the geometric interpretation of the dot product? A: $\vec{u} \cdot \vec{v} = \|\vec{u}\| \|\vec{v}\| \cos\theta$ . It measures how much one vector projects onto another.

Q5: Why are L1 and L2 norms used differently in regularization? A: L1 (Lasso) creates sparse solutions (feature selection). L2 (Ridge) shrinks all weights toward zero (prevents overfitting).

Practice Problems

Find the dot product of $\vec{a} = [2, -1, 3]$ and $\vec{b} = [1, 4, -2]$ .
Calculate the magnitude of $\vec{v} = [6, -8, 0]$ .
Are the vectors $\vec{u} = [1, 2]$ and $\vec{v} = [2, 4]$ linearly independent? Why or why not?
Find the unit vector of $\vec{w} = [3, -3]$ .
Find the angle between $\vec{a} = [1, 0]$ and $\vec{b} = [1, 1]$ .
Determine if $\{\vec{v}_1 = [1, 0, 0], \vec{v}_2 = [0, 1, 0], \vec{v}_3 = [1, 1, 0]\}$ is linearly independent.

💡Solutions

Problem 1: $\vec{a} \cdot \vec{b} = (2)(1) + (-1)(4) + (3)(-2) = 2 - 4 - 6 = -8$

Problem 2: $\|\vec{v}\| = \sqrt{6^2 + (-8)^2 + 0^2} = \sqrt{36 + 64} = \sqrt{100} = 10$

Problem 3: $\vec{v} = 2\vec{u}$ , so $\vec{v}$ is a scalar multiple of $\vec{u}$ . They are linearly dependent.

Problem 4: $\|\vec{w}\| = \sqrt{3^2 + (-3)^2} = \sqrt{18} = 3\sqrt{2}$ . So $\hat{w} = \frac{1}{3\sqrt{2}}\begin{bmatrix} 3 \\ -3 \end{bmatrix} = \begin{bmatrix} \frac{1}{\sqrt{2}} \\ -\frac{1}{\sqrt{2}} \end{bmatrix}$

Problem 5: $\cos\theta = \frac{(1)(1) + (0)(1)}{1 \cdot \sqrt{2}} = \frac{1}{\sqrt{2}} \implies \theta = 45°$

Problem 6: $\vec{v}_3 = \vec{v}_1 + \vec{v}_2$ , so the set is linearly dependent.

Quick Reference

Concept	Formula	Key Property
Addition	$\vec{u} + \vec{v} = [u_i + v_i]$	Commutative, associative
Scalar	$c\vec{v} = [cv_i]$	Distributive
Dot Product	$\vec{u} \cdot \vec{v} = \sum u_i v_i$	Returns scalar
Magnitude	$\\|\vec{v}\\| = \sqrt{\sum v_i^2}$	Always ≥ 0
Unit Vector	$\hat{v} = \vec{v}/\\|\vec{v}\\|$	Magnitude = 1
Angle	$\cos\theta = \frac{\vec{u} \cdot \vec{v}}{\\|\vec{u}\\| \\|\vec{v}\\|}$	Perpendicular if 0
Linear Independence	$c_1\vec{v}_1 + \cdots = \vec{0} \implies c_i = 0$	No redundancy
Span	All linear combinations	Generates the space
Basis	Linearly independent + spanning	Dimension = basis size

Vectors and Vector Spaces