Vectors and Vector Spaces
โน๏ธ Why It Matters
Vectors are the fundamental building blocks of all data in machine learning. Every data point โ an image, a word, a patient record โ is represented as a vector. Understanding vectors deeply is non-negotiable for ML engineers.
What is a Vector?
A vector is an ordered list of numbers that represents both magnitude and direction. Think of it as an arrow pointing to a location in space.
DfVector
A vector is an ordered tuple of numbers that represents a point or direction in space. A vector in n-dimensional space is written as:
Vector Notation
Here,
- =The vector (an ordered list of numbers)
- =Individual components of the vector
- =n-dimensional real coordinate space
Real-World Analogy: A vector is like a GPS coordinate โ it tells you exactly where something is. [3, 4] means 3 steps east, 4 steps north.
Types of Vectors
| Type | Description | Example | Dimension |
|---|---|---|---|
| 2D Vector | Two components (x, y) | [3, 4] | 2 |
| 3D Vector | Three components (x, y, z) | [1, 2, 3] | 3 |
| nD Vector | n components | [vโ, vโ, ..., vโ] | n |
| Row Vector | Written horizontally | [1, 2, 3] | 1รn |
| Column Vector | Written vertically | [1; 2; 3] | nร1 |
โ ๏ธ Convention
In ML, we typically work with column vectors (nร1 matrices). When someone says "vector" without qualification, they usually mean column vector.
Vector Operations
Vector Addition
Two vectors of the same size can be added by adding their corresponding components.
Vector Addition
Here,
- =Two vectors of the same dimension
- =Resultant vector with summed components
๐Example: Vector Addition
If and , then:
Geometric Interpretation: Vector addition follows the "tip-to-tail" rule โ place the tail of one vector at the tip of the other. The resultant goes from the starting point to the final tip.
Properties of Vector Addition:
- Commutative:
- Associative:
- Identity:
- Inverse:
Scalar Multiplication
A vector can be multiplied by a scalar (a single number), which scales each component.
Scalar Multiplication
Here,
- =A scalar (single real number)
- =The vector to be scaled
- =The scaled vector
๐Example: Scalar Multiplication
If and , then:
Properties of Scalar Multiplication:
- Distributive:
- Associative:
- Identity:
Dot Product (Inner Product)
The dot product measures how much two vectors point in the same direction. It returns a scalar, not a vector.
Dot Product (Algebraic)
Here,
- =The dot product (scalar result)
- =The i-th components of vectors u and v
Geometric Interpretation:
Dot Product (Geometric)
Here,
- =Magnitudes of the vectors
- =Angle between the vectors
๐Example: Dot Product
If and :
๐ก Key Insight
If the dot product is zero, the vectors are perpendicular (orthogonal). This is the foundation of many machine learning algorithms โ PCA, attention mechanisms, and similarity measures all rely on dot products.
Properties of Dot Product:
- Commutative:
- Distributive:
- Scalar:
Magnitude (Length) of a Vector
The magnitude of a vector is its length โ the distance from the origin to the point it represents.
Vector Magnitude (Euclidean Norm)
Here,
- =The magnitude (length) of vector v
- =The i-th component of vector v
๐Example: Magnitude
For :
Other Norms (Generalized Magnitude):
| Norm | Formula | Name | Use Case |
|---|---|---|---|
| L1 | Manhattan | Feature selection (Lasso) | |
| L2 | Euclidean | Most common, Ridge | |
| Lโ | Maximum | Worst-case analysis | |
| Lp | General | Flexibility |
Unit Vector
A unit vector is a vector with magnitude 1. It preserves direction while removing scale.
Unit Vector
Here,
- =The unit vector in the direction of v
- =The original vector
- =The magnitude of v
๐Example: Unit Vector
For with :
Verification: โ
โน๏ธ Why Unit Vectors Matter
Unit vectors are used to represent direction only. In attention mechanisms, query/key vectors are often normalized to unit length so the dot product measures pure cosine similarity.
Angle Between Vectors
Angle Formula
Here,
- =Angle between vectors u and v
- =Dot product
- =Magnitudes
๐Example: Angle
For and :
The vectors are perpendicular (orthogonal).
Linear Independence
Vectors are linearly independent if none of them can be written as a combination of the others.
DfLinear Independence
A set of vectors is linearly independent if the only solution to:
is .
๐Example: Linear Independence
Check if and are linearly independent.
Set :
From equation 1: . Substitute into equation 2: .
Linearly independent โ
โ ๏ธ Common Mistake
If one vector is a scalar multiple of another, they are linearly dependent. For example, and are dependent because .
Span
DfSpan
The span of a set of vectors is the set of all possible linear combinations of those vectors.
Span
Here,
- =The generating vectors
- =Scalar coefficients
๐Example: Span
- (second vector is redundant)
Vector Space
DfVector Space
A vector space is a collection of vectors where you can add them and multiply by scalars and still stay in the space.
Axioms of a Vector Space:
- Closure under addition: is in the space
- Closure under scalar multiplication: is in the space
- Associativity of addition
- Commutativity of addition
- Existence of zero vector
- Existence of additive inverse
- Distributivity of scalar multiplication
- Compatibility of scalar multiplication
๐Examples of Vector Spaces
- (all n-dimensional real vectors)
- All polynomials of degree โค n
- All continuous functions on [a, b]
- The set of all mรn matrices
Subspace
DfSubspace
A subspace is a subset of a vector space that is itself a vector space (closed under addition and scalar multiplication).
๐Example: Subspace
- The x-axis in is a subspace
- The set of all solutions to is a subspace (null space)
- The set of all symmetric matrices is a subspace
Basis and Dimension
DfBasis
A basis is a set of linearly independent vectors that can represent every vector in the space.
DfDimension
The dimension is the number of vectors in the basis.
๐Example: Basis
Standard basis for :
Any vector
Dimension of
Python Implementation
import numpy as np
# Create vectors
u = np.array([1, 2, 3])
v = np.array([4, 5, 6])
# Vector addition
addition = u + v # [5, 7, 9]
# Scalar multiplication
scaled = 3 * u # [3, 6, 9]
# Dot product
dot_product = np.dot(u, v) # 32
# Magnitude (L2 norm)
magnitude = np.linalg.norm(u) # sqrt(14) โ 3.74
# Other norms
l1_norm = np.linalg.norm(u, ord=1) # |1| + |2| + |3| = 6
linf_norm = np.linalg.norm(u, ord=np.inf) # max(|1|, |2|, |3|) = 3
# Unit vector
unit_vector = u / np.linalg.norm(u)
# Angle between vectors
cos_angle = np.dot(u, v) / (np.linalg.norm(u) * np.linalg.norm(v))
angle_degrees = np.degrees(np.arccos(np.clip(cos_angle, -1, 1)))
# Linear independence check
matrix = np.column_stack([u, v[:3]])
rank = np.linalg.matrix_rank(matrix)
print(f"Rank: {rank}, Independent: {rank == min(u.shape[0], 2)}")
print(f"Addition: {addition}")
print(f"Scaled: {scaled}")
print(f"Dot product: {dot_product}")
print(f"Magnitude: {magnitude:.4f}")
print(f"L1 norm: {l1_norm}")
print(f"Lโ norm: {linf_norm}")
print(f"Unit vector: {unit_vector}")
print(f"Angle: {angle_degrees:.2f}ยฐ")
Applications in AI/ML
| Application | Vector Concept | How It's Used |
|---|---|---|
| Word Embeddings | High-dimensional vectors | Word2Vec, GloVe represent words as 300D vectors |
| Image Representations | Pixel vectors | Flattened image as a vector for classification |
| Attention Mechanism | Dot product similarity | QueryยทKey dot product measures relevance |
| Cosine Similarity | Angle between vectors | Semantic search, recommendation systems |
| Feature Vectors | Data point vectors | Each row in a dataset is a feature vector |
| Loss Landscapes | Gradient vectors | Gradient descent follows negative gradient |
๐ก Practical Insight
The dot product is the most important operation in ML. Attention in transformers is just scaled dot-product attention: . Understanding vectors deeply helps you understand transformers.
Common Mistakes
| Mistake | Why It's Wrong | Correct Approach |
|---|---|---|
| Adding vectors of different dimensions | is undefined | Ensure same dimension |
| Forgetting dot product returns scalar | is a number, not a vector | Use cross product for vector result |
| Confusing dot product with element-wise | Dot product sums, element-wise doesn't | |
| Normalizing zero vector | , division by zero | Check magnitude before normalizing |
| Assuming orthogonal = uncorrelated | Orthogonal vectors are uncorrelated only if mean-centered | Center data first |
Interview Questions
Q1: What's the difference between dot product and cross product? A: Dot product returns a scalar measuring alignment. Cross product returns a vector perpendicular to both inputs (3D only).
Q2: Why do we normalize vectors in attention mechanisms? A: Normalizing to unit length makes the dot product equivalent to cosine similarity, which is bounded between -1 and 1, preventing softmax saturation.
Q3: How do you check if a set of vectors is linearly independent? A: Form a matrix with vectors as columns and compute the rank. If rank = number of vectors, they're independent.
Q4: What's the geometric interpretation of the dot product? A: . It measures how much one vector projects onto another.
Q5: Why are L1 and L2 norms used differently in regularization? A: L1 (Lasso) creates sparse solutions (feature selection). L2 (Ridge) shrinks all weights toward zero (prevents overfitting).
Practice Problems
- Find the dot product of and .
- Calculate the magnitude of .
- Are the vectors and linearly independent? Why or why not?
- Find the unit vector of .
- Find the angle between and .
- Determine if is linearly independent.
๐กSolutions
Problem 1:
Problem 2:
Problem 3: , so is a scalar multiple of . They are linearly dependent.
Problem 4: . So
Problem 5:
Problem 6: , so the set is linearly dependent.
Quick Reference
| Concept | Formula | Key Property |
|---|---|---|
| Addition | Commutative, associative | |
| Scalar | Distributive | |
| Dot Product | Returns scalar | |
| Magnitude | Always โฅ 0 | |
| Unit Vector | Magnitude = 1 | |
| Angle | Perpendicular if 0 | |
| Linear Independence | No redundancy | |
| Span | All linear combinations | Generates the space |
| Basis | Linearly independent + spanning | Dimension = basis size |