Linear Algebra — The Language of Data

ℹ️ Why It Matters

Every image, text, dataset, and neural network in AI is represented as numbers arranged in grids. Linear algebra is the math of those grids (matrices and vectors). Without it, there is no modern AI.

What is a Vector?

A vector is a list of numbers arranged in order. Think of it as an arrow pointing to a location in space.

Simple Analogy: Think of a vector like a GPS coordinate — it tells you exactly where something is.

DfVector

A vector is an ordered tuple of numbers that represents a point or direction in space. A vector in n-dimensional space is written as:

Vector Notation

\vec{v} = [v_1, v_2, \ldots, v_n] \in \mathbb{R}^n

Here,

$\vec{v}$ =The vector (an ordered list of numbers)
$v_1, v_2, \ldots, v_n$ =Individual components of the vector
$\mathbb{R}^n$ =n-dimensional real coordinate space

Real-World Example:

A student's grades: [Math: 85, English: 90, Science: 78] is a 3D vector
An RGB pixel color: [255, 128, 0] is a 3D vector (orange color)
Word embeddings in NLP: the word "king" might be represented as a 300-dimensional vector

Vector Operations

Addition: Add corresponding elements

Vector Addition

\vec{u} + \vec{v} = [u_1 + v_1, u_2 + v_2, \ldots, u_n + v_n]

Here,

$\vec{u}, \vec{v}$ =Two vectors of the same dimension
$\vec{u} + \vec{v}$ =Resultant vector with summed components

📝Example: Vector Addition

If $\vec{u} = [1, 2, 3]$ and $\vec{v} = [4, 5, 6]$ , then:

\vec{u} + \vec{v} = [1+4, 2+5, 3+6] = [5, 7, 9]

Scalar Multiplication: Multiply every element by a number

Scalar Multiplication

c\vec{v} = [c \cdot v_1, c \cdot v_2, \ldots, c \cdot v_n]

Here,

$c$ =A scalar (single real number)
$\vec{v}$ =The vector to be scaled
$c\vec{v}$ =The scaled vector

📝Example: Scalar Multiplication

If $\vec{v} = [2, 4, 6]$ and $c = 3$ , then:

c\vec{v} = 3 \times [2, 4, 6] = [6, 12, 18]

Dot Product: Multiply corresponding elements and sum them up

Dot Product

\vec{u} \cdot \vec{v} = \sum_{i=1}^{n} u_i v_i = u_1 v_1 + u_2 v_2 + \cdots + u_n v_n

Here,

$\vec{u} \cdot \vec{v}$ =The dot product (scalar result)
$u_i, v_i$ =The i-th components of vectors u and v

📝Example: Dot Product

If $\vec{u} = [1, 2, 3]$ and $\vec{v} = [4, 5, 6]$ :

\vec{u} \cdot \vec{v} = (1 \times 4) + (2 \times 5) + (3 \times 6) = 4 + 10 + 18 = 32

💡 Why dot product matters

It measures how similar two vectors are. If the dot product is large and positive, the vectors point in the same direction (similar). If it's zero, they're perpendicular (unrelated). If negative, they point in opposite directions (opposite).

Magnitude (Length of a Vector)

Vector Magnitude (Euclidean Norm)

\|\vec{v}\| = \sqrt{v_1^2 + v_2^2 + \cdots + v_n^2} = \sqrt{\sum_{i=1}^{n} v_i^2}

Here,

$\|\vec{v}\|$ =The magnitude (length) of vector v
$v_i$ =The i-th component of vector v

📝Example: Magnitude

For $\vec{v} = [3, 4]$ :

\|\vec{v}\| = \sqrt{3^2 + 4^2} = \sqrt{9 + 16} = \sqrt{25} = 5

Unit Vector

A unit vector is a vector with magnitude 1. It preserves direction while removing scale.

Unit Vector

\hat{v} = \frac{\vec{v}}{\|\vec{v}\|}

Here,

$\hat{v}$ =The unit vector in the direction of v
$\vec{v}$ =The original vector
$\|\vec{v}\|$ =The magnitude of v

📝Example: Unit Vector

For $\vec{v} = [3, 4]$ with $\|\vec{v}\| = 5$ :

\hat{v} = \frac{1}{5}[3, 4] = [0.6, 0.8]

\|\hat{v}\| = \sqrt{0.6^2 + 0.8^2} = \sqrt{0.36 + 0.64} = \sqrt{1} = 1 \checkmark

What is a Matrix?

A matrix is a rectangular grid of numbers arranged in rows and columns.

DfMatrix

A matrix is a rectangular array of numbers arranged in rows and columns. An $m \times n$ matrix has $m$ rows and $n$ columns.

Matrix Notation

A = \begin{bmatrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & \cdots & a_{mn} \end{bmatrix}

Here,

$A$ =The matrix
$a_{ij}$ =Element in row i, column j
$m \times n$ =Dimensions (rows × columns)

Real-World Examples:

Dataset: Each row = one data point, each column = one feature
Grayscale Image: Each pixel is a number (0=black, 255=white)
Neural Network Weights: Each layer is a matrix of connection strengths

Matrix Operations

Addition: Add corresponding elements (matrices must be same size)

📝Example: Matrix Addition

\begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} + \begin{bmatrix} 5 & 6 \\ 7 & 8 \end{bmatrix} = \begin{bmatrix} 1+5 & 2+6 \\ 3+7 & 4+8 \end{bmatrix} = \begin{bmatrix} 6 & 8 \\ 10 & 12 \end{bmatrix}

Matrix Multiplication: Row × Column, then sum

Matrix Multiplication

(AB)_{ij} = \sum_{k=1}^{n} a_{ik} b_{kj}

Here,

$A$ =First matrix (m × n)
$B$ =Second matrix (n × p)
$AB$ =Result matrix (m × p)

📝Example: Matrix Multiplication

\begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} \times \begin{bmatrix} 5 & 6 \\ 7 & 8 \end{bmatrix} = \begin{bmatrix} (1 \times 5 + 2 \times 7) & (1 \times 6 + 2 \times 8) \\ (3 \times 5 + 4 \times 7) & (3 \times 6 + 4 \times 8) \end{bmatrix} = \begin{bmatrix} 19 & 22 \\ 43 & 50 \end{bmatrix}

⚠️ Important Rule

For $A \times B$ to work, columns of A must equal rows of B.

Transpose

Flip rows and columns (mirror along the diagonal).

Matrix Transpose

(A^T)_{ij} = A_{ji}

Here,

$A^T$ =The transpose of matrix A
$A_{ji}$ =Element from row j, column i of original

📝Example: Transpose

\begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix}^T = \begin{bmatrix} 1 & 4 \\ 2 & 5 \\ 3 & 6 \end{bmatrix}

Identity Matrix

The "1" of matrices. Multiplying anything by it gives the same thing back.

DfIdentity Matrix

The identity matrix $I$ is a square matrix with 1s on the diagonal and 0s elsewhere. For any matrix $A$ : $A \times I = A$ .

📝Example: 3×3 Identity Matrix

I = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix}

A \times I = A \text{ (for any matrix A)}

Inverse Matrix

Like dividing. If $A \times B = I$ , then $B$ is the inverse of $A$ , written as $A^{-1}$ .

Matrix Inverse

A \times A^{-1} = I

Here,

$A$ =The original matrix
$A^{-1}$ =The inverse matrix
$I$ =The identity matrix

ℹ️ Why it matters in AI

Used in solving systems of linear equations, which comes up in regression and optimization.

Eigenvalues and Eigenvectors

The Big Idea: When you multiply a matrix by certain special vectors, the vector doesn't change direction — it only gets stretched or shrunk. Those special vectors are eigenvectors, and the stretching factor is the eigenvalue.

Eigenvalue Equation

A \vec{v} = \lambda \vec{v}

Here,

$A$ =The matrix
$\vec{v}$ =The eigenvector
$\lambda$ =The eigenvalue (scalar)

Analogy: Imagine putting a rubber sheet on a board and stretching it. Most arrows on the sheet will change direction. But some arrows, when you stretch, still point the same way — just longer or shorter. Those are eigenvectors.

ℹ️ How to find them

Solve the characteristic equation: $\det(A - \lambda I) = 0$

📝Example: Finding Eigenvalues

Let $A = \begin{bmatrix} 4 & 1 \\ 2 & 3 \end{bmatrix}$

\det(A - \lambda I) = 0

\begin{vmatrix} 4-\lambda & 1 \\ 2 & 3-\lambda \end{vmatrix} = 0

(4-\lambda)(3-\lambda) - (1)(2) = 0

12 - 4\lambda - 3\lambda + \lambda^2 - 2 = 0

\lambda^2 - 7\lambda + 10 = 0

(\lambda - 5)(\lambda - 2) = 0

\lambda_1 = 5, \quad \lambda_2 = 2

Applications in AI/DS:

Principal Component Analysis (PCA): Find the directions of maximum variance in data
Google's PageRank: Eigenvalues help rank web pages
Stability analysis: Check if a system is stable
Face recognition (Eigenfaces): Use eigenvalues of face image matrices

Matrix Decompositions

LU Decomposition

Break a matrix into Lower × Upper triangular matrices.

LU Decomposition

A = L \times U

Here,

$A$ =The original matrix
$L$ =Lower triangular matrix
$U$ =Upper triangular matrix

Use: Solving systems of equations efficiently.

SVD (Singular Value Decomposition)

The most important matrix decomposition in data science!

Singular Value Decomposition

A = U \times \Sigma \times V^T

Here,

$A$ =The original matrix
$U$ =Left singular vectors (orthogonal)
$\Sigma$ =Diagonal matrix of singular values
$V$ =Right singular vectors (orthogonal)

Analogy: SVD is like breaking down a number into its prime factors, but for matrices. Every matrix can be decomposed this way.

Applications:

Image compression: Keep only the largest singular values, throw away the rest
Recommendation systems (Netflix Prize): Decompose the user-movie rating matrix
Noise reduction: Remove small singular values (noise)
Latent Semantic Analysis: Find hidden topics in text

QR Decomposition

A = Q \times R

Here,

$A$ =The original matrix
$Q$ =Orthogonal matrix
$R$ =Upper triangular matrix

Use: Least squares problems, eigenvalue algorithms.

Vector Spaces and Subspaces

DfVector Space

A vector space is a collection of vectors where you can add them and multiply by scalars and still stay in the space.

DfSubspace

A subspace is a smaller vector space inside a bigger one.

Examples:

All 2D vectors [x, y] form a vector space (the entire 2D plane)
All vectors [x, 0] (y=0) form a subspace (the x-axis)

DfBasis

A basis is a set of linearly independent vectors that can represent every vector in the space.

DfDimension

The dimension is the number of vectors in the basis.

DfSpan

The span is all possible linear combinations of a set of vectors.

📝Example: Span

\text{Span}([1,0], [0,1]) = \text{entire 2D plane}

\text{Span}([1,1]) = \text{just the line } y=x

DfLinear Independence

Vectors are linearly independent if none of them can be written as a combination of the others.

DfRank

The rank is the number of linearly independent rows (or columns) in a matrix.

ℹ️ Why it matters

Rank tells you how much "information" a matrix contains
If rank < number of features, your data has redundancy (you can compress it)
PCA finds a low-rank approximation of your data

Norms

A norm is a way to measure the "size" of a vector.

Norm	Formula	Name	Use Case
L1 norm			v
L2 norm			v
L∞ norm			v
Lp norm			v

Applications in AI:

L1 regularization (Lasso): Shrinks some weights to exactly 0 → feature selection
L2 regularization (Ridge): Shrinks all weights toward 0 → prevents overfitting
Distance metrics: KNN, K-means clustering all use norms

Linear Algebra in Machine Learning — Summary

📋Summary: Linear Algebra

A vector is an ordered list of numbers representing magnitude and direction
Vector addition adds corresponding components
Scalar multiplication scales each component by a number
Dot product measures alignment between two vectors (returns scalar)
Matrix is a rectangular grid of numbers
Eigenvalues/eigenvectors reveal hidden structure in data
SVD decomposes any matrix into orthogonal components
Norms measure vector/matrix size

ML Concept	Linear Algebra Concept
Dataset	Matrix (rows=samples, cols=features)
Neural Network layer	Matrix multiplication + bias
PCA	Eigendecomposition / SVD
Recommendations	Matrix factorization
Image processing	Matrix operations
Word embeddings	Vector spaces
Regression (XᵀX)⁻¹Xᵀy	Matrix inverse, transpose