← Math|1 of 100
Mathematics for Data Science & AI

Linear Algebra — The Language of Data

Master linear algebra for data science and AI: vectors, matrices, eigenvalues, SVD, and their applications in machine learning.

📂 Linear Algebra📖 Lesson 1 of 100🎓 Free Course

Advertisement

Linear Algebra — The Language of Data

ℹ️ Why It Matters

Every image, text, dataset, and neural network in AI is represented as numbers arranged in grids. Linear algebra is the math of those grids (matrices and vectors). Without it, there is no modern AI.


What is a Vector?

A vector is a list of numbers arranged in order. Think of it as an arrow pointing to a location in space.

Simple Analogy: Think of a vector like a GPS coordinate — it tells you exactly where something is.

DfVector

A vector is an ordered tuple of numbers that represents a point or direction in space. A vector in n-dimensional space is written as:

Vector Notation

v=[v1,v2,,vn]Rn\vec{v} = [v_1, v_2, \ldots, v_n] \in \mathbb{R}^n

Here,

  • v\vec{v}=The vector (an ordered list of numbers)
  • v1,v2,,vnv_1, v_2, \ldots, v_n=Individual components of the vector
  • Rn\mathbb{R}^n=n-dimensional real coordinate space

Real-World Example:

  • A student's grades: [Math: 85, English: 90, Science: 78] is a 3D vector
  • An RGB pixel color: [255, 128, 0] is a 3D vector (orange color)
  • Word embeddings in NLP: the word "king" might be represented as a 300-dimensional vector

Vector Operations

Addition: Add corresponding elements

Vector Addition

u+v=[u1+v1,u2+v2,,un+vn]\vec{u} + \vec{v} = [u_1 + v_1, u_2 + v_2, \ldots, u_n + v_n]

Here,

  • u,v\vec{u}, \vec{v}=Two vectors of the same dimension
  • u+v\vec{u} + \vec{v}=Resultant vector with summed components

📝Example: Vector Addition

If u=[1,2,3]\vec{u} = [1, 2, 3] and v=[4,5,6]\vec{v} = [4, 5, 6], then:

u+v=[1+4,2+5,3+6]=[5,7,9]\vec{u} + \vec{v} = [1+4, 2+5, 3+6] = [5, 7, 9]

Scalar Multiplication: Multiply every element by a number

Scalar Multiplication

cv=[cv1,cv2,,cvn]c\vec{v} = [c \cdot v_1, c \cdot v_2, \ldots, c \cdot v_n]

Here,

  • cc=A scalar (single real number)
  • v\vec{v}=The vector to be scaled
  • cvc\vec{v}=The scaled vector

📝Example: Scalar Multiplication

If v=[2,4,6]\vec{v} = [2, 4, 6] and c=3c = 3, then:

cv=3×[2,4,6]=[6,12,18]c\vec{v} = 3 \times [2, 4, 6] = [6, 12, 18]

Dot Product: Multiply corresponding elements and sum them up

Dot Product

uv=i=1nuivi=u1v1+u2v2++unvn\vec{u} \cdot \vec{v} = \sum_{i=1}^{n} u_i v_i = u_1 v_1 + u_2 v_2 + \cdots + u_n v_n

Here,

  • uv\vec{u} \cdot \vec{v}=The dot product (scalar result)
  • ui,viu_i, v_i=The i-th components of vectors u and v

📝Example: Dot Product

If u=[1,2,3]\vec{u} = [1, 2, 3] and v=[4,5,6]\vec{v} = [4, 5, 6]:

uv=(1×4)+(2×5)+(3×6)=4+10+18=32\vec{u} \cdot \vec{v} = (1 \times 4) + (2 \times 5) + (3 \times 6) = 4 + 10 + 18 = 32

💡 Why dot product matters

It measures how similar two vectors are. If the dot product is large and positive, the vectors point in the same direction (similar). If it's zero, they're perpendicular (unrelated). If negative, they point in opposite directions (opposite).

Magnitude (Length of a Vector)

Vector Magnitude (Euclidean Norm)

v=v12+v22++vn2=i=1nvi2\|\vec{v}\| = \sqrt{v_1^2 + v_2^2 + \cdots + v_n^2} = \sqrt{\sum_{i=1}^{n} v_i^2}

Here,

  • v\|\vec{v}\|=The magnitude (length) of vector v
  • viv_i=The i-th component of vector v

📝Example: Magnitude

For v=[3,4]\vec{v} = [3, 4]:

v=32+42=9+16=25=5\|\vec{v}\| = \sqrt{3^2 + 4^2} = \sqrt{9 + 16} = \sqrt{25} = 5

Unit Vector

A unit vector is a vector with magnitude 1. It preserves direction while removing scale.

Unit Vector

v^=vv\hat{v} = \frac{\vec{v}}{\|\vec{v}\|}

Here,

  • v^\hat{v}=The unit vector in the direction of v
  • v\vec{v}=The original vector
  • v\|\vec{v}\|=The magnitude of v

📝Example: Unit Vector

For v=[3,4]\vec{v} = [3, 4] with v=5\|\vec{v}\| = 5:

v^=15[3,4]=[0.6,0.8]\hat{v} = \frac{1}{5}[3, 4] = [0.6, 0.8]
v^=0.62+0.82=0.36+0.64=1=1\|\hat{v}\| = \sqrt{0.6^2 + 0.8^2} = \sqrt{0.36 + 0.64} = \sqrt{1} = 1 \checkmark

What is a Matrix?

A matrix is a rectangular grid of numbers arranged in rows and columns.

DfMatrix

A matrix is a rectangular array of numbers arranged in rows and columns. An m×nm \times n matrix has mm rows and nn columns.

Matrix Notation

A=[a11a12a1na21a22a2nam1am2amn]A = \begin{bmatrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & \cdots & a_{mn} \end{bmatrix}

Here,

  • AA=The matrix
  • aija_{ij}=Element in row i, column j
  • m×nm \times n=Dimensions (rows × columns)

Real-World Examples:

  • Dataset: Each row = one data point, each column = one feature
  • Grayscale Image: Each pixel is a number (0=black, 255=white)
  • Neural Network Weights: Each layer is a matrix of connection strengths

Matrix Operations

Addition: Add corresponding elements (matrices must be same size)

📝Example: Matrix Addition

[1234]+[5678]=[1+52+63+74+8]=[681012]\begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} + \begin{bmatrix} 5 & 6 \\ 7 & 8 \end{bmatrix} = \begin{bmatrix} 1+5 & 2+6 \\ 3+7 & 4+8 \end{bmatrix} = \begin{bmatrix} 6 & 8 \\ 10 & 12 \end{bmatrix}

Matrix Multiplication: Row × Column, then sum

Matrix Multiplication

(AB)ij=k=1naikbkj(AB)_{ij} = \sum_{k=1}^{n} a_{ik} b_{kj}

Here,

  • AA=First matrix (m × n)
  • BB=Second matrix (n × p)
  • ABAB=Result matrix (m × p)

📝Example: Matrix Multiplication

[1234]×[5678]=[(1×5+2×7)(1×6+2×8)(3×5+4×7)(3×6+4×8)]=[19224350]\begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} \times \begin{bmatrix} 5 & 6 \\ 7 & 8 \end{bmatrix} = \begin{bmatrix} (1 \times 5 + 2 \times 7) & (1 \times 6 + 2 \times 8) \\ (3 \times 5 + 4 \times 7) & (3 \times 6 + 4 \times 8) \end{bmatrix} = \begin{bmatrix} 19 & 22 \\ 43 & 50 \end{bmatrix}

⚠️ Important Rule

For A×BA \times B to work, columns of A must equal rows of B.

Transpose

Flip rows and columns (mirror along the diagonal).

Matrix Transpose

(AT)ij=Aji(A^T)_{ij} = A_{ji}

Here,

  • ATA^T=The transpose of matrix A
  • AjiA_{ji}=Element from row j, column i of original

📝Example: Transpose

[123456]T=[142536]\begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix}^T = \begin{bmatrix} 1 & 4 \\ 2 & 5 \\ 3 & 6 \end{bmatrix}

Identity Matrix

The "1" of matrices. Multiplying anything by it gives the same thing back.

DfIdentity Matrix

The identity matrix II is a square matrix with 1s on the diagonal and 0s elsewhere. For any matrix AA: A×I=AA \times I = A.

📝Example: 3×3 Identity Matrix

I=[100010001]I = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix}
A×I=A (for any matrix A)A \times I = A \text{ (for any matrix A)}

Inverse Matrix

Like dividing. If A×B=IA \times B = I, then BB is the inverse of AA, written as A1A^{-1}.

Matrix Inverse

A×A1=IA \times A^{-1} = I

Here,

  • AA=The original matrix
  • A1A^{-1}=The inverse matrix
  • II=The identity matrix

ℹ️ Why it matters in AI

Used in solving systems of linear equations, which comes up in regression and optimization.


Eigenvalues and Eigenvectors

The Big Idea: When you multiply a matrix by certain special vectors, the vector doesn't change direction — it only gets stretched or shrunk. Those special vectors are eigenvectors, and the stretching factor is the eigenvalue.

Eigenvalue Equation

Av=λvA \vec{v} = \lambda \vec{v}

Here,

  • AA=The matrix
  • v\vec{v}=The eigenvector
  • λ\lambda=The eigenvalue (scalar)

Analogy: Imagine putting a rubber sheet on a board and stretching it. Most arrows on the sheet will change direction. But some arrows, when you stretch, still point the same way — just longer or shorter. Those are eigenvectors.

ℹ️ How to find them

Solve the characteristic equation: det(AλI)=0\det(A - \lambda I) = 0

📝Example: Finding Eigenvalues

Let A=[4123]A = \begin{bmatrix} 4 & 1 \\ 2 & 3 \end{bmatrix}

det(AλI)=0\det(A - \lambda I) = 0
4λ123λ=0\begin{vmatrix} 4-\lambda & 1 \\ 2 & 3-\lambda \end{vmatrix} = 0
(4λ)(3λ)(1)(2)=0(4-\lambda)(3-\lambda) - (1)(2) = 0
124λ3λ+λ22=012 - 4\lambda - 3\lambda + \lambda^2 - 2 = 0
λ27λ+10=0\lambda^2 - 7\lambda + 10 = 0
(λ5)(λ2)=0(\lambda - 5)(\lambda - 2) = 0
λ1=5,λ2=2\lambda_1 = 5, \quad \lambda_2 = 2

Applications in AI/DS:

  • Principal Component Analysis (PCA): Find the directions of maximum variance in data
  • Google's PageRank: Eigenvalues help rank web pages
  • Stability analysis: Check if a system is stable
  • Face recognition (Eigenfaces): Use eigenvalues of face image matrices

Matrix Decompositions

LU Decomposition

Break a matrix into Lower × Upper triangular matrices.

LU Decomposition

A=L×UA = L \times U

Here,

  • AA=The original matrix
  • LL=Lower triangular matrix
  • UU=Upper triangular matrix

Use: Solving systems of equations efficiently.

SVD (Singular Value Decomposition)

The most important matrix decomposition in data science!

Singular Value Decomposition

A=U×Σ×VTA = U \times \Sigma \times V^T

Here,

  • AA=The original matrix
  • UU=Left singular vectors (orthogonal)
  • Σ\Sigma=Diagonal matrix of singular values
  • VV=Right singular vectors (orthogonal)

Analogy: SVD is like breaking down a number into its prime factors, but for matrices. Every matrix can be decomposed this way.

Applications:

  • Image compression: Keep only the largest singular values, throw away the rest
  • Recommendation systems (Netflix Prize): Decompose the user-movie rating matrix
  • Noise reduction: Remove small singular values (noise)
  • Latent Semantic Analysis: Find hidden topics in text

QR Decomposition

QR Decomposition

A=Q×RA = Q \times R

Here,

  • AA=The original matrix
  • QQ=Orthogonal matrix
  • RR=Upper triangular matrix

Use: Least squares problems, eigenvalue algorithms.


Vector Spaces and Subspaces

DfVector Space

A vector space is a collection of vectors where you can add them and multiply by scalars and still stay in the space.

DfSubspace

A subspace is a smaller vector space inside a bigger one.

Examples:

  • All 2D vectors [x, y] form a vector space (the entire 2D plane)
  • All vectors [x, 0] (y=0) form a subspace (the x-axis)

DfBasis

A basis is a set of linearly independent vectors that can represent every vector in the space.

DfDimension

The dimension is the number of vectors in the basis.

DfSpan

The span is all possible linear combinations of a set of vectors.

📝Example: Span

Span([1,0],[0,1])=entire 2D plane\text{Span}([1,0], [0,1]) = \text{entire 2D plane}
Span([1,1])=just the line y=x\text{Span}([1,1]) = \text{just the line } y=x

DfLinear Independence

Vectors are linearly independent if none of them can be written as a combination of the others.

DfRank

The rank is the number of linearly independent rows (or columns) in a matrix.

ℹ️ Why it matters

  • Rank tells you how much "information" a matrix contains
  • If rank < number of features, your data has redundancy (you can compress it)
  • PCA finds a low-rank approximation of your data

Norms

A norm is a way to measure the "size" of a vector.

NormFormulaNameUse Case
L1 normv
L2 normv
L∞ normv
Lp normv

Applications in AI:

  • L1 regularization (Lasso): Shrinks some weights to exactly 0 → feature selection
  • L2 regularization (Ridge): Shrinks all weights toward 0 → prevents overfitting
  • Distance metrics: KNN, K-means clustering all use norms

Linear Algebra in Machine Learning — Summary

📋Summary: Linear Algebra

  • A vector is an ordered list of numbers representing magnitude and direction
  • Vector addition adds corresponding components
  • Scalar multiplication scales each component by a number
  • Dot product measures alignment between two vectors (returns scalar)
  • Matrix is a rectangular grid of numbers
  • Eigenvalues/eigenvectors reveal hidden structure in data
  • SVD decomposes any matrix into orthogonal components
  • Norms measure vector/matrix size
ML ConceptLinear Algebra Concept
DatasetMatrix (rows=samples, cols=features)
Neural Network layerMatrix multiplication + bias
PCAEigendecomposition / SVD
RecommendationsMatrix factorization
Image processingMatrix operations
Word embeddingsVector spaces
Regression (XᵀX)⁻¹XᵀyMatrix inverse, transpose
Lesson Progress1 / 100