← Math|19 of 100
Linear Algebra

Linear Transformations

Understand linear maps, matrix representations, kernel, image, and their role in neural networks.

πŸ“‚ TransformationsπŸ“– Lesson 19 of 100πŸŽ“ Free Course

Advertisement

Linear Transformations

ℹ️ Why It Matters

Every layer in a neural network is a linear transformation followed by a non-linearity. Computer graphics uses transformations to rotate, scale, and project objects. Physics engines simulate reality through transformations. Understanding them is key to understanding how data flows through any system.


What is a Linear Transformation?

DfLinear Transformation

A function T:Vβ†’WT: V \to W between vector spaces VV and WW is linear if for all vectors uβƒ—,vβƒ—βˆˆV\vec{u}, \vec{v} \in V and all scalars cc:

  1. Additivity: T(u⃗+v⃗)=T(u⃗)+T(v⃗)T(\vec{u} + \vec{v}) = T(\vec{u}) + T(\vec{v})
  2. Homogeneity: T(cv⃗)=cT(v⃗)T(c\vec{v}) = cT(\vec{v})

⚠️ Non-Example

T(x⃗)=x⃗+1⃗T(\vec{x}) = \vec{x} + \vec{1} is not linear because T(0⃗)=1⃗≠0⃗T(\vec{0}) = \vec{1} \neq \vec{0}. Linear transformations must map the zero vector to the zero vector.


Matrix Representation

Every linear transformation T:Rn→RmT: \mathbb{R}^n \to \mathbb{R}^m can be written as matrix multiplication.

Matrix Representation

T(x⃗)=Ax⃗T(\vec{x}) = A\vec{x}

Here,

  • AA=mΓ—n matrix representing the transformation
  • xβƒ—\vec{x}=Input vector in ℝⁿ
  • T(xβƒ—)T(\vec{x})=Output vector in ℝᡐ

How to find A: The columns of AA are T(eβƒ—1),T(eβƒ—2),…,T(eβƒ—n)T(\vec{e}_1), T(\vec{e}_2), \ldots, T(\vec{e}_n) where eβƒ—i\vec{e}_i are the standard basis vectors.

πŸ“Example: Finding Transformation Matrix

Let T:R2β†’R2T: \mathbb{R}^2 \to \mathbb{R}^2 be defined by T[xy]=[2x+yxβˆ’y]T\begin{bmatrix} x \\ y \end{bmatrix} = \begin{bmatrix} 2x + y \\ x - y \end{bmatrix}.

T(eβƒ—1)=T[10]=[21]T(\vec{e}_1) = T\begin{bmatrix} 1 \\ 0 \end{bmatrix} = \begin{bmatrix} 2 \\ 1 \end{bmatrix}T(eβƒ—2)=T[01]=[1βˆ’1]T(\vec{e}_2) = T\begin{bmatrix} 0 \\ 1 \end{bmatrix} = \begin{bmatrix} 1 \\ -1 \end{bmatrix}

So A=[211βˆ’1]A = \begin{bmatrix} 2 & 1 \\ 1 & -1 \end{bmatrix}


Kernel and Image

DfKernel (Null Space)

The kernel of TT is the set of all vectors that map to zero:

ker⁑(T)={xβƒ—βˆˆV:T(xβƒ—)=0βƒ—}\ker(T) = \{\vec{x} \in V : T(\vec{x}) = \vec{0}\}

DfImage (Range)

The image of TT is the set of all possible outputs:

im(T)={T(xβƒ—):xβƒ—βˆˆV}\text{im}(T) = \{T(\vec{x}) : \vec{x} \in V\}

ThRank-Nullity Theorem

For a linear transformation T:V→WT: V \to W where VV is finite-dimensional:

dim⁑(ker⁑(T))+dim⁑(im(T))=dim⁑(V)\dim(\ker(T)) + \dim(\text{im}(T)) = \dim(V)

πŸ“Example: Kernel and Image

Let A=[1224]A = \begin{bmatrix} 1 & 2 \\ 2 & 4 \end{bmatrix}.

Kernel: Solve Axβƒ—=0βƒ—A\vec{x} = \vec{0}: x1+2x2=0β€…β€ŠβŸΉβ€…β€Šx1=βˆ’2x2x_1 + 2x_2 = 0 \implies x_1 = -2x_2. ker⁑(T)=span{[βˆ’2,1]T}\ker(T) = \text{span}\{[-2, 1]^T\}, dim⁑=1\dim = 1.

Image: Column space = span of columns = span{[1,2]T}\text{span}\{[1, 2]^T\}, dim⁑=1\dim = 1.

Check: 1+1=2=dim⁑(R2)1 + 1 = 2 = \dim(\mathbb{R}^2) βœ“


Common Transformations

Rotation

2D Rotation

R(ΞΈ)=[cosβ‘ΞΈβˆ’sin⁑θsin⁑θcos⁑θ]R(\theta) = \begin{bmatrix} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{bmatrix}

Here,

  • ΞΈ\theta=Angle of rotation (counterclockwise)

πŸ“Example: 90Β° Rotation

R(90Β°)=[0βˆ’110]R(90Β°) = \begin{bmatrix} 0 & -1 \\ 1 & 0 \end{bmatrix}
R(90Β°)[10]=[01]R(90Β°)\begin{bmatrix} 1 \\ 0 \end{bmatrix} = \begin{bmatrix} 0 \\ 1 \end{bmatrix}

The point (1,0)(1, 0) rotates to (0,1)(0, 1). βœ“

Scaling

Scaling

S=[sx00sy]S = \begin{bmatrix} s_x & 0 \\ 0 & s_y \end{bmatrix}

Here,

  • sxs_x=Scale factor in x-direction
  • sys_y=Scale factor in y-direction

Reflection

πŸ“Example: Reflection over x-axis

Refx=[100βˆ’1]\text{Ref}_x = \begin{bmatrix} 1 & 0 \\ 0 & -1 \end{bmatrix}
Refx[34]=[3βˆ’4]\text{Ref}_x\begin{bmatrix} 3 \\ 4 \end{bmatrix} = \begin{bmatrix} 3 \\ -4 \end{bmatrix}

Shear

Horizontal Shear

Shearx=[1k01]\text{Shear}_x = \begin{bmatrix} 1 & k \\ 0 & 1 \end{bmatrix}

Here,

  • kk=Shear factor

Projection

πŸ“Example: Projection onto x-axis

Projx=[1000]\text{Proj}_x = \begin{bmatrix} 1 & 0 \\ 0 & 0 \end{bmatrix}
Projx[34]=[30]\text{Proj}_x\begin{bmatrix} 3 \\ 4 \end{bmatrix} = \begin{bmatrix} 3 \\ 0 \end{bmatrix}

Summary Table:

TransformationMatrixDeterminantEffect
Rotation (ΞΈ)[cosβ‘ΞΈβˆ’sin⁑θsin⁑θcos⁑θ]\begin{bmatrix}\cos\theta&-\sin\theta\\\sin\theta&\cos\theta\end{bmatrix}1Rotates, preserves lengths
Scaling (sx,sys_x, s_y)[sx00sy]\begin{bmatrix}s_x&0\\0&s_y\end{bmatrix}sxsys_x s_yStretches/shrinks
Reflection[100βˆ’1]\begin{bmatrix}1&0\\0&-1\end{bmatrix}-1Flips, preserves lengths
Shear[1k01]\begin{bmatrix}1&k\\0&1\end{bmatrix}1Slants
Projection[1000]\begin{bmatrix}1&0\\0&0\end{bmatrix}0Collapses dimension
Zero[0000]\begin{bmatrix}0&0\\0&0\end{bmatrix}0Maps everything to origin

Composition of Transformations

The composition of two linear transformations corresponds to matrix multiplication.

Composition

(T2∘T1)(xβƒ—)=T2(T1(xβƒ—))=(A2A1)xβƒ—(T_2 \circ T_1)(\vec{x}) = T_2(T_1(\vec{x})) = (A_2 A_1)\vec{x}

Here,

  • T1,T2T_1, T_2=Linear transformations
  • A1,A2A_1, A_2=Their representing matrices

πŸ“Example: Composition

Rotate by 90Β°, then reflect over x-axis:

Refxβ‹…R(90Β°)=[100βˆ’1][0βˆ’110]=[0βˆ’1βˆ’10]\text{Ref}_x \cdot R(90Β°) = \begin{bmatrix} 1 & 0 \\ 0 & -1 \end{bmatrix}\begin{bmatrix} 0 & -1 \\ 1 & 0 \end{bmatrix} = \begin{bmatrix} 0 & -1 \\ -1 & 0 \end{bmatrix}

Compare with reverse order: R(90Β°)β‹…Refx=[0110]R(90Β°) \cdot \text{Ref}_x = \begin{bmatrix} 0 & 1 \\ 1 & 0 \end{bmatrix}

Different results! Order matters.

πŸ’‘ Key Insight

Composition = matrix multiplication. Since AB≠BAAB \neq BA in general, the order of transformations matters. This is why layer order in neural networks matters.


Change of Basis

DfChange of Basis Matrix

If BB and Bβ€²B' are two bases, the change of basis matrix PP satisfies:

[xβƒ—]Bβ€²=Pβˆ’1[xβƒ—]B[\vec{x}]_{B'} = P^{-1}[\vec{x}]_B

πŸ“Example: Change of Basis

Let B={[1,0],[0,1]}B = \{[1,0], [0,1]\} (standard) and Bβ€²={[1,1],[1,βˆ’1]}B' = \{[1,1], [1,-1]\} (new basis).

The change of basis matrix from Bβ€²B' to BB is P=[111βˆ’1]P = \begin{bmatrix} 1 & 1 \\ 1 & -1 \end{bmatrix}.

Pβˆ’1=1βˆ’2[βˆ’1βˆ’1βˆ’11]=[0.50.50.5βˆ’0.5]P^{-1} = \frac{1}{-2}\begin{bmatrix} -1 & -1 \\ -1 & 1 \end{bmatrix} = \begin{bmatrix} 0.5 & 0.5 \\ 0.5 & -0.5 \end{bmatrix}

Eigenvalues and Transformations

Eigenvalues reveal the geometric behavior of a transformation.

ℹ️ Geometric Interpretation

  • Positive eigenvalue: Stretches along eigenvector direction
  • Negative eigenvalue: Reflects and stretches
  • Zero eigenvalue: Collapses that direction
  • Complex eigenvalue: Rotation component

πŸ“Example: Eigenvalues Tell the Story

A=[2003]A = \begin{bmatrix} 2 & 0 \\ 0 & 3 \end{bmatrix} has eigenvalues 2 and 3.

This transformation stretches the x-axis by 2 and y-axis by 3 β€” it's a non-uniform scaling.


Affine vs Linear Transformations

DfAffine Transformation

An affine transformation is a linear transformation plus a translation:

T(x⃗)=Ax⃗+b⃗T(\vec{x}) = A\vec{x} + \vec{b}

ℹ️ Homogeneous Coordinates

To represent affine transformations as matrix multiplication, use homogeneous coordinates:

[Ab⃗0⃗T1][x⃗1]=[Ax⃗+b⃗1]\begin{bmatrix} A & \vec{b} \\ \vec{0}^T & 1 \end{bmatrix}\begin{bmatrix} \vec{x} \\ 1 \end{bmatrix} = \begin{bmatrix} A\vec{x} + \vec{b} \\ 1 \end{bmatrix}

This is used in computer graphics and robotics.


Python Implementation

import numpy as np

# Define a linear transformation matrix
A = np.array([[2, 1], [1, -1]])

# Apply transformation
x = np.array([1, 2])
Tx = A @ x  # [4, -1]

# Rotation
theta = np.pi / 4  # 45 degrees
R = np.array([[np.cos(theta), -np.sin(theta)],
              [np.sin(theta), np.cos(theta)]])

# Scale
S = np.array([[2, 0], [0, 0.5]])

# Composition: rotate then scale
M = S @ R

# Kernel (null space)
from scipy.linalg import null_space
kernel = null_space(A)
print(f"Kernel: {kernel.flatten()}")

# Image (column space)
from scipy.linalg import column_space
# Or simply: rank determines dimension
rank = np.linalg.matrix_rank(A)
print(f"Rank: {rank}, Image dimension: {rank}")

# Verify linearity: T(u + v) = T(u) + T(v)
u, v = np.array([1, 0]), np.array([0, 1])
lhs = A @ (u + v)
rhs = A @ u + A @ v
print(f"T(u+v) = T(u)+T(v): {np.allclose(lhs, rhs)}")

# Verify: T(cu) = cT(u)
c = 3
lhs = A @ (c * u)
rhs = c * (A @ u)
print(f"T(cu) = cT(u): {np.allclose(lhs, rhs)}")

Applications in AI/ML

ApplicationTransformationHow It's Used
Neural Network LayerWx⃗+b⃗W\vec{x} + \vec{b}Linear + bias, then activation
Data AugmentationRotation, flip, cropIncreases training data diversity
Attention MechanismQKTQK^T projectionProjects query-key similarity
PCAProjection onto principal componentsDimensionality reduction
Word EmbeddingsLinear map from vocabulary to semantic spaceMaps one-hot to dense vectors
Computer GraphicsRotation, scaling, projectionRendering 3D scenes

πŸ’‘ Deep Learning Insight

A deep network is a composition of transformations: T=TL∘TLβˆ’1βˆ˜β‹―βˆ˜T1T = T_L \circ T_{L-1} \circ \cdots \circ T_1. Each Ti(xβƒ—)=Οƒ(Wixβƒ—+bβƒ—i)T_i(\vec{x}) = \sigma(W_i \vec{x} + \vec{b}_i). Without the non-linearity Οƒ\sigma, the entire network collapses to a single linear transformation T(xβƒ—)=Wxβƒ—T(\vec{x}) = W\vec{x} β€” no matter how many layers!


Common Mistakes

MistakeWhy It's WrongCorrect Approach
Assuming order doesn't matterAB≠BAAB \neq BA generallyAlways specify transformation order
Forgetting non-linearityStack of linear transforms = one linear transformAdd activation between layers
Confusing kernel with imageKernel = inputs→0, Image = all outputsUse Rank-Nullity to relate them
Assuming determinant β‰  0 means invertible over any fieldTrue for ℝ but not alwaysCheck rank for robustness
Using wrong basis for matrixMatrix depends on basisUse change-of-basis when switching

Interview Questions

Q1: What's the kernel of a transformation and why does it matter? A: The kernel (null space) is the set of inputs mapping to zero. Its dimension tells you how much information is lost. In neural networks, a large kernel means many inputs produce the same output β€” bad for discrimination.

Q2: Why must neural networks have non-linearities? A: Without them, any number of layers collapses to a single linear transformation T(x⃗)=Wx⃗T(\vec{x}) = W\vec{x}. Non-linearities (ReLU, sigmoid) break this collapse, allowing the network to learn complex decision boundaries.

Q3: What's the geometric meaning of a determinant of zero? A: The transformation collapses space to a lower dimension. The kernel is non-trivial (contains more than just 0βƒ—\vec{0}), and the transformation is not invertible.

Q4: How does data augmentation relate to transformations? A: Data augmentation applies random transformations (rotation, flip, crop) to training data. This makes the model invariant to those transformations, improving generalization.

Q5: What's the relationship between eigenvalues and transformation behavior? A: Eigenvalues tell you how the transformation scales along eigenvector directions. Positive eigenvalues stretch, negative eigenvalues reflect, zero eigenvalues collapse, and complex eigenvalues rotate.


Practice Problems

  1. Find the matrix representation of T:R2β†’R2T: \mathbb{R}^2 \to \mathbb{R}^2 defined by T([x,y])=[x+y,xβˆ’y]T([x, y]) = [x + y, x - y].
  2. Find the kernel and image of A=[1236]A = \begin{bmatrix} 1 & 2 \\ 3 & 6 \end{bmatrix}.
  3. What transformation does R(45Β°)β‹…R(45Β°)R(45Β°) \cdot R(45Β°) represent?
  4. Is T(xβƒ—)=βˆ₯xβƒ—βˆ₯T(\vec{x}) = \|\vec{x}\| a linear transformation? Prove or disprove.

πŸ’‘Solutions

Problem 1: T(eβƒ—1)=[1,1]T(\vec{e}_1) = [1, 1], T(eβƒ—2)=[1,βˆ’1]T(\vec{e}_2) = [1, -1]. Matrix: A=[111βˆ’1]A = \begin{bmatrix} 1 & 1 \\ 1 & -1 \end{bmatrix}.

Problem 2: Row reduce: [1200]\begin{bmatrix} 1 & 2 \\ 0 & 0 \end{bmatrix}. Kernel = span{[βˆ’2,1]T}\{[-2, 1]^T\}, Image = span{[1,3]T}\{[1, 3]^T\}. Rank-Nullity: 1+1=21 + 1 = 2 βœ“.

Problem 3: R(45Β°)β‹…R(45Β°)=R(90Β°)R(45Β°) \cdot R(45Β°) = R(90Β°). Rotation by 90Β°.

Problem 4: T(0βƒ—)=0T(\vec{0}) = 0 βœ“. But T(2xβƒ—)=2βˆ₯xβƒ—βˆ₯=2T(xβƒ—)T(2\vec{x}) = 2\|\vec{x}\| = 2T(\vec{x}) βœ“. However, T(xβƒ—+yβƒ—)=βˆ₯xβƒ—+yβƒ—βˆ₯β‰ βˆ₯xβƒ—βˆ₯+βˆ₯yβƒ—βˆ₯=T(xβƒ—)+T(yβƒ—)T(\vec{x} + \vec{y}) = \|\vec{x} + \vec{y}\| \neq \|\vec{x}\| + \|\vec{y}\| = T(\vec{x}) + T(\vec{y}) in general. Not linear (violates additivity).


Quick Reference

TransformationMatrixProperties
Rotation (ΞΈ)[cosβ‘ΞΈβˆ’sin⁑θsin⁑θcos⁑θ]\begin{bmatrix}\cos\theta&-\sin\theta\\\sin\theta&\cos\theta\end{bmatrix}Preserves lengths, det=1
Scaling (sx,sys_x, s_y)[sx00sy]\begin{bmatrix}s_x&0\\0&s_y\end{bmatrix}Stretches, det=sxsys_x s_y
Reflection[100βˆ’1]\begin{bmatrix}1&0\\0&-1\end{bmatrix}Preserves lengths, det=-1
Shear[1k01]\begin{bmatrix}1&k\\0&1\end{bmatrix}Preserves area, det=1
Projection[1000]\begin{bmatrix}1&0\\0&0\end{bmatrix}Collapses, det=0
PropertyFormula
Kernelker⁑(T)={xβƒ—:T(xβƒ—)=0βƒ—}\ker(T) = \{\vec{x}: T(\vec{x}) = \vec{0}\}
Imageim(T)={T(xβƒ—):xβƒ—βˆˆV}\text{im}(T) = \{T(\vec{x}): \vec{x} \in V\}
Rank-Nullitydim⁑(ker⁑)+dim⁑(im)=dim⁑(V)\dim(\ker) + \dim(\text{im}) = \dim(V)
CompositionT2∘T1↔A2A1T_2 \circ T_1 \leftrightarrow A_2 A_1

Cross-References

Lesson Progress19 / 100