Linear Transformations

ℹ️ Why It Matters

Every layer in a neural network is a linear transformation followed by a non-linearity. Computer graphics uses transformations to rotate, scale, and project objects. Physics engines simulate reality through transformations. Understanding them is key to understanding how data flows through any system.

What is a Linear Transformation?

DfLinear Transformation

A function $T: V \to W$ between vector spaces $V$ and $W$ is linear if for all vectors $\vec{u}, \vec{v} \in V$ and all scalars $c$ :

Additivity: $T(\vec{u} + \vec{v}) = T(\vec{u}) + T(\vec{v})$
Homogeneity: $T(c\vec{v}) = cT(\vec{v})$

⚠️ Non-Example

$T(\vec{x}) = \vec{x} + \vec{1}$ is not linear because $T(\vec{0}) = \vec{1} \neq \vec{0}$ . Linear transformations must map the zero vector to the zero vector.

Matrix Representation

Every linear transformation $T: \mathbb{R}^n \to \mathbb{R}^m$ can be written as matrix multiplication.

Matrix Representation

T(\vec{x}) = A\vec{x}

Here,

$A$ =m×n matrix representing the transformation
$\vec{x}$ =Input vector in ℝⁿ
$T(\vec{x})$ =Output vector in ℝᵐ

How to find A: The columns of $A$ are $T(\vec{e}_1), T(\vec{e}_2), \ldots, T(\vec{e}_n)$ where $\vec{e}_i$ are the standard basis vectors.

📝Example: Finding Transformation Matrix

Let $T: \mathbb{R}^2 \to \mathbb{R}^2$ be defined by $T\begin{bmatrix} x \\ y \end{bmatrix} = \begin{bmatrix} 2x + y \\ x - y \end{bmatrix}$ .

T(\vec{e}_1) = T\begin{bmatrix} 1 \\ 0 \end{bmatrix} = \begin{bmatrix} 2 \\ 1 \end{bmatrix}

T(\vec{e}_2) = T\begin{bmatrix} 0 \\ 1 \end{bmatrix} = \begin{bmatrix} 1 \\ -1 \end{bmatrix}

So $A = \begin{bmatrix} 2 & 1 \\ 1 & -1 \end{bmatrix}$

Kernel and Image

DfKernel (Null Space)

The kernel of $T$ is the set of all vectors that map to zero:

\ker(T) = \{\vec{x} \in V : T(\vec{x}) = \vec{0}\}

DfImage (Range)

The image of $T$ is the set of all possible outputs:

\text{im}(T) = \{T(\vec{x}) : \vec{x} \in V\}

ThRank-Nullity Theorem

For a linear transformation $T: V \to W$ where $V$ is finite-dimensional:

\dim(\ker(T)) + \dim(\text{im}(T)) = \dim(V)

📝Example: Kernel and Image

Let $A = \begin{bmatrix} 1 & 2 \\ 2 & 4 \end{bmatrix}$ .

Kernel: Solve $A\vec{x} = \vec{0}$ : $x_1 + 2x_2 = 0 \implies x_1 = -2x_2$ . $\ker(T) = \text{span}\{[-2, 1]^T\}$ , $\dim = 1$ .

Image: Column space = span of columns = $\text{span}\{[1, 2]^T\}$ , $\dim = 1$ .

Check: $1 + 1 = 2 = \dim(\mathbb{R}^2)$ ✓

Common Transformations

Rotation

2D Rotation

R(\theta) = \begin{bmatrix} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{bmatrix}

Here,

$\theta$ =Angle of rotation (counterclockwise)

📝Example: 90° Rotation

R(90°) = \begin{bmatrix} 0 & -1 \\ 1 & 0 \end{bmatrix}

R(90°)\begin{bmatrix} 1 \\ 0 \end{bmatrix} = \begin{bmatrix} 0 \\ 1 \end{bmatrix}

The point $(1, 0)$ rotates to $(0, 1)$ . ✓

Scaling

S = \begin{bmatrix} s_x & 0 \\ 0 & s_y \end{bmatrix}

Here,

$s_x$ =Scale factor in x-direction
$s_y$ =Scale factor in y-direction

Reflection

📝Example: Reflection over x-axis

\text{Ref}_x = \begin{bmatrix} 1 & 0 \\ 0 & -1 \end{bmatrix}

\text{Ref}_x\begin{bmatrix} 3 \\ 4 \end{bmatrix} = \begin{bmatrix} 3 \\ -4 \end{bmatrix}

Shear

Horizontal Shear

\text{Shear}_x = \begin{bmatrix} 1 & k \\ 0 & 1 \end{bmatrix}

Here,

$k$ =Shear factor

Projection

📝Example: Projection onto x-axis

\text{Proj}_x = \begin{bmatrix} 1 & 0 \\ 0 & 0 \end{bmatrix}

\text{Proj}_x\begin{bmatrix} 3 \\ 4 \end{bmatrix} = \begin{bmatrix} 3 \\ 0 \end{bmatrix}

Summary Table:

Transformation	Matrix	Determinant	Effect
Rotation (θ)	$\begin{bmatrix}\cos\theta&-\sin\theta\\\sin\theta&\cos\theta\end{bmatrix}$	1	Rotates, preserves lengths
Scaling ( $s_x, s_y$ )	$\begin{bmatrix}s_x&0\\0&s_y\end{bmatrix}$	$s_x s_y$	Stretches/shrinks
Reflection	$\begin{bmatrix}1&0\\0&-1\end{bmatrix}$	-1	Flips, preserves lengths
Shear	$\begin{bmatrix}1&k\\0&1\end{bmatrix}$	1	Slants
Projection	$\begin{bmatrix}1&0\\0&0\end{bmatrix}$	0	Collapses dimension
Zero	$\begin{bmatrix}0&0\\0&0\end{bmatrix}$	0	Maps everything to origin

Composition of Transformations

The composition of two linear transformations corresponds to matrix multiplication.

Composition

(T_2 \circ T_1)(\vec{x}) = T_2(T_1(\vec{x})) = (A_2 A_1)\vec{x}

Here,

$T_1, T_2$ =Linear transformations
$A_1, A_2$ =Their representing matrices

📝Example: Composition

Rotate by 90°, then reflect over x-axis:

\text{Ref}_x \cdot R(90°) = \begin{bmatrix} 1 & 0 \\ 0 & -1 \end{bmatrix}\begin{bmatrix} 0 & -1 \\ 1 & 0 \end{bmatrix} = \begin{bmatrix} 0 & -1 \\ -1 & 0 \end{bmatrix}

Compare with reverse order: $R(90°) \cdot \text{Ref}_x = \begin{bmatrix} 0 & 1 \\ 1 & 0 \end{bmatrix}$

Different results! Order matters.

💡 Key Insight

Composition = matrix multiplication. Since $AB \neq BA$ in general, the order of transformations matters. This is why layer order in neural networks matters.

Change of Basis

DfChange of Basis Matrix

If $B$ and $B'$ are two bases, the change of basis matrix $P$ satisfies:

[\vec{x}]_{B'} = P^{-1}[\vec{x}]_B

📝Example: Change of Basis

Let $B = \{[1,0], [0,1]\}$ (standard) and $B' = \{[1,1], [1,-1]\}$ (new basis).

The change of basis matrix from $B'$ to $B$ is $P = \begin{bmatrix} 1 & 1 \\ 1 & -1 \end{bmatrix}$ .

P^{-1} = \frac{1}{-2}\begin{bmatrix} -1 & -1 \\ -1 & 1 \end{bmatrix} = \begin{bmatrix} 0.5 & 0.5 \\ 0.5 & -0.5 \end{bmatrix}

Eigenvalues and Transformations

Eigenvalues reveal the geometric behavior of a transformation.

ℹ️ Geometric Interpretation

Positive eigenvalue: Stretches along eigenvector direction
Negative eigenvalue: Reflects and stretches
Zero eigenvalue: Collapses that direction
Complex eigenvalue: Rotation component

📝Example: Eigenvalues Tell the Story

$A = \begin{bmatrix} 2 & 0 \\ 0 & 3 \end{bmatrix}$ has eigenvalues 2 and 3.

This transformation stretches the x-axis by 2 and y-axis by 3 — it's a non-uniform scaling.

Affine vs Linear Transformations

DfAffine Transformation

An affine transformation is a linear transformation plus a translation:

T(\vec{x}) = A\vec{x} + \vec{b}

ℹ️ Homogeneous Coordinates

To represent affine transformations as matrix multiplication, use homogeneous coordinates:

\begin{bmatrix} A & \vec{b} \\ \vec{0}^T & 1 \end{bmatrix}\begin{bmatrix} \vec{x} \\ 1 \end{bmatrix} = \begin{bmatrix} A\vec{x} + \vec{b} \\ 1 \end{bmatrix}

This is used in computer graphics and robotics.

Python Implementation

import numpy as np

# Define a linear transformation matrix
A = np.array([[2, 1], [1, -1]])

# Apply transformation
x = np.array([1, 2])
Tx = A @ x  # [4, -1]

# Rotation
theta = np.pi / 4  # 45 degrees
R = np.array([[np.cos(theta), -np.sin(theta)],
              [np.sin(theta), np.cos(theta)]])

# Scale
S = np.array([[2, 0], [0, 0.5]])

# Composition: rotate then scale
M = S @ R

# Kernel (null space)
from scipy.linalg import null_space
kernel = null_space(A)
print(f"Kernel: {kernel.flatten()}")

# Image (column space)
from scipy.linalg import column_space
# Or simply: rank determines dimension
rank = np.linalg.matrix_rank(A)
print(f"Rank: {rank}, Image dimension: {rank}")

# Verify linearity: T(u + v) = T(u) + T(v)
u, v = np.array([1, 0]), np.array([0, 1])
lhs = A @ (u + v)
rhs = A @ u + A @ v
print(f"T(u+v) = T(u)+T(v): {np.allclose(lhs, rhs)}")

# Verify: T(cu) = cT(u)
c = 3
lhs = A @ (c * u)
rhs = c * (A @ u)
print(f"T(cu) = cT(u): {np.allclose(lhs, rhs)}")

Applications in AI/ML

Application	Transformation	How It's Used
Neural Network Layer	$W\vec{x} + \vec{b}$	Linear + bias, then activation
Data Augmentation	Rotation, flip, crop	Increases training data diversity
Attention Mechanism	$QK^T$ projection	Projects query-key similarity
PCA	Projection onto principal components	Dimensionality reduction
Word Embeddings	Linear map from vocabulary to semantic space	Maps one-hot to dense vectors
Computer Graphics	Rotation, scaling, projection	Rendering 3D scenes

💡 Deep Learning Insight

A deep network is a composition of transformations: $T = T_L \circ T_{L-1} \circ \cdots \circ T_1$ . Each $T_i(\vec{x}) = \sigma(W_i \vec{x} + \vec{b}_i)$ . Without the non-linearity $\sigma$ , the entire network collapses to a single linear transformation $T(\vec{x}) = W\vec{x}$ — no matter how many layers!

Common Mistakes

Mistake	Why It's Wrong	Correct Approach
Assuming order doesn't matter	$AB \neq BA$ generally	Always specify transformation order
Forgetting non-linearity	Stack of linear transforms = one linear transform	Add activation between layers
Confusing kernel with image	Kernel = inputs→0, Image = all outputs	Use Rank-Nullity to relate them
Assuming determinant ≠ 0 means invertible over any field	True for ℝ but not always	Check rank for robustness
Using wrong basis for matrix	Matrix depends on basis	Use change-of-basis when switching

Interview Questions

Q1: What's the kernel of a transformation and why does it matter? A: The kernel (null space) is the set of inputs mapping to zero. Its dimension tells you how much information is lost. In neural networks, a large kernel means many inputs produce the same output — bad for discrimination.

Q2: Why must neural networks have non-linearities? A: Without them, any number of layers collapses to a single linear transformation $T(\vec{x}) = W\vec{x}$ . Non-linearities (ReLU, sigmoid) break this collapse, allowing the network to learn complex decision boundaries.

Q3: What's the geometric meaning of a determinant of zero? A: The transformation collapses space to a lower dimension. The kernel is non-trivial (contains more than just $\vec{0}$ ), and the transformation is not invertible.

Q4: How does data augmentation relate to transformations? A: Data augmentation applies random transformations (rotation, flip, crop) to training data. This makes the model invariant to those transformations, improving generalization.

Q5: What's the relationship between eigenvalues and transformation behavior? A: Eigenvalues tell you how the transformation scales along eigenvector directions. Positive eigenvalues stretch, negative eigenvalues reflect, zero eigenvalues collapse, and complex eigenvalues rotate.

Practice Problems

Find the matrix representation of $T: \mathbb{R}^2 \to \mathbb{R}^2$ defined by $T([x, y]) = [x + y, x - y]$ .
Find the kernel and image of $A = \begin{bmatrix} 1 & 2 \\ 3 & 6 \end{bmatrix}$ .
What transformation does $R(45°) \cdot R(45°)$ represent?
Is $T(\vec{x}) = \|\vec{x}\|$ a linear transformation? Prove or disprove.

💡Solutions

Problem 1: $T(\vec{e}_1) = [1, 1]$ , $T(\vec{e}_2) = [1, -1]$ . Matrix: $A = \begin{bmatrix} 1 & 1 \\ 1 & -1 \end{bmatrix}$ .

Problem 2: Row reduce: $\begin{bmatrix} 1 & 2 \\ 0 & 0 \end{bmatrix}$ . Kernel = span $\{[-2, 1]^T\}$ , Image = span $\{[1, 3]^T\}$ . Rank-Nullity: $1 + 1 = 2$ ✓.

Problem 3: $R(45°) \cdot R(45°) = R(90°)$ . Rotation by 90°.

Problem 4: $T(\vec{0}) = 0$ ✓. But $T(2\vec{x}) = 2\|\vec{x}\| = 2T(\vec{x})$ ✓. However, $T(\vec{x} + \vec{y}) = \|\vec{x} + \vec{y}\| \neq \|\vec{x}\| + \|\vec{y}\| = T(\vec{x}) + T(\vec{y})$ in general. Not linear (violates additivity).

Quick Reference

Transformation	Matrix	Properties
Rotation (θ)	$\begin{bmatrix}\cos\theta&-\sin\theta\\\sin\theta&\cos\theta\end{bmatrix}$	Preserves lengths, det=1
Scaling ( $s_x, s_y$ )	$\begin{bmatrix}s_x&0\\0&s_y\end{bmatrix}$	Stretches, det= $s_x s_y$
Reflection	$\begin{bmatrix}1&0\\0&-1\end{bmatrix}$	Preserves lengths, det=-1
Shear	$\begin{bmatrix}1&k\\0&1\end{bmatrix}$	Preserves area, det=1
Projection	$\begin{bmatrix}1&0\\0&0\end{bmatrix}$	Collapses, det=0

Property	Formula
Kernel	$\ker(T) = \{\vec{x}: T(\vec{x}) = \vec{0}\}$
Image	$\text{im}(T) = \{T(\vec{x}): \vec{x} \in V\}$
Rank-Nullity	$\dim(\ker) + \dim(\text{im}) = \dim(V)$
Composition	$T_2 \circ T_1 \leftrightarrow A_2 A_1$

Cross-References

Vectors and vector spaces
Matrix operations
Eigenvalues and eigenvectors
Basis and dimension
Matrix calculus for backpropagation
Chain rule (composition of functions)