Linear Transformations
βΉοΈ Why It Matters
Every layer in a neural network is a linear transformation followed by a non-linearity. Computer graphics uses transformations to rotate, scale, and project objects. Physics engines simulate reality through transformations. Understanding them is key to understanding how data flows through any system.
What is a Linear Transformation?
DfLinear Transformation
A function between vector spaces and is linear if for all vectors and all scalars :
- Additivity:
- Homogeneity:
β οΈ Non-Example
is not linear because . Linear transformations must map the zero vector to the zero vector.
Matrix Representation
Every linear transformation can be written as matrix multiplication.
Matrix Representation
Here,
- =mΓn matrix representing the transformation
- =Input vector in ββΏ
- =Output vector in βα΅
How to find A: The columns of are where are the standard basis vectors.
πExample: Finding Transformation Matrix
Let be defined by .
So
Kernel and Image
DfKernel (Null Space)
The kernel of is the set of all vectors that map to zero:
DfImage (Range)
The image of is the set of all possible outputs:
ThRank-Nullity Theorem
For a linear transformation where is finite-dimensional:
πExample: Kernel and Image
Let .
Kernel: Solve : . , .
Image: Column space = span of columns = , .
Check: β
Common Transformations
Rotation
2D Rotation
Here,
- =Angle of rotation (counterclockwise)
πExample: 90Β° Rotation
The point rotates to . β
Scaling
Scaling
Here,
- =Scale factor in x-direction
- =Scale factor in y-direction
Reflection
πExample: Reflection over x-axis
Shear
Horizontal Shear
Here,
- =Shear factor
Projection
πExample: Projection onto x-axis
Summary Table:
| Transformation | Matrix | Determinant | Effect |
|---|---|---|---|
| Rotation (ΞΈ) | 1 | Rotates, preserves lengths | |
| Scaling () | Stretches/shrinks | ||
| Reflection | -1 | Flips, preserves lengths | |
| Shear | 1 | Slants | |
| Projection | 0 | Collapses dimension | |
| Zero | 0 | Maps everything to origin |
Composition of Transformations
The composition of two linear transformations corresponds to matrix multiplication.
Composition
Here,
- =Linear transformations
- =Their representing matrices
πExample: Composition
Rotate by 90Β°, then reflect over x-axis:
Compare with reverse order:
Different results! Order matters.
π‘ Key Insight
Composition = matrix multiplication. Since in general, the order of transformations matters. This is why layer order in neural networks matters.
Change of Basis
DfChange of Basis Matrix
If and are two bases, the change of basis matrix satisfies:
πExample: Change of Basis
Let (standard) and (new basis).
The change of basis matrix from to is .
Eigenvalues and Transformations
Eigenvalues reveal the geometric behavior of a transformation.
βΉοΈ Geometric Interpretation
- Positive eigenvalue: Stretches along eigenvector direction
- Negative eigenvalue: Reflects and stretches
- Zero eigenvalue: Collapses that direction
- Complex eigenvalue: Rotation component
πExample: Eigenvalues Tell the Story
has eigenvalues 2 and 3.
This transformation stretches the x-axis by 2 and y-axis by 3 β it's a non-uniform scaling.
Affine vs Linear Transformations
DfAffine Transformation
An affine transformation is a linear transformation plus a translation:
βΉοΈ Homogeneous Coordinates
To represent affine transformations as matrix multiplication, use homogeneous coordinates:
This is used in computer graphics and robotics.
Python Implementation
import numpy as np
# Define a linear transformation matrix
A = np.array([[2, 1], [1, -1]])
# Apply transformation
x = np.array([1, 2])
Tx = A @ x # [4, -1]
# Rotation
theta = np.pi / 4 # 45 degrees
R = np.array([[np.cos(theta), -np.sin(theta)],
[np.sin(theta), np.cos(theta)]])
# Scale
S = np.array([[2, 0], [0, 0.5]])
# Composition: rotate then scale
M = S @ R
# Kernel (null space)
from scipy.linalg import null_space
kernel = null_space(A)
print(f"Kernel: {kernel.flatten()}")
# Image (column space)
from scipy.linalg import column_space
# Or simply: rank determines dimension
rank = np.linalg.matrix_rank(A)
print(f"Rank: {rank}, Image dimension: {rank}")
# Verify linearity: T(u + v) = T(u) + T(v)
u, v = np.array([1, 0]), np.array([0, 1])
lhs = A @ (u + v)
rhs = A @ u + A @ v
print(f"T(u+v) = T(u)+T(v): {np.allclose(lhs, rhs)}")
# Verify: T(cu) = cT(u)
c = 3
lhs = A @ (c * u)
rhs = c * (A @ u)
print(f"T(cu) = cT(u): {np.allclose(lhs, rhs)}")
Applications in AI/ML
| Application | Transformation | How It's Used |
|---|---|---|
| Neural Network Layer | Linear + bias, then activation | |
| Data Augmentation | Rotation, flip, crop | Increases training data diversity |
| Attention Mechanism | projection | Projects query-key similarity |
| PCA | Projection onto principal components | Dimensionality reduction |
| Word Embeddings | Linear map from vocabulary to semantic space | Maps one-hot to dense vectors |
| Computer Graphics | Rotation, scaling, projection | Rendering 3D scenes |
π‘ Deep Learning Insight
A deep network is a composition of transformations: . Each . Without the non-linearity , the entire network collapses to a single linear transformation β no matter how many layers!
Common Mistakes
| Mistake | Why It's Wrong | Correct Approach |
|---|---|---|
| Assuming order doesn't matter | generally | Always specify transformation order |
| Forgetting non-linearity | Stack of linear transforms = one linear transform | Add activation between layers |
| Confusing kernel with image | Kernel = inputsβ0, Image = all outputs | Use Rank-Nullity to relate them |
| Assuming determinant β 0 means invertible over any field | True for β but not always | Check rank for robustness |
| Using wrong basis for matrix | Matrix depends on basis | Use change-of-basis when switching |
Interview Questions
Q1: What's the kernel of a transformation and why does it matter? A: The kernel (null space) is the set of inputs mapping to zero. Its dimension tells you how much information is lost. In neural networks, a large kernel means many inputs produce the same output β bad for discrimination.
Q2: Why must neural networks have non-linearities? A: Without them, any number of layers collapses to a single linear transformation . Non-linearities (ReLU, sigmoid) break this collapse, allowing the network to learn complex decision boundaries.
Q3: What's the geometric meaning of a determinant of zero? A: The transformation collapses space to a lower dimension. The kernel is non-trivial (contains more than just ), and the transformation is not invertible.
Q4: How does data augmentation relate to transformations? A: Data augmentation applies random transformations (rotation, flip, crop) to training data. This makes the model invariant to those transformations, improving generalization.
Q5: What's the relationship between eigenvalues and transformation behavior? A: Eigenvalues tell you how the transformation scales along eigenvector directions. Positive eigenvalues stretch, negative eigenvalues reflect, zero eigenvalues collapse, and complex eigenvalues rotate.
Practice Problems
- Find the matrix representation of defined by .
- Find the kernel and image of .
- What transformation does represent?
- Is a linear transformation? Prove or disprove.
π‘Solutions
Problem 1: , . Matrix: .
Problem 2: Row reduce: . Kernel = span, Image = span. Rank-Nullity: β.
Problem 3: . Rotation by 90Β°.
Problem 4: β. But β. However, in general. Not linear (violates additivity).
Quick Reference
| Transformation | Matrix | Properties |
|---|---|---|
| Rotation (ΞΈ) | Preserves lengths, det=1 | |
| Scaling () | Stretches, det= | |
| Reflection | Preserves lengths, det=-1 | |
| Shear | Preserves area, det=1 | |
| Projection | Collapses, det=0 |
| Property | Formula |
|---|---|
| Kernel | |
| Image | |
| Rank-Nullity | |
| Composition |