Matrix Operations and Properties

Why It Matters

Matrices are the backbone of modern computing, data science, and artificial intelligence. Every digital image you view, every recommendation you receive, and every voice assistant you interact with relies on matrix operations under the hood. Understanding matrices unlocks:

Data Representation: Tabular data in machine learning is literally a matrix — rows are samples, columns are features.
Geometric Transformations: Rotating, scaling, and translating objects in computer graphics is all matrix multiplication.
Neural Networks: Deep learning models perform millions of matrix multiplications per inference.
Optimization: Solving systems of linear equations (and thus optimizing loss functions) depends on matrix algebra.
Quantum Mechanics: Physical systems are described using matrices and linear operators.

Without matrix algebra, fields like computer vision, natural language processing, robotics, and computational finance would not exist in their current form.

What is a Matrix?

A matrix is a rectangular array of numbers (or expressions) arranged in rows and columns. It is the fundamental tool for representing and computing linear transformations, solving systems of equations, and organizing data.

DfMatrix

A matrix $A$ of size $m \times n$ (read "m by n") is a rectangular array with $m$ rows and $n$ columns. Each entry $a_{ij}$ represents the element in the $i$ -th row and $j$ -th column. Matrices are denoted with uppercase bold letters like $\mathbf{A}$ .

General Matrix Notation

A = \begin{bmatrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & \cdots & a_{mn} \end{bmatrix} \in \mathbb{R}^{m \times n}

Here,

$A$ =The matrix (m rows, n columns)
$a_{ij}$ =Element in the i-th row and j-th column
$\mathbb{R}^{m \times n}$ =Set of all real m×n matrices
$m, n$ =Positive integers representing dimensions

Real-World Analogy: A matrix is like a spreadsheet — numbers organized in a grid where each cell's position determines its role. A $3 \times 2$ matrix could represent 3 students' scores on 2 exams.

ℹ️ Notation Convention

When we write $A \in \mathbb{R}^{m \times n}$ , we mean $A$ is a matrix with $m$ rows and $n$ columns, and all entries are real numbers. For complex entries, we write $A \in \mathbb{C}^{m \times n}$ .

Types of Matrices

Understanding matrix types helps you choose the right operations and recognize special structure.

Type	Description	Dimensions	Example
Square Matrix	Same number of rows and columns ( $m = n$ )	$n \times n$	$\begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix}$
Row Matrix	Single row	$1 \times n$	$\begin{bmatrix} 1 & 2 & 3 \end{bmatrix}$
Column Matrix	Single column	$m \times 1$	$\begin{bmatrix} 1 \\ 2 \\ 3 \end{bmatrix}$
Identity Matrix	1s on diagonal, 0s elsewhere ( $I_n$ )	$n \times n$	$\begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}$
Zero Matrix	All elements are zero ( $\mathbf{0}$ )	$m \times n$	$\begin{bmatrix} 0 & 0 \\ 0 & 0 \end{bmatrix}$
Diagonal Matrix	Non-zero entries only on main diagonal	$n \times n$	$\begin{bmatrix} 3 & 0 \\ 0 & 7 \end{bmatrix}$
Upper Triangular	All entries below diagonal are zero	$n \times n$	$\begin{bmatrix} 1 & 2 & 3 \\ 0 & 4 & 5 \\ 0 & 0 & 6 \end{bmatrix}$
Lower Triangular	All entries above diagonal are zero	$n \times n$	$\begin{bmatrix} 1 & 0 & 0 \\ 2 & 3 & 0 \\ 4 & 5 & 6 \end{bmatrix}$
Symmetric Matrix	$A = A^T$ (equals its transpose)	$n \times n$	$\begin{bmatrix} 1 & 2 \\ 2 & 1 \end{bmatrix}$
Orthogonal Matrix	$A^T A = A A^T = I$	$n \times n$	$\begin{bmatrix} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{bmatrix}$
Sparse Matrix	Most elements are zero	Any	$\begin{bmatrix} 0 & 0 & 3 \\ 0 & 0 & 0 \\ 0 & 7 & 0 \end{bmatrix}$
Dense Matrix	Most elements are non-zero	Any	$\begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{bmatrix}$

ℹ️ Storage Efficiency

Sparse matrices are common in real-world applications (e.g., recommendation systems, graph adjacency matrices). Storing only non-zero entries can save enormous amounts of memory — a $10{,}000 \times 10{,}000$ sparse matrix might need only kilobytes instead of gigabytes.

Matrix Transpose

The transpose of a matrix flips it over its main diagonal, swapping rows and columns.

Matrix Transpose

(A^T)_{ij} = A_{ji}

Here,

$A^T$ =The transpose of matrix A
$(A^T)_{ij}$ =Element at position (i,j) in the transpose
$A_{ji}$ =Element at position (j,i) in the original matrix

📝Example: Transpose

If $A = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix}$ (a $2 \times 3$ matrix), then:

A^T = \begin{bmatrix} 1 & 4 \\ 2 & 5 \\ 3 & 6 \end{bmatrix} \text{ (a } 3 \times 2 \text{ matrix)}

Notice: the first row of $A$ becomes the first column of $A^T$ , and the second row becomes the second column.

Properties of Transpose

Property	Formula	Description
Involution	$(A^T)^T = A$	Transposing twice returns the original matrix
Addition	$(A + B)^T = A^T + B^T$	Transpose of a sum equals sum of transposes
Scalar Multiplication	$(cA)^T = c A^T$	Scalar factors pass through the transpose
Multiplication	$(AB)^T = B^T A^T$	Order reverses when transposing a product
Inverse	$(A^{-1})^T = (A^T)^{-1}$	Transpose of inverse equals inverse of transpose
Determinant	$\det(A^T) = \det(A)$	Transpose does not change the determinant

⚠️ Common Mistake

$(AB)^T = B^T A^T$ , not $A^T B^T$ . The order of multiplication reverses! This is analogous to $(ab)^{-1} = b^{-1} a^{-1}$ for numbers.

Matrix Addition

Matrix addition combines two matrices of the same dimensions element-wise.

Matrix Addition

C = A + B \implies c_{ij} = a_{ij} + b_{ij}

Here,

$A$ =First matrix (m × n)
$B$ =Second matrix (m × n) — must match A's dimensions
$C$ =Result matrix (m × n)
$c_{ij}$ =Sum of corresponding elements

📝Example: Matrix Addition

\begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} + \begin{bmatrix} 5 & 6 \\ 7 & 8 \end{bmatrix} = \begin{bmatrix} 1+5 & 2+6 \\ 3+7 & 4+8 \end{bmatrix} = \begin{bmatrix} 6 & 8 \\ 10 & 12 \end{bmatrix}

Properties of Addition

Property	Formula
Commutativity	$A + B = B + A$
Associativity	$(A + B) + C = A + (B + C)$
Identity	$A + \mathbf{0} = A$ (adding the zero matrix)
Inverse	$A + (-A) = \mathbf{0}$ (additive inverse)

Scalar Multiplication

Scalar multiplication multiplies every element of a matrix by a single number (scalar).

Scalar Multiplication

cA = \begin{bmatrix} c \cdot a_{11} & c \cdot a_{12} & \cdots & c \cdot a_{1n} \\ c \cdot a_{21} & c \cdot a_{22} & \cdots & c \cdot a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ c \cdot a_{m1} & c \cdot a_{m2} & \cdots & c \cdot a_{mn} \end{bmatrix}

Here,

$c$ =A scalar (single number)
$A$ =A matrix (m × n)
$cA$ =Result: every element of A multiplied by c

📝Example: Scalar Multiplication

3 \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} = \begin{bmatrix} 3 & 6 \\ 9 & 12 \end{bmatrix}

Properties of Scalar Multiplication

Property	Formula
Associativity	$c(dA) = (cd)A$
Distributivity	$c(A + B) = cA + cB$
Distributivity	$(c + d)A = cA + dA$
Identity	$1 \cdot A = A$

Matrix Multiplication

Matrix multiplication is the most important (and most frequently used) matrix operation. It combines two matrices to produce a new matrix through a specific rule of dot products.

Matrix Multiplication

C = AB \implies c_{ij} = \sum_{k=1}^{n} a_{ik} b_{kj}

Here,

$A$ =First matrix (m × n)
$B$ =Second matrix (n × p) — columns of A must match rows of B
$C$ =Result matrix (m × p)
$c_{ij}$ =Dot product of i-th row of A and j-th column of B

⚠️ Dimension Rule

To multiply $A \times B$ : if $A$ is $m \times n$ and $B$ is $n \times p$ , the result $C$ is $m \times p$ . The inner dimensions ( $n$ ) must match; otherwise multiplication is undefined.

📝Example 1: 2×2 Matrix Multiplication

Let $A = \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix}$ and $B = \begin{bmatrix} 5 & 6 \\ 7 & 8 \end{bmatrix}$ :

C = AB = \begin{bmatrix} (1)(5) + (2)(7) & (1)(6) + (2)(8) \\ (3)(5) + (4)(7) & (3)(6) + (4)(8) \end{bmatrix} = \begin{bmatrix} 19 & 22 \\ 43 & 50 \end{bmatrix}

📝Example 2: Non-Commutative Multiplication

Using the same $A$ and $B$ :

BA = \begin{bmatrix} 5 & 6 \\ 7 & 8 \end{bmatrix}\begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} = \begin{bmatrix} (5)(1)+(6)(3) & (5)(2)+(6)(4) \\ (7)(1)+(8)(3) & (7)(2)+(8)(4) \end{bmatrix} = \begin{bmatrix} 23 & 38 \\ 31 & 50 \end{bmatrix}

Since $AB = \begin{bmatrix} 19 & 22 \\ 43 & 50 \end{bmatrix} \neq BA = \begin{bmatrix} 23 & 38 \\ 31 & 50 \end{bmatrix}$ , matrix multiplication is not commutative.

📝Example 3: Rectangular Matrices

Let $A = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix}$ ( $2 \times 3$ ) and $B = \begin{bmatrix} 7 & 8 \\ 9 & 10 \\ 11 & 12 \end{bmatrix}$ ( $3 \times 2$ ):

AB = \begin{bmatrix} (1)(7)+(2)(9)+(3)(11) & (1)(8)+(2)(10)+(3)(12) \\ (4)(7)+(5)(9)+(6)(11) & (4)(8)+(5)(10)+(6)(12) \end{bmatrix} = \begin{bmatrix} 58 & 64 \\ 139 & 154 \end{bmatrix}

Result: $2 \times 2$ matrix.

Properties of Matrix Multiplication

Property	Formula
Associativity	$(AB)C = A(BC)$
Left Distributivity	$A(B + C) = AB + AC$
Right Distributivity	$(A + B)C = AC + BC$
Scalar Commutes	$c(AB) = (cA)B = A(cB)$
Identity	$AI = IA = A$
Not Commutative	$AB \neq BA$ (in general)

💡 Computational Complexity

Multiplying an $m \times n$ matrix by an $n \times p$ matrix requires $O(mnp)$ scalar multiplications. For two $n \times n$ matrices, naive multiplication is $O(n^3)$ . Fast algorithms (Strassen, Coppersmith-Winograd) achieve $O(n^{2.37...})$ but are rarely used in practice due to constant factors.

Matrix Inverse

The inverse of a matrix is the matrix equivalent of the reciprocal of a number. It "undoes" the transformation performed by the original matrix.

Matrix Inverse

AA^{-1} = A^{-1}A = I

Here,

$A$ =The original square matrix (must be n × n)
$A^{-1}$ =The inverse of A
$I$ =The n×n identity matrix

⚠️ Existence Condition

The inverse exists only if $\det(A) \neq 0$ . Such a matrix is called non-singular or invertible. If $\det(A) = 0$ , the matrix is singular and has no inverse.

Formula for 2×2 Inverse

2×2 Matrix Inverse

A^{-1} = \frac{1}{ad - bc}\begin{bmatrix} d & -b \\ -c & a \end{bmatrix}

Here,

$A$ =A 2×2 matrix
$ad - bc$ =The determinant of A

📝Example: 2×2 Inverse

For $A = \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix}$ :

\det(A) = (1)(4) - (2)(3) = 4 - 6 = -2

A^{-1} = \frac{1}{-2}\begin{bmatrix} 4 & -2 \\ -3 & 1 \end{bmatrix} = \begin{bmatrix} -2 & 1 \\ 1.5 & -0.5 \end{bmatrix}

Verification: $A A^{-1} = \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix}\begin{bmatrix} -2 & 1 \\ 1.5 & -0.5 \end{bmatrix} = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} = I$ ✓

Properties of Inverse

Property	Formula
Double Inverse	$(A^{-1})^{-1} = A$
Transpose Inverse	$(A^T)^{-1} = (A^{-1})^T$
Product Inverse	$(AB)^{-1} = B^{-1} A^{-1}$ (order reverses!)
Scalar Inverse	$(cA)^{-1} = \frac{1}{c} A^{-1}$ (for $c \neq 0$ )

Methods for Computing Inverse

For matrices larger than $2 \times 2$ , common methods include:

Gauss-Jordan Elimination: Augment $[A | I]$ and row reduce to $[I | A^{-1}]$ .
Adjugate Method: $A^{-1} = \frac{1}{\det(A)} \text{adj}(A)$ where $\text{adj}(A)$ is the adjugate (transpose of cofactor matrix).
LU Decomposition: Factor $A = LU$ , then solve $LUx = I$ column by column.

💡 Numerical Stability

In practice, never compute $A^{-1}$ explicitly to solve $Ax = b$ . Instead, solve $Ax = b$ directly using LU decomposition or QR factorization. Explicit inversion is numerically unstable and computationally wasteful.

Determinant

The determinant is a scalar value computed from a square matrix that encodes essential properties: whether the matrix is invertible, how it scales volumes, and whether it preserves or reverses orientation.

Determinant (2×2)

\det(A) = \begin{vmatrix} a & b \\ c & d \end{vmatrix} = ad - bc

Here,

$A$ =A 2×2 matrix
$a, b, c, d$ =Elements of the matrix
$\det(A)$ =The determinant (scalar value)

Determinant (3×3) — Cofactor Expansion

\det(A) = \begin{vmatrix} a & b & c \\ d & e & f \\ g & h & i \end{vmatrix} = a(ei - fh) - b(di - fg) + c(dh - eg)

Here,

$A$ =A 3×3 matrix
$a, b, c, ...$ =Elements of the matrix

📝Example: 2×2 Determinant

A = \begin{bmatrix} 3 & 8 \\ 4 & 6 \end{bmatrix}

\det(A) = (3)(6) - (8)(4) = 18 - 32 = -14

Since $\det(A) = -14 \neq 0$ , the matrix is invertible.

📝Example: 3×3 Determinant

A = \begin{bmatrix} 1 & 2 & 3 \\ 0 & 4 & 5 \\ 1 & 0 & 6 \end{bmatrix}

\det(A) = 1(4 \cdot 6 - 5 \cdot 0) - 2(0 \cdot 6 - 5 \cdot 1) + 3(0 \cdot 0 - 4 \cdot 1)

= 1(24) - 2(-5) + 3(-4) = 24 + 10 - 12 = 22

Geometric Meaning

$\det(A)$	Interpretation
$\det(A) > 0$	Matrix preserves orientation; scales area by $\det(A)$
$\det(A) < 0$	Matrix reverses orientation; scales area by $\|\det(A)\|$
$\det(A) = 0$	Matrix collapses space to a lower dimension (singular)
$\det(A) = 1$	Matrix preserves area exactly (unimodular)

Intuition: For a $2 \times 2$ matrix, $|\det(A)|$ is the area of the parallelogram formed by the column vectors. For a $3 \times 3$ matrix, it is the volume of the parallelepiped.

Properties of Determinants

Property	Formula
Identity	$\det(I) = 1$
Row Swap	Swapping two rows negates the determinant
Scalar	$\det(cA) = c^n \det(A)$ for an $n \times n$ matrix
Product	$\det(AB) = \det(A) \cdot \det(B)$
Transpose	$\det(A^T) = \det(A)$
Inverse	$\det(A^{-1}) = \frac{1}{\det(A)}$
Zero Rows	If a row is all zeros, $\det(A) = 0$
Equal Rows	If two rows are identical, $\det(A) = 0$

Trace

The trace of a square matrix is the sum of its diagonal elements. Despite its simplicity, the trace has profound connections to eigenvalues and matrix similarity.

Trace

\text{tr}(A) = \sum_{i=1}^{n} a_{ii} = a_{11} + a_{22} + \cdots + a_{nn}

Here,

$A$ =An n×n square matrix
$a_{ii}$ =Diagonal elements
$\text{tr}(A)$ =The trace (scalar)

📝Example: Trace

A = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{bmatrix}

\text{tr}(A) = 1 + 5 + 9 = 15

Properties of Trace

Property	Formula
Linearity	$\text{tr}(A + B) = \text{tr}(A) + \text{tr}(B)$
Scalar	$\text{tr}(cA) = c \cdot \text{tr}(A)$
Cyclic	$\text{tr}(AB) = \text{tr}(BA)$ (note: multiplication order doesn't matter for trace)
Transpose	$\text{tr}(A) = \text{tr}(A^T)$
Eigenvalues	$\text{tr}(A) = \sum_{i=1}^{n} \lambda_i$ (sum of eigenvalues)

Rank

The rank of a matrix is the maximum number of linearly independent rows (or columns). It tells you the "effective dimensionality" of the matrix.

DfMatrix Rank

The rank of a matrix $A$ , denoted $\text{rank}(A)$ or $\text{rk}(A)$ , is the dimension of the column space (or equivalently, the row space) of $A$ .

📝Example: Rank

A = \begin{bmatrix} 1 & 2 & 3 \\ 2 & 4 & 6 \\ 1 & 3 & 5 \end{bmatrix}

Row 2 = 2 × Row 1, so it's redundant. Row 3 is not a multiple of Row 1. Thus $\text{rank}(A) = 2$ .

Properties of Rank

Property	Description
$0 \leq \text{rank}(A) \leq \min(m, n)$	Rank is bounded by matrix dimensions
Full Rank	$\text{rank}(A) = \min(m, n)$ — no redundant rows/columns
$\text{rank}(A) = n$ (square)	$A$ is invertible
$\text{rank}(AB) \leq \min(\text{rank}(A), \text{rank}(B))$	Product rank is bounded by component ranks
$\text{rank}(A^T) = \text{rank}(A)$	Rank is invariant under transpose
Rank-Nullity Theorem	$\text{rank}(A) + \text{nullity}(A) = n$ for $A \in \mathbb{R}^{m \times n}$

ℹ️ Rank in Practice

A matrix with rank less than its size is called rank-deficient. This means rows or columns are linearly dependent, and the matrix is singular (non-invertible for square matrices).

Special Matrices

Some matrices have unique properties that make them particularly important:

DfOrthogonal Matrix

A square matrix $Q$ is orthogonal if $Q^T Q = Q Q^T = I$ . Equivalently, $Q^{-1} = Q^T$ . Orthogonal matrices represent rotations and reflections, and they preserve lengths and angles (Euclidean norm).

DfSymmetric Matrix

A matrix $A$ is symmetric if $A = A^T$ . Symmetric matrices have real eigenvalues and are always diagonalizable. They appear frequently in covariance matrices and Hessian matrices.

DfPositive Definite Matrix

A symmetric matrix $A$ is positive definite if $x^T A x > 0$ for all non-zero vectors $x$ . Positive definite matrices have all positive eigenvalues and appear in optimization (second-order conditions) and covariance matrices.

Special Matrix	Key Property	Example
Idempotent	$A^2 = A$	$\begin{bmatrix} 1 & 0 \\ 0 & 0 \end{bmatrix}$ (projection)
Nilpotent	$A^k = 0$ for some $k$	$\begin{bmatrix} 0 & 1 \\ 0 & 0 \end{bmatrix}$ ( $k=2$ )
Involutory	$A^2 = I$ (self-inverse)	$\begin{bmatrix} 0 & 1 \\ 1 & 0 \end{bmatrix}$ (swap)
Permutation	Each row/col has exactly one 1	$\begin{bmatrix} 0 & 1 \\ 1 & 0 \end{bmatrix}$
Stochastic	Rows/cols sum to 1 (probabilities)	$\begin{bmatrix} 0.7 & 0.3 \\ 0.4 & 0.6 \end{bmatrix}$
Diagonal	$a_{ij} = 0$ for $i \neq j$	$\begin{bmatrix} 3 & 0 \\ 0 & 7 \end{bmatrix}$

Python Implementation

NumPy provides efficient tools for all matrix operations:

import numpy as np

# Define matrices
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])

# Matrix addition
C = A + B
print("A + B:\n", C)
# [[ 6  8]
#  [10 12]]

# Scalar multiplication
print("3A:\n", 3 * A)
# [[ 3  6]
#  [ 9 12]]

# Transpose
print("A^T:\n", A.T)
# [[1 3]
#  [2 4]]

# Matrix multiplication (two ways)
print("A @ B:\n", A @ B)       # preferred
print("np.dot(A, B):\n", np.dot(A, B))
# [[19 22]
#  [43 50]]

# Determinant
print("det(A):", np.linalg.det(A))  # -2.0

# Inverse
A_inv = np.linalg.inv(A)
print("A^{-1}:\n", A_inv)
# [[-2.   1. ]
#  [ 1.5 -0.5]]

# Verify: A @ A^{-1} = I
print("A @ A^{-1}:\n", A @ A_inv)
# [[1. 0.]
#  [0. 1.]]

# Trace
print("tr(A):", np.trace(A))  # 5

# Rank
print("rank(A):", np.linalg.matrix_rank(A))  # 2

# Solve linear system Ax = b
b = np.array([5, 11])
x = np.linalg.solve(A, b)
print("x:", x)  # [1. 2.]

# Eigenvalues and eigenvectors
eigenvalues, eigenvectors = np.linalg.eig(A)
print("Eigenvalues:", eigenvalues)
print("Eigenvectors:\n", eigenvectors)

# 3x3 Determinant
M = np.array([[1, 2, 3], [0, 4, 5], [1, 0, 6]])
print("det(3x3):", np.linalg.det(M))  # 22.0

# Check if matrix is symmetric
print("A symmetric?", np.allclose(A, A.T))  # False
S = np.array([[1, 2], [2, 3]])
print("S symmetric?", np.allclose(S, S.T))  # True

# Check if matrix is orthogonal
Q = np.array([[np.cos(np.pi/4), -np.sin(np.pi/4)],
              [np.sin(np.pi/4),  np.cos(np.pi/4)]])
print("Q orthogonal?", np.allclose(Q @ Q.T, np.eye(2)))  # True

# Sparse matrix
from scipy import sparse
S_sparse = sparse.csr_matrix(np.array([[0, 0, 3], [0, 0, 0], [0, 7, 0]]))
print("Sparse matrix:\n", S_sparse)
print("Dense form:\n", S_sparse.toarray())

Applications in AI/ML

Matrices are not just theoretical constructs — they are the computational engine of modern AI and machine learning:

Data Representation

Feature matrices: In supervised learning, data is an $n \times d$ matrix where $n$ is the number of samples and $d$ is the number of features.
Image data: A grayscale image is a matrix of pixel intensities. Color images are 3D tensors (height × width × channels).

Neural Networks

Each layer computes $h = \sigma(Wx + b)$ where $W$ is a weight matrix, $x$ is the input vector, $b$ is a bias vector, and $\sigma$ is an activation function.
Training involves computing gradients of the loss with respect to weight matrices using the chain rule (backpropagation), which is fundamentally matrix calculus.

Dimensionality Reduction

PCA (Principal Component Analysis): Finds the principal directions (eigenvectors) of the data covariance matrix to reduce dimension while preserving variance.
SVD (Singular Value Decomposition): Factorizes any matrix $A = U\Sigma V^T$ into rotation, scaling, and rotation — used for compression, noise reduction, and recommendation systems.

Computer Vision

Homogeneous coordinates: 3D transformations (rotation, translation, scaling) are represented as $4 \times 4$ matrices.
Convolutional layers: Apply learnable filter matrices to input images.

Natural Language Processing

Word embeddings: Words are represented as dense vectors; similarity is computed via matrix operations (dot products).
Attention mechanisms: The transformer architecture computes attention scores as matrix multiplications: $\text{Attention}(Q, K, V) = \text{softmax}\left(\frac{QK^T}{\sqrt{d_k}}\right)V$ .

Recommendation Systems

User-item interaction matrices are factorized (using SVD or matrix factorization) to predict missing entries.

Graph Neural Networks

Graphs are represented as adjacency matrices; message passing involves matrix multiplications.

Common Mistakes

Mistake	Why It's Wrong	Correct Approach
Assuming $AB = BA$	Matrix multiplication is not commutative	Always check dimension compatibility and order
Adding matrices of different sizes	Undefined operation	Verify both matrices have identical dimensions
Forgetting $(AB)^T = B^T A^T$	Transpose reverses multiplication order	Remember: transpose each factor and reverse the product
Using $A^{-1}$ to solve $Ax = b$	Numerically unstable and wasteful	Use `np.linalg.solve(A, b)` directly
Confusing element-wise and matrix multiplication	`A * B` vs `A @ B` in NumPy	Use `@` (or `np.matmul`) for matrix multiplication; `*` for element-wise
Assuming $(A + B)^{-1} = A^{-1} + B^{-1}$	This is false in general	No simple formula — solve the system directly
Thinking rank equals dimension	Only true for full-rank matrices	Compute rank explicitly with Gaussian elimination or SVD
Ignoring numerical precision	Floating-point errors accumulate	Use tolerance-based comparisons (`np.allclose`)

Interview Questions

ℹ️ Common Interview Questions

These questions frequently appear in technical interviews for data science, ML engineering, and quantitative finance roles.

Q1: Why is matrix multiplication not commutative?

A: Because the operation represents sequential linear transformations, and the order of transformations matters. Rotating then reflecting gives a different result than reflecting then rotating. Mathematically, the dot product of row $i$ of $A$ with column $j$ of $B$ uses different elements than the dot product of row $i$ of $B$ with column $j$ of $A$ .

Q2: When does the inverse of a matrix exist?

A: A square matrix $A$ has an inverse if and only if $\det(A) \neq 0$ (equivalently, $A$ has full rank, all eigenvalues are non-zero, or the null space contains only the zero vector). Such matrices are called non-singular or invertible.

Q3: What is the geometric interpretation of the determinant?

A: The absolute value of the determinant represents the factor by which the matrix scales volume. For a $2 \times 2$ matrix, $|\det(A)|$ is the area of the parallelogram formed by the column vectors. If $\det(A) = 0$ , the matrix collapses space to a lower dimension. If $\det(A) < 0$ , the transformation reverses orientation.

Q4: How do you solve a system of linear equations using matrices?

A: Given $Ax = b$ , if $A$ is invertible, $x = A^{-1}b$ . In practice, use LU decomposition or QR factorization for numerical stability: x = np.linalg.solve(A, b).

Q5: What is the difference between a symmetric and an orthogonal matrix?

A: A symmetric matrix satisfies $A = A^T$ (it equals its transpose). An orthogonal matrix satisfies $Q^T Q = I$ (its columns form an orthonormal set). Symmetric matrices have real eigenvalues; orthogonal matrices represent rotations/reflections and preserve lengths.

Q6: What is the relationship between rank and invertibility?

A: An $n \times n$ matrix is invertible if and only if $\text{rank}(A) = n$ (full rank). If $\text{rank}(A) < n$ , the matrix is singular and cannot be inverted.

Q7: Why should you avoid computing $A^{-1}$ explicitly?

A: Computing $A^{-1}$ is $O(n^3)$ and can amplify numerical errors. Solving $Ax = b$ directly via LU decomposition is also $O(n^3)$ but more numerically stable and doesn't require storing the full inverse matrix.

Practice Problems

Problem 1: Find the transpose of $A = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix}$ .

Problem 2: Compute $AB$ where $A = \begin{bmatrix} 1 & 0 \\ 0 & 2 \end{bmatrix}$ and $B = \begin{bmatrix} 3 & 4 \\ 5 & 6 \end{bmatrix}$ .

Problem 3: Find the inverse of $A = \begin{bmatrix} 2 & 1 \\ 5 & 3 \end{bmatrix}$ .

Problem 4: Compute the determinant of $A = \begin{bmatrix} 3 & 8 \\ 4 & 6 \end{bmatrix}$ .

Problem 5: Compute the determinant of $A = \begin{bmatrix} 1 & 2 & 3 \\ 0 & 1 & 4 \\ 5 & 6 & 0 \end{bmatrix}$ using cofactor expansion.

Problem 6: Find the trace of $A = \begin{bmatrix} 2 & -1 & 0 \\ 3 & 4 & 5 \\ 1 & 0 & 7 \end{bmatrix}$ .

Problem 7: Determine the rank of $A = \begin{bmatrix} 1 & 2 & 3 \\ 2 & 4 & 6 \\ 1 & 3 & 5 \end{bmatrix}$ .

Problem 8: If $A = \begin{bmatrix} 1 & 2 \\ 0 & 1 \end{bmatrix}$ , compute $A^3$ .

Problem 9: Verify that $Q = \begin{bmatrix} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{bmatrix}$ is orthogonal by showing $Q^T Q = I$ .

Problem 10: Solve the system $Ax = b$ where $A = \begin{bmatrix} 2 & 1 \\ 5 & 3 \end{bmatrix}$ and $b = \begin{bmatrix} 4 \\ 13 \end{bmatrix}$ .

💡Solutions

Solution 1: $A^T = \begin{bmatrix} 1 & 4 \\ 2 & 5 \\ 3 & 6 \end{bmatrix}$

Solution 2: $AB = \begin{bmatrix} (1)(3)+(0)(5) & (1)(4)+(0)(6) \\ (0)(3)+(2)(5) & (0)(4)+(2)(6) \end{bmatrix} = \begin{bmatrix} 3 & 4 \\ 10 & 12 \end{bmatrix}$

Solution 3: $\det(A) = (2)(3) - (1)(5) = 1$

A^{-1} = \frac{1}{1}\begin{bmatrix} 3 & -1 \\ -5 & 2 \end{bmatrix} = \begin{bmatrix} 3 & -1 \\ -5 & 2 \end{bmatrix}

Solution 4: $\det(A) = (3)(6) - (8)(4) = 18 - 32 = -14$

Solution 5: Using cofactor expansion along the first row:

\det(A) = 1\begin{vmatrix} 1 & 4 \\ 6 & 0 \end{vmatrix} - 2\begin{vmatrix} 0 & 4 \\ 5 & 0 \end{vmatrix} + 3\begin{vmatrix} 0 & 1 \\ 5 & 6 \end{vmatrix}

= 1(0 - 24) - 2(0 - 20) + 3(0 - 5) = -24 + 40 - 15 = 1

Solution 6: $\text{tr}(A) = 2 + 4 + 7 = 13$

Solution 7: Row 2 = 2 × Row 1 (redundant). Row 3 is not a multiple of Row 1. After row reduction, we get 2 non-zero rows. Thus $\text{rank}(A) = 2$ .

Solution 8: $A^2 = \begin{bmatrix} 1 & 2 \\ 0 & 1 \end{bmatrix}\begin{bmatrix} 1 & 2 \\ 0 & 1 \end{bmatrix} = \begin{bmatrix} 1 & 4 \\ 0 & 1 \end{bmatrix}$

A^3 = A^2 \cdot A = \begin{bmatrix} 1 & 4 \\ 0 & 1 \end{bmatrix}\begin{bmatrix} 1 & 2 \\ 0 & 1 \end{bmatrix} = \begin{bmatrix} 1 & 6 \\ 0 & 1 \end{bmatrix}

Solution 9:

Q^T = \begin{bmatrix} \cos\theta & \sin\theta \\ -\sin\theta & \cos\theta \end{bmatrix}

$Q^T Q = \begin{bmatrix} \cos^2\theta + \sin^2\theta & \cos\theta\sin\theta - \sin\theta\cos\theta \\ -\sin\theta\cos\theta + \cos\theta\sin\theta & \sin^2\theta + \cos^2\theta \end{bmatrix} = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} = I$ ✓

Solution 10: From Problem 3, $A^{-1} = \begin{bmatrix} 3 & -1 \\ -5 & 2 \end{bmatrix}$ .

x = A^{-1}b = \begin{bmatrix} 3 & -1 \\ -5 & 2 \end{bmatrix}\begin{bmatrix} 4 \\ 13 \end{bmatrix} = \begin{bmatrix} 12 - 13 \\ -20 + 26 \end{bmatrix} = \begin{bmatrix} -1 \\ 6 \end{bmatrix}

Verification: $A x = \begin{bmatrix} 2 & 1 \\ 5 & 3 \end{bmatrix}\begin{bmatrix} -1 \\ 6 \end{bmatrix} = \begin{bmatrix} -2 + 6 \\ -5 + 18 \end{bmatrix} = \begin{bmatrix} 4 \\ 13 \end{bmatrix}$ ✓

Quick Reference

Operation	Formula	NumPy
Transpose	$(A^T)_{ij} = A_{ji}$	`A.T` or `np.transpose(A)`
Addition	$(A+B)_{ij} = a_{ij} + b_{ij}$	`A + B`
Scalar Multiply	$(cA)_{ij} = c \cdot a_{ij}$	`c * A`
Matrix Multiply	$(AB)_{ij} = \sum_k a_{ik} b_{kj}$	`A @ B` or `np.matmul(A, B)`
Determinant (2×2)	$ad - bc$	`np.linalg.det(A)`
Inverse	$AA^{-1} = I$	`np.linalg.inv(A)`
Solve $Ax=b$	$x = A^{-1}b$	`np.linalg.solve(A, b)`
Trace	$\sum_i a_{ii}$	`np.trace(A)`
Rank	dim of column space	`np.linalg.matrix_rank(A)`
Eigenvalues	$\det(A - \lambda I) = 0$	`np.linalg.eig(A)`
SVD	$A = U\Sigma V^T$	`np.linalg.svd(A)`
Norm	$\\|A\\|_F = \sqrt{\sum a_{ij}^2}$	`np.linalg.norm(A)`

Cross-References

Vectors and Dot Products: Foundation for understanding matrix rows and columns as vector operations.
Linear Transformations — Matrices as functions mapping vectors to vectors — how matrices represent geometric operations.
Eigenvalues and Eigenvectors: Diagonalization, spectral decomposition, and the characteristic equation.
Systems of Linear Equations: Solving $Ax = b$ using Gaussian elimination, LU decomposition, and matrix inverses.
Singular Value Decomposition (SVD): Factorizing any matrix into rotation-scaling-rotation — the Swiss army knife of linear algebra.
Principal Component Analysis (PCA) — Using eigenvalue decomposition of covariance matrices for dimensionality reduction.
Optimization and Gradient Descent — Hessians, Jacobians, and second-order methods rely on matrix calculus.
Probability and Statistics — Covariance matrices, correlation matrices, and multivariate distributions.