← Math|10 of 100
Linear Algebra

Matrix Operations and Properties

Master matrix multiplication, transpose, inverse, and their applications in data transformation.

📂 Matrix Algebra📖 Lesson 10 of 100🎓 Free Course

Advertisement

Why It Matters

Matrices are the backbone of modern computing, data science, and artificial intelligence. Every digital image you view, every recommendation you receive, and every voice assistant you interact with relies on matrix operations under the hood. Understanding matrices unlocks:

  • Data Representation: Tabular data in machine learning is literally a matrix — rows are samples, columns are features.
  • Geometric Transformations: Rotating, scaling, and translating objects in computer graphics is all matrix multiplication.
  • Neural Networks: Deep learning models perform millions of matrix multiplications per inference.
  • Optimization: Solving systems of linear equations (and thus optimizing loss functions) depends on matrix algebra.
  • Quantum Mechanics: Physical systems are described using matrices and linear operators.

Without matrix algebra, fields like computer vision, natural language processing, robotics, and computational finance would not exist in their current form.


What is a Matrix?

A matrix is a rectangular array of numbers (or expressions) arranged in rows and columns. It is the fundamental tool for representing and computing linear transformations, solving systems of equations, and organizing data.

DfMatrix

A matrix AA of size m×nm \times n (read "m by n") is a rectangular array with mm rows and nn columns. Each entry aija_{ij} represents the element in the ii-th row and jj-th column. Matrices are denoted with uppercase bold letters like A\mathbf{A}.

General Matrix Notation

A=[a11a12a1na21a22a2nam1am2amn]Rm×nA = \begin{bmatrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & \cdots & a_{mn} \end{bmatrix} \in \mathbb{R}^{m \times n}

Here,

  • AA=The matrix (m rows, n columns)
  • aija_{ij}=Element in the i-th row and j-th column
  • Rm×n\mathbb{R}^{m \times n}=Set of all real m×n matrices
  • m,nm, n=Positive integers representing dimensions

Real-World Analogy: A matrix is like a spreadsheet — numbers organized in a grid where each cell's position determines its role. A 3×23 \times 2 matrix could represent 3 students' scores on 2 exams.

ℹ️ Notation Convention

When we write ARm×nA \in \mathbb{R}^{m \times n}, we mean AA is a matrix with mm rows and nn columns, and all entries are real numbers. For complex entries, we write ACm×nA \in \mathbb{C}^{m \times n}.


Types of Matrices

Understanding matrix types helps you choose the right operations and recognize special structure.

TypeDescriptionDimensionsExample
Square MatrixSame number of rows and columns (m=nm = n)n×nn \times n[1234]\begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix}
Row MatrixSingle row1×n1 \times n[123]\begin{bmatrix} 1 & 2 & 3 \end{bmatrix}
Column MatrixSingle columnm×1m \times 1[123]\begin{bmatrix} 1 \\ 2 \\ 3 \end{bmatrix}
Identity Matrix1s on diagonal, 0s elsewhere (InI_n)n×nn \times n[1001]\begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}
Zero MatrixAll elements are zero (0\mathbf{0})m×nm \times n[0000]\begin{bmatrix} 0 & 0 \\ 0 & 0 \end{bmatrix}
Diagonal MatrixNon-zero entries only on main diagonaln×nn \times n[3007]\begin{bmatrix} 3 & 0 \\ 0 & 7 \end{bmatrix}
Upper TriangularAll entries below diagonal are zeron×nn \times n[123045006]\begin{bmatrix} 1 & 2 & 3 \\ 0 & 4 & 5 \\ 0 & 0 & 6 \end{bmatrix}
Lower TriangularAll entries above diagonal are zeron×nn \times n[100230456]\begin{bmatrix} 1 & 0 & 0 \\ 2 & 3 & 0 \\ 4 & 5 & 6 \end{bmatrix}
Symmetric MatrixA=ATA = A^T (equals its transpose)n×nn \times n[1221]\begin{bmatrix} 1 & 2 \\ 2 & 1 \end{bmatrix}
Orthogonal MatrixATA=AAT=IA^T A = A A^T = In×nn \times n[cosθsinθsinθcosθ]\begin{bmatrix} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{bmatrix}
Sparse MatrixMost elements are zeroAny[003000070]\begin{bmatrix} 0 & 0 & 3 \\ 0 & 0 & 0 \\ 0 & 7 & 0 \end{bmatrix}
Dense MatrixMost elements are non-zeroAny[123456789]\begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{bmatrix}

ℹ️ Storage Efficiency

Sparse matrices are common in real-world applications (e.g., recommendation systems, graph adjacency matrices). Storing only non-zero entries can save enormous amounts of memory — a 10,000×10,00010{,}000 \times 10{,}000 sparse matrix might need only kilobytes instead of gigabytes.


Matrix Transpose

The transpose of a matrix flips it over its main diagonal, swapping rows and columns.

Matrix Transpose

(AT)ij=Aji(A^T)_{ij} = A_{ji}

Here,

  • ATA^T=The transpose of matrix A
  • (AT)ij(A^T)_{ij}=Element at position (i,j) in the transpose
  • AjiA_{ji}=Element at position (j,i) in the original matrix

📝Example: Transpose

If A=[123456]A = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix} (a 2×32 \times 3 matrix), then:

AT=[142536] (a 3×2 matrix)A^T = \begin{bmatrix} 1 & 4 \\ 2 & 5 \\ 3 & 6 \end{bmatrix} \text{ (a } 3 \times 2 \text{ matrix)}

Notice: the first row of AA becomes the first column of ATA^T, and the second row becomes the second column.

Properties of Transpose

PropertyFormulaDescription
Involution(AT)T=A(A^T)^T = ATransposing twice returns the original matrix
Addition(A+B)T=AT+BT(A + B)^T = A^T + B^TTranspose of a sum equals sum of transposes
Scalar Multiplication(cA)T=cAT(cA)^T = c A^TScalar factors pass through the transpose
Multiplication(AB)T=BTAT(AB)^T = B^T A^TOrder reverses when transposing a product
Inverse(A1)T=(AT)1(A^{-1})^T = (A^T)^{-1}Transpose of inverse equals inverse of transpose
Determinantdet(AT)=det(A)\det(A^T) = \det(A)Transpose does not change the determinant

⚠️ Common Mistake

(AB)T=BTAT(AB)^T = B^T A^T, not ATBTA^T B^T. The order of multiplication reverses! This is analogous to (ab)1=b1a1(ab)^{-1} = b^{-1} a^{-1} for numbers.


Matrix Addition

Matrix addition combines two matrices of the same dimensions element-wise.

Matrix Addition

C=A+B    cij=aij+bijC = A + B \implies c_{ij} = a_{ij} + b_{ij}

Here,

  • AA=First matrix (m × n)
  • BB=Second matrix (m × n) — must match A's dimensions
  • CC=Result matrix (m × n)
  • cijc_{ij}=Sum of corresponding elements

📝Example: Matrix Addition

[1234]+[5678]=[1+52+63+74+8]=[681012]\begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} + \begin{bmatrix} 5 & 6 \\ 7 & 8 \end{bmatrix} = \begin{bmatrix} 1+5 & 2+6 \\ 3+7 & 4+8 \end{bmatrix} = \begin{bmatrix} 6 & 8 \\ 10 & 12 \end{bmatrix}

Properties of Addition

PropertyFormula
CommutativityA+B=B+AA + B = B + A
Associativity(A+B)+C=A+(B+C)(A + B) + C = A + (B + C)
IdentityA+0=AA + \mathbf{0} = A (adding the zero matrix)
InverseA+(A)=0A + (-A) = \mathbf{0} (additive inverse)

Scalar Multiplication

Scalar multiplication multiplies every element of a matrix by a single number (scalar).

Scalar Multiplication

cA=[ca11ca12ca1nca21ca22ca2ncam1cam2camn]cA = \begin{bmatrix} c \cdot a_{11} & c \cdot a_{12} & \cdots & c \cdot a_{1n} \\ c \cdot a_{21} & c \cdot a_{22} & \cdots & c \cdot a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ c \cdot a_{m1} & c \cdot a_{m2} & \cdots & c \cdot a_{mn} \end{bmatrix}

Here,

  • cc=A scalar (single number)
  • AA=A matrix (m × n)
  • cAcA=Result: every element of A multiplied by c

📝Example: Scalar Multiplication

3[1234]=[36912]3 \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} = \begin{bmatrix} 3 & 6 \\ 9 & 12 \end{bmatrix}

Properties of Scalar Multiplication

PropertyFormula
Associativityc(dA)=(cd)Ac(dA) = (cd)A
Distributivityc(A+B)=cA+cBc(A + B) = cA + cB
Distributivity(c+d)A=cA+dA(c + d)A = cA + dA
Identity1A=A1 \cdot A = A

Matrix Multiplication

Matrix multiplication is the most important (and most frequently used) matrix operation. It combines two matrices to produce a new matrix through a specific rule of dot products.

Matrix Multiplication

C=AB    cij=k=1naikbkjC = AB \implies c_{ij} = \sum_{k=1}^{n} a_{ik} b_{kj}

Here,

  • AA=First matrix (m × n)
  • BB=Second matrix (n × p) — columns of A must match rows of B
  • CC=Result matrix (m × p)
  • cijc_{ij}=Dot product of i-th row of A and j-th column of B

⚠️ Dimension Rule

To multiply A×BA \times B: if AA is m×nm \times n and BB is n×pn \times p, the result CC is m×pm \times p. The inner dimensions (nn) must match; otherwise multiplication is undefined.

📝Example 1: 2×2 Matrix Multiplication

Let A=[1234]A = \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} and B=[5678]B = \begin{bmatrix} 5 & 6 \\ 7 & 8 \end{bmatrix}:

C=AB=[(1)(5)+(2)(7)(1)(6)+(2)(8)(3)(5)+(4)(7)(3)(6)+(4)(8)]=[19224350]C = AB = \begin{bmatrix} (1)(5) + (2)(7) & (1)(6) + (2)(8) \\ (3)(5) + (4)(7) & (3)(6) + (4)(8) \end{bmatrix} = \begin{bmatrix} 19 & 22 \\ 43 & 50 \end{bmatrix}

📝Example 2: Non-Commutative Multiplication

Using the same AA and BB:

BA=[5678][1234]=[(5)(1)+(6)(3)(5)(2)+(6)(4)(7)(1)+(8)(3)(7)(2)+(8)(4)]=[23383150]BA = \begin{bmatrix} 5 & 6 \\ 7 & 8 \end{bmatrix}\begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} = \begin{bmatrix} (5)(1)+(6)(3) & (5)(2)+(6)(4) \\ (7)(1)+(8)(3) & (7)(2)+(8)(4) \end{bmatrix} = \begin{bmatrix} 23 & 38 \\ 31 & 50 \end{bmatrix}

Since AB=[19224350]BA=[23383150]AB = \begin{bmatrix} 19 & 22 \\ 43 & 50 \end{bmatrix} \neq BA = \begin{bmatrix} 23 & 38 \\ 31 & 50 \end{bmatrix}, matrix multiplication is not commutative.

📝Example 3: Rectangular Matrices

Let A=[123456]A = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix} (2×32 \times 3) and B=[789101112]B = \begin{bmatrix} 7 & 8 \\ 9 & 10 \\ 11 & 12 \end{bmatrix} (3×23 \times 2):

AB=[(1)(7)+(2)(9)+(3)(11)(1)(8)+(2)(10)+(3)(12)(4)(7)+(5)(9)+(6)(11)(4)(8)+(5)(10)+(6)(12)]=[5864139154]AB = \begin{bmatrix} (1)(7)+(2)(9)+(3)(11) & (1)(8)+(2)(10)+(3)(12) \\ (4)(7)+(5)(9)+(6)(11) & (4)(8)+(5)(10)+(6)(12) \end{bmatrix} = \begin{bmatrix} 58 & 64 \\ 139 & 154 \end{bmatrix}

Result: 2×22 \times 2 matrix.

Properties of Matrix Multiplication

PropertyFormula
Associativity(AB)C=A(BC)(AB)C = A(BC)
Left DistributivityA(B+C)=AB+ACA(B + C) = AB + AC
Right Distributivity(A+B)C=AC+BC(A + B)C = AC + BC
Scalar Commutesc(AB)=(cA)B=A(cB)c(AB) = (cA)B = A(cB)
IdentityAI=IA=AAI = IA = A
Not CommutativeABBAAB \neq BA (in general)

💡 Computational Complexity

Multiplying an m×nm \times n matrix by an n×pn \times p matrix requires O(mnp)O(mnp) scalar multiplications. For two n×nn \times n matrices, naive multiplication is O(n3)O(n^3). Fast algorithms (Strassen, Coppersmith-Winograd) achieve O(n2.37...)O(n^{2.37...}) but are rarely used in practice due to constant factors.


Matrix Inverse

The inverse of a matrix is the matrix equivalent of the reciprocal of a number. It "undoes" the transformation performed by the original matrix.

Matrix Inverse

AA1=A1A=IAA^{-1} = A^{-1}A = I

Here,

  • AA=The original square matrix (must be n × n)
  • A1A^{-1}=The inverse of A
  • II=The n×n identity matrix

⚠️ Existence Condition

The inverse exists only if det(A)0\det(A) \neq 0. Such a matrix is called non-singular or invertible. If det(A)=0\det(A) = 0, the matrix is singular and has no inverse.

Formula for 2×2 Inverse

2×2 Matrix Inverse

A1=1adbc[dbca]A^{-1} = \frac{1}{ad - bc}\begin{bmatrix} d & -b \\ -c & a \end{bmatrix}

Here,

  • AA=A 2×2 matrix
  • adbcad - bc=The determinant of A

📝Example: 2×2 Inverse

For A=[1234]A = \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix}:

det(A)=(1)(4)(2)(3)=46=2\det(A) = (1)(4) - (2)(3) = 4 - 6 = -2
A1=12[4231]=[211.50.5]A^{-1} = \frac{1}{-2}\begin{bmatrix} 4 & -2 \\ -3 & 1 \end{bmatrix} = \begin{bmatrix} -2 & 1 \\ 1.5 & -0.5 \end{bmatrix}

Verification: AA1=[1234][211.50.5]=[1001]=IA A^{-1} = \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix}\begin{bmatrix} -2 & 1 \\ 1.5 & -0.5 \end{bmatrix} = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} = I

Properties of Inverse

PropertyFormula
Double Inverse(A1)1=A(A^{-1})^{-1} = A
Transpose Inverse(AT)1=(A1)T(A^T)^{-1} = (A^{-1})^T
Product Inverse(AB)1=B1A1(AB)^{-1} = B^{-1} A^{-1} (order reverses!)
Scalar Inverse(cA)1=1cA1(cA)^{-1} = \frac{1}{c} A^{-1} (for c0c \neq 0)

Methods for Computing Inverse

For matrices larger than 2×22 \times 2, common methods include:

  1. Gauss-Jordan Elimination: Augment [AI][A | I] and row reduce to [IA1][I | A^{-1}].
  2. Adjugate Method: A1=1det(A)adj(A)A^{-1} = \frac{1}{\det(A)} \text{adj}(A) where adj(A)\text{adj}(A) is the adjugate (transpose of cofactor matrix).
  3. LU Decomposition: Factor A=LUA = LU, then solve LUx=ILUx = I column by column.

💡 Numerical Stability

In practice, never compute A1A^{-1} explicitly to solve Ax=bAx = b. Instead, solve Ax=bAx = b directly using LU decomposition or QR factorization. Explicit inversion is numerically unstable and computationally wasteful.


Determinant

The determinant is a scalar value computed from a square matrix that encodes essential properties: whether the matrix is invertible, how it scales volumes, and whether it preserves or reverses orientation.

Determinant (2×2)

det(A)=abcd=adbc\det(A) = \begin{vmatrix} a & b \\ c & d \end{vmatrix} = ad - bc

Here,

  • AA=A 2×2 matrix
  • a,b,c,da, b, c, d=Elements of the matrix
  • det(A)\det(A)=The determinant (scalar value)

Determinant (3×3) — Cofactor Expansion

det(A)=abcdefghi=a(eifh)b(difg)+c(dheg)\det(A) = \begin{vmatrix} a & b & c \\ d & e & f \\ g & h & i \end{vmatrix} = a(ei - fh) - b(di - fg) + c(dh - eg)

Here,

  • AA=A 3×3 matrix
  • a,b,c,...a, b, c, ...=Elements of the matrix

📝Example: 2×2 Determinant

A=[3846]A = \begin{bmatrix} 3 & 8 \\ 4 & 6 \end{bmatrix}
det(A)=(3)(6)(8)(4)=1832=14\det(A) = (3)(6) - (8)(4) = 18 - 32 = -14

Since det(A)=140\det(A) = -14 \neq 0, the matrix is invertible.

📝Example: 3×3 Determinant

A=[123045106]A = \begin{bmatrix} 1 & 2 & 3 \\ 0 & 4 & 5 \\ 1 & 0 & 6 \end{bmatrix}
det(A)=1(4650)2(0651)+3(0041)\det(A) = 1(4 \cdot 6 - 5 \cdot 0) - 2(0 \cdot 6 - 5 \cdot 1) + 3(0 \cdot 0 - 4 \cdot 1)=1(24)2(5)+3(4)=24+1012=22= 1(24) - 2(-5) + 3(-4) = 24 + 10 - 12 = 22

Geometric Meaning

det(A)\det(A)Interpretation
det(A)>0\det(A) > 0Matrix preserves orientation; scales area by det(A)\det(A)
det(A)<0\det(A) < 0Matrix reverses orientation; scales area by det(A)|\det(A)|
det(A)=0\det(A) = 0Matrix collapses space to a lower dimension (singular)
det(A)=1\det(A) = 1Matrix preserves area exactly (unimodular)

Intuition: For a 2×22 \times 2 matrix, det(A)|\det(A)| is the area of the parallelogram formed by the column vectors. For a 3×33 \times 3 matrix, it is the volume of the parallelepiped.

Properties of Determinants

PropertyFormula
Identitydet(I)=1\det(I) = 1
Row SwapSwapping two rows negates the determinant
Scalardet(cA)=cndet(A)\det(cA) = c^n \det(A) for an n×nn \times n matrix
Productdet(AB)=det(A)det(B)\det(AB) = \det(A) \cdot \det(B)
Transposedet(AT)=det(A)\det(A^T) = \det(A)
Inversedet(A1)=1det(A)\det(A^{-1}) = \frac{1}{\det(A)}
Zero RowsIf a row is all zeros, det(A)=0\det(A) = 0
Equal RowsIf two rows are identical, det(A)=0\det(A) = 0

Trace

The trace of a square matrix is the sum of its diagonal elements. Despite its simplicity, the trace has profound connections to eigenvalues and matrix similarity.

Trace

tr(A)=i=1naii=a11+a22++ann\text{tr}(A) = \sum_{i=1}^{n} a_{ii} = a_{11} + a_{22} + \cdots + a_{nn}

Here,

  • AA=An n×n square matrix
  • aiia_{ii}=Diagonal elements
  • tr(A)\text{tr}(A)=The trace (scalar)

📝Example: Trace

A=[123456789]A = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \\ 7 & 8 & 9 \end{bmatrix}
tr(A)=1+5+9=15\text{tr}(A) = 1 + 5 + 9 = 15

Properties of Trace

PropertyFormula
Linearitytr(A+B)=tr(A)+tr(B)\text{tr}(A + B) = \text{tr}(A) + \text{tr}(B)
Scalartr(cA)=ctr(A)\text{tr}(cA) = c \cdot \text{tr}(A)
Cyclictr(AB)=tr(BA)\text{tr}(AB) = \text{tr}(BA) (note: multiplication order doesn't matter for trace)
Transposetr(A)=tr(AT)\text{tr}(A) = \text{tr}(A^T)
Eigenvaluestr(A)=i=1nλi\text{tr}(A) = \sum_{i=1}^{n} \lambda_i (sum of eigenvalues)

Rank

The rank of a matrix is the maximum number of linearly independent rows (or columns). It tells you the "effective dimensionality" of the matrix.

DfMatrix Rank

The rank of a matrix AA, denoted rank(A)\text{rank}(A) or rk(A)\text{rk}(A), is the dimension of the column space (or equivalently, the row space) of AA.

📝Example: Rank

A=[123246135]A = \begin{bmatrix} 1 & 2 & 3 \\ 2 & 4 & 6 \\ 1 & 3 & 5 \end{bmatrix}

Row 2 = 2 × Row 1, so it's redundant. Row 3 is not a multiple of Row 1. Thus rank(A)=2\text{rank}(A) = 2.

Properties of Rank

PropertyDescription
0rank(A)min(m,n)0 \leq \text{rank}(A) \leq \min(m, n)Rank is bounded by matrix dimensions
Full Rankrank(A)=min(m,n)\text{rank}(A) = \min(m, n) — no redundant rows/columns
rank(A)=n\text{rank}(A) = n (square)AA is invertible
rank(AB)min(rank(A),rank(B))\text{rank}(AB) \leq \min(\text{rank}(A), \text{rank}(B))Product rank is bounded by component ranks
rank(AT)=rank(A)\text{rank}(A^T) = \text{rank}(A)Rank is invariant under transpose
Rank-Nullity Theoremrank(A)+nullity(A)=n\text{rank}(A) + \text{nullity}(A) = n for ARm×nA \in \mathbb{R}^{m \times n}

ℹ️ Rank in Practice

A matrix with rank less than its size is called rank-deficient. This means rows or columns are linearly dependent, and the matrix is singular (non-invertible for square matrices).


Special Matrices

Some matrices have unique properties that make them particularly important:

DfOrthogonal Matrix

A square matrix QQ is orthogonal if QTQ=QQT=IQ^T Q = Q Q^T = I. Equivalently, Q1=QTQ^{-1} = Q^T. Orthogonal matrices represent rotations and reflections, and they preserve lengths and angles (Euclidean norm).

DfSymmetric Matrix

A matrix AA is symmetric if A=ATA = A^T. Symmetric matrices have real eigenvalues and are always diagonalizable. They appear frequently in covariance matrices and Hessian matrices.

DfPositive Definite Matrix

A symmetric matrix AA is positive definite if xTAx>0x^T A x > 0 for all non-zero vectors xx. Positive definite matrices have all positive eigenvalues and appear in optimization (second-order conditions) and covariance matrices.

Special MatrixKey PropertyExample
IdempotentA2=AA^2 = A[1000]\begin{bmatrix} 1 & 0 \\ 0 & 0 \end{bmatrix} (projection)
NilpotentAk=0A^k = 0 for some kk[0100]\begin{bmatrix} 0 & 1 \\ 0 & 0 \end{bmatrix} (k=2k=2)
InvolutoryA2=IA^2 = I (self-inverse)[0110]\begin{bmatrix} 0 & 1 \\ 1 & 0 \end{bmatrix} (swap)
PermutationEach row/col has exactly one 1[0110]\begin{bmatrix} 0 & 1 \\ 1 & 0 \end{bmatrix}
StochasticRows/cols sum to 1 (probabilities)[0.70.30.40.6]\begin{bmatrix} 0.7 & 0.3 \\ 0.4 & 0.6 \end{bmatrix}
Diagonalaij=0a_{ij} = 0 for iji \neq j[3007]\begin{bmatrix} 3 & 0 \\ 0 & 7 \end{bmatrix}

Python Implementation

NumPy provides efficient tools for all matrix operations:

import numpy as np

# Define matrices
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])

# Matrix addition
C = A + B
print("A + B:\n", C)
# [[ 6  8]
#  [10 12]]

# Scalar multiplication
print("3A:\n", 3 * A)
# [[ 3  6]
#  [ 9 12]]

# Transpose
print("A^T:\n", A.T)
# [[1 3]
#  [2 4]]

# Matrix multiplication (two ways)
print("A @ B:\n", A @ B)       # preferred
print("np.dot(A, B):\n", np.dot(A, B))
# [[19 22]
#  [43 50]]

# Determinant
print("det(A):", np.linalg.det(A))  # -2.0

# Inverse
A_inv = np.linalg.inv(A)
print("A^{-1}:\n", A_inv)
# [[-2.   1. ]
#  [ 1.5 -0.5]]

# Verify: A @ A^{-1} = I
print("A @ A^{-1}:\n", A @ A_inv)
# [[1. 0.]
#  [0. 1.]]

# Trace
print("tr(A):", np.trace(A))  # 5

# Rank
print("rank(A):", np.linalg.matrix_rank(A))  # 2

# Solve linear system Ax = b
b = np.array([5, 11])
x = np.linalg.solve(A, b)
print("x:", x)  # [1. 2.]

# Eigenvalues and eigenvectors
eigenvalues, eigenvectors = np.linalg.eig(A)
print("Eigenvalues:", eigenvalues)
print("Eigenvectors:\n", eigenvectors)

# 3x3 Determinant
M = np.array([[1, 2, 3], [0, 4, 5], [1, 0, 6]])
print("det(3x3):", np.linalg.det(M))  # 22.0

# Check if matrix is symmetric
print("A symmetric?", np.allclose(A, A.T))  # False
S = np.array([[1, 2], [2, 3]])
print("S symmetric?", np.allclose(S, S.T))  # True

# Check if matrix is orthogonal
Q = np.array([[np.cos(np.pi/4), -np.sin(np.pi/4)],
              [np.sin(np.pi/4),  np.cos(np.pi/4)]])
print("Q orthogonal?", np.allclose(Q @ Q.T, np.eye(2)))  # True

# Sparse matrix
from scipy import sparse
S_sparse = sparse.csr_matrix(np.array([[0, 0, 3], [0, 0, 0], [0, 7, 0]]))
print("Sparse matrix:\n", S_sparse)
print("Dense form:\n", S_sparse.toarray())

Applications in AI/ML

Matrices are not just theoretical constructs — they are the computational engine of modern AI and machine learning:

Data Representation

  • Feature matrices: In supervised learning, data is an n×dn \times d matrix where nn is the number of samples and dd is the number of features.
  • Image data: A grayscale image is a matrix of pixel intensities. Color images are 3D tensors (height × width × channels).

Neural Networks

  • Each layer computes h=σ(Wx+b)h = \sigma(Wx + b) where WW is a weight matrix, xx is the input vector, bb is a bias vector, and σ\sigma is an activation function.
  • Training involves computing gradients of the loss with respect to weight matrices using the chain rule (backpropagation), which is fundamentally matrix calculus.

Dimensionality Reduction

  • PCA (Principal Component Analysis): Finds the principal directions (eigenvectors) of the data covariance matrix to reduce dimension while preserving variance.
  • SVD (Singular Value Decomposition): Factorizes any matrix A=UΣVTA = U\Sigma V^T into rotation, scaling, and rotation — used for compression, noise reduction, and recommendation systems.

Computer Vision

  • Homogeneous coordinates: 3D transformations (rotation, translation, scaling) are represented as 4×44 \times 4 matrices.
  • Convolutional layers: Apply learnable filter matrices to input images.

Natural Language Processing

  • Word embeddings: Words are represented as dense vectors; similarity is computed via matrix operations (dot products).
  • Attention mechanisms: The transformer architecture computes attention scores as matrix multiplications: Attention(Q,K,V)=softmax(QKTdk)V\text{Attention}(Q, K, V) = \text{softmax}\left(\frac{QK^T}{\sqrt{d_k}}\right)V.

Recommendation Systems

  • User-item interaction matrices are factorized (using SVD or matrix factorization) to predict missing entries.

Graph Neural Networks

  • Graphs are represented as adjacency matrices; message passing involves matrix multiplications.

Common Mistakes

MistakeWhy It's WrongCorrect Approach
Assuming AB=BAAB = BAMatrix multiplication is not commutativeAlways check dimension compatibility and order
Adding matrices of different sizesUndefined operationVerify both matrices have identical dimensions
Forgetting (AB)T=BTAT(AB)^T = B^T A^TTranspose reverses multiplication orderRemember: transpose each factor and reverse the product
Using A1A^{-1} to solve Ax=bAx = bNumerically unstable and wastefulUse np.linalg.solve(A, b) directly
Confusing element-wise and matrix multiplicationA * B vs A @ B in NumPyUse @ (or np.matmul) for matrix multiplication; * for element-wise
Assuming (A+B)1=A1+B1(A + B)^{-1} = A^{-1} + B^{-1}This is false in generalNo simple formula — solve the system directly
Thinking rank equals dimensionOnly true for full-rank matricesCompute rank explicitly with Gaussian elimination or SVD
Ignoring numerical precisionFloating-point errors accumulateUse tolerance-based comparisons (np.allclose)

Interview Questions

ℹ️ Common Interview Questions

These questions frequently appear in technical interviews for data science, ML engineering, and quantitative finance roles.

Q1: Why is matrix multiplication not commutative?

A: Because the operation represents sequential linear transformations, and the order of transformations matters. Rotating then reflecting gives a different result than reflecting then rotating. Mathematically, the dot product of row ii of AA with column jj of BB uses different elements than the dot product of row ii of BB with column jj of AA.


Q2: When does the inverse of a matrix exist?

A: A square matrix AA has an inverse if and only if det(A)0\det(A) \neq 0 (equivalently, AA has full rank, all eigenvalues are non-zero, or the null space contains only the zero vector). Such matrices are called non-singular or invertible.


Q3: What is the geometric interpretation of the determinant?

A: The absolute value of the determinant represents the factor by which the matrix scales volume. For a 2×22 \times 2 matrix, det(A)|\det(A)| is the area of the parallelogram formed by the column vectors. If det(A)=0\det(A) = 0, the matrix collapses space to a lower dimension. If det(A)<0\det(A) < 0, the transformation reverses orientation.


Q4: How do you solve a system of linear equations using matrices?

A: Given Ax=bAx = b, if AA is invertible, x=A1bx = A^{-1}b. In practice, use LU decomposition or QR factorization for numerical stability: x = np.linalg.solve(A, b).


Q5: What is the difference between a symmetric and an orthogonal matrix?

A: A symmetric matrix satisfies A=ATA = A^T (it equals its transpose). An orthogonal matrix satisfies QTQ=IQ^T Q = I (its columns form an orthonormal set). Symmetric matrices have real eigenvalues; orthogonal matrices represent rotations/reflections and preserve lengths.


Q6: What is the relationship between rank and invertibility?

A: An n×nn \times n matrix is invertible if and only if rank(A)=n\text{rank}(A) = n (full rank). If rank(A)<n\text{rank}(A) < n, the matrix is singular and cannot be inverted.


Q7: Why should you avoid computing A1A^{-1} explicitly?

A: Computing A1A^{-1} is O(n3)O(n^3) and can amplify numerical errors. Solving Ax=bAx = b directly via LU decomposition is also O(n3)O(n^3) but more numerically stable and doesn't require storing the full inverse matrix.


Practice Problems

Problem 1: Find the transpose of A=[123456]A = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix}.

Problem 2: Compute ABAB where A=[1002]A = \begin{bmatrix} 1 & 0 \\ 0 & 2 \end{bmatrix} and B=[3456]B = \begin{bmatrix} 3 & 4 \\ 5 & 6 \end{bmatrix}.

Problem 3: Find the inverse of A=[2153]A = \begin{bmatrix} 2 & 1 \\ 5 & 3 \end{bmatrix}.

Problem 4: Compute the determinant of A=[3846]A = \begin{bmatrix} 3 & 8 \\ 4 & 6 \end{bmatrix}.

Problem 5: Compute the determinant of A=[123014560]A = \begin{bmatrix} 1 & 2 & 3 \\ 0 & 1 & 4 \\ 5 & 6 & 0 \end{bmatrix} using cofactor expansion.

Problem 6: Find the trace of A=[210345107]A = \begin{bmatrix} 2 & -1 & 0 \\ 3 & 4 & 5 \\ 1 & 0 & 7 \end{bmatrix}.

Problem 7: Determine the rank of A=[123246135]A = \begin{bmatrix} 1 & 2 & 3 \\ 2 & 4 & 6 \\ 1 & 3 & 5 \end{bmatrix}.

Problem 8: If A=[1201]A = \begin{bmatrix} 1 & 2 \\ 0 & 1 \end{bmatrix}, compute A3A^3.

Problem 9: Verify that Q=[cosθsinθsinθcosθ]Q = \begin{bmatrix} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{bmatrix} is orthogonal by showing QTQ=IQ^T Q = I.

Problem 10: Solve the system Ax=bAx = b where A=[2153]A = \begin{bmatrix} 2 & 1 \\ 5 & 3 \end{bmatrix} and b=[413]b = \begin{bmatrix} 4 \\ 13 \end{bmatrix}.

💡Solutions

Solution 1: AT=[142536]A^T = \begin{bmatrix} 1 & 4 \\ 2 & 5 \\ 3 & 6 \end{bmatrix}

Solution 2: AB=[(1)(3)+(0)(5)(1)(4)+(0)(6)(0)(3)+(2)(5)(0)(4)+(2)(6)]=[341012]AB = \begin{bmatrix} (1)(3)+(0)(5) & (1)(4)+(0)(6) \\ (0)(3)+(2)(5) & (0)(4)+(2)(6) \end{bmatrix} = \begin{bmatrix} 3 & 4 \\ 10 & 12 \end{bmatrix}

Solution 3: det(A)=(2)(3)(1)(5)=1\det(A) = (2)(3) - (1)(5) = 1

A1=11[3152]=[3152]A^{-1} = \frac{1}{1}\begin{bmatrix} 3 & -1 \\ -5 & 2 \end{bmatrix} = \begin{bmatrix} 3 & -1 \\ -5 & 2 \end{bmatrix}

Solution 4: det(A)=(3)(6)(8)(4)=1832=14\det(A) = (3)(6) - (8)(4) = 18 - 32 = -14

Solution 5: Using cofactor expansion along the first row:

det(A)=1146020450+30156\det(A) = 1\begin{vmatrix} 1 & 4 \\ 6 & 0 \end{vmatrix} - 2\begin{vmatrix} 0 & 4 \\ 5 & 0 \end{vmatrix} + 3\begin{vmatrix} 0 & 1 \\ 5 & 6 \end{vmatrix}=1(024)2(020)+3(05)=24+4015=1= 1(0 - 24) - 2(0 - 20) + 3(0 - 5) = -24 + 40 - 15 = 1

Solution 6: tr(A)=2+4+7=13\text{tr}(A) = 2 + 4 + 7 = 13

Solution 7: Row 2 = 2 × Row 1 (redundant). Row 3 is not a multiple of Row 1. After row reduction, we get 2 non-zero rows. Thus rank(A)=2\text{rank}(A) = 2.

Solution 8: A2=[1201][1201]=[1401]A^2 = \begin{bmatrix} 1 & 2 \\ 0 & 1 \end{bmatrix}\begin{bmatrix} 1 & 2 \\ 0 & 1 \end{bmatrix} = \begin{bmatrix} 1 & 4 \\ 0 & 1 \end{bmatrix}

A3=A2A=[1401][1201]=[1601]A^3 = A^2 \cdot A = \begin{bmatrix} 1 & 4 \\ 0 & 1 \end{bmatrix}\begin{bmatrix} 1 & 2 \\ 0 & 1 \end{bmatrix} = \begin{bmatrix} 1 & 6 \\ 0 & 1 \end{bmatrix}

Solution 9:

QT=[cosθsinθsinθcosθ]Q^T = \begin{bmatrix} \cos\theta & \sin\theta \\ -\sin\theta & \cos\theta \end{bmatrix}

QTQ=[cos2θ+sin2θcosθsinθsinθcosθsinθcosθ+cosθsinθsin2θ+cos2θ]=[1001]=IQ^T Q = \begin{bmatrix} \cos^2\theta + \sin^2\theta & \cos\theta\sin\theta - \sin\theta\cos\theta \\ -\sin\theta\cos\theta + \cos\theta\sin\theta & \sin^2\theta + \cos^2\theta \end{bmatrix} = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} = I

Solution 10: From Problem 3, A1=[3152]A^{-1} = \begin{bmatrix} 3 & -1 \\ -5 & 2 \end{bmatrix}.

x=A1b=[3152][413]=[121320+26]=[16]x = A^{-1}b = \begin{bmatrix} 3 & -1 \\ -5 & 2 \end{bmatrix}\begin{bmatrix} 4 \\ 13 \end{bmatrix} = \begin{bmatrix} 12 - 13 \\ -20 + 26 \end{bmatrix} = \begin{bmatrix} -1 \\ 6 \end{bmatrix}

Verification: Ax=[2153][16]=[2+65+18]=[413]A x = \begin{bmatrix} 2 & 1 \\ 5 & 3 \end{bmatrix}\begin{bmatrix} -1 \\ 6 \end{bmatrix} = \begin{bmatrix} -2 + 6 \\ -5 + 18 \end{bmatrix} = \begin{bmatrix} 4 \\ 13 \end{bmatrix}


Quick Reference

OperationFormulaNumPy
Transpose(AT)ij=Aji(A^T)_{ij} = A_{ji}A.T or np.transpose(A)
Addition(A+B)ij=aij+bij(A+B)_{ij} = a_{ij} + b_{ij}A + B
Scalar Multiply(cA)ij=caij(cA)_{ij} = c \cdot a_{ij}c * A
Matrix Multiply(AB)ij=kaikbkj(AB)_{ij} = \sum_k a_{ik} b_{kj}A @ B or np.matmul(A, B)
Determinant (2×2)adbcad - bcnp.linalg.det(A)
InverseAA1=IAA^{-1} = Inp.linalg.inv(A)
Solve Ax=bAx=bx=A1bx = A^{-1}bnp.linalg.solve(A, b)
Traceiaii\sum_i a_{ii}np.trace(A)
Rankdim of column spacenp.linalg.matrix_rank(A)
Eigenvaluesdet(AλI)=0\det(A - \lambda I) = 0np.linalg.eig(A)
SVDA=UΣVTA = U\Sigma V^Tnp.linalg.svd(A)
NormAF=aij2\|A\|_F = \sqrt{\sum a_{ij}^2}np.linalg.norm(A)

Cross-References

  • Vectors and Dot Products: Foundation for understanding matrix rows and columns as vector operations.
  • Linear Transformations — Matrices as functions mapping vectors to vectors — how matrices represent geometric operations.
  • Eigenvalues and Eigenvectors: Diagonalization, spectral decomposition, and the characteristic equation.
  • Systems of Linear Equations: Solving Ax=bAx = b using Gaussian elimination, LU decomposition, and matrix inverses.
  • Singular Value Decomposition (SVD): Factorizing any matrix into rotation-scaling-rotation — the Swiss army knife of linear algebra.
  • Principal Component Analysis (PCA) — Using eigenvalue decomposition of covariance matrices for dimensionality reduction.
  • Optimization and Gradient Descent — Hessians, Jacobians, and second-order methods rely on matrix calculus.
  • Probability and Statistics — Covariance matrices, correlation matrices, and multivariate distributions.
Lesson Progress10 / 100