Tensor Calculus
ℹ️ Why It Matters
Tensor calculus is the mathematical language of multi-dimensional data manipulation. Every operation in deep learning—matrix multiplication, convolutions, attention mechanisms—is fundamentally a tensor operation. Mastering tensor calculus unlocks understanding of how frameworks like PyTorch and TensorFlow compute gradients through complex architectures, and enables implementation of custom operations that go beyond standard library functions. The Einstein summation convention provides a compact, powerful notation for expressing complex multi-index operations that would otherwise require cumbersome summation symbols.
Core Definitions
DfTensor
A tensor is a multidimensional array of numbers that transforms according to specific rules under a change of coordinates. A tensor of rank (also called order-) has indices. A scalar is rank-0, a vector is rank-1, a matrix is rank-2, and higher-order tensors have 3 or more indices. Formally, a type tensor is a multilinear map from copies of the dual space and copies of the vector space to the scalar field.
DfEinstein Summation Convention
When an index appears exactly twice in a single term, summation over that index is implied. For example, means . Free indices (appearing once) label the components of the result. Dummy indices (appearing twice) are summed over and do not appear in the result. This convention eliminates the need for explicit sigma notation and makes tensor manipulations far more concise and readable.
DfCovariant Tensor
A covariant tensor (or lower-index tensor) transforms with the Jacobian of the coordinate change. Components are written with subscripts: . Under a coordinate transformation , a rank-2 covariant tensor transforms as . Covariant tensors are "measured" quantities—they measure distances, angles, and other geometric properties.
DfContravariant Tensor
A contravariant tensor (or upper-index tensor) transforms with the inverse Jacobian. Components are written with superscripts: . Under coordinate transformation, . Contravariant tensors represent "physical" quantities like velocity and position that transform inversely to the coordinate system.
DfMetric Tensor
The metric tensor is a symmetric positive-definite covariant tensor that defines distances and angles on a manifold. The infinitesimal distance is . The metric also defines raising and lowering of indices: and . In Euclidean space, the metric tensor is simply the identity matrix, but in curved spaces it encodes the geometry.
DfTensor Contraction
Tensor contraction is the operation of summing over a pair of indices—one upper and one lower. For a mixed tensor , contraction yields a scalar: . Contraction generalizes the trace operation and is the fundamental operation underlying matrix multiplication, dot products, and all einsum computations.
Key Formulas
Einstein Summation Notation
Here,
- =First tensor component with row index i and summation index k
- =Second tensor component with summation index k and column index j
- =Result tensor with free indices i and j
Tensor Contraction
Here,
- =Inverse metric tensor (raises the index)
- =Covariant tensor of rank 2
- =Mixed tensor with one upper and one lower index
Outer Product (Tensor Product)
Here,
- =Component of the first vector (rank-1 tensor)
- =Component of the second vector (rank-1 tensor)
- =Resulting rank-2 tensor
Trace
Here,
- =Diagonal elements of a rank-2 tensor
- =Contracted index notation for trace
Frobenius Norm
Here,
- =Elements of the matrix
- =Transpose of A
Kronecker Product
Here,
- =Element of matrix A
- =Element of matrix B
- =Block structure of Kronecker product
Index Lowering and Raising
Here,
- =Metric tensor (lowers indices)
- =Inverse metric tensor (raises indices)
- =Covariant and contravariant components
Important Theorems
ThUniversal Approximation Theorem (Tensor Version)
A feedforward neural network with a single hidden layer containing a finite number of neurons can approximate any continuous function on a compact subset of to arbitrary accuracy, provided the activation function is non-constant, bounded, and monotonically-increasing. The weights and biases are tensor parameters optimized via backpropagation. This theorem guarantees that neural networks are universal function approximators.
ThTensor Rank Decomposition
Any rank- tensor can be decomposed as a sum of rank-1 tensors: . The minimum needed is called the tensor rank. For matrices (rank-2 tensors), this is the standard matrix rank. For higher-order tensors, tensor rank can differ from the notion of multilinear rank.
ThEinstein Summation Equivalence
Every Einstein summation expression can be written as a sequence of tensor contractions, which are themselves compositions of matrix multiplications, transpositions, and reshaping operations. This guarantees that any einsum expression can be implemented as an efficient sequence of BLAS/LAPACK operations, ensuring computational tractability for arbitrary tensor expressions.
ThPolyadic Decomposition (CP Decomposition)
A tensor can be decomposed as , where , , , and denotes the outer product. The minimum for exact decomposition is the tensor CP rank. This decomposition generalizes low-rank matrix factorization to higher dimensions.
Worked Examples
📝Matrix Multiplication via Einstein Summation
Problem: Compute where and using Einstein summation notation.
Solution:
The matrix multiplication is expressed as with summation over .
In einsum notation: 'ik,kj->ij'
Step 1: Identify free and dummy indices.
- Free indices: (row of ), (column of )
- Dummy index: (summed over dimension 4)
Step 2: Compute each element:
Step 3: The result .
Concrete example: If and , then:
And so on for all 15 elements.
📝Batch Matrix Multiplication
Problem: Given batches of matrices and , express batch matrix multiplication using einsum.
Solution:
Einsum notation: 'bmk,bkn->bmn'
Step 1: For each batch index , compute the standard matrix product:
Step 2: The batch index appears only once in the output, so it is NOT summed over.
Step 3: Result shape:
Example values: If , , , , this computes 32 independent matrix multiplications of size times .
Computational advantage: Batch matrix multiplication can be parallelized on GPUs, processing all matrix multiplications simultaneously. The einsum notation makes this parallelism explicit—the batch dimension is free (not summed), so operations along different batches are independent.
📝Dot Product and Outer Product
Problem: For vectors and , express both the dot product and outer product using Einstein summation.
Solution:
Dot product: (in einsum: 'i,i->')
This sums over the single index:
The result is a scalar (rank-0 tensor).
Outer product: (in einsum: 'i,j->ij')
No indices are summed; both and are free.
The result is a rank-2 tensor: .
Key distinction: The dot product contracts both indices into a scalar. The outer product keeps both indices free, producing a matrix.
Example: For and :
Dot product:
Outer product:
📝Trace Using Index Contraction
Problem: Express the trace of a matrix using Einstein summation, and compute it for a specific matrix.
Solution:
Einsum notation: 'ii->' or equivalently
Step 1: The index appears twice (as row and column), so it is summed over:
Step 2: For :
Step 3: The trace is invariant under cyclic permutation: even when .
Step 4: Generalization: The trace of a product of matrices is invariant under cyclic permutation:
This property is used extensively in matrix calculus and backpropagation.
📝Computing Tensor Gradients
Problem: In a neural network layer , compute the gradient given the loss gradient .
Solution:
Step 1: Write in index notation: (Einstein summation over ).
Step 2: Apply the chain rule:
Step 3: Compute :
where is the Kronecker delta (1 if , 0 otherwise).
Step 4: Substitute:
Step 5: In matrix form: (outer product)
Einsum notation: 'k,i->ki' where indexes and indexes .
Practice Problems
📝Problem 1: Batch Attention Computation
Problem: In transformer attention, given , , and , write the full attention computation using einsum notation.
Solution:
Step 1: Compute attention scores:
Einsum: 'btd,bsd->bts'
Step 2: Scale by :
Step 3: Apply softmax over the dimension:
Step 4: Compute weighted sum:
Einsum: 'bts,bsd->btd'
Combined: scores = 'btd,bsd->bts'; out = 'bts,bsd->btd'
Result shape:
The attention mechanism is a batched weighted average over source positions, where weights are determined by query-key similarity.
📝Problem 2: Frobenius Norm via Einstein Summation
Problem: Express the Frobenius norm of a matrix using Einstein summation, and verify that it equals .
Solution:
The Frobenius norm squared is:
Einsum notation: 'ij,ij->'
Verification: , so:
Therefore ✓
Alternative expression: where are the singular values of .
📝Problem 3: Tensor Product of Two Vectors
Problem: Compute the tensor product where and .
Solution:
The tensor product is a rank-2 tensor:
Result shape: (outer product of a 3-vector with a 2-vector).
Einsum: 'i,j->ij' with and .
Key properties:
- (rank-1 matrix)
- (dot product)
📝Problem 4: Index Raising and Lowering
Problem: Given a metric and covariant vector , find the contravariant components .
Solution:
Step 1: Compute the inverse metric .
Step 2: Raise the index:
Result:
Verification: Lower back: ✓
And ✓
📝Problem 5: Kronecker Product Properties
Problem: Given and , compute and verify that .
Solution:
Step 1: Compute the Kronecker product:
Step 2: Compute trace of :
Step 3: Compute :
Verification: ✓
Additional properties of Kronecker product:
- (mixed-product property)
- (when inverses exist)
Common Mistakes
| Mistake | Correct Approach |
|---|---|
| Confusing (dot product) with (matrix multiply) | Check which indices are repeated vs. free |
Forgetting that einsum 'ij,kl->ijkl' is outer product, not contraction | Only repeated indices in the same term are summed |
| Assuming (matrix multiply is not commutative) | Order matters: in general |
| Using same index name for different dimensions | Each dummy index must appear exactly twice; use distinct names |
| Ignoring index alignment in batch operations | Batch dimensions must match or be broadcastable |
| Confusing covariant (lower) and contravariant (upper) indices | Upper indices transform with Jacobian, lower with inverse Jacobian |
| Forgetting that einsum does not broadcast | All dimensions must match or appear in the output |
Connections to Machine Learning
ℹ️ Connections to Machine Learning
Tensor calculus is the foundation of deep learning computation. Backpropagation computes gradients that are themselves tensors, and the chain rule becomes a tensor contraction. Attention mechanisms in transformers are tensor operations: is a batched outer product over sequence positions. Convolutional layers apply weight tensors to input tensors via einsum-like operations. Understanding tensor calculus enables efficient implementation of custom gradient functions, understanding of framework internals, and development of novel architectures. The tensor rank decomposition connects to model compression: low-rank factorization of weight matrices reduces parameters while preserving function. Tensor networks (MPS, Tensor Train) provide structured parameterizations that reduce the exponential cost of representing high-order correlations.
Exam/Interview Questions
Q1: Write the einsum notation for computing the matrix where and .
Answer: 'ik,jk->ij'. The summation is over (the shared dimension), and the free indices and give the row and column of .
Q2: Explain the difference between a covariant and contravariant tensor.
Answer: A covariant tensor (lower indices ) transforms with the Jacobian of the coordinate change: . A contravariant tensor (upper indices ) transforms with the inverse Jacobian: . Intuitively, covariant components shrink when the coordinate system expands, while contravariant components expand.
Q3: What is the tensor product of two vectors, and how does it relate to the outer product?
Answer: The tensor product produces a rank-2 tensor with components . This is identical to the outer product in linear algebra. The tensor product generalizes to higher-order tensors: the product of a rank-2 and rank-1 tensor gives a rank-3 tensor.
Q4: Why is the Frobenius norm of a matrix equal to ?
Answer: , so . Taking the square root gives the Frobenius norm. This identity is useful for computing norms without explicitly forming the full matrix.
Q5: In transformer attention, the operation is . Explain what tensor operations are involved.
Answer: is a batched matrix multiplication with einsum 'btd,bsd->bts', producing attention scores. Division by is elementwise scaling. Softmax is applied over the (source) dimension. The final multiplication with is 'bts,bsd->btd', a weighted sum over source positions. All operations are tensor contractions or elementwise.
Q6: What is the rank of a tensor, and why does it matter for model compression?
Answer: The tensor rank is the minimum number of rank-1 tensors needed to express a tensor as a sum: . A low-rank weight tensor can be decomposed into smaller factors, reducing the number of parameters from to . This is the basis of tensor decomposition methods for neural network compression.
Q7: How does backpropagation use tensor contractions?
Answer: Backpropagation applies the chain rule through each layer. For , the gradient is an outer product (tensor contraction with the input). For multi-dimensional tensors (e.g., convolutions), the gradient is computed via transposed convolution operations, which are also tensor contractions. Einsum notation makes these operations explicit and ensures correct index alignment.
Quick Reference
| Operation | Einstein Notation | Einsum String | Result Shape |
|---|---|---|---|
| Matrix Multiply | 'ik,kj->ij' | ||
| Batch MatMul | 'bik,bkj->bij' | ||
| Dot Product | 'i,i->' | scalar | |
| Outer Product | 'i,j->ij' | ||
| Trace | 'ii->' | scalar | |
| Frobenius Norm | 'ij,ij->' | scalar | |
| Transpose | 'ij->ji' | ||
| Diagonal | 'ii->i' | ||
| Sum | 'ij->' | scalar | |
| Elementwise Multiply | 'ij,ij->ij' | ||
| Matrix Norm | 'ij,kl->' | scalar | |
| Batch Trace | 'bii->b' |
Cross-References
- 097-advanced-differential-geometry — Metric tensors on manifolds extend the concept of inner products to curved spaces; Christoffel symbols are tensor expressions
- 098-advanced-functional-analysis — Tensor products of vector spaces generalize to infinite-dimensional Hilbert spaces; operator tensors in quantum mechanics
- 099-advanced-measure-theory — Integration of tensor fields requires measure-theoretic foundations; tensor-valued measures
- Linear Algebra (earlier modules) — Matrix operations, eigenvalues, and SVD are foundational to tensor decompositions
- Neural Networks: Backpropagation computes gradients as tensor contractions through the computational graph