Tensor Calculus

ℹ️ Why It Matters

Tensor calculus is the mathematical language of multi-dimensional data manipulation. Every operation in deep learning—matrix multiplication, convolutions, attention mechanisms—is fundamentally a tensor operation. Mastering tensor calculus unlocks understanding of how frameworks like PyTorch and TensorFlow compute gradients through complex architectures, and enables implementation of custom operations that go beyond standard library functions. The Einstein summation convention provides a compact, powerful notation for expressing complex multi-index operations that would otherwise require cumbersome summation symbols.

Core Definitions

DfTensor

A tensor is a multidimensional array of numbers that transforms according to specific rules under a change of coordinates. A tensor of rank $k$ (also called order- $k$ ) has $k$ indices. A scalar is rank-0, a vector is rank-1, a matrix is rank-2, and higher-order tensors have 3 or more indices. Formally, a type $(p,q)$ tensor is a multilinear map from $q$ copies of the dual space and $p$ copies of the vector space to the scalar field.

DfEinstein Summation Convention

When an index appears exactly twice in a single term, summation over that index is implied. For example, $a_{ik}b_{kj}$ means $\sum_{k} a_{ik}b_{kj}$ . Free indices (appearing once) label the components of the result. Dummy indices (appearing twice) are summed over and do not appear in the result. This convention eliminates the need for explicit sigma notation and makes tensor manipulations far more concise and readable.

DfCovariant Tensor

A covariant tensor (or lower-index tensor) transforms with the Jacobian of the coordinate change. Components are written with subscripts: $T_{ij}$ . Under a coordinate transformation $x \to x'$ , a rank-2 covariant tensor transforms as $T'_{ij} = \frac{\partial x^k}{\partial x'^i}\frac{\partial x^l}{\partial x'^j}T_{kl}$ . Covariant tensors are "measured" quantities—they measure distances, angles, and other geometric properties.

DfContravariant Tensor

A contravariant tensor (or upper-index tensor) transforms with the inverse Jacobian. Components are written with superscripts: $T^{ij}$ . Under coordinate transformation, $T'^{ij} = \frac{\partial x'^i}{\partial x^k}\frac{\partial x'^j}{\partial x^l}T^{kl}$ . Contravariant tensors represent "physical" quantities like velocity and position that transform inversely to the coordinate system.

DfMetric Tensor

The metric tensor $g_{ij}$ is a symmetric positive-definite covariant tensor that defines distances and angles on a manifold. The infinitesimal distance is $ds^2 = g_{ij}dx^i dx^j$ . The metric also defines raising and lowering of indices: $v^i = g^{ij}v_j$ and $w_i = g_{ij}w^j$ . In Euclidean space, the metric tensor is simply the identity matrix, but in curved spaces it encodes the geometry.

DfTensor Contraction

Tensor contraction is the operation of summing over a pair of indices—one upper and one lower. For a mixed tensor $T^i_{\ j}$ , contraction yields a scalar: $T^i_{\ i} = \sum_i T^i_{\ i}$ . Contraction generalizes the trace operation and is the fundamental operation underlying matrix multiplication, dot products, and all einsum computations.

Key Formulas

Einstein Summation Notation

c_{ij} = \sum_k a_{ik}b_{kj} \equiv a_{ik}b_{kj}

Here,

$a_{ik}$ =First tensor component with row index i and summation index k
$b_{kj}$ =Second tensor component with summation index k and column index j
$c_{ij}$ =Result tensor with free indices i and j

Tensor Contraction

T^{i}_{\ j} = g^{ik}T_{kj}

Here,

$g^{ik}$ =Inverse metric tensor (raises the index)
$T_{kj}$ =Covariant tensor of rank 2
$T^{i}_{ j}$ =Mixed tensor with one upper and one lower index

Outer Product (Tensor Product)

(a \otimes b)_{ij} = a_i b_j

Here,

$a_i$ =Component of the first vector (rank-1 tensor)
$b_j$ =Component of the second vector (rank-1 tensor)
$(a otimes b)_{ij}$ =Resulting rank-2 tensor

Trace

\text{tr}(A) = A^{i}_{\ i} = \sum_i A_{ii}

Here,

$A_{ii}$ =Diagonal elements of a rank-2 tensor
$A^{i}_{ i}$ =Contracted index notation for trace

Frobenius Norm

\|A\|_F = \sqrt{A_{ij}A_{ij}} = \sqrt{\text{tr}(A^T A)}

Here,

$A_{ij}$ =Elements of the matrix
$A^T$ =Transpose of A

Kronecker Product

(A \otimes B)_{ik,jl} = A_{ij}B_{kl}

Here,

$A_{ij}$ =Element of matrix A
$B_{kl}$ =Element of matrix B
$(A otimes B)_{ik,jl}$ =Block structure of Kronecker product

Index Lowering and Raising

v_i = g_{ij}v^j, \quad v^i = g^{ij}v_j

Here,

$g_{ij}$ =Metric tensor (lowers indices)
$g^{ij}$ =Inverse metric tensor (raises indices)
$v_i, v^i$ =Covariant and contravariant components

Important Theorems

ThUniversal Approximation Theorem (Tensor Version)

A feedforward neural network with a single hidden layer containing a finite number of neurons can approximate any continuous function on a compact subset of $\mathbb{R}^n$ to arbitrary accuracy, provided the activation function is non-constant, bounded, and monotonically-increasing. The weights and biases are tensor parameters optimized via backpropagation. This theorem guarantees that neural networks are universal function approximators.

ThTensor Rank Decomposition

Any rank- $r$ tensor $T \in \mathbb{R}^{n_1 \times n_2 \times \cdots \times n_k}$ can be decomposed as a sum of $r$ rank-1 tensors: $T = \sum_{s=1}^{r} a_s^{(1)} \otimes a_s^{(2)} \otimes \cdots \otimes a_s^{(k)}$ . The minimum $r$ needed is called the tensor rank. For matrices (rank-2 tensors), this is the standard matrix rank. For higher-order tensors, tensor rank can differ from the notion of multilinear rank.

ThEinstein Summation Equivalence

Every Einstein summation expression can be written as a sequence of tensor contractions, which are themselves compositions of matrix multiplications, transpositions, and reshaping operations. This guarantees that any einsum expression can be implemented as an efficient sequence of BLAS/LAPACK operations, ensuring computational tractability for arbitrary tensor expressions.

ThPolyadic Decomposition (CP Decomposition)

A tensor $T \in \mathbb{R}^{I \times J \times K}$ can be decomposed as $T \approx \sum_{r=1}^{R} \mathbf{a}_r \circ \mathbf{b}_r \circ \mathbf{c}_r$ , where $\mathbf{a}_r \in \mathbb{R}^I$ , $\mathbf{b}_r \in \mathbb{R}^J$ , $\mathbf{c}_r \in \mathbb{R}^K$ , and $\circ$ denotes the outer product. The minimum $R$ for exact decomposition is the tensor CP rank. This decomposition generalizes low-rank matrix factorization to higher dimensions.

Worked Examples

📝Matrix Multiplication via Einstein Summation

Problem: Compute $C = AB$ where $A \in \mathbb{R}^{3 \times 4}$ and $B \in \mathbb{R}^{4 \times 5}$ using Einstein summation notation.

Solution:

The matrix multiplication is expressed as $C_{ij} = A_{ik}B_{kj}$ with summation over $k$ .

In einsum notation: 'ik,kj->ij'

Step 1: Identify free and dummy indices.

Free indices: $i$ (row of $C$ ), $j$ (column of $C$ )
Dummy index: $k$ (summed over dimension 4)

Step 2: Compute each element:

C_{ij} = \sum_{k=1}^{4} A_{ik}B_{kj} = A_{i1}B_{1j} + A_{i2}B_{2j} + A_{i3}B_{3j} + A_{i4}B_{4j}

Step 3: The result $C \in \mathbb{R}^{3 \times 5}$ .

Concrete example: If $A = \begin{pmatrix} 1 & 2 & 3 & 4 \\ 5 & 6 & 7 & 8 \\ 9 & 10 & 11 & 12 \end{pmatrix}$ and $B = \begin{pmatrix} 1 & 0 \\ 0 & 1 \\ 1 & 1 \\ 0 & 0 \end{pmatrix}$ , then:

C_{11} = 1 \cdot 1 + 2 \cdot 0 + 3 \cdot 1 + 4 \cdot 0 = 4

C_{12} = 1 \cdot 0 + 2 \cdot 1 + 3 \cdot 1 + 4 \cdot 0 = 5

And so on for all 15 elements.

📝Batch Matrix Multiplication

Problem: Given batches of matrices $A \in \mathbb{R}^{B \times M \times K}$ and $B \in \mathbb{R}^{B \times K \times N}$ , express batch matrix multiplication using einsum.

Solution:

Einsum notation: 'bmk,bkn->bmn'

Step 1: For each batch index $b$ , compute the standard matrix product:

C_{bmn} = \sum_{k=1}^{K} A_{bmk}B_{bkn}

Step 2: The batch index $b$ appears only once in the output, so it is NOT summed over.

Step 3: Result shape: $C \in \mathbb{R}^{B \times M \times N}$

Example values: If $B=32$ , $M=128$ , $K=64$ , $N=256$ , this computes 32 independent matrix multiplications of size $128 \times 64$ times $64 \times 256$ .

Computational advantage: Batch matrix multiplication can be parallelized on GPUs, processing all $B$ matrix multiplications simultaneously. The einsum notation makes this parallelism explicit—the batch dimension $b$ is free (not summed), so operations along different batches are independent.

📝Dot Product and Outer Product

Problem: For vectors $\mathbf{a} \in \mathbb{R}^n$ and $\mathbf{b} \in \mathbb{R}^n$ , express both the dot product and outer product using Einstein summation.

Solution:

Dot product: $c = a_i b_i$ (in einsum: 'i,i->')

This sums over the single index: $c = \sum_{i=1}^{n} a_i b_i$

The result is a scalar (rank-0 tensor).

Outer product: $C_{ij} = a_i b_j$ (in einsum: 'i,j->ij')

No indices are summed; both $i$ and $j$ are free.

The result is a rank-2 tensor: $C \in \mathbb{R}^{n \times n}$ .

Key distinction: The dot product contracts both indices into a scalar. The outer product keeps both indices free, producing a matrix.

Example: For $\mathbf{a} = (1, 2, 3)$ and $\mathbf{b} = (4, 5, 6)$ :

Dot product: $1 \cdot 4 + 2 \cdot 5 + 3 \cdot 6 = 32$

Outer product: $\begin{pmatrix} 4 & 5 & 6 \\ 8 & 10 & 12 \\ 12 & 15 & 18 \end{pmatrix}$

📝Trace Using Index Contraction

Problem: Express the trace of a matrix $A \in \mathbb{R}^{n \times n}$ using Einstein summation, and compute it for a specific matrix.

Solution:

Einsum notation: 'ii->' or equivalently $A^{i}_{\ i}$

Step 1: The index $i$ appears twice (as row and column), so it is summed over:

\text{tr}(A) = \sum_{i=1}^{n} A_{ii}

Step 2: For $A = \begin{pmatrix} 3 & 1 \\ 2 & 4 \end{pmatrix}$ :

\text{tr}(A) = A_{11} + A_{22} = 3 + 4 = 7

Step 3: The trace is invariant under cyclic permutation: $\text{tr}(AB) = \text{tr}(BA)$ even when $AB \neq BA$ .

Step 4: Generalization: The trace of a product of $k$ matrices is invariant under cyclic permutation:

\text{tr}(A_1 A_2 \cdots A_k) = \text{tr}(A_2 A_3 \cdots A_k A_1) = \cdots

This property is used extensively in matrix calculus and backpropagation.

📝Computing Tensor Gradients

Problem: In a neural network layer $\mathbf{y} = W\mathbf{x} + \mathbf{b}$ , compute the gradient $\frac{\partial L}{\partial W}$ given the loss gradient $\frac{\partial L}{\partial \mathbf{y}}$ .

Solution:

Step 1: Write in index notation: $y_i = W_{ij}x_j + b_i$ (Einstein summation over $j$ ).

Step 2: Apply the chain rule:

\frac{\partial L}{\partial W_{ij}} = \frac{\partial L}{\partial y_k} \cdot \frac{\partial y_k}{\partial W_{ij}}

Step 3: Compute $\frac{\partial y_k}{\partial W_{ij}}$ :

\frac{\partial y_k}{\partial W_{ij}} = \frac{\partial (W_{kl}x_l + b_k)}{\partial W_{ij}} = \delta_{ki}\delta_{jl}x_l = \delta_{ki}x_j

where $\delta_{ki}$ is the Kronecker delta (1 if $k=i$ , 0 otherwise).

Step 4: Substitute:

\frac{\partial L}{\partial W_{ij}} = \frac{\partial L}{\partial y_k} \delta_{ki} x_j = \frac{\partial L}{\partial y_i} x_j

Step 5: In matrix form: $\frac{\partial L}{\partial W} = \frac{\partial L}{\partial \mathbf{y}} \mathbf{x}^T$ (outer product)

Einsum notation: 'k,i->ki' where $k$ indexes $\partial L/\partial \mathbf{y}$ and $i$ indexes $\mathbf{x}$ .

Practice Problems

📝Problem 1: Batch Attention Computation

Problem: In transformer attention, given $Q \in \mathbb{R}^{B \times T \times d}$ , $K \in \mathbb{R}^{B \times S \times d}$ , and $V \in \mathbb{R}^{B \times S \times d}$ , write the full attention computation $\text{softmax}(QK^T/\sqrt{d})V$ using einsum notation.

Solution:

Step 1: Compute attention scores: $A_{bts} = \sum_d Q_{btd}K_{bsd}$ Einsum: 'btd,bsd->bts'

Step 2: Scale by $\sqrt{d}$ : $A_{bts} \leftarrow A_{bts}/\sqrt{d}$

Step 3: Apply softmax over the $s$ dimension: $\hat{A}_{bts} = \text{softmax}_s(A_{bts})$

Step 4: Compute weighted sum: $O_{btd} = \sum_s \hat{A}_{bts}V_{bsd}$ Einsum: 'bts,bsd->btd'

Combined: scores = 'btd,bsd->bts'; out = 'bts,bsd->btd'

Result shape: $O \in \mathbb{R}^{B \times T \times d}$

The attention mechanism is a batched weighted average over source positions, where weights are determined by query-key similarity.

📝Problem 2: Frobenius Norm via Einstein Summation

Problem: Express the Frobenius norm of a matrix $A \in \mathbb{R}^{m \times n}$ using Einstein summation, and verify that it equals $\sqrt{\text{tr}(A^T A)}$ .

Solution:

The Frobenius norm squared is:

\|A\|_F^2 = A_{ij}A_{ij} = \sum_{i=1}^{m}\sum_{j=1}^{n} A_{ij}^2

Einsum notation: 'ij,ij->'

Verification: $(A^T A)_{jj} = \sum_i A_{ij}^2$ , so:

\text{tr}(A^T A) = \sum_j \sum_i A_{ij}^2 = \sum_{i,j} A_{ij}^2 = \|A\|_F^2

Therefore $\|A\|_F = \sqrt{\text{tr}(A^T A)}$ ✓

Alternative expression: $\|A\|_F = \sqrt{\sum_i \sigma_i^2}$ where $\sigma_i$ are the singular values of $A$ .

📝Problem 3: Tensor Product of Two Vectors

Problem: Compute the tensor product $\mathbf{u} \otimes \mathbf{v}$ where $\mathbf{u} = (1, 2, 3)^T$ and $\mathbf{v} = (4, 5)^T$ .

Solution:

The tensor product is a rank-2 tensor: $(u \otimes v)_{ij} = u_i v_j$

u \otimes v = \begin{pmatrix} 1 \cdot 4 & 1 \cdot 5 \\ 2 \cdot 4 & 2 \cdot 5 \\ 3 \cdot 4 & 3 \cdot 5 \end{pmatrix} = \begin{pmatrix} 4 & 5 \\ 8 & 10 \\ 12 & 15 \end{pmatrix}

Result shape: $3 \times 2$ (outer product of a 3-vector with a 2-vector).

Einsum: 'i,j->ij' with $u = (1,2,3)$ and $v = (4,5)$ .

Key properties:

$\text{rank}(u \otimes v) = 1$ (rank-1 matrix)
$\text{tr}(u \otimes v) = \sum_i u_i v_i = \mathbf{u} \cdot \mathbf{v}$ (dot product)
$\|u \otimes v\|_F = \|\mathbf{u}\| \cdot \|\mathbf{v}\|$

📝Problem 4: Index Raising and Lowering

Problem: Given a metric $g_{ij} = \begin{pmatrix} 2 & 1 \\ 1 & 3 \end{pmatrix}$ and covariant vector $v_j = (3, 7)$ , find the contravariant components $v^i$ .

Solution:

Step 1: Compute the inverse metric $g^{ij}$ .

g^{ij} = \frac{1}{\det(g)} \begin{pmatrix} 3 & -1 \\ -1 & 2 \end{pmatrix}

\det(g) = 2 \cdot 3 - 1 \cdot 1 = 5

g^{ij} = \frac{1}{5}\begin{pmatrix} 3 & -1 \\ -1 & 2 \end{pmatrix} = \begin{pmatrix} 0.6 & -0.2 \\ -0.2 & 0.4 \end{pmatrix}

Step 2: Raise the index: $v^i = g^{ij}v_j$

v^1 = g^{11}v_1 + g^{12}v_2 = 0.6 \times 3 + (-0.2) \times 7 = 1.8 - 1.4 = 0.4

v^2 = g^{21}v_1 + g^{22}v_2 = (-0.2) \times 3 + 0.4 \times 7 = -0.6 + 2.8 = 2.2

Result: $v^i = (0.4, 2.2)$

Verification: Lower back: $v_j = g_{ij}v^i = g_{11}(0.4) + g_{12}(2.2) = 2(0.4) + 1(2.2) = 0.8 + 2.2 = 3$ ✓

And $v_2 = g_{21}(0.4) + g_{22}(2.2) = 1(0.4) + 3(2.2) = 0.4 + 6.6 = 7$ ✓

📝Problem 5: Kronecker Product Properties

Problem: Given $A = \begin{pmatrix} 1 & 2 \\ 3 & 4 \end{pmatrix}$ and $B = \begin{pmatrix} 0 & 5 \\ 6 & 7 \end{pmatrix}$ , compute $A \otimes B$ and verify that $\text{tr}(A \otimes B) = \text{tr}(A) \cdot \text{tr}(B)$ .

Solution:

Step 1: Compute the Kronecker product:

A \otimes B = \begin{pmatrix} 1 \cdot B & 2 \cdot B \\ 3 \cdot B & 4 \cdot B \end{pmatrix} = \begin{pmatrix} 0 & 5 & 0 & 10 \\ 6 & 7 & 12 & 14 \\ 0 & 15 & 0 & 20 \\ 18 & 21 & 24 & 28 \end{pmatrix}

Step 2: Compute trace of $A \otimes B$ :

\text{tr}(A \otimes B) = 0 + 7 + 0 + 28 = 35

Step 3: Compute $\text{tr}(A) \cdot \text{tr}(B)$ :

\text{tr}(A) = 1 + 4 = 5, \quad \text{tr}(B) = 0 + 7 = 7

\text{tr}(A) \cdot \text{tr}(B) = 5 \times 7 = 35

Verification: $\text{tr}(A \otimes B) = 35 = \text{tr}(A) \cdot \text{tr}(B)$ ✓

Additional properties of Kronecker product:

$(A \otimes B)(C \otimes D) = (AC) \otimes (BD)$ (mixed-product property)
$\text{rank}(A \otimes B) = \text{rank}(A) \cdot \text{rank}(B)$
$(A \otimes B)^{-1} = A^{-1} \otimes B^{-1}$ (when inverses exist)

Common Mistakes

Mistake	Correct Approach
Confusing $a_{ij}b_{ij}$ (dot product) with $a_{ij}b_{jk}$ (matrix multiply)	Check which indices are repeated vs. free
Forgetting that einsum `'ij,kl->ijkl'` is outer product, not contraction	Only repeated indices in the same term are summed
Assuming $A_{ik}B_{kj} = B_{kj}A_{ik}$ (matrix multiply is not commutative)	Order matters: $AB \neq BA$ in general
Using same index name for different dimensions	Each dummy index must appear exactly twice; use distinct names
Ignoring index alignment in batch operations	Batch dimensions must match or be broadcastable
Confusing covariant (lower) and contravariant (upper) indices	Upper indices transform with Jacobian, lower with inverse Jacobian
Forgetting that einsum does not broadcast	All dimensions must match or appear in the output

Connections to Machine Learning

ℹ️ Connections to Machine Learning

Tensor calculus is the foundation of deep learning computation. Backpropagation computes gradients that are themselves tensors, and the chain rule becomes a tensor contraction. Attention mechanisms in transformers are tensor operations: $QK^T$ is a batched outer product over sequence positions. Convolutional layers apply weight tensors to input tensors via einsum-like operations. Understanding tensor calculus enables efficient implementation of custom gradient functions, understanding of framework internals, and development of novel architectures. The tensor rank decomposition connects to model compression: low-rank factorization of weight matrices reduces parameters while preserving function. Tensor networks (MPS, Tensor Train) provide structured parameterizations that reduce the exponential cost of representing high-order correlations.

Exam/Interview Questions

Q1: Write the einsum notation for computing the matrix $C = AB^T$ where $A \in \mathbb{R}^{m \times k}$ and $B \in \mathbb{R}^{n \times k}$ .

Answer: 'ik,jk->ij'. The summation is over $k$ (the shared dimension), and the free indices $i$ and $j$ give the row and column of $C \in \mathbb{R}^{m \times n}$ .

Q2: Explain the difference between a covariant and contravariant tensor.

Answer: A covariant tensor (lower indices $T_{ij}$ ) transforms with the Jacobian of the coordinate change: $T'_{ij} = \frac{\partial x^k}{\partial x'^i}\frac{\partial x^l}{\partial x'^j}T_{kl}$ . A contravariant tensor (upper indices $T^{ij}$ ) transforms with the inverse Jacobian: $T'^{ij} = \frac{\partial x'^i}{\partial x^k}\frac{\partial x'^j}{\partial x^l}T^{kl}$ . Intuitively, covariant components shrink when the coordinate system expands, while contravariant components expand.

Q3: What is the tensor product of two vectors, and how does it relate to the outer product?

Answer: The tensor product $\mathbf{u} \otimes \mathbf{v}$ produces a rank-2 tensor with components $(u \otimes v)_{ij} = u_i v_j$ . This is identical to the outer product in linear algebra. The tensor product generalizes to higher-order tensors: the product of a rank-2 and rank-1 tensor gives a rank-3 tensor.

Q4: Why is the Frobenius norm of a matrix equal to $\sqrt{\text{tr}(A^T A)}$ ?

Answer: $(A^T A)_{jj} = \sum_i A_{ij}^2$ , so $\text{tr}(A^T A) = \sum_j \sum_i A_{ij}^2 = \sum_{i,j} A_{ij}^2 = \|A\|_F^2$ . Taking the square root gives the Frobenius norm. This identity is useful for computing norms without explicitly forming the full matrix.

Q5: In transformer attention, the operation is $\text{softmax}(QK^T/\sqrt{d})V$ . Explain what tensor operations are involved.

Answer: $QK^T$ is a batched matrix multiplication with einsum 'btd,bsd->bts', producing attention scores. Division by $\sqrt{d}$ is elementwise scaling. Softmax is applied over the $s$ (source) dimension. The final multiplication with $V$ is 'bts,bsd->btd', a weighted sum over source positions. All operations are tensor contractions or elementwise.

Q6: What is the rank of a tensor, and why does it matter for model compression?

Answer: The tensor rank is the minimum number of rank-1 tensors needed to express a tensor as a sum: $T = \sum_{s=1}^{r} a_s^{(1)} \otimes \cdots \otimes a_s^{(k)}$ . A low-rank weight tensor can be decomposed into smaller factors, reducing the number of parameters from $O(n_1 n_2 \cdots n_k)$ to $O(r \sum n_i)$ . This is the basis of tensor decomposition methods for neural network compression.

Q7: How does backpropagation use tensor contractions?

Answer: Backpropagation applies the chain rule through each layer. For $y = Wx$ , the gradient $\frac{\partial L}{\partial W} = \frac{\partial L}{\partial y} x^T$ is an outer product (tensor contraction with the input). For multi-dimensional tensors (e.g., convolutions), the gradient is computed via transposed convolution operations, which are also tensor contractions. Einsum notation makes these operations explicit and ensures correct index alignment.

Quick Reference

Operation	Einstein Notation	Einsum String	Result Shape
Matrix Multiply	$C_{ij} = A_{ik}B_{kj}$	`'ik,kj->ij'`	$(m, n)$
Batch MatMul	$C_{bij} = A_{bik}B_{bkj}$	`'bik,bkj->bij'`	$(b, m, n)$
Dot Product	$c = a_i b_i$	`'i,i->'`	scalar
Outer Product	$C_{ij} = a_i b_j$	`'i,j->ij'`	$(m, n)$
Trace	$c = A_{ii}$	`'ii->'`	scalar
Frobenius Norm	$\\|A\\|_F = \sqrt{A_{ij}A_{ij}}$	`'ij,ij->'`	scalar
Transpose	$B_{ij} = A_{ji}$	`'ij->ji'`	$(n, m)$
Diagonal	$d_i = A_{ii}$	`'ii->i'`	$(n,)$
Sum	$s = A_{ij}$	`'ij->'`	scalar
Elementwise Multiply	$C_{ij} = A_{ij}B_{ij}$	`'ij,ij->ij'`	$(m, n)$
Matrix Norm	$\\|A\\| = \sqrt{A_{ij}A_{kl}}$	`'ij,kl->'`	scalar
Batch Trace	$t_b = A_{bii}$	`'bii->b'`	$(b,)$

Cross-References

097-advanced-differential-geometry — Metric tensors on manifolds extend the concept of inner products to curved spaces; Christoffel symbols are tensor expressions
098-advanced-functional-analysis — Tensor products of vector spaces generalize to infinite-dimensional Hilbert spaces; operator tensors in quantum mechanics
099-advanced-measure-theory — Integration of tensor fields requires measure-theoretic foundations; tensor-valued measures
Linear Algebra (earlier modules) — Matrix operations, eigenvalues, and SVD are foundational to tensor decompositions
Neural Networks: Backpropagation computes gradients as tensor contractions through the computational graph