Multivariable Calculus

ℹ️ Why It Matters

Multivariable calculus is the mathematical foundation of modern machine learning and artificial intelligence. Every neural network training step involves computing gradients in high-dimensional spaces, where the loss function depends on millions or billions of parameters. Understanding partial derivatives, the gradient vector, directional derivatives, and second-order information (Hessian matrix) enables you to diagnose training issues, design better optimization algorithms, and understand why certain methods converge while others fail. In generative models like VAEs and GANs, change of variables through the Jacobian determinant is essential for computing probability densities. Double and multiple integrals appear in Bayesian inference, expectation calculations, and probabilistic models. Without multivariable calculus, advanced AI research and implementation would be impossible.

Multivariable Functions

DfMultivariable Function

A multivariable function is a mapping $f: \mathbb{R}^n \to \mathbb{R}^m$ that takes a vector of $n$ inputs and produces a vector of $m$ outputs. When $m = 1$ , it is a scalar-valued function; when $m > 1$ , it is a vector-valued function. Common notation includes $f(x_1, x_2, \ldots, x_n)$ for scalar functions and $\vec{f}(\vec{x})$ for vector functions.

Scalar Function

f: \mathbb{R}^n \to \mathbb{R}, \quad \vec{x} = (x_1, x_2, \ldots, x_n) \mapsto f(\vec{x})

Here,

$n$ =Number of input variables (dimension)
$\vec{x}$ =Input vector in n-dimensional space
$f(\vec{x})$ =Scalar output value

Vector Function

\\vec{f}: \\mathbb{R}^n \\to \\mathbb{R}^m, \\quad \\vec{x} \\mapsto (f_1(\\vec{x}), f_2(\\vec{x}), \\ldots, f_m(\\vec{x}))

Here,

$m$ =Number of output components
$f_i$ =The i-th component function

Example: A neural network layer with 256 inputs and 64 outputs is a vector function $\vec{f}: \mathbb{R}^{256} \to \mathbb{R}^{64}$ .

Partial Derivatives Review

ℹ️ Quick Summary

The partial derivative of $f(x_1, \ldots, x_n)$ with respect to $x_i$ is the derivative of $f$ while holding all other variables constant. It measures how $f$ changes as only $x_i$ varies.

Partial Derivative

\\frac{\\partial f}{\\partial x_i} = \\lim_{h \\to 0} \\frac{f(x_1, \\ldots, x_i + h, \\ldots, x_n) - f(x_1, \\ldots, x_i, \\ldots, x_n)}{h}

Here,

$\frac{\partial f}{\partial x_i}$ =Partial derivative with respect to x_i
$h$ =Infinitesimal increment in x_i

Key Rules:

Sum Rule: $\frac{\partial}{\partial x_i}(f + g) = \frac{\partial f}{\partial x_i} + \frac{\partial g}{\partial x_i}$
Product Rule: $\frac{\partial}{\partial x_i}(f \cdot g) = \frac{\partial f}{\partial x_i} g + f \frac{\partial g}{\partial x_i}$
Chain Rule: $\frac{\partial}{\partial x_i} f(g_1(\vec{x}), \ldots) = \sum_j \frac{\partial f}{\partial g_j} \frac{\partial g_j}{\partial x_i}$
Clairaut's Theorem: $\frac{\partial^2 f}{\partial x_i \partial x_j} = \frac{\partial^2 f}{\partial x_j \partial x_i}$ (for continuous second derivatives)

The Gradient Vector

DfGradient Vector

The gradient of a scalar-valued function $f: \mathbb{R}^n \to \mathbb{R}$ is the vector of all first-order partial derivatives. It points in the direction of steepest ascent and has magnitude equal to the maximum rate of change of $f$ .

Gradient

\\nabla f = \\left( \\frac{\\partial f}{\\partial x_1}, \\frac{\\partial f}{\\partial x_2}, \\ldots, \\frac{\\partial f}{\\partial x_n} \\right)

Here,

$\nabla f$ =Gradient vector (nabla f)
$\frac{\partial f}{\partial x_i}$ =Partial derivative along i-th axis

ℹ️ Geometric Interpretation

The gradient $\nabla f(\vec{x}_0)$ is perpendicular (normal) to the level curve $f(\vec{x}) = c$ passing through $\vec{x}_0$ .
The gradient points in the direction of steepest increase of $f$ .
The negative gradient $-\nabla f$ points in the direction of steepest decrease.
The magnitude $|\nabla f|$ equals the maximum rate of change at that point.

Gradient Magnitude as Directional Max

|\\nabla f| = \\max_{\\|\\vec{u}\\| = 1} D_{\\vec{u}} f = \\max_{\\|\\vec{u}\\| = 1} \\nabla f \\cdot \\vec{u}

Here,

$\vec{u}$ =Unit direction vector
$D_{\vec{u}} f$ =Directional derivative in direction u

Example: For $f(x, y) = x^2 y + \sin(y)$ :

\nabla f = \left( 2xy, \; x^2 + \cos(y) \right)

At point $(1, \pi)$ : $\nabla f(1, \pi) = (2\pi, \; 1 - 1) = (2\pi, 0)$

Directional Derivative

DfDirectional Derivative

The directional derivative of $f$ at point $\vec{x}_0$ in the direction of unit vector $\vec{u}$ measures the rate of change of $f$ as we move from $\vec{x}_0$ in direction $\vec{u}$ .

Directional Derivative

D_{\\vec{u}} f(\\vec{x}_0) = \\lim_{h \\to 0} \\frac{f(\\vec{x}_0 + h\\vec{u}) - f(\\vec{x}_0)}{h} = \\nabla f(\\vec{x}_0) \\cdot \\vec{u}

Here,

$D_{\vec{u}} f$ =Directional derivative in direction u
$\vec{u}$ =Unit vector (||u|| = 1)
$\nabla f \cdot \vec{u}$ =Dot product with gradient

ThCauchy-Schwarz Bound

The directional derivative is bounded by:

|D_{\vec{u}} f| \leq |\nabla f| \cdot |\vec{u}| = |\nabla f|

Equality holds when $\vec{u}$ is parallel to $\nabla f$ (maximum ascent) or anti-parallel (maximum descent).

⚠️ Important

The direction vector $\vec{u}$ must be a unit vector for the formula $D_{\vec{u}} f = \nabla f \cdot \vec{u}$ to give the rate of change per unit distance. If $\vec{v}$ is not normalized, use $\vec{u} = \vec{v}/|\vec{v}|$ .

Example: For $f(x,y) = x^2 + y^2$ at $(1,1)$ , direction $\vec{u} = (1/\sqrt{2}, 1/\sqrt{2})$ :

$\nabla f(1,1) = (2, 2)$ , so $D_{\vec{u}} f = 2 \cdot \frac{1}{\sqrt{2}} + 2 \cdot \frac{1}{\sqrt{2}} = 2\sqrt{2}$

Hessian Matrix

DfHessian Matrix

The Hessian matrix of a scalar-valued function $f: \mathbb{R}^n \to \mathbb{R}$ is the $n \times n$ matrix of all second-order partial derivatives. It captures the local curvature (convexity/concavity) of $f$ and is essential for classifying critical points and designing second-order optimization methods.

Hessian

H_{ij} = \\frac{\\partial^2 f}{\\partial x_i \\partial x_j} = \\frac{\\partial}{\\partial x_i} \\left( \\frac{\\partial f}{\\partial x_j} \\right)

Here,

$H_{ij}$ =The (i,j) entry of the Hessian
$f$ =The scalar-valued function
$n$ =Dimension of the input space

ℹ️ Properties

The Hessian is always symmetric (by Clairaut's theorem): $H_{ij} = H_{ji}$ for all $i, j$ .
It has $n(n+1)/2$ unique entries (not $n^2$ ).
At a critical point ( $\nabla f = 0$ ), the Hessian determines whether the point is a minimum, maximum, or saddle point.
The trace of the Hessian equals the Laplacian: $\text{tr}(H) = \nabla^2 f = \sum_i \frac{\partial^2 f}{\partial x_i^2}$ .

Example: For $f(x, y) = x^4 + y^4 - 4xy + 1$ :

\nabla f = (4x^3 - 4y, \; 4y^3 - 4x)

H = \begin{bmatrix} 12x^2 & -4 \\ -4 & 12y^2 \end{bmatrix}

At critical point $(1,1)$ : $H = \begin{bmatrix} 12 & -4 \\ -4 & 12 \end{bmatrix}$

Multivariable Taylor Series

Second-Order Taylor Expansion

f(\\vec{x}_0 + \\vec{h}) = f(\\vec{x}_0) + \\nabla f(\\vec{x}_0)^T \\vec{h} + \\frac{1}{2} \\vec{h}^T H(\\vec{x}_0) \\vec{h} + O(\\|\\vec{h}\\|^3)

Here,

$\vec{x}_0$ =Expansion point
$\vec{h}$ =Small displacement vector
$\nabla f(\vec{x}_0)$ =Gradient at x_0
$H(\vec{x}_0)$ =Hessian at x_0

ℹ️ Interpretation of Terms

Zeroth order: $f(\vec{x}_0)$ — the function value at the expansion point.
First order: $\nabla f^T \vec{h}$ — linear approximation using the gradient.
Second order: $\frac{1}{2} \vec{h}^T H \vec{h}$ — curvature correction using the Hessian.
Higher-order terms $O(\|\vec{h}\|^3)$ become negligible for small $\vec{h}$ .

ThTaylor's Theorem with Remainder

For $f: \mathbb{R}^n \to \mathbb{R}$ with continuous partial derivatives up to order $k+1$ :

f(\vec{x}_0 + \vec{h}) = \sum_{|\alpha| \leq k} \frac{D^\alpha f(\vec{x}_0)}{\alpha!} \vec{h}^\alpha + R_k(\vec{h})

where $\alpha$ is a multi-index and $R_k$ is the remainder term.

Second-order approximation (scalar case):

f(x_0 + h) \approx f(x_0) + f'(x_0)h + \frac{1}{2}f''(x_0)h^2

Vector form: $\vec{h}^T H \vec{h} = \sum_{i,j} H_{ij} h_i h_j$ is a quadratic form.

Critical Points Classification

DfCritical Point

A point $\vec{x}_0$ is a critical point of $f$ if $\nabla f(\vec{x}_0) = \vec{0}$ (all partial derivatives are zero simultaneously).

ThSecond Derivative Test

At a critical point $\vec{x}_0$ where $\nabla f(\vec{x}_0) = \vec{0}$ , classify using the Hessian $H(\vec{x}_0)$ :

Positive definite (all eigenvalues $> 0$ ): $\vec{x}_0$ is a local minimum.
Negative definite (all eigenvalues $< 0$ ): $\vec{x}_0$ is a local maximum.
Indefinite (eigenvalues of mixed signs): $\vec{x}_0$ is a saddle point.
Semi-definite (some eigenvalues $= 0$ ): Test is inconclusive; need higher-order analysis.

Determinant Criterion (2D)

\\det(H) = H_{11}H_{22} - H_{12}^2

Here,

$H_{11}$ =Second partial w.r.t. x
$H_{22}$ =Second partial w.r.t. y
$H_{12}$ =Mixed partial

Classification rules for $f(x,y)$ :

$H_{11}$	$\det(H)$	Classification
$> 0$	$> 0$	Local minimum
$< 0$	$> 0$	Local maximum
—	$< 0$	Saddle point
—	$= 0$	Inconclusive

Example: $f(x,y) = x^2 + y^2 - 2x - 4y + 5$

\nabla f = (2x - 2, \; 2y - 4) = (0,0) \implies (x,y) = (1,2)

$H = \begin{bmatrix} 2 & 0 \\ 0 & 2 \end{bmatrix}$ , eigenvalues $\{2, 2\} > 0 \implies$ local minimum

Double Integrals

Double Integral over a Region

\\iint_D f(x,y) \\, dA = \\int_a^b \\int_{g_1(x)}^{g_2(x)} f(x,y) \\, dy \\, dx

Here,

$D$ =The region of integration in the xy-plane
$f(x,y)$ =The integrand function
$g_1(x), g_2(x)$ =Lower and upper bounds for y

ℹ️ Geometric Interpretation

If $f(x,y) \geq 0$ , the double integral $\iint_D f(x,y) \, dA$ represents the volume under the surface $z = f(x,y)$ and above the region $D$ .
If $f(x,y) = 1$ , the integral $\iint_D 1 \, dA = \text{Area}(D)$ gives the area of region $D$ .
Probability density: $\iint_D p(x,y) \, dA$ gives the probability that $(X,Y) \in D$ .

Triple Integral

\\iiint_V f(x,y,z) \\, dV = \\int_a^b \\int_{g_1(x)}^{g_2(x)} \\int_{h_1(x,y)}^{h_2(x,y)} f(x,y,z) \\, dz \\, dy \\, dx

Here,

$V$ =Volume region in 3D space
$dV$ =Volume element dx dy dz

ℹ️ Polar Coordinates

For circular regions, use polar coordinates: $x = r\cos\theta$ , $y = r\sin\theta$ .

\iint_D f(x,y) \, dx \, dy = \int_\alpha^\beta \int_0^R f(r\cos\theta, r\sin\theta) \, r \, dr \, d\theta

The factor $r$ is the Jacobian determinant of the polar coordinate transformation.

Change of Variables

DfJacobian Determinant

When changing variables in a multiple integral, the Jacobian determinant accounts for the distortion of area/volume caused by the coordinate transformation. For a transformation $(x, y) = T(u, v)$ , the area element transforms as $dx \, dy = |J_T| \, du \, dv$ .

Jacobian Determinant

\\det(J_T) = \\det \\begin{bmatrix} \\frac{\\partial x}{\\partial u} & \\frac{\\partial x}{\\partial v} \\\\ \\frac{\\partial y}{\\partial u} & \\frac{\\partial y}{\\partial v} \\end{bmatrix}

Here,

$J_T$ =Jacobian matrix of the transformation T
$|\det(J_T)|$ =Absolute value of Jacobian determinant

Change of Variables Formula (2D)

\\iint_{T(D)} f(x,y) \\, dx \\, dy = \\iint_D f(T(u,v)) \\, |\\det(J_T)| \\, du \\, dv

Here,

$T$ =Transformation from (u,v) to (x,y)
$D$ =Region in the (u,v) plane
$T(D)$ =Transformed region in the (x,y) plane

Common transformations:

Transformation	$x$	$y$	$\|\det(J)\|$
Polar	$r\cos\theta$	$r\sin\theta$	$r$
Cylindrical	$r\cos\theta$	$r\sin\theta$	$r$
Spherical	$\rho\sin\phi\cos\theta$	$\rho\sin\phi\sin\theta$	$\rho^2\sin\phi$

ThInverse Function Theorem

If $T: \mathbb{R}^n \to \mathbb{R}^n$ is continuously differentiable and $\det(J_T(\vec{x}_0)) \neq 0$ , then $T$ is locally invertible near $\vec{x}_0$ and the Jacobian of $T^{-1}$ is $J_{T^{-1}}(T(\vec{x}_0)) = J_T(\vec{x}_0)^{-1}$ .

Python Implementation

import numpy as np
import sympy as sp

# ============================================
# 1. Partial Derivatives with SymPy
# ============================================
x, y = sp.symbols('x y')
f = x**2 * y + sp.sin(y)

# Partial derivatives
df_dx = sp.diff(f, x)
df_dy = sp.diff(f, y)
print(f"f = {f}")
print(f"∂f/∂x = {df_dx}")      # 2*x*y
print(f"∂f/∂y = {df_dy}")      # x**2 + cos(y)

# Gradient
grad_f = [df_dx, df_dy]
print(f"∇f = {grad_f}")

# ============================================
# 2. Hessian Matrix with SymPy
# ============================================
f_hess = x**4 + y**4 - 4*x*y + 1
H = sp.hessian(f_hess, (x, y))
print(f"\nHessian of {f_hess}:")
sp.pprint(H)

eigenvals = H.eigenvals()
print(f"Eigenvalues at (1,1): {H.subs({x:1, y:1}).eigenvals()}")

# ============================================
# 3. Gradient Descent Implementation
# ============================================
def gradient_descent(grad_f, x0, lr=0.01, max_iter=1000, tol=1e-8):
    """Simple gradient descent optimization."""
    x = np.array(x0, dtype=float)
    history = [x.copy()]
    
    for i in range(max_iter):
        g = np.array(grad_f(x), dtype=float)
        if np.linalg.norm(g) < tol:
            print(f"Converged in {i} iterations")
            break
        x = x - lr * g
        history.append(x.copy())
    
    return x, np.array(history)

# Minimize f(x,y) = x^2 + y^2 - 2x - 4y + 5
f = lambda v: v[0]**2 + v[1]**2 - 2*v[0] - 4*v[1] + 5
grad = lambda v: np.array([2*v[0]-2, 2*v[1]-4])

x_min, hist = gradient_descent(grad, [5.0, 5.0], lr=0.1)
print(f"\nMinimum at: {x_min}")  # Should be [1, 2]

# ============================================
# 4. Numerical Jacobian
# ============================================
def numerical_jacobian(f, x, h=1e-7):
    """Compute Jacobian matrix numerically using central differences."""
    n = len(x)
    m = len(f(x))
    J = np.zeros((m, n))
    for j in range(n):
        x_plus = x.copy()
        x_minus = x.copy()
        x_plus[j] += h
        x_minus[j] -= h
        J[:, j] = (f(x_plus) - f(x_minus)) / (2 * h)
    return J

# Example: f(x,y) = [x^2 + y, x*y^2]
f_vec = lambda v: np.array([v[0]**2 + v[1], v[0]*v[1]**2])
x_point = np.array([1.0, 2.0])
J = numerical_jacobian(f_vec, x_point)
print(f"\nJacobian at (1,2):\n{J}")
# Expected: [[2, 1], [4, 4]]

# ============================================
# 5. Numerical Hessian
# ============================================
def numerical_hessian(f, x, h=1e-5):
    """Compute Hessian matrix numerically."""
    n = len(x)
    H = np.zeros((n, n))
    f0 = f(x)
    
    for i in range(n):
        for j in range(n):
            x_pp = x.copy()
            x_pm = x.copy()
            x_mp = x.copy()
            x_mm = x.copy()
            
            x_pp[i] += h; x_pp[j] += h
            x_pm[i] += h; x_pm[j] -= h
            x_mp[i] -= h; x_mp[j] += h
            x_mm[i] -= h; x_mm[j] -= h
            
            H[i, j] = (f(x_pp) - f(x_pm) - f(x_mp) + f(x_mm)) / (4 * h**2)
    
    return H

# Test on f(x,y) = x^4 + y^4 - 4xy + 1
f_test = lambda v: v[0]**4 + v[1]**4 - 4*v[0]*v[1] + 1
H_num = numerical_hessian(f_test, np.array([1.0, 1.0]))
print(f"\nNumerical Hessian at (1,1):\n{H_num}")

# ============================================
# 6. Double Integration with SciPy
# ============================================
from scipy import integrate

def integrand(y, x):
    return np.exp(-(x**2 + y**2))

# Integrate over circle of radius R
result, error = integrate.dblquad(
    integrand,
    -2, 2,           # x bounds
    lambda x: -np.sqrt(4-x**2),  # y lower
    lambda x: np.sqrt(4-x**2)    # y upper
)
print(f"\nDouble integral over circle: {result:.6f}")
print(f"Expected (≈ π(1-e^(-4))): {np.pi*(1-np.exp(-4)):.6f}")

Applications in AI/ML

ℹ️ Optimization Landscapes

The loss function $L(\theta)$ in machine learning is a scalar function of the parameter vector $\theta \in \mathbb{R}^d$ . The geometry of this function — its gradient, curvature, and critical points — directly determines the behavior of training algorithms.

1. Gradient Descent: The most fundamental optimization algorithm uses $\theta_{t+1} = \theta_t - \eta \nabla L(\theta_t)$ . The gradient $\nabla L$ provides the direction of steepest descent.

2. Second-Order Methods: Newton's method uses the Hessian to achieve quadratic convergence:

\theta_{t+1} = \theta_t - H^{-1}(\theta_t) \nabla L(\theta_t)

The Hessian's eigenvalues determine convergence rate: eigenvalues close to zero slow convergence, while large condition numbers indicate ill-conditioning.

3. Natural Gradient: The Fisher information matrix (an expected Hessian) accounts for the geometry of the parameter space, providing better updates than standard gradient descent.

4. Saddle Points in High Dimensions: In high-dimensional loss landscapes, saddle points are exponentially more common than local minima. The Hessian's indefinite eigenvalues reveal saddle points that trap first-order methods.

5. Normalizing Flows: Use the change of variables formula with the Jacobian determinant to transform simple distributions into complex ones:

\log p_X(x) = \log p_Z(f^{-1}(x)) + \log |\det(J_{f^{-1}}(x))|

6. Backpropagation: Each layer's Jacobian matrix propagates gradients through the chain rule. Vanishing/exploding gradients correspond to Jacobian eigenvalues less than or greater than 1.

Common Mistakes

📋Common Mistakes in Multivariable Calculus

Mistake	Correct Approach
Forgetting chain rule: $\frac{d}{dx}f(g(x)) = f'(g(x))$	$\frac{d}{dx}f(g(x)) = f'(g(x)) \cdot g'(x)$
Ignoring mixed partials	Verify $f_{xy} = f_{yx}$ for continuous functions
Wrong integration order	Check bounds: $\int_a^b \int_{g_1(x)}^{g_2(x)} dy \, dx$ vs $\int_c^d \int_{h_1(y)}^{h_2(y)} dx \, dy$
Missing Jacobian determinant	Always include $\|\det(J)\|$ when changing variables
Confusing $\nabla f$ with $df$	$\nabla f$ is a vector; $df$ is a differential (covector)
Not normalizing direction vector	$\vec{u}$ must satisfy $\\|\vec{u}\\| = 1$ for directional derivative
Assuming Hessian is positive definite	Check eigenvalues, not just diagonal entries
Forgetting absolute value in Jacobian	Use $\|\det(J)\|$ for area/volume element

Interview Questions

📝Question 1: Gradient Properties

Q: Why is the gradient perpendicular to level curves?

💡Answer

A level curve is defined by $f(\vec{x}) = c$ . Taking the total differential: $\nabla f \cdot d\vec{x} = 0$ . Since $d\vec{x}$ is tangent to the level curve, $\nabla f$ must be perpendicular to it. This is because the directional derivative along the level curve is zero (the function value doesn't change), so $\nabla f \cdot \vec{t} = 0$ for any tangent vector $\vec{t}$ .

📝Question 2: Hessian in Optimization

Q: Why can't we always use Newton's method with the Hessian?

💡Answer

Three main reasons: (1) Computing and storing the $n \times n$ Hessian requires $O(n^2)$ memory, infeasible for millions of parameters. (2) Inverting the Hessian costs $O(n^3)$ . (3) At saddle points, the Hessian is not positive definite, causing Newton's method to move toward the saddle instead of a minimum. Quasi-Newton methods (L-BFGS) approximate the Hessian to address these issues.

📝Question 3: Jacobian in Neural Networks

Q: How does the Jacobian relate to backpropagation?

💡Answer

Backpropagation applies the chain rule: if layer $l$ computes $\vec{a}^l = \sigma(W^l \vec{a}^{l-1} + \vec{b}^l)$ , then $\frac{\partial L}{\partial \vec{a}^{l-1}} = (W^l)^T \text{diag}(\sigma') \frac{\partial L}{\partial \vec{a}^l}$ . The Jacobian of layer $l$ is $J_l = \text{diag}(\sigma') W^l$ . If eigenvalues of $J_l$ are all $< 1$ , gradients vanish; if $> 1$ , they explode.

📝Question 4: Directional Derivative

Q: In what direction does $f$ increase most rapidly at a point?

💡Answer

The direction of most rapid increase is the gradient direction $\nabla f / \|\nabla f\|$ . The rate of increase is $\|\nabla f\|$ . This follows from the Cauchy-Schwarz inequality: $D_{\vec{u}} f = \nabla f \cdot \vec{u} \leq \|\nabla f\| \cdot \|\vec{u}\| = \|\nabla f\|$ , with equality when $\vec{u} = \nabla f / \|\nabla f\|$ .

📝Question 5: Saddle Points

Q: Why are saddle points more common than local minima in high dimensions?

💡Answer

A critical point is a saddle point if the Hessian has at least one positive and one negative eigenvalue. In $d$ dimensions, the probability that all eigenvalues have the same sign decreases exponentially with $d$ . For a random matrix, the probability of being positive definite is approximately $2^{-d}$ . Thus, in high dimensions (e.g., $d = 10^6$ ), virtually all critical points are saddle points.

📝Question 6: Taylor Series

Q: When does the second-order Taylor approximation underestimate the true function value?

💡Answer

The second-order approximation $f(\vec{x}_0 + \vec{h}) \approx f(\vec{x}_0) + \nabla f^T \vec{h} + \frac{1}{2}\vec{h}^T H \vec{h}$ underestimates when the remainder $R_2 = \frac{1}{6} \nabla^3 f(\xi) h^3$ (in 1D) is positive. For multivariable functions, if $f$ is convex ( $H \succeq 0$ ) and we expand around a minimum ( $\nabla f = 0$ ), the approximation is exact for quadratic functions and underestimates for functions with positive higher-order derivatives.

Practice Problems

📝Problem 1: Gradient and Directional Derivative

Compute the gradient of $f(x,y,z) = x^2 y + yz^2$ at the point $(1,2,3)$ and find the directional derivative in the direction of $\vec{v} = (1,1,1)$ .

💡Solution

\nabla f = (2xy, \; x^2 + z^2, \; 2yz)

At $(1,2,3)$ : $\nabla f = (4, \; 10, \; 12)$

Normalize: $\vec{u} = \frac{1}{\sqrt{3}}(1,1,1)$

D_{\vec{u}} f = \nabla f \cdot \vec{u} = \frac{1}{\sqrt{3}}(4 + 10 + 12) = \frac{26}{\sqrt{3}} \approx 15.01

📝Problem 2: Hessian Classification

Classify the critical points of $f(x,y) = x^3 - 3xy^2 + y^3$ .

💡Solution

\nabla f = (3x^2 - 3y^2, \; -6xy + 3y^2) = (0,0)

From first equation: $x^2 = y^2 \implies x = \pm y$

If $x = y$ : $-6y^2 + 3y^2 = -3y^2 = 0 \implies y = 0 \implies (0,0)$

If $x = -y$ : $6y^2 + 3y^2 = 9y^2 = 0 \implies y = 0 \implies (0,0)$

H = \begin{bmatrix} 6x & -6y \\ -6y & -6x + 6y \end{bmatrix}

At $(0,0)$ : $H = \begin{bmatrix} 0 & 0 \\ 0 & 0 \end{bmatrix}$

The Hessian is the zero matrix — test is inconclusive. Higher-order analysis shows $(0,0)$ is a saddle point (the function takes both positive and negative values in every neighborhood of the origin).

📝Problem 3: Change of Variables

Evaluate $\iint_D (x^2 + y^2) \, dA$ where $D$ is the disk $x^2 + y^2 \leq 4$ using polar coordinates.

💡Solution

Convert to polar: $x = r\cos\theta$ , $y = r\sin\theta$ , $x^2 + y^2 = r^2$ , $dA = r \, dr \, d\theta$

\int_0^{2\pi} \int_0^2 r^2 \cdot r \, dr \, d\theta = \int_0^{2\pi} d\theta \int_0^2 r^3 \, dr

= 2\pi \cdot \left[\frac{r^4}{4}\right]_0^2 = 2\pi \cdot \frac{16}{4} = 8\pi \approx 25.13

📝Problem 4: Taylor Expansion

Find the second-order Taylor approximation of $f(x,y) = e^{x+y}$ at $(0,0)$ and evaluate at $(0.1, 0.1)$ .

💡Solution

At $(0,0)$ : $f = 1$ , $\nabla f = (1,1)$ , $H = \begin{bmatrix} 1 & 1 \\ 1 & 1 \end{bmatrix}$

f(\vec{h}) \approx 1 + (h_1 + h_2) + \frac{1}{2}(h_1^2 + 2h_1 h_2 + h_2^2)

At $(0.1, 0.1)$ : $f \approx 1 + 0.2 + \frac{1}{2}(0.01 + 0.02 + 0.01) = 1 + 0.2 + 0.02 = 1.22$

Exact: $e^{0.2} \approx 1.2214$ (error $\approx 0.0014$ )

📝Problem 5: Jacobian Determinant

Compute the Jacobian determinant of the transformation $x = u^2 - v^2$ , $y = 2uv$ and use it to evaluate $\iint_R (x+y) \, dx \, dy$ where $R$ is the region bounded by $x = 0$ , $x = 1$ , $y = 0$ , $y = 1$ .

💡Solution

\frac{\partial(x,y)}{\partial(u,v)} = \det \begin{bmatrix} 2u & -2v \\ 2v & 2u \end{bmatrix} = 4u^2 + 4v^2 = 4(u^2+v^2)

The transformation maps to the first quadrant. Setting $u^2 - v^2 = 0 \implies u = v$ , and $u^2 - v^2 = 1$ defines a hyperbola.

\iint_R (x+y) \, dx \, dy = \iint_D (u^2 - v^2 + 2uv) \cdot 4(u^2+v^2) \, du \, dv

This integral evaluates to approximately $0.5$ after computing the appropriate bounds in $(u,v)$ space.

Quick Reference

📋Multivariable Calculus Quick Reference

Concept	Formula	Key Property
Gradient	$\nabla f = \left(\frac{\partial f}{\partial x_1}, \ldots, \frac{\partial f}{\partial x_n}\right)$	Direction of steepest ascent
Directional Derivative	$D_{\vec{u}} f = \nabla f \cdot \vec{u}$ , $\\|\vec{u}\\| = 1$	Rate of change in direction $\vec{u}$
Hessian	$H_{ij} = \frac{\partial^2 f}{\partial x_i \partial x_j}$	Symmetric matrix; determines curvature
Taylor (2nd order)	$f(\vec{x}_0 + \vec{h}) \approx f + \nabla f^T \vec{h} + \frac{1}{2}\vec{h}^T H \vec{h}$	Quadratic approximation
Critical point	$\nabla f(\vec{x}_0) = \vec{0}$	Candidate for min/max/saddle
Min	$H \succ 0$ (all $\lambda_i > 0$ )	Convex curvature
Max	$H \prec 0$ (all $\lambda_i < 0$ )	Concave curvature
Saddle	$H$ indefinite (mixed-sign eigenvalues)	Curves up in some directions, down in others
Double integral	$\iint_D f(x,y) \, dA = \int \int f(x,y) \, dy \, dx$	Volume under surface
Jacobian	$J_{ij} = \frac{\partial f_i}{\partial x_j}$	Local linear approximation of $\vec{f}$
Jacobian det.	$\det(J)$	Area/volume scaling factor
Polar coords	$x = r\cos\theta$ , $y = r\sin\theta$ , $dA = r \, dr \, d\theta$	Use for circular regions

Cross-References

Linear Algebra: Eigenvalues and Eigenvectors — Essential for Hessian analysis and classification of critical points
Optimization: Calculus Optimization — Direct application of multivariable gradients
Probability: Probability Distributions — Joint distributions and expectations via double integrals
Calculus I: Calculus Derivatives — Foundation for partial derivatives and gradients
Calculus II: Calculus Integrals — Foundation for multiple integrals
Matrix Calculus: Matrix Calculus — Efficient computation of gradients in vectorized form