Derivatives and Differentiation
âšī¸ Why It Matters
Derivatives measure the instantaneous rate of change of a function. In machine learning, every gradient descent update â from simple linear regression to training large language models â relies on computing derivatives of a loss function with respect to model parameters. Understanding differentiation is the single most important skill for building and optimizing ML systems.
What is a Derivative
DfDerivative
The derivative of a function at a point is defined as the limit of the difference quotient as the increment approaches zero. Geometrically, the derivative represents the slope of the tangent line to the curve at that point.
Limit Definition of the Derivative
Here,
- =The derivative of f with respect to x
- =An infinitesimally small increment in x
- =The change in function value over the interval
- rac{f(x+h)-f(x)}{h}=The difference quotient (average rate of change)
đĄ Geometric Meaning
The derivative gives the slope of the line tangent to at the point . If , the function is increasing at . If , it is decreasing. If , the tangent is horizontal â a candidate for a local maximum or minimum.
â ī¸ Differentiability vs Continuity
If is differentiable at , then is continuous at . The converse is not true: is continuous at but not differentiable there (the left and right derivatives differ).
Basic Derivative Rules
The following rules allow us to differentiate complex functions by breaking them into simpler parts.
| Rule | Formula | Example |
|---|---|---|
| Constant | ||
| Constant Multiple | ||
| Power Rule | ||
| Sum Rule | ||
| Difference Rule | ||
| Product Rule | ||
| Quotient Rule | ||
| Chain Rule |
đĄ Product Rule Mnemonic
Remember: "derivative of the first times the second, plus the first times the derivative of the second." Or in words: .
Derivatives of Common Functions
| Function | Derivative | Notes |
|---|---|---|
| Power rule, for any real | ||
| Its own derivative | ||
| General exponential | ||
| Logarithmic derivative | ||
| Change of base | ||
| Cyclic pattern | ||
| Note the negative sign | ||
| From quotient rule on | ||
| Inverse trig | ||
| Inverse trig | ||
| Power rule with | ||
| Sigmoid (ML critical) | ||
| Hyperbolic tangent |
đĄ ML Sigmoid Derivative
The sigmoid function has the remarkably elegant derivative . This means once you compute during the forward pass, you can reuse it in the backward pass â a key efficiency in backpropagation.
Higher-Order Derivatives
DfHigher-Order Derivatives
The second derivative is the derivative of the derivative. In general, the -th derivative is denoted .
| Notation | Meaning |
|---|---|
| First derivative â rate of change | |
| Second derivative â concavity / acceleration | |
| Third derivative â rate of change of acceleration | |
| -th derivative | |
| Leibniz notation for second derivative |
ThSecond Derivative Test
If and exists:
- If , then has a local minimum at .
- If , then has a local maximum at .
- If , the test is inconclusive.
Taylor Series Expansion
Here,
- =The n-th derivative evaluated at a
- =Factorial of n
- =The point about which the expansion is made
âšī¸ Concavity and Inflection Points
- on an interval means is concave up (shaped like a cup).
- on an interval means is concave down (shaped like a frown).
- An inflection point occurs where changes sign.
Implicit Differentiation
DfImplicit Differentiation
When is defined implicitly by an equation rather than explicitly as , we differentiate both sides with respect to , treating as a function of and applying the chain rule to every term.
Implicit Derivative
Here,
- =The implicit function
- =Partial derivative of F with respect to x
- =Partial derivative of F with respect to y
đImplicit Differentiation Example
Problem: Find for (a circle of radius 5).
đĄSolution
Differentiate both sides with respect to :
At the point : .
Logarithmic Differentiation
DfLogarithmic Differentiation
A technique for differentiating functions of the form or products/quotients of many factors. Take the natural logarithm of both sides, then differentiate implicitly.
Logarithmic Differentiation Technique
Here,
- =The original function
- =Its derivative
đLogarithmic Differentiation Example
Problem: Find for .
đĄSolution
Let . Take the natural log of both sides:
Differentiate both sides:
đĄ When to Use
Logarithmic differentiation is most useful when:
- The function is a product or quotient of many factors
- The function has the form where both base and exponent depend on
- The function involves radicals combined with other operations
Mean Value Theorem
ThMean Value Theorem (MVT)
If is continuous on and differentiable on , then there exists at least one point such that:
âšī¸ Intuition
The MVT guarantees that at some point, the instantaneous rate of change (derivative) equals the average rate of change over the interval. Geometrically, there is a point where the tangent line is parallel to the secant line connecting the endpoints.
ThRolle's Theorem (Special Case of MVT)
If is continuous on , differentiable on , and , then there exists at least one such that .
â ī¸ Important Corollary
If for all in an interval, then is constant on that interval. This is used to prove the uniqueness of antiderivatives.
Applications
Related Rates
DfRelated Rates
When two or more quantities change with respect to time and are related by an equation, we can differentiate that equation with respect to to find relationships between their rates of change.
đRelated Rates: Expanding Sphere
Problem: A sphere's radius increases at cm/s. How fast is the volume increasing when cm?
đĄSolution
cm/s
Optimization
đĄ Optimization Procedure
To find the absolute maximum or minimum of on :
- Find all critical points: solve and identify where is undefined.
- Evaluate at each critical point.
- Evaluate at the endpoints and .
- The largest value is the absolute maximum; the smallest is the absolute minimum.
đOptimization: Minimum Surface Area
Problem: A box with a square base and no top has volume . Find the dimensions that minimize surface area.
đĄSolution
Let = side of square base, = height.
Constraint:
Surface area:
(confirms minimum)
Dimensions: , .
Linear Approximation
Linear Approximation
Here,
- =Function value at the known point a
- =Derivative at a (slope of tangent line)
- =Small displacement from a
Python Implementation
Symbolic Differentiation with SymPy
import sympy as sp
x = sp.Symbol('x')
# Symbolic derivatives
f = sp.ln(sp.sin(x))
print(f"Derivative of ln(sin(x)): {sp.diff(f, x)}")
g = x**2 * sp.exp(x)
print(f"Derivative of x^2 * e^x: {sp.diff(g, x)}")
# Higher-order derivatives
h = sp.sin(x)
print(f"Second derivative of sin(x): {sp.diff(h, x, 2)}")
print(f"Third derivative of sin(x): {sp.diff(h, x, 3)}")
# Partial derivatives (multivariate)
y = sp.Symbol('y')
f_multi = x**2 * y + sp.sin(x * y)
print(f"âf/âx = {sp.diff(f_multi, x)}")
print(f"âf/ây = {sp.diff(f_multi, y)}")
Numerical Differentiation
import numpy as np
def numerical_derivative(f, x, h=1e-7):
"""Central difference approximation."""
return (f(x + h) - f(x - h)) / (2 * h)
def numerical_second_derivative(f, x, h=1e-5):
"""Second derivative via finite differences."""
return (f(x + h) - 2 * f(x) + f(x - h)) / (h ** 2)
# Examples
f = np.sin
print(f"sin'(0.5) numerical: {numerical_derivative(f, 0.5):.6f}")
print(f"sin'(0.5) exact: {np.cos(0.5):.6f}")
g = lambda x: x**3 - 2*x + 1
print(f"g''(1) numerical: {numerical_second_derivative(g, 1):.6f}")
print(f"g''(1) exact: {6 * 1:.6f}")
Sigmoid and Its Derivative (ML)
import numpy as np
def sigmoid(x):
return 1 / (1 + np.exp(-x))
def sigmoid_derivative(x):
s = sigmoid(x)
return s * (1 - s)
# Gradient check
x = 0.5
print(f"Sigmoid({x}) = {sigmoid(x):.6f}")
print(f"Sigmoid'({x}) = {sigmoid_derivative(x):.6f}")
# Verify numerically
h = 1e-7
numerical = (sigmoid(x + h) - sigmoid(x - h)) / (2 * h)
print(f"Numerical approx: {numerical:.6f}")
Applications in AI/ML
Gradient Descent
Gradient Descent Update Rule
Here,
- =Model parameters at iteration t
- =Learning rate (step size)
- =Gradient of the loss function
- =Loss function to minimize
âšī¸ Why Derivatives Matter in ML
Every step of training a neural network involves:
- Forward pass: Compute the prediction using the current parameters.
- Compute loss: Measure how wrong the prediction is.
- Backward pass: Compute derivatives (gradients) of the loss with respect to every parameter using the chain rule.
- Update parameters: Move each parameter in the direction that reduces the loss, scaled by the learning rate.
Chain Rule in Backpropagation
For a simple neural network layer , , loss :
import numpy as np
def sigmoid(x):
return 1 / (1 + np.exp(-x))
# Forward pass
x = np.array([1.0, 2.0])
W = np.array([[0.5, 0.3], [0.2, 0.7]])
b = np.array([0.1, 0.1])
z = W @ x + b
a = sigmoid(z)
# Backward pass via chain rule
y_true = np.array([1.0, 0.0])
dL_da = a - y_true # âL/âa
da_dz = a * (1 - a) # âa/âz (sigmoid derivative)
dL_dz = dL_da * da_dz # âL/âz
dL_dW = np.outer(dL_dz, x) # âL/âW
dL_db = dL_dz # âL/âb
Rate of Change in Feature Engineering
- Velocity: First derivative of position w.r.t. time.
- Acceleration: Second derivative.
- Jerk: Third derivative.
- In time-series ML, these derivatives (computed numerically) become features.
Common Mistakes
| Mistake | Incorrect | Correct | Explanation |
|---|---|---|---|
| Forgetting chain rule | Must multiply by inner derivative | ||
| Product rule order | Product rule is NOT just product of derivatives | ||
| Quotient rule sign | Numerator uses subtraction | ||
| Power rule on | Exponential rule differs from power rule | ||
| derivative | Common, but requires chain rule | ||
| Derivative of constant | Constants have zero rate of change |
â ī¸ Double-Check Your Work
After computing a derivative, verify by plugging in simple values. For example, if you think , test at : the slope of at should be 2, which matches. Always sanity-check!
Interview Questions
Q1: What is the derivative of ?
đĄAnswer
Using the chain rule with outer function and inner function :
Q2: Explain the product rule in plain English. When do you use it?
đĄAnswer
The product rule states that the derivative of a product is . Intuitively, when differentiating a product, either factor could be changing, so you must account for the rate of change of each factor while holding the other constant. Use it whenever two functions of are multiplied together.
Q3: Why is the sigmoid derivative so important in neural networks?
đĄAnswer
The sigmoid derivative is used during backpropagation to compute how the loss changes with respect to each weight. Its key property is that it can be computed using only the output of the forward pass (), meaning no additional computation is needed. This makes backpropagation through sigmoid layers very efficient. However, for large , , causing the vanishing gradient problem.
Q4: What is the difference between and ?
đĄAnswer
- (power rule, is constant)
- (requires logarithmic differentiation, both base and exponent depend on )
The power rule applies only when the exponent is a constant. When both base and exponent are variables, you must use logarithmic differentiation.
Q5: Prove that if for all in an interval, then is constant.
đĄAnswer
By the Mean Value Theorem, for any in the interval with , there exists such that . Since , we get , so . Since and were arbitrary, is constant on the interval.
Q6: What are the conditions for a critical point to be a local minimum?
đĄAnswer
For a function continuous at :
- First derivative test: (or undefined) and changes from negative to positive at .
- Second derivative test: and .
- Higher-order test: If the first non-zero derivative at is of even order and , then is a local minimum.
Practice Problems
Problem 1: Chain Rule
đFind the Derivative
Problem: Find .
đĄSolution
Let , so .
Problem 2: Product Rule
đFind the Derivative
Problem: Find .
đĄSolution
Apply the product rule to , , :
Problem 3: Implicit Differentiation
đFind the Derivative
Problem: Find for .
đĄSolution
Differentiate both sides:
Problem 4: Optimization
đFind the Maximum
Problem: Find the maximum area of a rectangle inscribed in a semicircle of radius .
đĄSolution
Place the semicircle on the coordinate plane: , .
Rectangle dimensions: width , height
Set :
Maximum area .
Problem 5: Logarithmic Differentiation
đFind the Derivative
Problem: Find for .
đĄSolution
Let . Take ln: .
Differentiate:
Quick Reference
đKey Takeaways
- Derivative Definition: measures the instantaneous rate of change.
- Power Rule: â the most fundamental differentiation tool.
- Product Rule: â for products of functions.
- Quotient Rule: â for ratios of functions.
- Chain Rule: â the foundation of backpropagation.
- Second Derivative: measures concavity and acceleration; used in optimization.
- ML Connection: Gradient descent uses derivatives to minimize loss.
- Sigmoid Derivative: â efficient because it reuses the forward pass output.
- Taylor Series: â approximates functions using derivatives.
- Common Pitfall: Always apply the chain rule to composite functions â forgetting it is the #1 mistake.
Cross-References
- Limits: Understanding limits is prerequisite for the derivative definition â Limits and Continuity
- Chain Rule: In-depth coverage of the chain rule and implicit differentiation â Chain Rule and Implicit Differentiation
- Partial Derivatives: Derivatives with multiple variables â Partial Derivatives
- Multivariable Calculus: Gradients, Jacobians, and Hessians â Multivariable Calculus
- Taylor Series: Polynomial approximations using derivatives â Taylor Series
- Optimization: Gradient descent and convex optimization â Optimization Fundamentals
- Linear Algebra: Matrix calculus for neural network derivatives â Matrix Calculus
- Numerical Methods: Numerical differentiation and integration â Numerical Methods