← Math|25 of 100
Calculus

Derivatives and Differentiation

Master derivatives, rules of differentiation, and their applications in optimization and machine learning.

📂 Differentiation📖 Lesson 25 of 100🎓 Free Course

Advertisement

Derivatives and Differentiation

â„šī¸ Why It Matters

Derivatives measure the instantaneous rate of change of a function. In machine learning, every gradient descent update — from simple linear regression to training large language models — relies on computing derivatives of a loss function with respect to model parameters. Understanding differentiation is the single most important skill for building and optimizing ML systems.


What is a Derivative

DfDerivative

The derivative of a function f(x)f(x) at a point xx is defined as the limit of the difference quotient as the increment hh approaches zero. Geometrically, the derivative represents the slope of the tangent line to the curve at that point.

Limit Definition of the Derivative

f′(x)=lim⁡h→0f(x+h)−f(x)hf'(x) = \lim_{h \to 0}\frac{f(x+h) - f(x)}{h}

Here,

  • f′(x)f'(x)=The derivative of f with respect to x
  • hh=An infinitesimally small increment in x
  • f(x+h)−f(x)f(x+h) - f(x)=The change in function value over the interval
  • rac{f(x+h)-f(x)}{h}=The difference quotient (average rate of change)

💡 Geometric Meaning

The derivative f′(x0)f'(x_0) gives the slope of the line tangent to y=f(x)y = f(x) at the point (x0,f(x0))(x_0, f(x_0)). If f′(x0)>0f'(x_0) > 0, the function is increasing at x0x_0. If f′(x0)<0f'(x_0) < 0, it is decreasing. If f′(x0)=0f'(x_0) = 0, the tangent is horizontal — a candidate for a local maximum or minimum.

âš ī¸ Differentiability vs Continuity

If ff is differentiable at aa, then ff is continuous at aa. The converse is not true: f(x)=âˆŖxâˆŖf(x) = |x| is continuous at x=0x = 0 but not differentiable there (the left and right derivatives differ).


Basic Derivative Rules

The following rules allow us to differentiate complex functions by breaking them into simpler parts.

RuleFormulaExample
Constantddx[c]=0\frac{d}{dx}[c] = 0ddx[7]=0\frac{d}{dx}[7] = 0
Constant Multipleddx[cf(x)]=c⋅f′(x)\frac{d}{dx}[cf(x)] = c \cdot f'(x)ddx[3x2]=6x\frac{d}{dx}[3x^2] = 6x
Power Ruleddxxn=nxn−1\frac{d}{dx}x^n = nx^{n-1}ddxx5=5x4\frac{d}{dx}x^5 = 5x^4
Sum Ruleddx[f(x)+g(x)]=f′(x)+g′(x)\frac{d}{dx}[f(x) + g(x)] = f'(x) + g'(x)ddx[x2+sin⁡x]=2x+cos⁡x\frac{d}{dx}[x^2 + \sin x] = 2x + \cos x
Difference Ruleddx[f(x)−g(x)]=f′(x)−g′(x)\frac{d}{dx}[f(x) - g(x)] = f'(x) - g'(x)ddx[x3−ex]=3x2−ex\frac{d}{dx}[x^3 - e^x] = 3x^2 - e^x
Product Ruleddx[f(x)g(x)]=f′(x)g(x)+f(x)g′(x)\frac{d}{dx}[f(x)g(x)] = f'(x)g(x) + f(x)g'(x)ddx[x⋅ex]=ex+xex\frac{d}{dx}[x \cdot e^x] = e^x + xe^x
Quotient Ruleddxâ€‰âŖ[f(x)g(x)]=f′g−fg′g2\frac{d}{dx}\!\left[\frac{f(x)}{g(x)}\right] = \frac{f'g - fg'}{g^2}ddxâ€‰âŖ[xx2+1]=1−x2(x2+1)2\frac{d}{dx}\!\left[\frac{x}{x^2+1}\right] = \frac{1-x^2}{(x^2+1)^2}
Chain Ruleddxf(g(x))=f′(g(x))⋅g′(x)\frac{d}{dx}f(g(x)) = f'(g(x)) \cdot g'(x)ddxsin⁡(3x)=3cos⁡(3x)\frac{d}{dx}\sin(3x) = 3\cos(3x)

💡 Product Rule Mnemonic

Remember: "derivative of the first times the second, plus the first times the derivative of the second." Or in words: f′g+fg′f'g + fg'.


Derivatives of Common Functions

Function f(x)f(x)Derivative f′(x)f'(x)Notes
xnx^nnxn−1nx^{n-1}Power rule, for any real nn
exe^xexe^xIts own derivative
axa^xaxln⁥aa^x \ln aGeneral exponential
ln⁥x\ln x1x\frac{1}{x}Logarithmic derivative
log⁥ax\log_a x1xln⁥a\frac{1}{x \ln a}Change of base
sin⁥x\sin xcos⁥x\cos xCyclic pattern
cos⁡x\cos x−sin⁡x-\sin xNote the negative sign
tan⁥x\tan xsec⁥2x\sec^2 xFrom quotient rule on sin⁥/cos⁥\sin/\cos
csc⁡x\csc x−csc⁡xcot⁡x-\csc x \cot x
sec⁥x\sec xsec⁥xtan⁥x\sec x \tan x
cot⁡x\cot x−csc⁡2x-\csc^2 x
arcsin⁡x\arcsin x11−x2\frac{1}{\sqrt{1-x^2}}Inverse trig
arctan⁥x\arctan x11+x2\frac{1}{1+x^2}Inverse trig
x\sqrt{x}12x\frac{1}{2\sqrt{x}}Power rule with n=12n=\frac{1}{2}
΃(x)\sigma(x)΃(x)(1âˆ’Īƒ(x))\sigma(x)(1-\sigma(x))Sigmoid (ML critical)
tanh⁡(x)\tanh(x)1−tanh⁡2(x)1 - \tanh^2(x)Hyperbolic tangent

💡 ML Sigmoid Derivative

The sigmoid function ΃(x)=11+e−x\sigma(x) = \frac{1}{1+e^{-x}} has the remarkably elegant derivative Īƒâ€˛(x)=΃(x)(1âˆ’Īƒ(x))\sigma'(x) = \sigma(x)(1-\sigma(x)). This means once you compute ΃(x)\sigma(x) during the forward pass, you can reuse it in the backward pass — a key efficiency in backpropagation.


Higher-Order Derivatives

DfHigher-Order Derivatives

The second derivative f′′(x)=d2dx2f(x)f''(x) = \frac{d^2}{dx^2}f(x) is the derivative of the derivative. In general, the nn-th derivative is denoted f(n)(x)f^{(n)}(x).

NotationMeaning
f′(x)f'(x)First derivative — rate of change
f′′(x)f''(x)Second derivative — concavity / acceleration
f′′′(x)f'''(x)Third derivative — rate of change of acceleration
f(n)(x)f^{(n)}(x)nn-th derivative
d2ydx2\frac{d^2y}{dx^2}Leibniz notation for second derivative

ThSecond Derivative Test

If f′(c)=0f'(c) = 0 and f′′(c)f''(c) exists:

  • If f′′(c)>0f''(c) > 0, then ff has a local minimum at cc.
  • If f′′(c)<0f''(c) < 0, then ff has a local maximum at cc.
  • If f′′(c)=0f''(c) = 0, the test is inconclusive.

Taylor Series Expansion

f(x)=∑n=0∞f(n)(a)n!(x−a)n=f(a)+f′(a)(x−a)+f′′(a)2!(x−a)2+⋯f(x) = \sum_{n=0}^{\infty}\frac{f^{(n)}(a)}{n!}(x-a)^n = f(a) + f'(a)(x-a) + \frac{f''(a)}{2!}(x-a)^2 + \cdots

Here,

  • f(n)(a)f^{(n)}(a)=The n-th derivative evaluated at a
  • n!n!=Factorial of n
  • aa=The point about which the expansion is made

â„šī¸ Concavity and Inflection Points

  • f′′(x)>0f''(x) > 0 on an interval means ff is concave up (shaped like a cup).
  • f′′(x)<0f''(x) < 0 on an interval means ff is concave down (shaped like a frown).
  • An inflection point occurs where f′′(x)f''(x) changes sign.

Implicit Differentiation

DfImplicit Differentiation

When yy is defined implicitly by an equation F(x,y)=0F(x, y) = 0 rather than explicitly as y=f(x)y = f(x), we differentiate both sides with respect to xx, treating yy as a function of xx and applying the chain rule to every yy term.

Implicit Derivative

dydx=−FxFy\frac{dy}{dx} = -\frac{F_x}{F_y}

Here,

  • F(x,y)F(x, y)=The implicit function
  • FxF_x=Partial derivative of F with respect to x
  • FyF_y=Partial derivative of F with respect to y

📝Implicit Differentiation Example

Problem: Find dydx\frac{dy}{dx} for x2+y2=25x^2 + y^2 = 25 (a circle of radius 5).

💡Solution

Differentiate both sides with respect to xx:

ddx[x2]+ddx[y2]=ddx[25]\frac{d}{dx}[x^2] + \frac{d}{dx}[y^2] = \frac{d}{dx}[25]2x+2ydydx=02x + 2y\frac{dy}{dx} = 0dydx=−xy\frac{dy}{dx} = -\frac{x}{y}

At the point (3,4)(3, 4): dydx=−34\frac{dy}{dx} = -\frac{3}{4}.


Logarithmic Differentiation

DfLogarithmic Differentiation

A technique for differentiating functions of the form f(x)g(x)f(x)^{g(x)} or products/quotients of many factors. Take the natural logarithm of both sides, then differentiate implicitly.

Logarithmic Differentiation Technique

ddxlnâĄâˆŖf(x)âˆŖ=f′(x)f(x)\frac{d}{dx}\ln|f(x)| = \frac{f'(x)}{f(x)}

Here,

  • f(x)f(x)=The original function
  • f′(x)f'(x)=Its derivative

📝Logarithmic Differentiation Example

Problem: Find ddx[xx]\frac{d}{dx}[x^x] for x>0x > 0.

💡Solution

Let y=xxy = x^x. Take the natural log of both sides:

ln⁥y=xln⁥x\ln y = x \ln x

Differentiate both sides:

1ydydx=ln⁥x+1\frac{1}{y}\frac{dy}{dx} = \ln x + 1dydx=xx(ln⁥x+1)\frac{dy}{dx} = x^x(\ln x + 1)

💡 When to Use

Logarithmic differentiation is most useful when:

  • The function is a product or quotient of many factors
  • The function has the form f(x)g(x)f(x)^{g(x)} where both base and exponent depend on xx
  • The function involves radicals combined with other operations

Mean Value Theorem

ThMean Value Theorem (MVT)

If ff is continuous on [a,b][a, b] and differentiable on (a,b)(a, b), then there exists at least one point c∈(a,b)c \in (a, b) such that:

f′(c)=f(b)−f(a)b−af'(c) = \frac{f(b) - f(a)}{b - a}

â„šī¸ Intuition

The MVT guarantees that at some point, the instantaneous rate of change (derivative) equals the average rate of change over the interval. Geometrically, there is a point where the tangent line is parallel to the secant line connecting the endpoints.

ThRolle's Theorem (Special Case of MVT)

If ff is continuous on [a,b][a, b], differentiable on (a,b)(a, b), and f(a)=f(b)f(a) = f(b), then there exists at least one c∈(a,b)c \in (a, b) such that f′(c)=0f'(c) = 0.

âš ī¸ Important Corollary

If f′(x)=0f'(x) = 0 for all xx in an interval, then ff is constant on that interval. This is used to prove the uniqueness of antiderivatives.


Applications

Related Rates

DfRelated Rates

When two or more quantities change with respect to time and are related by an equation, we can differentiate that equation with respect to tt to find relationships between their rates of change.

📝Related Rates: Expanding Sphere

Problem: A sphere's radius increases at drdt=3\frac{dr}{dt} = 3 cm/s. How fast is the volume increasing when r=5r = 5 cm?

💡Solution

V=43Ī€r3V = \frac{4}{3}\pi r^3

dVdt=4Ī€r2drdt=4Ī€(25)(3)=300Ī€\frac{dV}{dt} = 4\pi r^2 \frac{dr}{dt} = 4\pi(25)(3) = 300\pi cm3^3/s

Optimization

💡 Optimization Procedure

To find the absolute maximum or minimum of f(x)f(x) on [a,b][a, b]:

  1. Find all critical points: solve f′(x)=0f'(x) = 0 and identify where f′(x)f'(x) is undefined.
  2. Evaluate ff at each critical point.
  3. Evaluate ff at the endpoints aa and bb.
  4. The largest value is the absolute maximum; the smallest is the absolute minimum.

📝Optimization: Minimum Surface Area

Problem: A box with a square base and no top has volume V=32V = 32. Find the dimensions that minimize surface area.

💡Solution

Let xx = side of square base, hh = height.

Constraint: x2h=32  ⟹  h=32x2x^2 h = 32 \implies h = \frac{32}{x^2}

Surface area: S=x2+4xh=x2+128xS = x^2 + 4xh = x^2 + \frac{128}{x}

S′(x)=2x−128x2=0  ⟹  x3=64  ⟹  x=4S'(x) = 2x - \frac{128}{x^2} = 0 \implies x^3 = 64 \implies x = 4

S′′(x)=2+256x3>0S''(x) = 2 + \frac{256}{x^3} > 0 (confirms minimum)

Dimensions: x=4x = 4, h=2h = 2.

Linear Approximation

Linear Approximation

f(x)≈f(a)+f′(a)(x−a)f(x) \approx f(a) + f'(a)(x - a)

Here,

  • f(a)f(a)=Function value at the known point a
  • f′(a)f'(a)=Derivative at a (slope of tangent line)
  • x−ax - a=Small displacement from a

Python Implementation

Symbolic Differentiation with SymPy

import sympy as sp

x = sp.Symbol('x')

# Symbolic derivatives
f = sp.ln(sp.sin(x))
print(f"Derivative of ln(sin(x)): {sp.diff(f, x)}")

g = x**2 * sp.exp(x)
print(f"Derivative of x^2 * e^x: {sp.diff(g, x)}")

# Higher-order derivatives
h = sp.sin(x)
print(f"Second derivative of sin(x): {sp.diff(h, x, 2)}")
print(f"Third derivative of sin(x):  {sp.diff(h, x, 3)}")

# Partial derivatives (multivariate)
y = sp.Symbol('y')
f_multi = x**2 * y + sp.sin(x * y)
print(f"∂f/∂x = {sp.diff(f_multi, x)}")
print(f"∂f/∂y = {sp.diff(f_multi, y)}")

Numerical Differentiation

import numpy as np

def numerical_derivative(f, x, h=1e-7):
    """Central difference approximation."""
    return (f(x + h) - f(x - h)) / (2 * h)

def numerical_second_derivative(f, x, h=1e-5):
    """Second derivative via finite differences."""
    return (f(x + h) - 2 * f(x) + f(x - h)) / (h ** 2)

# Examples
f = np.sin
print(f"sin'(0.5) numerical:  {numerical_derivative(f, 0.5):.6f}")
print(f"sin'(0.5) exact:      {np.cos(0.5):.6f}")

g = lambda x: x**3 - 2*x + 1
print(f"g''(1) numerical:    {numerical_second_derivative(g, 1):.6f}")
print(f"g''(1) exact:        {6 * 1:.6f}")

Sigmoid and Its Derivative (ML)

import numpy as np

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def sigmoid_derivative(x):
    s = sigmoid(x)
    return s * (1 - s)

# Gradient check
x = 0.5
print(f"Sigmoid({x}) = {sigmoid(x):.6f}")
print(f"Sigmoid'({x}) = {sigmoid_derivative(x):.6f}")

# Verify numerically
h = 1e-7
numerical = (sigmoid(x + h) - sigmoid(x - h)) / (2 * h)
print(f"Numerical approx:     {numerical:.6f}")

Applications in AI/ML

Gradient Descent

Gradient Descent Update Rule

θt+1=θt−η∇θL(θt)\theta_{t+1} = \theta_t - \eta \nabla_\theta \mathcal{L}(\theta_t)

Here,

  • hetat heta_t=Model parameters at iteration t
  • etaeta=Learning rate (step size)
  • ablahetamathcalL abla_ heta mathcal{L}=Gradient of the loss function
  • mathcalLmathcal{L}=Loss function to minimize

â„šī¸ Why Derivatives Matter in ML

Every step of training a neural network involves:

  1. Forward pass: Compute the prediction using the current parameters.
  2. Compute loss: Measure how wrong the prediction is.
  3. Backward pass: Compute derivatives (gradients) of the loss with respect to every parameter using the chain rule.
  4. Update parameters: Move each parameter in the direction that reduces the loss, scaled by the learning rate.

Chain Rule in Backpropagation

For a simple neural network layer z=Wx+bz = Wx + b, a=΃(z)a = \sigma(z), loss L\mathcal{L}:

import numpy as np

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

# Forward pass
x = np.array([1.0, 2.0])
W = np.array([[0.5, 0.3], [0.2, 0.7]])
b = np.array([0.1, 0.1])
z = W @ x + b
a = sigmoid(z)

# Backward pass via chain rule
y_true = np.array([1.0, 0.0])
dL_da = a - y_true              # ∂L/∂a
da_dz = a * (1 - a)            # ∂a/∂z (sigmoid derivative)
dL_dz = dL_da * da_dz          # ∂L/∂z
dL_dW = np.outer(dL_dz, x)     # ∂L/∂W
dL_db = dL_dz                  # ∂L/∂b

Rate of Change in Feature Engineering

  • Velocity: First derivative of position w.r.t. time.
  • Acceleration: Second derivative.
  • Jerk: Third derivative.
  • In time-series ML, these derivatives (computed numerically) become features.

Common Mistakes

MistakeIncorrectCorrectExplanation
Forgetting chain ruleddxsin⁥(x2)=cos⁥(x2)\frac{d}{dx}\sin(x^2) = \cos(x^2)ddxsin⁥(x2)=2xcos⁥(x2)\frac{d}{dx}\sin(x^2) = 2x\cos(x^2)Must multiply by inner derivative
Product rule order(fg)′=f′g′(fg)' = f'g'(fg)′=f′g+fg′(fg)' = f'g + fg'Product rule is NOT just product of derivatives
Quotient rule sign(fg)′=f′g+fg′g2(\frac{f}{g})' = \frac{f'g + fg'}{g^2}(fg)′=f′g−fg′g2(\frac{f}{g})' = \frac{f'g - fg'}{g^2}Numerator uses subtraction
Power rule on exe^xddxex=xex−1\frac{d}{dx}e^x = xe^{x-1}ddxex=ex\frac{d}{dx}e^x = e^xExponential rule differs from power rule
ln⁡\ln derivativeddxln⁡(x)=1x\frac{d}{dx}\ln(x) = \frac{1}{x}ddxln⁡(x)=1x\frac{d}{dx}\ln(x) = \frac{1}{x}Common, but ddxln⁡(g(x))=g′(x)g(x)\frac{d}{dx}\ln(g(x)) = \frac{g'(x)}{g(x)} requires chain rule
Derivative of constantddx[c]=c\frac{d}{dx}[c] = cddx[c]=0\frac{d}{dx}[c] = 0Constants have zero rate of change

âš ī¸ Double-Check Your Work

After computing a derivative, verify by plugging in simple values. For example, if you think ddxx2=2x\frac{d}{dx}x^2 = 2x, test at x=1x=1: the slope of x2x^2 at x=1x=1 should be 2, which matches. Always sanity-check!


Interview Questions

Q1: What is the derivative of ex2e^{x^2}?

💡Answer

Using the chain rule with outer function eue^u and inner function u=x2u = x^2:

ddxex2=ex2⋅2x=2xex2\frac{d}{dx}e^{x^2} = e^{x^2} \cdot 2x = 2xe^{x^2}

Q2: Explain the product rule in plain English. When do you use it?

💡Answer

The product rule states that the derivative of a product f(x)⋅g(x)f(x) \cdot g(x) is f′(x)g(x)+f(x)g′(x)f'(x)g(x) + f(x)g'(x). Intuitively, when differentiating a product, either factor could be changing, so you must account for the rate of change of each factor while holding the other constant. Use it whenever two functions of xx are multiplied together.

Q3: Why is the sigmoid derivative so important in neural networks?

💡Answer

The sigmoid derivative Īƒâ€˛(x)=΃(x)(1âˆ’Īƒ(x))\sigma'(x) = \sigma(x)(1-\sigma(x)) is used during backpropagation to compute how the loss changes with respect to each weight. Its key property is that it can be computed using only the output of the forward pass (΃(x)\sigma(x)), meaning no additional computation is needed. This makes backpropagation through sigmoid layers very efficient. However, for large âˆŖxâˆŖ|x|, Īƒâ€˛(x)≈0\sigma'(x) \approx 0, causing the vanishing gradient problem.

Q4: What is the difference between ddx[xx]\frac{d}{dx}[x^x] and ddx[xn]\frac{d}{dx}[x^n]?

💡Answer

  • ddx[xn]=nxn−1\frac{d}{dx}[x^n] = nx^{n-1} (power rule, nn is constant)
  • ddx[xx]=xx(ln⁥x+1)\frac{d}{dx}[x^x] = x^x(\ln x + 1) (requires logarithmic differentiation, both base and exponent depend on xx)

The power rule applies only when the exponent is a constant. When both base and exponent are variables, you must use logarithmic differentiation.

Q5: Prove that if f′(x)=0f'(x) = 0 for all xx in an interval, then ff is constant.

💡Answer

By the Mean Value Theorem, for any a,ba, b in the interval with a<ba < b, there exists c∈(a,b)c \in (a, b) such that f(b)−f(a)=f′(c)(b−a)f(b) - f(a) = f'(c)(b - a). Since f′(c)=0f'(c) = 0, we get f(b)−f(a)=0f(b) - f(a) = 0, so f(b)=f(a)f(b) = f(a). Since aa and bb were arbitrary, ff is constant on the interval.

Q6: What are the conditions for a critical point to be a local minimum?

💡Answer

For a function ff continuous at cc:

  1. First derivative test: f′(c)=0f'(c) = 0 (or undefined) and f′f' changes from negative to positive at cc.
  2. Second derivative test: f′(c)=0f'(c) = 0 and f′′(c)>0f''(c) > 0.
  3. Higher-order test: If the first non-zero derivative at cc is of even order nn and f(n)(c)>0f^{(n)}(c) > 0, then cc is a local minimum.

Practice Problems

Problem 1: Chain Rule

📝Find the Derivative

Problem: Find ddxln⁥(sin⁥x)\frac{d}{dx}\ln(\sin x).

💡Solution

Let u=sin⁥xu = \sin x, so y=ln⁥uy = \ln u.

dydx=dydu⋅dudx=1sin⁡x⋅cos⁡x=cot⁡x\frac{dy}{dx} = \frac{dy}{du} \cdot \frac{du}{dx} = \frac{1}{\sin x} \cdot \cos x = \cot x

Problem 2: Product Rule

📝Find the Derivative

Problem: Find ddx[x2exsin⁥x]\frac{d}{dx}[x^2 e^x \sin x].

💡Solution

Apply the product rule to f=x2f = x^2, g=exg = e^x, h=sin⁥xh = \sin x:

(fgh)′=f′gh+fg′h+fgh′(fgh)' = f'gh + fg'h + fgh'=2x⋅ex⋅sin⁡x+x2⋅ex⋅sin⁡x+x2⋅ex⋅cos⁡x= 2x \cdot e^x \cdot \sin x + x^2 \cdot e^x \cdot \sin x + x^2 \cdot e^x \cdot \cos x=xex(2sin⁡x+xsin⁡x+xcos⁡x)= xe^x(2\sin x + x\sin x + x\cos x)

Problem 3: Implicit Differentiation

📝Find the Derivative

Problem: Find dydx\frac{dy}{dx} for cos⁥(xy)=x+y\cos(xy) = x + y.

💡Solution

Differentiate both sides:

−sin⁡(xy)⋅(y+xdydx)=1+dydx-\sin(xy) \cdot (y + x\frac{dy}{dx}) = 1 + \frac{dy}{dx}−ysin⁡(xy)−xsin⁡(xy)dydx=1+dydx-y\sin(xy) - x\sin(xy)\frac{dy}{dx} = 1 + \frac{dy}{dx}dydx(−xsin⁡(xy)−1)=1+ysin⁡(xy)\frac{dy}{dx}(-x\sin(xy) - 1) = 1 + y\sin(xy)dydx=−1+ysin⁡(xy)1+xsin⁡(xy)\frac{dy}{dx} = -\frac{1 + y\sin(xy)}{1 + x\sin(xy)}

Problem 4: Optimization

📝Find the Maximum

Problem: Find the maximum area of a rectangle inscribed in a semicircle of radius rr.

💡Solution

Place the semicircle on the coordinate plane: x2+y2=r2x^2 + y^2 = r^2, yâ‰Ĩ0y \geq 0.

Rectangle dimensions: width =2x= 2x, height =y=r2−x2= y = \sqrt{r^2 - x^2}

A(x)=2xr2−x2A(x) = 2x\sqrt{r^2 - x^2}A′(x)=2r2−x2+2x⋅−xr2−x2=2(r2−2x2)r2−x2A'(x) = 2\sqrt{r^2 - x^2} + 2x \cdot \frac{-x}{\sqrt{r^2 - x^2}} = \frac{2(r^2 - 2x^2)}{\sqrt{r^2 - x^2}}

Set A′(x)=0A'(x) = 0: r2=2x2  ⟹  x=r2r^2 = 2x^2 \implies x = \frac{r}{\sqrt{2}}

Maximum area =2⋅r2⋅r2=r2= 2 \cdot \frac{r}{\sqrt{2}} \cdot \frac{r}{\sqrt{2}} = r^2.

Problem 5: Logarithmic Differentiation

📝Find the Derivative

Problem: Find ddx[(sin⁥x)cos⁥x]\frac{d}{dx}[(\sin x)^{\cos x}] for sin⁥x>0\sin x > 0.

💡Solution

Let y=(sin⁡x)cos⁡xy = (\sin x)^{\cos x}. Take ln: ln⁡y=cos⁡x⋅ln⁡(sin⁡x)\ln y = \cos x \cdot \ln(\sin x).

Differentiate:

1ydydx=−sin⁡x⋅ln⁡(sin⁡x)+cos⁡x⋅cos⁡xsin⁡x\frac{1}{y}\frac{dy}{dx} = -\sin x \cdot \ln(\sin x) + \cos x \cdot \frac{\cos x}{\sin x}dydx=(sin⁡x)cos⁡x[−sin⁡xln⁡(sin⁡x)+cos⁡2xsin⁡x]\frac{dy}{dx} = (\sin x)^{\cos x}\left[-\sin x \ln(\sin x) + \frac{\cos^2 x}{\sin x}\right]

Quick Reference

📋Key Takeaways

  • Derivative Definition: f′(x)=lim⁥h→0f(x+h)−f(x)hf'(x) = \lim_{h \to 0} \frac{f(x+h) - f(x)}{h} measures the instantaneous rate of change.
  • Power Rule: ddxxn=nxn−1\frac{d}{dx}x^n = nx^{n-1} — the most fundamental differentiation tool.
  • Product Rule: (fg)′=f′g+fg′(fg)' = f'g + fg' — for products of functions.
  • Quotient Rule: (fg)′=f′g−fg′g2(\frac{f}{g})' = \frac{f'g - fg'}{g^2} — for ratios of functions.
  • Chain Rule: (f(g(x)))′=f′(g(x))⋅g′(x)(f(g(x)))' = f'(g(x)) \cdot g'(x) — the foundation of backpropagation.
  • Second Derivative: f′′(x)f''(x) measures concavity and acceleration; used in optimization.
  • ML Connection: Gradient descent θ←θ−η∇L\theta \leftarrow \theta - \eta \nabla\mathcal{L} uses derivatives to minimize loss.
  • Sigmoid Derivative: Īƒâ€˛(x)=΃(x)(1âˆ’Īƒ(x))\sigma'(x) = \sigma(x)(1-\sigma(x)) — efficient because it reuses the forward pass output.
  • Taylor Series: f(x)=∑f(n)(a)n!(x−a)nf(x) = \sum \frac{f^{(n)}(a)}{n!}(x-a)^n — approximates functions using derivatives.
  • Common Pitfall: Always apply the chain rule to composite functions — forgetting it is the #1 mistake.

Cross-References

Lesson Progress25 / 100