Master definite and indefinite integrals, Fundamental Theorem of Calculus, integration techniques, numerical methods, and applications in probability and machine learning.
đ Integrationđ Lesson 28 of 100đ Free Course
Advertisement
Integration Fundamentals
âšī¸ Why It Matters
Integration is the reverse operation of differentiation and one of the two pillars of calculus. While derivatives measure rates of change, integrals accumulate quantities: areas under curves, volumes of solids, total probability, and expected values. In machine learning, integration is indispensable â probability density functions are normalized by integrating to 1, expected values are computed as integrals, Bayesian inference requires marginalizing over posterior distributions, and variational inference approximates intractable integrals. Every time you compute the probability that a continuous random variable falls in a range, you are performing integration. Every time you sample from a distribution or estimate an expectation, integration is happening underneath. Mastering integration means understanding the mathematical foundation behind probability, statistics, and the training of probabilistic models.
What is an Integral
DfIndefinite Integral (Antiderivative)
The indefinite integral of a function f(x) is a family of functions F(x) whose derivative is f(x). It is written âĢf(x)dx=F(x)+C, where C is an arbitrary constant of integration representing the fact that infinitely many antiderivatives differ only by a constant.
DfDefinite Integral
The definite integral of f(x) from a to b is the signed area between the curve y=f(x) and the x-axis over the interval [a,b]. It is defined as the limit of Riemann sums:
An indefinite integral produces a family of functions (F(x)+C), while a definite integral produces a number (the signed area). The Fundamental Theorem of Calculus connects them: the definite integral equals the antiderivative evaluated at the bounds.
â ī¸ Not All Functions Have Elementary Antiderivatives
Many common functions â such as eâx2, xsinxâ, and sinxâ â do not have closed-form antiderivatives in terms of elementary functions. For these, we rely on numerical integration or special functions (e.g., the error function erf(x)).
Fundamental Theorem of Calculus
ThFundamental Theorem of Calculus (Part 1)
If f is continuous on [a,b] and F(x)=âĢaxâf(t)dt, then F is differentiable on (a,b) and Fâ˛(x)=f(x). In other words, differentiation undoes integration.
ThFundamental Theorem of Calculus (Part 2)
If f is continuous on [a,b] and F is any antiderivative of f (i.e., Fâ˛(x)=f(x)), then:
Fundamental Theorem of Calculus
âĢabâf(x)dx=F(b)âF(a)
Here,
f(x)=The integrand â the function being integrated
F(x)=An antiderivative of f, where F'(x) = f(x)
a=The lower limit of integration
b=The upper limit of integration
F(b)âF(a)=The net change of F over [a, b]
âšī¸ Intuition
Part 1 says that if you build a function by integrating f from a fixed point a to a variable point x, the rate at which this accumulated area grows is exactly f(x). Part 2 says that to compute the accumulated area, you only need any antiderivative â evaluate it at the top bound and subtract its value at the bottom bound. This transforms the hard problem of summing infinitely many infinitesimal contributions into the easy problem of evaluating a function at two points.
đApplying the Fundamental Theorem
Problem: Compute âĢ02â(3x2+2x)dx.
đĄSolution
Find an antiderivative: F(x)=x3+x2 (since Fâ˛(x)=3x2+2x).
Evaluate at the bounds: F(2)âF(0)=(8+4)â(0+0)=12.
Non-negative functions have non-negative integrals
Comparison
If f(x)â¤g(x) on [a,b], then âĢabâf(x)dxâ¤âĢabâg(x)dx
Inequality preserved under integration
Triangle Inequality
ââĢabâf(x)dxââ¤âĢabââŖf(x)âŖdx
The integral of the absolute value bounds the absolute value of the integral
Basic Integration Rules
The following rules allow us to integrate complex functions by breaking them into simpler parts.
Rule
Formula
Example
Constant
âĢcdx=cx+C
âĢ5dx=5x+C
Power Rule
âĢxndx=n+1xn+1â+C (nî =â1)
âĢx3dx=4x4â+C
Reciprocal
âĢx1âdx=lnâŖxâŖ+C
âĢx1âdx=lnâŖxâŖ+C
Exponential
âĢexdx=ex+C
âĢexdx=ex+C
General Exponential
âĢaxdx=lnaaxâ+C
âĢ3xdx=ln33xâ+C
Sine
âĢsinxdx=âcosx+C
âĢsinxdx=âcosx+C
Cosine
âĢcosxdx=sinx+C
âĢcosxdx=sinx+C
Secant
âĢsec2xdx=tanx+C
âĢsec2xdx=tanx+C
Cosecant
âĢcsc2xdx=âcotx+C
âĢcsc2xdx=âcotx+C
Secant-Tangent
âĢsecxtanxdx=secx+C
âĢsecxtanxdx=secx+C
Cosecant-Cotangent
âĢcscxcotxdx=âcscx+C
âĢcscxcotxdx=âcscx+C
Inverse Sine
âĢ1âx2â1âdx=arcsinx+C
âĢ1âx2â1âdx=arcsinx+C
Inverse Tangent
âĢ1+x21âdx=arctanx+C
âĢ1+x21âdx=arctanx+C
Hyperbolic Sine
âĢsinhxdx=coshx+C
âĢsinhxdx=coshx+C
Hyperbolic Cosine
âĢcoshxdx=sinhx+C
âĢcoshxdx=sinhx+C
đĄ Power Rule Exception
The power rule âĢxndx=n+1xn+1â+C fails when n=â1. In that case, âĢxâ1dx=âĢx1âdx=lnâŖxâŖ+C. This is because dxdâlnâŖxâŖ=x1â.
Integration by Substitution
DfIntegration by Substitution (u-substitution)
The substitution rule is the reverse of the chain rule for differentiation. If u=g(x) is a differentiable function whose range is an interval, and f is continuous on that interval, then:
Substitution Rule
âĢf(g(x))â gâ˛(x)dx=âĢf(u)duwhere u=g(x)
Here,
u=g(x)=The substitution â the inner function
du=gâ˛(x)dx=The differential of u
f(g(x))â gâ˛(x)=The original integrand with chain rule factor
đĄ When to Use Substitution
Use substitution when the integrand contains a function and its derivative (or a constant multiple of it). Look for a "inner function" g(x) whose derivative gâ˛(x) appears as a factor. Common patterns: âĢsin(cosx)cosxdx, âĢxex2dx, âĢx2+1xâdx.
đSubstitution Example 1
Problem: Compute âĢ2xcos(x2)dx.
đĄSolution
Let u=x2, so du=2xdx.
âĢcos(u)du=sin(u)+C=sin(x2)+C.
đSubstitution Example 2
Problem: Compute âĢ0Ī/2âsin3(x)cos(x)dx.
đĄSolution
Let u=sin(x), so du=cos(x)dx.
When x=0, u=0. When x=Ī/2, u=1.
âĢ01âu3du=[4u4â]01â=41â.
â ī¸ Don't Forget to Change the Bounds
When performing substitution on a definite integral, you must either change the limits of integration to match the new variable or back-substitute before evaluating. Forgetting to update the bounds is one of the most common errors.
Integration by Parts
DfIntegration by Parts
Integration by parts is the reverse of the product rule for differentiation. It is used to integrate products of functions:
Integration by Parts
âĢudv=uvââĢvdu
Here,
u=The part you differentiate (choose something that simplifies)
dv=The part you integrate (choose something easy to integrate)
du=The derivative of u
v=The antiderivative of dv
Definite Integral Version
âĢabâudv=[uv]abâââĢabâvdu
Here,
[uv]abâ=The boundary term: u(b)v(b) - u(a)v(a)
âĢabâvdu=The remaining integral (often simpler)
đĄ LIATE Rule for Choosing u
A helpful heuristic for choosing u: Logarithmic â Inverse trig â Algebraic â Trigonometric â Exponential. Choose u as whichever comes first in this list. The remaining factor becomes dv.
đIntegration by Parts Example 1
Problem: Compute âĢxexdx.
đĄSolution
Let u=x (algebraic), dv=exdx (exponential). Then du=dx, v=ex.
When integration by parts returns you to the original integral (as in the exsinx example), you can solve for the integral algebraically. This "cyclic" pattern occurs with products of exponentials and trigonometric functions.
Common Integrals
Integral
Result
Notes
âĢxndx
n+1xn+1â+C
nî =â1
âĢx1âdx
lnâŖxâŖ+C
The n=â1 case
âĢexdx
ex+C
Its own antiderivative
âĢaxdx
lnaaxâ+C
a>0, aî =1
âĢsinxdx
âcosx+C
âĢcosxdx
sinx+C
âĢsec2xdx
tanx+C
âĢcsc2xdx
âcotx+C
âĢsecxtanxdx
secx+C
âĢcscxcotxdx
âcscx+C
âĢ1âx2â1âdx
arcsinx+C
âĢ1+x21âdx
arctanx+C
âĢtanxdx
lnâŖsecxâŖ+C
Rewrite as âĢcosxsinxâdx
âĢsecxdx
lnâŖsecx+tanxâŖ+C
âĢx2+a21âdx
a1âarctanaxâ+C
âĢa2âx2â1âdx
arcsinaxâ+C
âĢx2âa21âdx
2a1âlnâx+axâaââ+C
Partial fractions
âĢx2Âąa2â1âdx
lnâŖx+x2Âąa2ââŖ+C
âĢsinhxdx
coshx+C
âĢcoshxdx
sinhx+C
âĢsech2xdx
tanhx+C
âĢcsch2xdx
âcoth x+C
Improper Integrals
DfImproper Integral
An improper integral is an integral where either the interval of integration is infinite or the integrand has an infinite discontinuity (vertical asymptote) within the interval. We evaluate it as a limit.
An improper integral converges if the limit exists and is finite. It diverges if the limit does not exist or is infinite. Both parts must converge independently for the full integral to converge.
âĢ1ââxp1âdx converges if and only if p>1. This is a quick way to determine convergence for power-type integrands. For example, âĢ1ââx1.011âdx converges (just barely), while âĢ1ââx0.991âdx diverges.
When an antiderivative cannot be found in closed form, or when the integrand is defined only by data points, we approximate the integral numerically.
Trapezoidal Rule
DfTrapezoidal Rule
Approximates the area under the curve by dividing the interval into n subintervals and approximating each strip as a trapezoid. The error is proportional to h2 (second-order method).
Approximates the area using parabolic arcs instead of straight lines. Requires an even number of subintervals. The error is proportional to h4 (fourth-order method), making it significantly more accurate than the trapezoidal rule for smooth functions.
h=nbâaâ=Width of each subinterval (n must be even)
xiâ=a+ih=The i-th grid point
4âi oddâ=Weight 4 for odd-indexed points
2âi evenâ=Weight 2 for even-indexed points (except endpoints)
Gaussian Quadrature
DfGaussian Quadrature
A higher-order numerical integration method that chooses both the nodes and weights optimally. An n-point Gaussian quadrature rule integrates polynomials of degree 2nâ1 exactly. It is particularly efficient for smooth functions.
Method
Order of Accuracy
Best For
Nodes Required
Trapezoidal
O(h2)
Rough data, quick estimates
Uniform grid
Simpson's
O(h4)
Smooth functions, moderate precision
Uniform grid (even n)
Gaussian
O(h2n)
High-precision integration of smooth functions
Optimally placed nodes
Monte Carlo
O(1/nâ)
High-dimensional integrals
Random samples
đĄ Monte Carlo Integration in High Dimensions
For integrals over high-dimensional spaces (common in Bayesian inference and physics simulations), grid-based methods suffer from the curse of dimensionality â the number of grid points grows exponentially with dimension. Monte Carlo integration converges at rate O(1/nâ) regardless of dimension, making it the only practical choice for high-dimensional integrals.
Python Implementation
Basic Integration with scipy.integrate
import numpy as np
from scipy import integrate
# Define the function to integrate
def f(x):
return x**2 * np.exp(-x)
# Compute the integral from 0 to infinity (improper integral)
result, error = integrate.quad(f, 0, np.inf)
print(f"Integral of x^2 * exp(-x) from 0 to inf:")
print(f" Result: {result:.6f}")
print(f" Error estimate: {error:.2e}")
print(f" Exact (2! = 2): 2.000000")
# Definite integral from 0 to 1
result, error = integrate.quad(lambda x: np.sin(x), 0, np.pi)
print(f"\nIntegral of sin(x) from 0 to pi: {result:.6f} (exact: 2)")
Numerical Methods Comparison
import numpy as np
from scipy import integrate
# Define test function: f(x) = x^2
f = lambda x: x**2
a, b = 0, 1
exact = 1/3
# Trapezoidal rule
for n in [10, 100, 1000]:
x = np.linspace(a, b, n + 1)
trap_result = np.trapz(f(x), x)
print(f"Trapezoidal (n={n:4d}): {trap_result:.8f} error: {abs(trap_result - exact):.2e}")
print()
# Simpson's rule
for n in [10, 100, 1000]:
x = np.linspace(a, b, n + 1)
simp_result = integrate.simpson(f(x), x=x)
print(f"Simpson's (n={n:4d}): {simp_result:.8f} error: {abs(simp_result - exact):.2e}")
print()
# Gaussian quadrature (scipy)
for n in [5, 10, 20]:
result, error = integrate.fixed_quad(lambda x: x**2, a, b, n=n)
print(f"Gauss quad (n={n:4d}): {result:.8f} error: {abs(result - exact):.2e}")
# scipy.integrate.quad (adaptive)
result, error = integrate.quad(f, a, b)
print(f"\nAdaptive quad: {result:.8f} error: {error:.2e}")
Symbolic Integration with SymPy
import sympy as sp
x = sp.Symbol('x')
# Symbolic indefinite integral
f = x**2 * sp.exp(x)
F = sp.integrate(f, x)
print(f"Indefinite integral of x^2 * e^x: {F}")
# Definite integral
result = sp.integrate(x**2, (x, 0, 1))
print(f"Definite integral of x^2 from 0 to 1: {result}")
# Improper integral
result = sp.integrate(sp.exp(-x**2), (x, -sp.oo, sp.oo))
print(f"Integral of e^(-x^2) from -inf to inf: {result}")
print(f" = sqrt(pi) = {sp.sqrt(sp.pi)}")
# Verify Fundamental Theorem
F = sp.integrate(sp.sin(x), x)
print(f"\nAntiderivative of sin(x): {F}")
print(f"FTC: F(pi) - F(0) = {F.subs(x, sp.pi) - F.subs(x, 0)}")
High-Dimensional Integration (Monte Carlo)
import numpy as np
def monte_carlo_integrate(f, bounds, n_samples=100000):
"""Monte Carlo integration for arbitrary dimensions."""
dim = len(bounds)
samples = np.random.uniform(
low=[b[0] for b in bounds],
high=[b[1] for b in bounds],
size=(n_samples, dim)
)
values = np.array([f(s) for s in samples])
volume = np.prod([b[1] - b[0] for b in bounds])
mean_val = np.mean(values)
std_err = np.std(values) / np.sqrt(n_samples)
return mean_val * volume, std_err
# Example: integral of x^2 + y^2 over [0,1] x [0,1]
f = lambda p: p[0]**2 + p[1]**2
result, error = monte_carlo_integrate(f, [(0, 1), (0, 1)])
exact = 2/3 # integral of x^2 + y^2 over [0,1]^2
print(f"Monte Carlo estimate: {result:.6f} +/- {error:.6f}")
print(f"Exact value: {exact:.6f}")
Applications in AI/ML
Probability Density Functions
âšī¸ Integration and Probability
The entire foundation of continuous probability rests on integration. A probability density function fXâ(x) must satisfy âĢââââfXâ(x)dx=1 (normalization), and the probability of any event is computed as an integral of the density.
Probability from PDF
P(aâ¤Xâ¤b)=âĢabâfXâ(x)dx
Here,
fXâ(x)=The probability density function of the random variable X
a,b=The interval bounds
P(aâ¤Xâ¤b)=The probability that X falls in [a, b]
Normalization Condition
âĢââââfXâ(x)dx=1
Here,
fXâ(x)=A valid PDF must integrate to 1 over its support
Expected Values and Moments
Expected Value
E[X]=âĢââââxâ fXâ(x)dx
Here,
E[X]=The expected value (mean) of X
xâ fXâ(x)=Each value weighted by its probability density
E[g(X)]=The expected value of g(X) â a weighted average
Bayesian Inference
âšī¸ Integrals in Bayesian Methods
Bayesian inference is built on integration. The posterior distribution requires the evidence (marginal likelihood), which is an integral: p(θâŖdata)=âĢp(dataâŖÎ¸)p(θ)dθp(dataâŖÎ¸)p(θ)â. This integral is often intractable, which is why methods like MCMC, variational inference, and Laplace approximation exist â they all approximate this integral.
Evidence (Marginal Likelihood)
p(data)=âĢp(dataâŖÎ¸)â p(θ)dθ
Here,
p(dataâŖÎ¸)=The likelihood of the data given parameters
p(θ)=The prior distribution over parameters
p(data)=The evidence â obtained by integrating over all parameter values
Common Probability Integrals
Distribution
PDF
Key Integral
NormalN(Îŧ,Ī2)
Ī2Īâ1âeâ2Ī2(xâÎŧ)2â
âĢââââf(x)dx=1
Standard Normal
2Īâ1âeâx2/2
âĢââââ2Īâeâx2/2âdx=1
ExponentialExp(Îģ)
ÎģeâÎģx, xâĨ0
âĢ0ââÎģeâÎģxdx=1
BetaBeta(ι,β)
B(Îą,β)xÎąâ1(1âx)βâ1â
âĢ01âf(x)dx=1
GammaGamma(ι,β)
Î(Îą)βιxÎąâ1eâβxâ
âĢ0ââf(x)dx=1
Common Mistakes
Mistake
Incorrect
Correct
Explanation
Forgetting the constant of integration
âĢx2dx=3x3â
âĢx2dx=3x3â+C
Always include +C for indefinite integrals
Wrong sign on trig integral
âĢsinxdx=cosx+C
âĢsinxdx=âcosx+C
The integral of sine is negative cosine
Power rule on 1/x
âĢx1âdx=0x0â+C
âĢx1âdx=lnâŖxâŖ+C
Power rule fails at n=â1; use lnâŖxâŖ
Not changing bounds on substitution
Evaluate with old bounds after u-sub
Update bounds when substituting
If u=g(x), new bounds are g(a) and g(b)
Dropping absolute value in ln
âĢx1âdx=ln(x)+C
âĢx1âdx=lnâŖxâŖ+C
x1â is defined for x<0 too
Confusing integration with differentiation rules
Treating integrals like derivatives
Integration follows different rules
E.g., âĢsinxcosxdxî =sinxâ sinx
Forgetting boundary term in integration by parts
âĢudv=ââĢvdu
âĢudv=uvââĢvdu
The uv term is essential
Divergent improper integrals
Assuming âĢ1ââxp1âdx always converges
Converges only for p>1
Check convergence before evaluating
Wrong du in substitution
Forgetting to include the derivative
du=gâ˛(x)dx not just du=g(x)
The differential du must include the derivative
Splitting integrals incorrectly
âĢâ11âx1âdx=0 (by symmetry)
âĢâ11âx1âdx diverges
The integrand has a singularity at x=0
â ī¸ Double-Check Your Work
After computing an integral, verify by differentiating your answer. If âĢf(x)dx=F(x)+C, then Fâ˛(x) should equal f(x). For definite integrals, check boundary values and sign consistency.
Interview Questions
Q1: State the Fundamental Theorem of Calculus and explain its significance.
đĄAnswer
The Fundamental Theorem of Calculus has two parts:
Part 1: If F(x)=âĢaxâf(t)dt and f is continuous, then Fâ˛(x)=f(x). This shows differentiation undoes integration.
Part 2: If F is any antiderivative of f, then âĢabâf(x)dx=F(b)âF(a). This allows us to compute definite integrals using antiderivatives.
Significance: It connects the two branches of calculus (differential and integral), transforms the problem of computing areas into evaluating antiderivatives, and provides the theoretical foundation for the entire field.
Q2: When would you use integration by parts vs. substitution?
đĄAnswer
Substitution is used when the integrand contains a composition of functions where the inner function's derivative appears as a factor (reverse of the chain rule). Example: âĢ2xcos(x2)dx.
Integration by parts is used for products of different types of functions (reverse of the product rule). Example: âĢxexdx.
A good heuristic: if you see a "inner-derivative" pattern, use substitution. If you see a product of different function types (algebraic à exponential, algebraic à trig, etc.), use by parts. The LIATE rule helps choose u.
Q3: Why does âĢââââeâx2dx=Īâ? Why is this important in ML?
đĄAnswer
This is the Gaussian integral. It cannot be computed by finding an antiderivative (since eâx2 has no elementary antiderivative). The classic proof squares the integral and converts to polar coordinates:
Importance in ML: This integral is the normalization constant for the Gaussian distribution, the most important distribution in statistics and ML. It ensures âĢââââĪ2Īâ1âeâ(xâÎŧ)2/(2Ī2)dx=1, which is required for any valid probability distribution.
Q4: Explain the difference between convergence and divergence for improper integrals. Give an example of each.
đĄAnswer
An improper integral converges if the limit exists and is finite; it diverges if the limit does not exist or is infinite.
Convergent example:âĢ1ââx21âdx=limbâââ[âx1â]1bâ=1. The area is finite despite the infinite interval.
Divergent example:âĢ1ââx1âdx=limbâââlnb=â. The area grows without bound.
Key insight: The p-test says âĢ1ââxp1âdx converges iff p>1. For integrals near a singularity like âĢ01âxp1âdx, it converges iff p<1.
Q5: How is numerical integration used when analytical solutions are unavailable?
đĄAnswer
When the antiderivative cannot be expressed in closed form (e.g., eâx2, xsinxâ, or integrands defined only by data), we use numerical methods:
Trapezoidal rule: Approximate area with trapezoids. Simple, O(h2) convergence. Good for rough data.
Simpson's rule: Use parabolic arcs. O(h4) convergence. Better for smooth functions.
Gaussian quadrature: Optimally placed nodes. Integrates degree 2nâ1 polynomials exactly with n points.
Monte Carlo integration: Random sampling. Converges at O(1/nâ) but is the only practical method for high-dimensional integrals (critical in Bayesian inference).
Adaptive quadrature: Automatically refines the grid where the integrand is difficult. scipy.integrate.quad uses this.
Q6: What role does integration play in training probabilistic models?
đĄAnswer
Integration is essential in several aspects of probabilistic model training:
Normalization: Every PDF must integrate to 1. For complex models, this constant is often intractable.
Marginalization: To compute p(x)=âĢp(x,θ)dθ, we integrate out latent variables.
Evidence computation:p(data)=âĢp(dataâŖÎ¸)p(θ)dθ is needed for model comparison and Bayesian model selection.
Expected loss:E[L]=âĢL(θ)p(θâŖdata)dθ â computing the expected loss over the posterior.
Variational inference: Approximates intractable integrals with tractable ones by minimizing KL divergence.
MCMC: Draws samples from posteriors by constructing Markov chains whose stationary distribution is the target â a way to estimate integrals via sampling.
Q7: Prove that âĢ01âxndx=n+11â using the definition of the integral.
đĄAnswer
Using the Riemann sum with a uniform partition into n subintervals of width Îx=1/n: