Root Finding
ℹ️ Why It Matters
Root finding is one of the oldest and most fundamental problems in numerical computation. Every optimization algorithm reduces to finding where the gradient equals zero. Engineering simulations solve nonlinear equations at every timestep. Chemical equilibrium, financial pricing, and machine learning all depend on finding zeros of functions that cannot be solved analytically.
💡 Practical Impact
Finding the root of f(x) = 0 is equivalent to minimizing f(x)². Newton's method for root finding generalizes to Newton's method for optimization. The secant method is the backbone of quasi-Newton optimizers like BFGS used in almost every ML framework.
Definitions
DfRoot (Zero) of a Function
A root (or zero) of a function f is a value x* such that f(x*) = 0. Finding roots means solving the equation f(x) = 0, which may be transcendental (e.g., cos(x) = x), polynomial, or any other nonlinear equation.
DfBracket
A bracket is an interval [a, b] where f(a) and f(b) have opposite signs. By the Intermediate Value Theorem, if f is continuous on [a, b] and f(a)·f(b) < 0, then at least one root exists in (a, b).
DfConvergence Order
An iterative root-finding method converges with order p if |x_{n+1} − x*| ≤ C|x_n − x*|^p for some constant C. Order 1 is linear, order 2 is quadratic, order 1.618 (golden ratio) is superlinear (secant method).
DfSimple Root vs Multiple Root
A root x* of multiplicity m satisfies f(x*) = f'(x*) = ... = f^(m-1)(x*) = 0 but f^(m)(x*) ≠ 0. For simple roots (m = 1), Newton's method converges quadratically. For multiple roots, convergence degrades to linear unless modified.
DfFixed Point
A fixed point of a function g is a value x* such that g(x*) = x*. Finding a root of f(x) = 0 is equivalent to finding a fixed point of g(x) = x − f(x), or any g(x) = x + c·f(x) for suitable c.
Formulas
Bisection Method
Here,
- =Current bracket endpoints with f(a)·f(b) < 0
- =Midpoint of the bracket
Bisection Error Bound
Here,
- =Approximation after n iterations
- =True root
- =Initial bracket
Newton-Raphson Method
Here,
- =Current approximation
- =Function value at x_n
- =Derivative at x_n (must be nonzero)
Newton's Method for Square Root
Here,
- =Number whose square root is sought
- =Current approximation of √a
Secant Method
Here,
- =Two most recent approximations
- =Function values at those points
Fixed-Point Iteration
Here,
- =Fixed-point function (rewriting f(x) = 0 as x = g(x))
- =Current approximation
Fixed-Point Convergence Condition
Here,
- =Derivative of the fixed-point function
- =The fixed point
Modified Newton for Multiple Roots
Here,
- =Multiplicity of the root (known or estimated)
Bisection Method
ℹ️ How Bisection Works
Bisection is the simplest guaranteed-convergence method. Start with a bracket [a, b] where f changes sign. Compute the midpoint c = (a+b)/2. If f(c) has the same sign as f(a), the root is in [c, b]; otherwise it is in [a, c]. Repeat until the interval is smaller than the desired tolerance.
Properties
- Guaranteed convergence: Always finds a root if one exists in the bracket
- Linear convergence: Error halved each iteration (order 1)
- Error bound: |x_n − x*| ≤ (b − a) / 2^n
- Derivative-free: Only requires function evaluations
- Drawback: Slow; cannot find roots without a bracket; cannot find multiple roots simultaneously
Newton-Raphson Method
ℹ️ How Newton's Method Works
Newton's method uses the tangent line at x_n to find the next approximation. The tangent line crosses the x-axis at x_{n+1} = x_n − f(x_n)/f'(x_n). This is equivalent to one step of Newton's method for finding the root of the linear approximation.
Properties
- Quadratic convergence: Error squared each iteration near a simple root (order 2)
- Requires derivative: f'(x) must be known and computable
- Requires good initial guess: Can diverge if x_0 is far from the root
- Division by zero: Fails when f'(x_n) = 0
- Multiple roots: Convergence degrades to linear for roots with multiplicity > 1
Secant Method
ℹ️ How the Secant Method Works
The secant method approximates the derivative using two previous points: f'(x_n) ≈ (f(x_n) − f(x_{n-1})) / (x_n − x_{n-1}). Substituting into Newton's formula eliminates the need for derivatives. It requires two initial guesses but converges faster than bisection.
Properties
- Superlinear convergence: Order φ ≈ 1.618 (golden ratio)
- No derivative needed: Uses finite difference approximation
- Two initial points: Requires x_0 and x_1
- Not guaranteed to converge: Can diverge or cycle if initial guesses are poor
- No bracket required: Unlike bisection, does not need a sign change
Fixed-Point Iteration
ℹ️ How Fixed-Point Iteration Works
Rewrite f(x) = 0 as x = g(x) for some function g. Starting from x_0, compute x_{n+1} = g(x_n). Convergence requires |g'(x*)| < 1 near the fixed point. The closer |g'(x*)| is to 0, the faster the convergence.
Choosing g(x)
For f(x) = x³ − x − 2 = 0, possible reformulations:
- g₁(x) = x³ − 2 → g₁'(x) = 3x², |g₁'(1.5)| = 6.75 > 1 (diverges)
- g₂(x) = (x + 2)^(1/3) → g₂'(x) = 1/[3(x+2)^(2/3)], |g₂'(1.5)| = 0.21 < 1 (converges)
- g₃(x) = x − f(x)/f'(x) → This is Newton's method
Examples
📝Finding a Root of x³ − x − 2 = 0
This polynomial has one real root near x ≈ 1.5214. We will apply all four methods and compare convergence.
💡Implementation: All Four Methods Compared
import numpy as np
def bisection(f, a, b, tol=1e-10, max_iter=100):
"""Bisection method with bracket validation."""
if f(a) * f(b) > 0:
raise ValueError("No sign change in [a, b]")
history = []
for i in range(max_iter):
c = (a + b) / 2
fc = f(c)
history.append((i, c, abs(fc)))
if abs(fc) < tol or (b - a) / 2 < tol:
return c, history
if f(a) * fc < 0:
b = c
else:
a = c
return (a + b) / 2, history
def newton(f, df, x0, tol=1e-10, max_iter=100):
"""Newton-Raphson method."""
history = []
x = x0
for i in range(max_iter):
fx = f(x)
dfx = df(x)
history.append((i, x, abs(fx)))
if abs(fx) < tol:
return x, history
if abs(dfx) < 1e-15:
raise RuntimeError("Derivative near zero; method fails")
x = x - fx / dfx
return x, history
def secant(f, x0, x1, tol=1e-10, max_iter=100):
"""Secant method."""
history = []
f0, f1 = f(x0), f(x1)
for i in range(max_iter):
history.append((i, x1, abs(f1)))
if abs(f1) < tol:
return x1, history
if abs(f1 - f0) < 1e-15:
raise RuntimeError("Function values too close")
x_new = x1 - f1 * (x1 - x0) / (f1 - f0)
x0, f0 = x1, f1
x1 = x1 - f1 * (x1 - x0) / (f1 - f0) if False else x_new
f1 = f(x1)
return x1, history
def fixed_point(g, x0, tol=1e-10, max_iter=100):
"""Fixed-point iteration."""
history = []
x = x0
for i in range(max_iter):
x_new = g(x)
history.append((i, x, abs(x_new - x)))
if abs(x_new - x) < tol:
return x_new, history
x = x_new
return x, history
# Define the function and its derivative
f = lambda x: x**3 - x - 2
df = lambda x: 3*x**2 - 1
# Fixed-point function (convergent)
g = lambda x: (x + 2)**(1/3)
print("Root Finding: x^3 - x - 2 = 0")
print("=" * 50)
# Bisection
root_bis, hist_bis = bisection(f, 1, 2)
print(f"Bisection: x = {root_bis:.12f} ({len(hist_bis)} iterations)")
# Newton
root_new, hist_new = newton(f, df, 1.5)
print(f"Newton: x = {root_new:.12f} ({len(hist_new)} iterations)")
# Secant
root_sec, hist_sec = secant(f, 1.0, 2.0)
print(f"Secant: x = {root_sec:.12f} ({len(hist_sec)} iterations)")
# Fixed-point
root_fp, hist_fp = fixed_point(g, 1.5)
print(f"Fixed-pt: x = {root_fp:.12f} ({len(hist_fp)} iterations)")
# Verify
print(f"\nVerification: f({root_new:.12f}) = {f(root_new):.2e}")
📝Convergence Rate Comparison
The convergence order determines how many iterations are needed for a given accuracy. Quadratic convergence (Newton) roughly doubles the number of correct digits each iteration. Linear convergence (bisection) adds a constant number of digits.
💡Convergence Rate Demonstration
import numpy as np
def convergence_analysis():
"""Compare convergence rates of different methods."""
f = lambda x: x**2 - 2 # Root is sqrt(2)
df = lambda x: 2*x
exact = np.sqrt(2)
# Newton's method (quadratic)
x = 1.5
print("Newton's Method (quadratic convergence):")
for i in range(6):
error = abs(x - exact)
print(f" Iter {i}: x = {x:.15f}, error = {error:.2e}")
x = x - f(x) / df(x)
print()
# Secant method (superlinear, order ~1.618)
x0, x1 = 1.0, 2.0
print("Secant Method (superlinear convergence):")
for i in range(6):
error = abs(x1 - exact)
print(f" Iter {i}: x = {x1:.15f}, error = {error:.2e}")
x_new = x1 - f(x1) * (x1 - x0) / (f(x1) - f(x0))
x0, x1 = x1, x_new
print()
# Bisection (linear)
a, b = 1.0, 2.0
print("Bisection Method (linear convergence):")
for i in range(6):
c = (a + b) / 2
error = abs(c - exact)
print(f" Iter {i}: x = {c:.15f}, error = {error:.2e}")
if f(a) * f(c) < 0:
b = c
else:
a = c
convergence_analysis()
📝Newton's Method for Matrix Square Root
Newton's method extends to matrix equations. To find X such that X² = A (matrix square root), the iteration is X_{n+1} = 0.5(X_n + A·X_n⁻¹).
💡Matrix Square Root via Newton's Method
import numpy as np
def matrix_sqrt_newton(A, tol=1e-10, max_iter=100):
"""Newton's method for matrix square root: X² = A."""
X = A.copy().astype(float)
for i in range(max_iter):
X_next = 0.5 * (X + A @ np.linalg.inv(X))
if np.linalg.norm(X_next - X) < tol:
return X_next, i + 1
X = X_next
return X, max_iter
# Example
A = np.array([[2.0, 1.0],
[1.0, 3.0]])
X, iters = matrix_sqrt_newton(A)
print(f"Matrix square root:\n{X}")
print(f"\nVerification X²:\n{X @ X}")
print(f"\nOriginal A:\n{A}")
print(f"\nError: {np.linalg.norm(X @ X - A):.2e}")
print(f"Iterations: {iters}")
Theorems
ThIntermediate Value Theorem (Basis for Bisection)
If f is continuous on [a, b] and f(a) and f(b) have opposite signs (i.e., f(a)·f(b) < 0), then there exists at least one c ∈ (a, b) such that f(c) = 0.
ThNewton's Method Quadratic Convergence
If f is twice continuously differentiable, x* is a simple root (f'(x*) ≠ 0), and x_0 is sufficiently close to x*, then Newton's method converges quadratically: |x_{n+1} − x*| ≤ C|x_n − x*|² where C = |f''(x*)| / (2|f'(x*)|).
ThSecant Method Superlinear Convergence
The secant method converges superlinearly with order p = (1 + √5)/2 ≈ 1.618 (the golden ratio). The error satisfies: |e_{n+1}| ≤ C|e_n|^p where e_n = x_n − x*. This is faster than linear but slower than quadratic.
ThFixed-Point Convergence Theorem
If g is continuously differentiable on [a, b] and |g'(x)| ≤ L < 1 for all x ∈ [a, b], and g maps [a, b] into itself, then the fixed-point iteration x_{n+1} = g(x_n) converges to the unique fixed point x* for any x_0 ∈ [a, b].
ThContractive Mapping Theorem (Banach Fixed-Point Theorem)
If g is a contraction mapping on a complete metric space (i.e., |g(x) − g(y)| ≤ L|x − y| for some L < 1), then g has a unique fixed point x*, and the iteration converges to x* for any starting point.
Python Implementation
import numpy as np
from typing import Callable, Tuple, List, Optional
class RootFinder:
"""Comprehensive root-finding toolkit."""
@staticmethod
def bisection(f: Callable, a: float, b: float,
tol: float = 1e-10, max_iter: int = 100) -> Tuple[float, List]:
"""Bisection method with full history."""
if f(a) * f(b) > 0:
raise ValueError("No sign change in [a, b]")
if f(a) * f(b) == 0:
return a if f(a) == 0 else b, []
history = []
for i in range(max_iter):
c = (a + b) / 2
fc = f(c)
history.append({'iter': i, 'x': c, 'f(x)': fc,
'interval': (a, b), 'width': b - a})
if abs(fc) < tol or (b - a) / 2 < tol:
return c, history
if f(a) * fc < 0:
b = c
else:
a = c
return (a + b) / 2, history
@staticmethod
def newton(f: Callable, df: Callable, x0: float,
tol: float = 1e-10, max_iter: int = 100) -> Tuple[float, List]:
"""Newton-Raphson method."""
history = []
x = x0
for i in range(max_iter):
fx = f(x)
dfx = df(x)
history.append({'iter': i, 'x': x, 'f(x)': fx, 'df(x)': dfx})
if abs(fx) < tol:
return x, history
if abs(dfx) < 1e-15:
raise RuntimeError(f"Derivative near zero at x={x}")
x = x - fx / dfx
return x, history
@staticmethod
def secant(f: Callable, x0: float, x1: float,
tol: float = 1e-10, max_iter: int = 100) -> Tuple[float, List]:
"""Secant method."""
history = []
f0, f1 = f(x0), f(x1)
for i in range(max_iter):
history.append({'iter': i, 'x': x1, 'f(x)': f1})
if abs(f1) < tol:
return x1, history
if abs(f1 - f0) < 1e-15:
raise RuntimeError("Denominator too small")
x_new = x1 - f1 * (x1 - x0) / (f1 - f0)
x0, f0 = x1, f1
x1 = x_new
f1 = f(x1)
return x1, history
@staticmethod
def fixed_point(g: Callable, x0: float,
tol: float = 1e-10, max_iter: int = 100) -> Tuple[float, List]:
"""Fixed-point iteration."""
history = []
x = x0
for i in range(max_iter):
x_new = g(x)
history.append({'iter': i, 'x': x, 'x_new': x_new,
'diff': abs(x_new - x)})
if abs(x_new - x) < tol:
return x_new, history
x = x_new
return x, history
@staticmethod
def brentq(f: Callable, a: float, b: float,
tol: float = 1e-10, max_iter: int = 100) -> Tuple[float, List]:
"""Brent's method — combines bisection with inverse quadratic interpolation."""
if f(a) * f(b) > 0:
raise ValueError("No sign change")
fa, fb = f(a), f(b)
history = []
for i in range(max_iter):
if abs(fb) < tol:
return b, history
# Try inverse quadratic interpolation
c, fc = a, fa
d = b - a
e = d
if abs(fa) > abs(fb):
a, b = b, a
fa, fb = fb, fa
m = 0.5 * (a - b)
if abs(m) < tol or fb == 0:
return b, history
if abs(e) >= tol and abs(fa) > abs(fb):
s = fb / fa
if a == c:
# Secant
p = 2 * m * s
q = 1 - s
else:
# Inverse quadratic
q = fa / fc
r = fb / fc
p = s * (2 * m * q * (q - r) - (b - a) * (r - 1))
q = (q - 1) * (r - 1) * (s - 1)
if p > 0:
q = -q
else:
p = -p
if 2 * p < min(3 * m * q - tol * abs(q),
abs(e * q)):
e = d
d = p / q
else:
d = m
e = m
else:
d = m
e = m
a, fa = b, fb
if abs(d) > tol:
b = b + d
else:
b = b - tol if m > 0 else b + tol
fb = f(b)
history.append({'iter': i, 'x': b, 'f(x)': fb})
return b, history
def demonstrate_methods():
"""Compare all methods on several test functions."""
print("=" * 60)
print("ROOT FINDING METHOD COMPARISON")
print("=" * 60)
tests = [
("x^3 - x - 2 = 0",
lambda x: x**3 - x - 2,
lambda x: 3*x**2 - 1,
(1, 2), 1.5),
("cos(x) - x = 0",
lambda x: np.cos(x) - x,
lambda x: -np.sin(x) - 1,
(0, 1), 0.5),
("e^x - 3 = 0",
lambda x: np.exp(x) - 3,
lambda x: np.exp(x),
(0, 2), 1.0),
]
finder = RootFinder()
for name, f, df, bracket, x0 in tests:
print(f"\n--- {name} ---")
root, _ = finder.bisection(f, bracket[0], bracket[1])
print(f" Bisection: {root:.12f}")
root, _ = finder.newton(f, df, x0)
print(f" Newton: {root:.12f}")
root, _ = finder.secant(f, bracket[0], bracket[1])
print(f" Secant: {root:.12f}")
demonstrate_methods()
Applications in AI/ML
ℹ️ Root Finding in Machine Learning
Root finding is foundational to optimization. Every gradient-based optimizer finds where ∇f(x) = 0. Newton's method for optimization uses the Hessian: x_{n+1} = x_n − H⁻¹∇f(x_n). The secant method generalizes to quasi-Newton methods (BFGS, L-BFGS) that approximate the Hessian.
Optimization Connection
| Root Finding | Optimization Equivalent |
|---|---|
| Find x where f(x) = 0 | Find x where ∇f(x) = 0 |
| Newton: x − f/f' | Newton: x − H⁻¹∇f |
| Secant approximation | BFGS Hessian approximation |
| Bisection | Golden section search |
Specific Applications
- Logistic regression: Solving for weights requires finding roots of the gradient
- Newton's method in neural networks: Second-order optimizers use Hessian inverse
- Brent's method in scipy.optimize: Used for line search in L-BFGS
- Fixed-point iteration: Expectation-Maximization (EM) algorithm is a fixed-point iteration
- Eigenvalue problems: Finding eigenvalues reduces to finding roots of det(A − λI) = 0
Common Mistakes
| Mistake | Problem | Solution |
|---|---|---|
| No bracket for bisection | May miss the root entirely | Verify f(a)·f(b) < 0 before starting |
| Starting Newton far from root | Method may diverge | Use bisection to get close, then switch to Newton |
| Ignoring f'(x) = 0 in Newton | Division by zero crashes | Check |
| Using fixed-point iteration blindly | May diverge if | g'(x*) |
| Not checking convergence | May return early or iterate forever | Use both |
| Assuming unique root | May find wrong root | Use bracketing methods to isolate roots first |
| Too tight tolerance | More iterations than needed | Set tolerance relative to problem scale |
| Ignoring floating point issues | Cancellation in secant method | Use robust methods like Brent's when possible |
Interview Questions
Q1: When would you use bisection vs Newton's method? A: Use bisection when you need guaranteed convergence and have a bracket, or when derivatives are unavailable. Use Newton's when you have a good initial guess and the derivative, for faster convergence. In practice, use a hybrid: bisection to bracket, Newton to refine.
Q2: Why does Newton's method fail for multiple roots? A: At a root of multiplicity m, f'(x*) = 0, so the tangent line is nearly horizontal. The iteration becomes x_{n+1} = x_n − m·f(x_n)/f'(x_n), converging linearly instead of quadratically. The modification requires knowing m, which is often unknown.
Q3: Explain the secant method's convergence order. A: The secant method has order p = (1+√5)/2 ≈ 1.618. This is proven by analyzing the error recurrence e_{n+1} ≈ C·e_n·e_{n-1}, which leads to the characteristic equation p² = p + 1, giving the golden ratio. It is faster than linear (bisection) but slower than quadratic (Newton).
Q4: How does Brent's method combine bisection and interpolation? A: Brent's method uses inverse quadratic interpolation when three points are available and the interpolation step stays within the current bracket. If the interpolation step would leave the bracket or fails to reduce the error sufficiently, it falls back to bisection. This gives the robustness of bisection with the speed of higher-order methods.
Q5: Give an example where fixed-point iteration diverges. A: For x² − 2 = 0, rewriting as g(x) = x² + 2 gives |g'(√2)| = 2√2 ≈ 2.83 > 1, so iteration diverges. Rewriting as g(x) = 2/x gives |g'(√2)| = 2/2 = 1, which is borderline. Only g(x) = (x + 2/x)/2 (Newton's method) has |g'(√2)| = 0, converging fastest.
Q6: How is root finding used in line search for optimization? A: Line search finds α that minimizes f(x + αd). This requires finding where d/dα f(x + αd) = 0, a one-dimensional root-finding problem. Brent's method is commonly used because it combines guaranteed convergence with speed. Wolfe conditions provide stopping criteria that ensure sufficient decrease.
Practice Problems
📝Problem 1: Find All Roots of a Polynomial
Find all real roots of x⁴ − 5x² + 4 = 0 using a combination of bracketing and Newton's method.
💡Solution
import numpy as np
def find_all_roots(f, df, search_range=(-10, 10), n_points=1000):
"""Find all roots by scanning for sign changes then refining with Newton."""
roots = []
x_vals = np.linspace(search_range[0], search_range[1], n_points)
f_vals = [f(x) for x in x_vals]
for i in range(len(f_vals) - 1):
if f_vals[i] * f_vals[i+1] < 0:
# Sign change found — use bisection to bracket, then Newton
a, b = x_vals[i], x_vals[i+1]
c = (a + b) / 2
for _ in range(50):
fc = f(c)
if abs(fc) < 1e-12:
break
if f(a) * fc < 0:
b = c
else:
a = c
c = (a + b) / 2
# Refine with Newton
x = c
for _ in range(20):
x = x - f(x) / df(x)
roots.append(x)
return roots
f = lambda x: x**4 - 5*x**2 + 4
df = lambda x: 4*x**3 - 10*x
roots = find_all_roots(f, df)
print(f"Found roots: {[f'{r:.6f}' for r in roots]}")
print(f"Values: {[f'{f(r):.2e}' for r in roots]}")
# Expected: ±1, ±2
📝Problem 2: Newton's Method with Backtracking
Implement Newton's method with backtracking line search to ensure convergence even with poor initial guesses.
💡Solution
import numpy as np
def newton_backtracking(f, df, x0, alpha=0.5, beta=0.5,
tol=1e-10, max_iter=100):
"""Newton's method with backtracking line search."""
x = x0
for i in range(max_iter):
fx = f(x)
dfx = df(x)
if abs(fx) < tol:
return x, i
# Newton direction
dx = -fx / dfx
# Backtracking line search
t = 1.0
while abs(f(x + t * dx)) > abs(fx) - 1e-4 * t * abs(dfx * dx):
t *= beta
if t < 1e-15:
break
x = x + t * dx
return x, max_iter
# Test on a challenging function
f = lambda x: np.log(x) - 1 # Root at e
df = lambda x: 1/x
root, iters = newton_backtracking(f, df, x0=10.0) # Far from root
print(f"Root: {root:.10f} (e = {np.e:.10f})")
print(f"Iterations: {iters}")
Quick Reference
📋Root Finding Quick Reference
| Method | Convergence | Derivative | Bracket | Initial Guesses |
|---|---|---|---|---|
| Bisection | Linear (order 1) | No | Required | 1 (bracket) |
| Newton | Quadratic (order 2) | Yes | No | 1 |
| Secant | Superlinear (order 1.618) | No | No | 2 |
| Fixed-point | Linear (order depends on g') | No | No | 1 |
| Brent's | Superlinear + robust | No | Required | 1 (bracket) |
Error bounds:
- Bisection after n iterations: |error| ≤ (b − a) / 2^n
- Newton near simple root: |e_{n+1}| ≤ C|e_n|²
- Secant: |e_{n+1}| ≤ C|e_n|^1.618
When to use each:
- Bisection: Guaranteed convergence, no derivatives, have bracket
- Newton: Fast convergence, derivative available, good initial guess
- Secant: No derivative, faster than bisection, can risk divergence
- Brent's: Production use — robust + fast
Cross-References
- 086-Floating-Point — Rounding errors affect convergence criteria; machine epsilon determines achievable tolerance
- 088-Integration — Quadrature methods use function evaluations subject to floating point error
- 090-Error Analysis — Condition numbers determine sensitivity of root locations to perturbations in f
- Optimization — Root finding of ∇f = 0 is the foundation of optimization; Newton's method generalizes to Newton's method for optimization
- Linear Algebra — Eigenvalue computation requires finding roots of the characteristic polynomial
- Differential Equations — Implicit ODE solvers require solving nonlinear equations at each timestep