Lagrange Multipliers

ℹ️ Why It Matters

Many machine learning problems are inherently constrained: support vector machines maximize a margin subject to classification constraints, neural network training may impose fairness requirements, and regularization adds penalty constraints on model weights. Lagrange multipliers provide the mathematical framework for solving these constrained optimization problems by converting them into unconstrained ones that are far easier to solve with gradient-based methods.

In real-world ML pipelines, unconstrained optimization is the exception, not the rule. Every time you add a constraint — a budget on inference time, a fairness requirement across demographic groups, a maximum norm on weight vectors — you enter the domain of constrained optimization. Lagrange multipliers are the foundational tool that makes these problems tractable, and understanding them is essential for anyone building or analyzing ML systems.

What are Lagrange Multipliers

DfLagrange Multipliers

Lagrange multipliers are a strategy for finding the local maxima and minima of a function subject to equality constraints. The core idea is to introduce a new variable, the Lagrange multiplier $\lambda$ , for each constraint, and then solve a system of equations derived from the gradients of the objective function and the constraints.

Lagrange Multiplier Condition

\nabla f(\vec{x}) = \lambda \nabla g(\vec{x}), \quad g(\vec{x}) = 0

Here,

$f(\vec{x})$ =The objective function to optimize
$g(\vec{x}) = 0$ =The equality constraint
$\lambda$ =The Lagrange multiplier (a scalar)
$\nabla f$ =The gradient of the objective function
$\nabla g$ =The gradient of the constraint function

💡 Intuition

At the constrained optimum, the gradient of the objective must be parallel to the gradient of the constraint. If they were not parallel, there would be a direction along the constraint surface that could improve the objective, contradicting optimality. The multiplier $\lambda$ quantifies how much the optimal value changes when the constraint is relaxed.

Geometric Interpretation

ℹ️ Level Curves and Constraint Surfaces

Consider a 2D problem where we want to optimize $f(x, y)$ subject to $g(x, y) = 0$ . The constraint $g(x, y) = 0$ defines a curve in the plane. The level curves of $f$ (contours where $f$ is constant) form a family of curves. The constrained optimum occurs at the level curve that is tangent to the constraint curve — any level curve that crosses the constraint curve would not be optimal because nearby points on the constraint curve would yield higher or lower values of $f$ .

At the point of tangency:

$\nabla f$ is perpendicular to the level curve of $f$
$\nabla g$ is perpendicular to the constraint curve $g = 0$
Since both gradients are perpendicular to the same tangent line, they must be parallel
Therefore $\nabla f = \lambda \nabla g$ for some scalar $\lambda$

If $\lambda > 0$ , increasing the constraint (relaxing it in the direction of $\nabla g$ ) increases $f$ . If $\lambda < 0$ , relaxing the constraint decreases $f$ . This makes $\lambda$ the sensitivity of the optimal value to the constraint.

Method of Lagrange Multipliers

ThStep-by-Step Method

To optimize $f(\vec{x})$ subject to $g(\vec{x}) = 0$ :

Step 1. Write down the Lagrangian function:

\mathcal{L}(\vec{x}, \lambda) = f(\vec{x}) - \lambda \, g(\vec{x})

Step 2. Take partial derivatives of $\mathcal{L}$ with respect to each variable $x_i$ and $\lambda$ :

\frac{\partial \mathcal{L}}{\partial x_i} = \frac{\partial f}{\partial x_i} - \lambda \frac{\partial g}{\partial x_i} = 0

\frac{\partial \mathcal{L}}{\partial \lambda} = -g(\vec{x}) = 0

Step 3. Solve the resulting system of $n + 1$ equations (where $n$ is the number of variables) for the $n + 1$ unknowns: the $n$ components of $\vec{x}$ and $\lambda$ .

Step 4. Evaluate $f$ at each solution to identify maxima and minima.

⚠️ Important Caveats

The method finds candidates for optima; you must verify which are maxima, minima, or saddle points.
The constraint gradient $\nabla g$ must be nonzero at the solution (regularity condition).
For multiple constraints, add a multiplier per constraint.

Single Constraint Examples

📝Example 1: Rectangle of Maximum Area

Problem: Find the rectangle with the largest area that can be inscribed in a circle of radius $R$ .

💡Solution

Objective: Maximize $f(x, y) = (2x)(2y) = 4xy$ (area of rectangle with half-widths $x, y$ ).

Constraint: $g(x, y) = x^2 + y^2 - R^2 = 0$ (rectangle must lie on the circle).

Lagrangian: $\mathcal{L} = 4xy - \lambda(x^2 + y^2 - R^2)$

System of equations:

\frac{\partial \mathcal{L}}{\partial x} = 4y - 2\lambda x = 0 \implies \lambda = \frac{2y}{x}

\frac{\partial \mathcal{L}}{\partial y} = 4x - 2\lambda y = 0 \implies \lambda = \frac{2x}{y}

\frac{\partial \mathcal{L}}{\partial \lambda} = -(x^2 + y^2 - R^2) = 0

Solving: From the first two equations: $\frac{2y}{x} = \frac{2x}{y} \implies y^2 = x^2 \implies y = x$ (taking positive values).

Substituting into the constraint: $x^2 + x^2 = R^2 \implies x = \frac{R}{\sqrt{2}}$ , $y = \frac{R}{\sqrt{2}}$ .

Maximum area: $4xy = 4 \cdot \frac{R}{\sqrt{2}} \cdot \frac{R}{\sqrt{2}} = 2R^2$ .

The optimal rectangle is a square with side length $R\sqrt{2}$ .

📝Example 2: Closest Point on a Plane

Problem: Find the point on the plane $2x + y - 2z = 4$ closest to the origin.

💡Solution

Objective: Minimize $f(x, y, z) = x^2 + y^2 + z^2$ (squared distance from origin).

Constraint: $g(x, y, z) = 2x + y - 2z - 4 = 0$ .

Lagrangian: $\mathcal{L} = x^2 + y^2 + z^2 - \lambda(2x + y - 2z - 4)$

System:

\frac{\partial \mathcal{L}}{\partial x} = 2x - 2\lambda = 0 \implies x = \lambda

\frac{\partial \mathcal{L}}{\partial y} = 2y - \lambda = 0 \implies y = \frac{\lambda}{2}

\frac{\partial \mathcal{L}}{\partial z} = 2z + 2\lambda = 0 \implies z = -\lambda

2x + y - 2z = 4

Solving: Substituting: $2\lambda + \frac{\lambda}{2} + 2\lambda = 4 \implies \frac{9\lambda}{2} = 4 \implies \lambda = \frac{8}{9}$ .

Solution: $x = \frac{8}{9}$ , $y = \frac{4}{9}$ , $z = -\frac{8}{9}$ .

Minimum distance: $\sqrt{\left(\frac{8}{9}\right)^2 + \left(\frac{4}{9}\right)^2 + \left(\frac{8}{9}\right)^2} = \frac{4\sqrt{5}}{3}$ .

Multiple Constraints

Lagrange Multipliers with Multiple Equality Constraints

\mathcal{L}(\vec{x}, \lambda_1, \ldots, \lambda_m) = f(\vec{x}) - \sum_{j=1}^{m} \lambda_j \, g_j(\vec{x})

Here,

$f(\vec{x})$ =The objective function
$g_j(\vec{x}) = 0$ =The j-th equality constraint (m total)
$\lambda_j$ =The Lagrange multiplier for the j-th constraint

ThMultiple Constraints System

For $n$ variables and $m$ equality constraints (with $m < n$ ), solve:

\nabla f = \sum_{j=1}^{m} \lambda_j \nabla g_j, \quad g_j(\vec{x}) = 0 \text{ for } j = 1, \ldots, m

This gives $n + m$ equations (from the gradient condition and constraints) for $n + m$ unknowns (the $n$ components of $\vec{x}$ and the $m$ multipliers $\lambda_j$ ).

📝Example 3: Two Constraints

Problem: Optimize $f(x, y, z) = x + 2y + 3z$ subject to $g_1: x^2 + y^2 + z^2 = 1$ and $g_2: x + y + z = 0$ .

💡Solution

Lagrangian: $\mathcal{L} = x + 2y + 3z - \lambda_1(x^2 + y^2 + z^2 - 1) - \lambda_2(x + y + z)$

System:

\frac{\partial \mathcal{L}}{\partial x} = 1 - 2\lambda_1 x - \lambda_2 = 0

\frac{\partial \mathcal{L}}{\partial y} = 2 - 2\lambda_1 y - \lambda_2 = 0

\frac{\partial \mathcal{L}}{\partial z} = 3 - 2\lambda_1 z - \lambda_2 = 0

x^2 + y^2 + z^2 = 1, \quad x + y + z = 0

Solving: Subtracting equations pairwise:

(1 - 2\lambda_1 x) - (2 - 2\lambda_1 y) = 0 \implies 2\lambda_1(y - x) = 1 \implies y - x = \frac{1}{2\lambda_1}

(2 - 2\lambda_1 y) - (3 - 2\lambda_1 z) = 0 \implies 2\lambda_1(z - y) = 1 \implies z - y = \frac{1}{2\lambda_1}

So $y - x = z - y$ , meaning $2y = x + z$ . Combined with $x + y + z = 0$ : $3y = 0 \implies y = 0$ , so $z = -x$ .

From the sphere constraint: $x^2 + 0 + x^2 = 1 \implies x = \pm\frac{1}{\sqrt{2}}$ .

Maximum: $f\left(\frac{1}{\sqrt{2}}, 0, -\frac{1}{\sqrt{2}}\right) = \frac{1}{\sqrt{2}} + 0 - \frac{3}{\sqrt{2}} = -\sqrt{2}$ .

Minimum: $f\left(-\frac{1}{\sqrt{2}}, 0, \frac{1}{\sqrt{2}}\right) = -\frac{1}{\sqrt{2}} + 0 + \frac{3}{\sqrt{2}} = \sqrt{2}$ .

KKT Conditions

DfKarush-Kuhn-Tucker (KKT) Conditions

The KKT conditions generalize Lagrange multipliers to handle inequality constraints. For the problem of minimizing $f(\vec{x})$ subject to $g_i(\vec{x}) \leq 0$ (inequality) and $h_j(\vec{x}) = 0$ (equality), the KKT conditions are necessary conditions for optimality (under constraint qualifications).

ThKKT Conditions

For a minimization problem with inequality constraints $g_i(\vec{x}) \leq 0$ and equality constraints $h_j(\vec{x}) = 0$ , at the optimal point $\vec{x}^*$ :

1. Stationarity:

\nabla f(\vec{x}^*) + \sum_{i} \lambda_i \nabla g_i(\vec{x}^*) + \sum_{j} \mu_j \nabla h_j(\vec{x}^*) = \vec{0}

2. Primal Feasibility:

g_i(\vec{x}^*) \leq 0 \quad \forall \, i, \qquad h_j(\vec{x}^*) = 0 \quad \forall \, j

3. Dual Feasibility:

\lambda_i \geq 0 \quad \forall \, i

4. Complementary Slackness:

\lambda_i \, g_i(\vec{x}^*) = 0 \quad \forall \, i

💡 Complementary Slackness Intuition

Complementary slackness means that for each inequality constraint, either the constraint is active (binding, $g_i = 0$ ) or its multiplier is zero ( $\lambda_i = 0$ ). An inactive constraint ( $g_i < 0$ ) has no influence on the optimum, so its multiplier must be zero. Only active constraints "pull" on the solution, and their multipliers indicate how strongly they pull.

⚠️ Minimization vs Maximization Sign Convention

Different textbooks use different sign conventions. Some write $\nabla f + \sum \lambda_i \nabla g_i = 0$ for minimization (as above), while others write $\nabla f = \sum \lambda_i \nabla g_i$ for maximization. Always check which convention is being used. The sign of $\lambda_i$ must be consistent with the convention and the stationarity condition.

Saddle Points

ℹ️ When Lagrange Multipliers Find Saddle Points

The Lagrange multiplier method finds all stationary points of the Lagrangian, which include maxima, minima, and saddle points. The method itself does not distinguish between these. To classify a candidate solution, you must use the second-order conditions.

ThSecond-Order Conditions for Constrained Optimization

Let $L = \nabla^2_{\vec{x}\vec{x}} \mathcal{L}$ be the Hessian of the Lagrangian with respect to $\vec{x}$ , and $J_g$ be the Jacobian of the constraints.

For a constrained minimum: $L$ must be positive definite on the tangent space of the active constraints.

For a constrained maximum: $L$ must be negative definite on the tangent space of the active constraints.

If $L$ is indefinite on the tangent space, the point is a constrained saddle point.

📝Saddle Point Example

Problem: Optimize $f(x, y) = xy$ subject to $g(x, y) = x^2 + y^2 - 1 = 0$ .

💡Solution

Lagrangian: $\mathcal{L} = xy - \lambda(x^2 + y^2 - 1)$

System:

\frac{\partial \mathcal{L}}{\partial x} = y - 2\lambda x = 0

\frac{\partial \mathcal{L}}{\partial y} = x - 2\lambda y = 0

x^2 + y^2 = 1

From the first two equations: $y = 2\lambda x$ and $x = 2\lambda y$ , so $x = 4\lambda^2 x$ , giving $\lambda = \pm\frac{1}{2}$ .

If $\lambda = \frac{1}{2}$ : $y = x$ , so $2x^2 = 1 \implies x = \pm\frac{1}{\sqrt{2}}$ .
- $(x, y) = \left(\frac{1}{\sqrt{2}}, \frac{1}{\sqrt{2}}\right)$ : $f = \frac{1}{2}$ (maximum)
- $(x, y) = \left(-\frac{1}{\sqrt{2}}, -\frac{1}{\sqrt{2}}\right)$ : $f = \frac{1}{2}$ (maximum)
If $\lambda = -\frac{1}{2}$ : $y = -x$ , so $2x^2 = 1 \implies x = \pm\frac{1}{\sqrt{2}}$ .
- $(x, y) = \left(\frac{1}{\sqrt{2}}, -\frac{1}{\sqrt{2}}\right)$ : $f = -\frac{1}{2}$ (minimum)
- $(x, y) = \left(-\frac{1}{\sqrt{2}}, \frac{1}{\sqrt{2}}\right)$ : $f = -\frac{1}{2}$ (minimum)

All four points are genuine extrema (no saddle points on this constraint set). However, without checking second-order conditions, we would not know which are maxima and which are minima.

⚠️ Saddle Points in Practice

In ML, saddle points are common in non-convex optimization landscapes. While Lagrange multipliers can identify them analytically, in practice with high-dimensional neural networks, saddle points are far more prevalent than true local minima. Second-order methods (like Newton's method) can escape saddle points by following negative curvature directions.

Python Implementation

import numpy as np
from scipy.optimize import minimize, minimize_scalar

# ============================================
# Example 1: Single Equality Constraint
# Minimize x^2 + y^2 subject to x + y = 1
# ============================================

def lagrange_single_constraint():
    f = lambda x: x[0]**2 + x[1]**2
    constraint = {'type': 'eq', 'fun': lambda x: x[0] + x[1] - 1}
    
    result = minimize(f, x0=[0.5, 0.5], constraints=constraint)
    print(f"Optimal point: x = {result.x[0]:.4f}, y = {result.x[1]:.4f}")
    print(f"Optimal value: {result.fun:.4f}")
    print(f"Multiplier (approx): lambda = {2 * result.x[0]:.4f}")
    # Analytical solution: x = y = 1/2, lambda = 1

lagrange_single_constraint()

# ============================================
# Example 2: Minimize distance to a plane
# Minimize x^2 + y^2 + z^2 subject to 2x + y - 2z = 4
# ============================================

def closest_point_on_plane():
    f = lambda x: x[0]**2 + x[1]**2 + x[2]**2
    constraint = {'type': 'eq', 'fun': lambda x: 2*x[0] + x[1] - 2*x[2] - 4}
    
    result = minimize(f, x0=[1, 1, 1], constraints=constraint)
    print(f"\nClosest point: ({result.x[0]:.4f}, {result.x[1]:.4f}, {result.x[2]:.4f})")
    print(f"Distance: {np.sqrt(result.fun):.4f}")
    # Analytical: (8/9, 4/9, -8/9), distance = 4*sqrt(5)/3

closest_point_on_plane()

# ============================================
# Example 3: Multiple constraints
# Minimize x^2 + y^2 + z^2 subject to x+y+z=1 and x-y=0
# ============================================

def multiple_constraints():
    f = lambda x: x[0]**2 + x[1]**2 + x[2]**2
    constraints = [
        {'type': 'eq', 'fun': lambda x: x[0] + x[1] + x[2] - 1},
        {'type': 'eq', 'fun': lambda x: x[0] - x[1]}
    ]
    
    result = minimize(f, x0=[1/3, 1/3, 1/3], constraints=constraints)
    print(f"\nOptimal point: ({result.x[0]:.4f}, {result.x[1]:.4f}, {result.x[2]:.4f})")
    print(f"Optimal value: {result.fun:.4f}")

multiple_constraints()

# ============================================
# Example 4: KKT with inequality constraints
# Minimize x^2 + y^2 subject to x + y >= 1
# ============================================

def kkt_inequality():
    f = lambda x: x[0]**2 + x[1]**2
    # Convert >= to <=: -(x + y - 1) <= 0
    constraint = {'type': 'ineq', 'fun': lambda x: x[0] + x[1] - 1}
    
    result = minimize(f, x0=[0, 0], constraints=constraint)
    print(f"\nKKT optimal point: ({result.x[0]:.4f}, {result.x[1]:.4f})")
    print(f"Optimal value: {result.fun:.4f}")
    # Solution: x = y = 0.5, value = 0.5

kkt_inequality()

Applications in AI/ML

Support Vector Machines (SVM)

ℹ️ SVM and Lagrange Multipliers

The SVM primal problem maximizes the margin $\frac{2}{\|w\|}$ subject to classification constraints. Lagrange multipliers transform this into the dual problem, where each multiplier $\alpha_i$ corresponds to a training point. Points with $\alpha_i > 0$ are the support vectors — the only points that matter for the decision boundary.

SVM Dual Formulation

\max_{\alpha} \sum_{i=1}^{n} \alpha_i - \frac{1}{2} \sum_{i=1}^{n}\sum_{j=1}^{n} \alpha_i \alpha_j y_i y_j \, \vec{x}_i^T \vec{x}_j

Here,

$\alpha_i$ =Lagrange multiplier for the i-th training point
$y_i \in \{-1, +1\}$ =Class label of the i-th point
$\vec{x}_i$ =Feature vector of the i-th point

Regularization as Constraint

💡 L1 and L2 Regularization

Ridge regression (L2) minimizes $\|y - Xw\|^2$ subject to $\|w\|_2^2 \leq t$ . The Lagrangian form is $\|y - Xw\|^2 + \lambda \|w\|_2^2$ . Lasso (L1) uses $\|w\|_1 \leq t$ , leading to $\|y - Xw\|^2 + \lambda \|w\|_1$ . The multiplier $\lambda$ controls the regularization strength — it is the sensitivity of the loss to the constraint radius $t$ .

Fairness Constraints

ℹ️ Fairness in ML via Lagrange Multipliers

Modern fairness-aware ML adds constraints like demographic parity ( $P(\hat{Y}=1|A=0) = P(\hat{Y}=1|A=1)$ ) or equalized odds. These are incorporated as equality or inequality constraints in the optimization problem, and Lagrange multipliers (or KKT conditions) provide the mechanism for solving the resulting constrained problem. The multipliers quantify the "cost of fairness" — how much accuracy is sacrificed per unit of fairness enforcement.

Portfolio Optimization

In finance-inspired ML, the Markowitz mean-variance portfolio optimizes return subject to risk constraints. Lagrange multipliers solve this, and the resulting efficient frontier maps out the tradeoff between risk and return — parameterized by $\lambda$ .

Common Mistakes

Mistake	Why It's Wrong	Correct Approach
Forgetting the constraint equation	Solving $\nabla f = \lambda \nabla g$ alone gives $n$ equations for $n+1$ unknowns	Always include $g(\vec{x}) = 0$ as an equation
Ignoring the sign of $\lambda$	$\lambda > 0$ vs $\lambda < 0$ indicates whether relaxing the constraint helps or hurts	Track the sign; it has physical meaning
Confusing $\mathcal{L} = f + \lambda g$ with $\mathcal{L} = f - \lambda g$	Both conventions work, but mixing them leads to sign errors in $\lambda$	Pick one convention and be consistent
Assuming all critical points are extrema	Lagrange finds stationary points of $\mathcal{L}$ , including saddle points	Verify with second-order conditions
Forgetting complementary slackness (KKT)	With inequality constraints, assuming all $\lambda_i > 0$ is wrong	Check: either $\lambda_i = 0$ or $g_i = 0$
Not checking constraint qualification	If $\nabla g = 0$ at the candidate, the method may fail	Verify regularity (e.g., $\nabla g \neq 0$ )
Using Lagrange for unconstrained problems	Lagrange multipliers are for constrained optimization	Use gradient-based methods for unconstrained problems
Ignoring boundary effects	The Lagrangian method finds interior critical points, not boundary solutions	Check boundary conditions separately if the domain is bounded

Interview Questions

Q1: What is the geometric interpretation of a Lagrange multiplier?

💡Answer

The Lagrange multiplier $\lambda$ measures the rate of change of the optimal value of the objective function with respect to relaxation of the constraint. Specifically, if the constraint is $g(\vec{x}) = c$ , then $\frac{df^*}{dc} = \lambda$ , where $f^*$ is the optimal value. Geometrically, at the constrained optimum, the gradient of the objective is parallel to the gradient of the constraint, and $\lambda$ is the proportionality constant. A large $|\lambda|$ means the constraint is very "expensive" — relaxing it would significantly improve the objective.

Q2: Why does the Lagrange multiplier method work? What is the intuition behind $\nabla f = \lambda \nabla g$ ?

💡Answer

At the constrained optimum, there is no feasible direction that improves the objective. Any movement along the constraint surface $g = 0$ must have zero first-order change in $f$ . The tangent space to the constraint consists of all directions $\vec{v}$ such that $\nabla g \cdot \vec{v} = 0$ . For $f$ to have zero change in all such directions, $\nabla f$ must be orthogonal to this tangent space, meaning $\nabla f$ is parallel to $\nabla g$ . Hence $\nabla f = \lambda \nabla g$ .

Q3: How do Lagrange multipliers relate to the dual problem in SVM?

💡Answer

The SVM primal maximizes the margin subject to classification constraints $y_i(w^T x_i + b) \geq 1$ . Introducing Lagrange multipliers $\alpha_i$ for each constraint and applying the KKT conditions yields the dual problem, which depends only on dot products $\vec{x}_i^T \vec{x}_j$ . The complementary slackness condition implies $\alpha_i > 0$ only for support vectors (points exactly on the margin boundary). The dual formulation enables the kernel trick, allowing nonlinear classification via implicit feature mapping.

Q4: What is complementary slackness and why does it matter?

💡Answer

Complementary slackness states that for each inequality constraint $g_i \leq 0$ , the product $\lambda_i g_i = 0$ . This means either the constraint is active ( $g_i = 0$ , binding) or the multiplier is zero ( $\lambda_i = 0$ , inactive). This matters because it identifies which constraints actually influence the solution. In SVMs, it tells us that only support vectors ( $\alpha_i > 0$ ) lie on the margin boundary; all other points have $\alpha_i = 0$ and can be removed without changing the solution.

Q5: When might Lagrange multipliers fail or give misleading results?

💡Answer

Lagrange multipliers may fail when: (1) the constraint qualification is violated (e.g., $\nabla g = 0$ at the candidate point), meaning the constraint surface is degenerate; (2) the problem is non-smooth (e.g., L1 regularization involves a non-differentiable norm); (3) the problem is non-convex, and the method finds saddle points rather than global optima; (4) the domain is not open (boundary solutions may not satisfy $\nabla f = \lambda \nabla g$ ). In such cases, subgradient methods, penalty methods, or interior-point methods are preferred.

Q6: Explain the relationship between Lagrange multipliers and sensitivity analysis.

💡Answer

The Lagrange multiplier $\lambda$ is the shadow price of the constraint. If the constraint is $g(\vec{x}) = c$ , then $f^*(c + \epsilon) \approx f^*(c) + \lambda \epsilon$ for small $\epsilon$ . In ML, this means $\lambda$ tells you how much the loss would decrease if you slightly relaxed a constraint (e.g., increasing the allowed model complexity, loosening a fairness requirement, or expanding a resource budget). This makes multipliers valuable for resource allocation and cost-benefit analysis.

Practice Problems

Problem 1: Box Optimization

📝Maximum Volume Box

Problem: Find the dimensions of a rectangular box (with no lid) of maximum volume given a surface area of $12 \, \text{cm}^2$ .

💡Solution

Let $x, y$ be the base dimensions and $z$ the height.

Objective: Maximize $f = xyz$ .

Constraint: $g = xy + 2xz + 2yz - 12 = 0$ .

Lagrangian: $\mathcal{L} = xyz - \lambda(xy + 2xz + 2yz - 12)$ .

System:

yz = \lambda(y + 2z), \quad xz = \lambda(x + 2z), \quad xy = \lambda(2x + 2y)

From the first two: $\frac{yz}{y+2z} = \frac{xz}{x+2z} \implies y(x+2z) = x(y+2z) \implies 2yz = 2xz \implies y = x$ .

From the first and third: $\frac{yz}{y+2z} = \frac{xy}{2x+2y}$ . With $y = x$ : $\frac{xz}{x+2z} = \frac{x^2}{4x} = \frac{x}{4}$ .

So $4xz = x(x + 2z) \implies 4z = x + 2z \implies x = 2z$ .

Substituting into constraint: $(2z)(2z) + 2(2z)z + 2(2z)z = 4z^2 + 4z^2 + 4z^2 = 12z^2 = 12 \implies z = 1$ .

Dimensions: $x = 2$ , $y = 2$ , $z = 1$ . Maximum volume: $V = 4$ .

Problem 2: Nearest Point on an Ellipsoid

📝Closest Point on Ellipsoid

Problem: Find the point on the ellipsoid $\frac{x^2}{4} + \frac{y^2}{9} + z^2 = 1$ closest to the origin.

💡Solution

Objective: Minimize $f = x^2 + y^2 + z^2$ .

Constraint: $g = \frac{x^2}{4} + \frac{y^2}{9} + z^2 - 1 = 0$ .

Lagrangian: $\mathcal{L} = x^2 + y^2 + z^2 - \lambda\left(\frac{x^2}{4} + \frac{y^2}{9} + z^2 - 1\right)$ .

System:

2x = \lambda \cdot \frac{x}{2} \implies x\left(2 - \frac{\lambda}{2}\right) = 0

2y = \lambda \cdot \frac{2y}{9} \implies y\left(2 - \frac{2\lambda}{9}\right) = 0

2z = \lambda \cdot 2z \implies z(2 - 2\lambda) = 0

Either each variable is zero or the multiplier equals the bracketed coefficient. Testing combinations:

If $x \neq 0$ : $\lambda = 4$ . Then $2 - \frac{2(4)}{9} = \frac{10}{9} \neq 0$ , so $y = 0$ . And $2 - 8 = -6 \neq 0$ , so $z = 0$ . From constraint: $\frac{x^2}{4} = 1 \implies x = \pm 2$ .

If $y \neq 0$ : $\lambda = 9$ . Then $2 - \frac{9}{2} = -\frac{5}{2} \neq 0$ , so $x = 0$ . And $2 - 18 = -16 \neq 0$ , so $z = 0$ . From constraint: $\frac{y^2}{9} = 1 \implies y = \pm 3$ .

If $z \neq 0$ : $\lambda = 1$ . Then $2 - \frac{1}{2} \neq 0$ , so $x = 0$ . And $2 - \frac{2}{9} \neq 0$ , so $y = 0$ . From constraint: $z^2 = 1 \implies z = \pm 1$ .

Candidates and values:

$(\pm 2, 0, 0)$ : $f = 4$
$(0, \pm 3, 0)$ : $f = 9$
$(0, 0, \pm 1)$ : $f = 1$

Closest point: $(0, 0, \pm 1)$ with distance $1$ .

Problem 3: Constrained Exponential

📝Maximum of Exponential on a Circle

Problem: Find the maximum of $f(x, y) = e^{x+y}$ subject to $x^2 + y^2 = 1$ .

💡Solution

Lagrangian: $\mathcal{L} = e^{x+y} - \lambda(x^2 + y^2 - 1)$ .

System:

e^{x+y} = 2\lambda x, \quad e^{x+y} = 2\lambda y, \quad x^2 + y^2 = 1

From the first two: $2\lambda x = 2\lambda y$ . If $\lambda \neq 0$ : $x = y$ .

From constraint: $2x^2 = 1 \implies x = y = \pm\frac{1}{\sqrt{2}}$ .

$(x, y) = \left(\frac{1}{\sqrt{2}}, \frac{1}{\sqrt{2}}\right)$ : $f = e^{\sqrt{2}} \approx 4.113$ (maximum)
$(x, y) = \left(-\frac{1}{\sqrt{2}}, -\frac{1}{\sqrt{2}}\right)$ : $f = e^{-\sqrt{2}} \approx 0.243$ (minimum)

Maximum value: $e^{\sqrt{2}}$ at $\left(\frac{1}{\sqrt{2}}, \frac{1}{\sqrt{2}}\right)$ .

Problem 4: KKT with Two Inequality Constraints

📝KKT Problem

Problem: Minimize $f(x, y) = (x-1)^2 + (y-1)^2$ subject to $x + y \leq 1$ and $x \leq 0$ .

💡Solution

Lagrangian: $\mathcal{L} = (x-1)^2 + (y-1)^2 + \lambda_1(x + y - 1) + \lambda_2(x)$ .

KKT Conditions:

$\frac{\partial \mathcal{L}}{\partial x} = 2(x-1) + \lambda_1 + \lambda_2 = 0$
$\frac{\partial \mathcal{L}}{\partial y} = 2(y-1) + \lambda_1 = 0$
$x + y \leq 1$ , $x \leq 0$ (primal feasibility)
$\lambda_1 \geq 0$ , $\lambda_2 \geq 0$ (dual feasibility)
$\lambda_1(x + y - 1) = 0$ , $\lambda_2 x = 0$ (complementary slackness)

Case 1: Both constraints inactive ( $\lambda_1 = \lambda_2 = 0$ ). Then $2(x-1) = 0$ and $2(y-1) = 0$ , giving $x = 1, y = 1$ . But $x + y = 2 > 1$ , violating feasibility. Infeasible.

Case 2: Only first constraint active ( $\lambda_1 > 0, \lambda_2 = 0, x + y = 1$ ). From conditions 1 and 2: $2(x-1) + \lambda_1 = 0$ and $2(y-1) + \lambda_1 = 0$ . So $x - 1 = y - 1 \implies x = y$ . With $x + y = 1$ : $x = y = \frac{1}{2}$ . Check $x \leq 0$ : $\frac{1}{2} > 0$ , violating feasibility. Infeasible.

Case 3: Only second constraint active ( $\lambda_2 > 0, \lambda_1 = 0, x = 0$ ). From condition 2: $2(y-1) = 0 \implies y = 1$ . Check $x + y = 1 \leq 1$ : feasible. Check $x = 0 \leq 0$ : feasible. From condition 1: $2(0-1) + 0 + \lambda_2 = 0 \implies \lambda_2 = 2 > 0$ . Valid. Solution: $(x, y) = (0, 1)$ , $f = (0-1)^2 + (1-1)^2 = 1$ .

Case 4: Both constraints active ( $x = 0$ , $x + y = 1 \implies y = 1$ ). Same as Case 3.

Optimal solution: $(x, y) = (0, 1)$ with $f^* = 1$ and $\lambda_1 = 0$ , $\lambda_2 = 2$ .

Quick Reference

📋Key Takeaways

Lagrange Multiplier Method: To optimize $f(\vec{x})$ subject to $g(\vec{x}) = 0$ , solve $\nabla f = \lambda \nabla g$ and $g(\vec{x}) = 0$ simultaneously, where $\lambda$ is the multiplier.
Geometric Interpretation: At the optimum, the gradient of the objective is parallel to the gradient of the constraint; $\lambda$ measures the sensitivity of the optimal value to constraint changes.
Multiple Constraints: For $m$ equality constraints, use $m$ multipliers: $\nabla f = \sum_j \lambda_j \nabla g_j$ .
KKT Conditions: For inequality constraints $g_i(\vec{x}) \leq 0$ , the conditions are: stationarity ( $\nabla f + \sum \lambda_i \nabla g_i = 0$ ), primal feasibility ( $g_i \leq 0$ ), dual feasibility ( $\lambda_i \geq 0$ ), and complementary slackness ( $\lambda_i g_i = 0$ ).
Complementary Slackness: Either a constraint is active ( $g_i = 0$ ) or its multiplier is zero ( $\lambda_i = 0$ ) — never both non-zero simultaneously.
SVM Connection: Support vector machines maximize margin $\frac{2}{\|w\|}$ subject to $y_i(w^T x_i + b) \geq 1$ , solved via Lagrange multipliers leading to the dual formulation where only support vectors have $\alpha_i > 0$ .
Regularization: L2 regularization $\lambda\|w\|_2^2$ is the Lagrangian penalty for constraining $\|w\|_2^2 \leq t$ ; the multiplier $\lambda$ trades off fit vs. complexity.
Second-Order Conditions: Use the bordered Hessian to classify constrained optima as minima, maxima, or saddle points.
Python: Use scipy.optimize.minimize() with constraints= parameter for equality and inequality constraints.
Sensitivity Analysis: $\lambda$ quantifies how much the optimal objective changes per unit change in the constraint — useful for resource allocation and cost-benefit analysis.

Cross-References

Optimization Fundamentals: Core optimization theory and unconstrained methods → Optimization Fundamentals
Constrained Optimization: KKT conditions, penalty methods, and interior-point methods → Constrained Optimization
Partial Derivatives: Gradients and partial derivatives needed for the Lagrangian → Partial Derivatives
Multivariable Calculus: Gradients, Jacobians, and Hessians in higher dimensions → Multivariable Calculus
Gradient Descent: Gradient-based optimization that complements Lagrange methods → Gradient Descent
Linear Algebra Norms: Norms used in regularization constraints → Linear Algebra Norms
Matrix Calculus: Derivatives of matrix-valued functions in SVM derivations → Matrix Calculus
Probability Foundations: Probability constraints in ML fairness → Probability Foundations

Lagrange Multipliers