Quadratic Programming
💡 Why It Matters
Quadratic programming (QP) is the bridge between linear programming and general nonlinear optimization. Nearly every machine learning model with a quadratic loss function—linear regression, SVMs, logistic regression regularization, portfolio optimization, and model predictive control—reduces to solving a QP. Mastering QP gives you the foundation for understanding how convex optimization solvers work under the hood.
ℹ️ Core Insight
A QP generalizes linear programming by allowing a quadratic objective while keeping constraints linear. If the quadratic term is positive semi-definite, the problem is convex and any local minimum is a global minimum.
Quadratic Programming: Standard Form
DfQuadratic Program (Standard Form)
Quadratic Programming — Standard Form
Here,
- =Symmetric n×n matrix defining the quadratic curvature
- =Linear cost vector in ℝⁿ
- =Decision variable vector
- =Inequality constraint matrix (m×n)
- =Inequality constraint right-hand side
- =Equality constraint matrix (p×n)
- =Equality constraint right-hand side
⚠️ Convention
The factor ½ in front of xᵀQx is conventional—it simplifies the gradient to Qx + c rather than 2Qx + c. Always check whether a formulation uses this convention or not.
Positive Definite Hessian
ThConvexity of QP via Positive Definite Hessian
Hessian of the QP Objective
Here,
- =The Hessian matrix — constant for QP, unlike general nonlinear problems
- =Linear term determines the gradient at x = 0
💡 Why This Matters
Because the Hessian is constant, second-order methods (Newton's method) converge in a single step for unconstrained QPs. For constrained QPs, this structure enables efficient interior-point and active-set algorithms.
Active Set Methods
DfActive Set Method
Active Set Iteration
Here,
- =Search direction from solving the equality-constrained QP with current active set
- =Step length, typically 1 (full step) or limited by blocking constraints
ℹ️ Complexity
Active set methods are efficient for small to medium QPs (n < 10,000). The number of iterations is bounded by the number of constraints combinatorially, but in practice far fewer iterations are needed.
Interior Point Methods
Interior Point (Barrier) Formulation
Here,
- =Barrier parameter — drives log-barrier term; decreased toward 0 in the central path
- =Slack variables converting inequalities to equalities
- =Log-barrier enforces sᵢ > 0; approaches boundary as μ → 0
Central Path and Newton Step
Here,
- =Diagonal matrix of slack variables s
- =Diagonal matrix of Lagrange multipliers λ
- =Vector of all ones
- =Barrier parameter (decreased each iteration)
💡 Practical Performance
Interior point methods solve QPs in O(n³·L) time where L is a logarithmic complexity measure. Modern implementations (MOSEK, Gurobi) are highly optimized and handle QPs with millions of variables.
Wolfe Dual
Lagrangian and Wolfe Dual of QP
Here,
- =Lagrange multipliers for inequality constraints (λ ≥ 0)
- =Lagrange multipliers for equality constraints (unrestricted)
Wolfe Dual Problem
Here,
- =Substitution variable: z = Qx + c + Aᵀλ + A_{eq}ᵀν = 0 at optimality
- =Inverse of Q — requires Q to be non-singular (positive definite)
ℹ️ Dual Insight
The Wolfe dual converts the QP into another QP in dual variables. Strong duality holds when Q ≻ 0 and constraints satisfy Slater's condition, meaning the primal and dual optima are equal.
Ridge Regression as QP
Ridge Regression — QP Formulation
Here,
- =n×d feature matrix
- =n×1 target vector
- =Regularization strength (controls bias-variance tradeoff)
- =Weight vector to optimize
Ridge Regression — Closed-Form Solution
Here,
- =Modified Gram matrix — always invertible for λ > 0
- =Regularization term ensures positive definiteness even if XᵀX is singular
💡 QP Structure
Ridge regression is a QP with Q = XᵀX + λI (positive definite), c = -Xᵀy, and no constraints. The regularization term λI shifts the Hessian eigenvalues away from zero, making the problem strictly convex and numerically stable.
SVM as QP
SVM Primal (Hard Margin)
Here,
- =Weight vector — ||w||⁻¹ is the margin width
- =Bias / decision threshold
- =Class label: +1 or -1
- =Feature vector of sample i
SVM Primal (Soft Margin)
Here,
- =Penalty parameter — trades off margin size vs. misclassification
- =Slack variable for sample i (allows soft-margin violations)
SVM Dual
Here,
- =Dual variables — nonzero values identify support vectors
- =Upper bound on each αᵢ (box constraint)
- =Kernel trick replaces this with K(xᵢ, xⱼ) for nonlinear SVM
ℹ️ Key Insight
The SVM dual is a QP with Q_ij = yᵢyⱼxᵢᵀxⱼ. Only support vectors (αᵢ > 0) contribute to the decision function. This sparsity is why SVMs are memory-efficient at prediction time.
Python Implementation: cvxpy
import cvxpy as cp
import numpy as np
# Generate data for a QP: min 0.5*x'*Q*x + c'*x
np.random.seed(42)
n = 5
Q = np.random.randn(n, n)
Q = Q.T @ Q # Make positive definite
c = np.random.randn(n)
# Decision variable
x = cp.Variable(n)
# Objective
objective = cp.Minimize(0.5 * cp.quad_form(x, Q) + c @ x)
# Constraints
A = np.random.randn(3, n)
b = np.ones(3)
constraints = [A @ x <= b, cp.sum(x) == 1]
# Solve
prob = cp.Problem(objective, constraints)
prob.solve()
print(f"Optimal value: {prob.value:.4f}")
print(f"Optimal x: {x.value}")
print(f"Status: {prob.status}")
print(f"Solver: {prob.solver_stats.solver_name}")
Python Implementation: scipy
from scipy.optimize import minimize
import numpy as np
np.random.seed(42)
n = 5
Q = np.random.randn(n, n)
Q = Q.T @ Q
c = np.random.randn(n)
# Unconstrained QP via scipy
def objective(x):
return 0.5 * x @ Q @ x + c @ x
def gradient(x):
return Q @ x + c
def hessian(x):
return Q
x0 = np.zeros(n)
result = minimize(objective, x0, jac=gradient, hess=hessian, method='trust-constr')
print(f"Optimal value: {result.fun:.4f}")
print(f"Optimal x: {result.x}")
print(f"Converged: {result.success}")
# Constrained QP with bounds
from scipy.optimize import minimize
bounds = [(0, 1) for _ in range(n)]
eq_constraint = {'type': 'eq', 'fun': lambda x: np.sum(x) - 1}
result2 = minimize(objective, x0, jac=gradient, hess=hessian,
method='SLSQP', bounds=bounds, constraints=[eq_constraint])
print(f"Constrained optimal: {result2.fun:.4f}")
Applications in AI / Machine Learning
💡 Applications
QP appears throughout machine learning and AI — not just in SVMs but anywhere a quadratic loss is minimized subject to constraints.
📝Application: Model Predictive Control (MPC)
MPC solves a QP at each timestep to find optimal control actions over a finite horizon. The quadratic cost penalizes deviations from reference trajectories, and constraints enforce actuator limits and safety bounds. Real-time QP solvers ( qpOASES, HPIPM ) enable MPC for autonomous vehicles and robotics.
📝Application: Portfolio Optimization (Markowitz)
Markowitz Mean-Variance Optimization
Here,
- =Portfolio weight vector
- =Covariance matrix of asset returns
- =Expected return vector
- =Minimum required return
The covariance matrix Σ is positive semi-definite, making this a QP. The efficient frontier is obtained by varying r_{min}.
📝Application: Least Squares and Fitting
Any least squares problem min ‖Ax - b‖² is a QP with Q = AᵀA, c = -Aᵀb. Non-negative least squares (NNLS) adds x ≥ 0 constraints. Basis pursuit and compressed sensing add L1 terms to create QPs or second-order cone programs.
Common Mistakes
| Mistake | Why It Fails | Correct Approach |
|---|---|---|
| Using a non-symmetric Q | Gradient is ½(Q + Qᵀ)x, not Qx | Symmetrize: Q ← ½(Q + Qᵀ) before solving |
| Ignoring positive semi-definiteness | Non-convex QP may have many local minima | Verify Q ≽ 0 via eigenvalue decomposition; use global solver if not |
| Confusing ½ convention | Gradient off by factor of 2 | Always verify: ∇(½xᵀQx) = Qx, not 2Qx |
| Not scaling constraints | Poorly scaled A, b cause numerical issues | Normalize rows of A so ‖aᵢ‖ ≈ 1 |
| Assuming unbounded feasible region | QP may be infeasible or unbounded | Check feasibility before solving; add bounds |
| Using QP for non-quadratic objectives | QP solver assumes quadratic curvature | Use nonlinear solver (IPOPT, SNOPT) for general NLP |
| Ignoring warm-starting | Repeated QPs (MPC, online learning) waste time | Use previous solution as warm start for next solve |
Interview Questions
📝Q: Why must Q be positive semi-definite for convex QP?
If Q has a negative eigenvalue λ with eigenvector v, then f(tv) = ½t²λ‖v‖² + t·cᵀv → -∞ as t → ∞ along v. The problem becomes unbounded with no finite minimum. Positive semi-definiteness ensures the quadratic bowl curves upward in every direction.
📝Q: What is the difference between active-set and interior-point methods?
Active-set methods work on the boundary of the feasible region by identifying which constraints are active. Interior-point methods stay strictly inside the feasible region and approach the boundary via a central path. Active-set is better for warm-starting; interior-point has better worst-case complexity.
📝Q: How does the kernel trick relate to QP?
The SVM dual is a QP involving dot products xᵢᵀxⱼ. The kernel trick replaces xᵢᵀxⱼ with K(xᵢ, xⱼ) = φ(xᵢ)ᵀφ(xⱼ), solving an infinite-dimensional QP without explicitly computing φ. Common kernels: RBF, polynomial, sigmoid.
📝Q: Why is ridge regression a QP with no constraints?
Ridge regression minimizes ½‖Xw - y‖² + (λ/2)‖w‖², which expands to ½wᵀ(XᵀX + λI)w - (Xᵀy)Tw + ½yᵀy. This is a QP with Q = XᵀX + λI ≻ 0 and no constraints. The positive definiteness of Q guarantees a unique global minimum.
Practice Problems
📝Problem 1: Formulate as QP
Rewrite the following as a standard QP: minimize f(x₁, x₂) = 3x₁² + 2x₁x₂ + x₂² - 4x₁ + 5x₂ subject to x₁ + x₂ ≤ 2 and x₁, x₂ ≥ 0.
💡Solution 1
Identify Q and c from the quadratic and linear terms:
f(x) = ½xᵀQx + cᵀx where Q = [[6, 2], [2, 2]], c = [-4, 5]ᵀ
Constraints in standard form: A = [[1, 1]], b = [2], with lower bounds x ≥ 0.
📝Problem 2: Solve Ridge Regression as QP
Given X = [[1, 0], [0, 1], [1, 1]] and y = [1, 2, 3]ᵀ with λ = 0.5, find the ridge regression solution using the QP formulation.
💡Solution 2
Compute Q = XᵀX + λI = [[2, 1], [1, 2]] + [[0.5, 0], [0, 0.5]] = [[2.5, 1], [1, 2.5]]
c = -Xᵀy = -[4, 4]ᵀ
Solve Qw = -c: w* = Q⁻¹(-c) = [[2.5, 1], [1, 2.5]]⁻¹ · [4, 4]ᵀ
det(Q) = 6.25 - 1 = 5.25
w* = (1/5.25)[[2.5, -1], [-1, 2.5]] · [4, 4]ᵀ = [10/5.25, 6/5.25]ᵀ ≈ [1.905, 1.143]ᵀ
📝Problem 3: Dual of a Simple QP
Derive the dual of: min ½x² s.t. x ≥ 1.
💡Solution 3
Lagrangian: L(x, λ) = ½x² - λ(x - 1) for λ ≥ 0.
∂L/∂x = 0 ⟹ x* = λ
Dual function: g(λ) = L(λ, λ) = ½λ² - λ² + λ = -½λ² + λ
Dual problem: max -½λ² + λ s.t. λ ≥ 0
Optimal: λ* = 1, x* = 1. Strong duality holds (Slater's condition satisfied).
📝Problem 4: SVM Dual Verification
For a 2D binary classification with support vectors x₁ = [1, 0], y₁ = 1 and x₂ = [0, 1], y₂ = -1, verify that α₁ = α₂ = 1 is optimal for C = 2.
💡Solution 4
Dual objective: α₁ + α₂ - ½(α₁²·1·1·1 + 2α₁α₂·(-1)·(0) + α₂²·1·1·1) = α₁ + α₂ - ½(α₁² + α₂²)
Gradient w.r.t. α₁: 1 - α₁ = 0 ⟹ α₁ = 1 Gradient w.r.t. α₂: 1 - α₂ = 0 ⟹ α₂ = 1
Both satisfy 0 ≤ αᵢ ≤ C = 2, and α₁y₁ + α₂y₂ = 1 - 1 = 0 ✓
Quick Reference
📋QP Standard Form
where Q is symmetric positive semi-definite (Q ≽ 0).
📋Key Properties
- Convexity: Q ≽ 0 ⟹ convex QP; Q ≻ 0 ⟹ strictly convex, unique minimum
- Gradient: ∇f = Qx + c; Hessian: ∇²f = Q (constant)
- Duality gap is zero when Slater's condition holds (Q ≻ 0, feasible interior exists)
- Time complexity: O(n³) per iteration for interior point; active set bounded by constraint count
📋Solver Selection
- cvxpy: High-level modeling language; delegates to OSQP, ECOS, SCS
- scipy.optimize: minimize() with method='trust-constr' or 'SLSQP' for QPs
- OSQP: Operator Splitting QP; fast for large sparse QPs
- Gurobi / MOSEK: Commercial solvers; best performance for large-scale QPs
- qpOASES: Real-time QP solver for embedded MPC
📋ML / AI Connections
- SVM training ⟹ QP (primal or dual form)
- Ridge regression ⟹ unconstrained QP with Q = XᵀX + λI
- Portfolio optimization ⟹ QP with covariance matrix
- Model predictive control ⟹ repeated QP at each timestep
- LASSO (L1) is not a QP but a second-order cone program
Cross-References
ℹ️ Related Topics
- Convex Optimization — QP is a special case of convex optimization; see module on convex analysis
- Linear Programming — LP is QP with Q = 0; simplex and interior-point methods generalize
- KKT Conditions — Optimality conditions for constrained QP; see Lagrange duality
- Newton's Method — Second-order unconstrained optimization; QP is the model subproblem
- SVM Theory — SVM dual derivation relies on QP duality; see kernel methods module
- Markowitz Portfolio Theory — Mean-variance optimization is a QP; see quantitative finance module
- Model Predictive Control — Real-time QP solving; see control systems module
- Regularization — Ridge (L2) and Lasso (L1) penalties in statistical learning