Quadratic Programming

💡 Why It Matters

Quadratic programming (QP) is the bridge between linear programming and general nonlinear optimization. Nearly every machine learning model with a quadratic loss function—linear regression, SVMs, logistic regression regularization, portfolio optimization, and model predictive control—reduces to solving a QP. Mastering QP gives you the foundation for understanding how convex optimization solvers work under the hood.

ℹ️ Core Insight

A QP generalizes linear programming by allowing a quadratic objective while keeping constraints linear. If the quadratic term is positive semi-definite, the problem is convex and any local minimum is a global minimum.

Quadratic Programming: Standard Form

DfQuadratic Program (Standard Form)

Quadratic Programming — Standard Form

\min_{x \in \mathbb{R}^n} \quad \frac{1}{2} x^T Q x + c^T x \quad \text{s.t.} \quad Ax \leq b, \quad A_{eq} x = b_{eq}

Here,

$Q$ =Symmetric n×n matrix defining the quadratic curvature
$c$ =Linear cost vector in ℝⁿ
$x$ =Decision variable vector
$A$ =Inequality constraint matrix (m×n)
$b$ =Inequality constraint right-hand side
$A_{eq}$ =Equality constraint matrix (p×n)
$b_{eq}$ =Equality constraint right-hand side

⚠️ Convention

The factor ½ in front of xᵀQx is conventional—it simplifies the gradient to Qx + c rather than 2Qx + c. Always check whether a formulation uses this convention or not.

Positive Definite Hessian

ThConvexity of QP via Positive Definite Hessian

Hessian of the QP Objective

\nabla^2 f(x) = Q \quad \text{and} \quad \nabla f(x) = Qx + c

Here,

$Q$ =The Hessian matrix — constant for QP, unlike general nonlinear problems
$c$ =Linear term determines the gradient at x = 0

💡 Why This Matters

Because the Hessian is constant, second-order methods (Newton's method) converge in a single step for unconstrained QPs. For constrained QPs, this structure enables efficient interior-point and active-set algorithms.

Active Set Methods

DfActive Set Method

Active Set Iteration

x_{k+1} = x_k + \alpha_k d_k, \quad \text{where } d_k \text{ solves the QP restricted to the guessed active set}

Here,

$d_k$ =Search direction from solving the equality-constrained QP with current active set
$\alpha_k$ =Step length, typically 1 (full step) or limited by blocking constraints

ℹ️ Complexity

Active set methods are efficient for small to medium QPs (n < 10,000). The number of iterations is bounded by the number of constraints combinatorially, but in practice far fewer iterations are needed.

Interior Point Methods

Interior Point (Barrier) Formulation

\min_{x,s} \quad \frac{1}{2}x^TQx + c^Tx - \mu \sum_{i=1}^{m} \ln(s_i) \quad \text{s.t.} \quad Ax + s = b, \quad s > 0

Here,

$\mu$ =Barrier parameter — drives log-barrier term; decreased toward 0 in the central path
$s$ =Slack variables converting inequalities to equalities
$\ln(s_i)$ =Log-barrier enforces sᵢ > 0; approaches boundary as μ → 0

Central Path and Newton Step

\begin{bmatrix} Q & A^T \\ A & -\text{diag}(S)^{-1}E \end{bmatrix} \begin{bmatrix} \Delta x \\ \Delta s \end{bmatrix} = -\begin{bmatrix} Qx + c + A^T\lambda \\ Ax + s - b \\ S\Lambda e - \mu e \end{bmatrix}

Here,

$S$ =Diagonal matrix of slack variables s
$\Lambda$ =Diagonal matrix of Lagrange multipliers λ
$e$ =Vector of all ones
$\mu$ =Barrier parameter (decreased each iteration)

💡 Practical Performance

Interior point methods solve QPs in O(n³·L) time where L is a logarithmic complexity measure. Modern implementations (MOSEK, Gurobi) are highly optimized and handle QPs with millions of variables.

Wolfe Dual

Lagrangian and Wolfe Dual of QP

L(x, \lambda, \nu) = \frac{1}{2}x^TQx + c^Tx + \lambda^T(Ax - b) + \nu^T(A_{eq}x - b_{eq})

Here,

$\lambda$ =Lagrange multipliers for inequality constraints (λ ≥ 0)
$\nu$ =Lagrange multipliers for equality constraints (unrestricted)

Wolfe Dual Problem

\max_{\lambda \geq 0, \nu} \quad -\frac{1}{2}z^TQ^{-1}z - b^T\lambda - b_{eq}^T\nu \quad \text{s.t.} \quad Qx + c + A^T\lambda + A_{eq}^T\nu = 0 \quad \Rightarrow \quad z = -(c + A^T\lambda + A_{eq}^T\nu)

Here,

$z$ =Substitution variable: z = Qx + c + Aᵀλ + A_{eq}ᵀν = 0 at optimality
$Q^{-1}$ =Inverse of Q — requires Q to be non-singular (positive definite)

ℹ️ Dual Insight

The Wolfe dual converts the QP into another QP in dual variables. Strong duality holds when Q ≻ 0 and constraints satisfy Slater's condition, meaning the primal and dual optima are equal.

Ridge Regression as QP

Ridge Regression — QP Formulation

\min_{w} \quad \frac{1}{2}\|Xw - y\|^2 + \frac{\lambda}{2}\|w\|^2 = \frac{1}{2}w^T(X^TX + \lambda I)w - (X^Ty)^Tw + \frac{1}{2}y^Ty

Here,

$X$ =n×d feature matrix
$y$ =n×1 target vector
$\lambda$ =Regularization strength (controls bias-variance tradeoff)
$w$ =Weight vector to optimize

Ridge Regression — Closed-Form Solution

w^* = (X^TX + \lambda I)^{-1} X^Ty

Here,

$X^TX + \lambda I$ =Modified Gram matrix — always invertible for λ > 0
$\lambda I$ =Regularization term ensures positive definiteness even if XᵀX is singular

💡 QP Structure

Ridge regression is a QP with Q = XᵀX + λI (positive definite), c = -Xᵀy, and no constraints. The regularization term λI shifts the Hessian eigenvalues away from zero, making the problem strictly convex and numerically stable.

SVM as QP

SVM Primal (Hard Margin)

\min_{w,b} \quad \frac{1}{2}\|w\|^2 \quad \text{s.t.} \quad y_i(w^Tx_i + b) \geq 1, \quad i = 1,\ldots,n

Here,

$w$ =Weight vector — ||w||⁻¹ is the margin width
$b$ =Bias / decision threshold
$y_i$ =Class label: +1 or -1
$x_i$ =Feature vector of sample i

SVM Primal (Soft Margin)

\min_{w,b,\xi} \quad \frac{1}{2}\|w\|^2 + C\sum_{i=1}^{n}\xi_i \quad \text{s.t.} \quad y_i(w^Tx_i + b) \geq 1 - \xi_i, \quad \xi_i \geq 0

Here,

$C$ =Penalty parameter — trades off margin size vs. misclassification
$\xi_i$ =Slack variable for sample i (allows soft-margin violations)

SVM Dual

\max_{\alpha} \quad \sum_{i=1}^{n}\alpha_i - \frac{1}{2}\sum_{i,j}\alpha_i\alpha_j y_i y_j x_i^Tx_j \quad \text{s.t.} \quad 0 \leq \alpha_i \leq C, \quad \sum_{i}\alpha_i y_i = 0

Here,

$\alpha_i$ =Dual variables — nonzero values identify support vectors
$C$ =Upper bound on each αᵢ (box constraint)
$x_i^Tx_j$ =Kernel trick replaces this with K(xᵢ, xⱼ) for nonlinear SVM

ℹ️ Key Insight

The SVM dual is a QP with Q_ij = yᵢyⱼxᵢᵀxⱼ. Only support vectors (αᵢ > 0) contribute to the decision function. This sparsity is why SVMs are memory-efficient at prediction time.

Python Implementation: cvxpy

import cvxpy as cp
import numpy as np

# Generate data for a QP: min 0.5*x'*Q*x + c'*x
np.random.seed(42)
n = 5
Q = np.random.randn(n, n)
Q = Q.T @ Q  # Make positive definite
c = np.random.randn(n)

# Decision variable
x = cp.Variable(n)

# Objective
objective = cp.Minimize(0.5 * cp.quad_form(x, Q) + c @ x)

# Constraints
A = np.random.randn(3, n)
b = np.ones(3)
constraints = [A @ x <= b, cp.sum(x) == 1]

# Solve
prob = cp.Problem(objective, constraints)
prob.solve()

print(f"Optimal value: {prob.value:.4f}")
print(f"Optimal x: {x.value}")
print(f"Status: {prob.status}")
print(f"Solver: {prob.solver_stats.solver_name}")

Python Implementation: scipy

from scipy.optimize import minimize
import numpy as np

np.random.seed(42)
n = 5
Q = np.random.randn(n, n)
Q = Q.T @ Q
c = np.random.randn(n)

# Unconstrained QP via scipy
def objective(x):
    return 0.5 * x @ Q @ x + c @ x

def gradient(x):
    return Q @ x + c

def hessian(x):
    return Q

x0 = np.zeros(n)
result = minimize(objective, x0, jac=gradient, hess=hessian, method='trust-constr')

print(f"Optimal value: {result.fun:.4f}")
print(f"Optimal x: {result.x}")
print(f"Converged: {result.success}")

# Constrained QP with bounds
from scipy.optimize import minimize
bounds = [(0, 1) for _ in range(n)]
eq_constraint = {'type': 'eq', 'fun': lambda x: np.sum(x) - 1}

result2 = minimize(objective, x0, jac=gradient, hess=hessian,
                   method='SLSQP', bounds=bounds, constraints=[eq_constraint])
print(f"Constrained optimal: {result2.fun:.4f}")

Applications in AI / Machine Learning

💡 Applications

QP appears throughout machine learning and AI — not just in SVMs but anywhere a quadratic loss is minimized subject to constraints.

📝Application: Model Predictive Control (MPC)

MPC solves a QP at each timestep to find optimal control actions over a finite horizon. The quadratic cost penalizes deviations from reference trajectories, and constraints enforce actuator limits and safety bounds. Real-time QP solvers ( qpOASES, HPIPM ) enable MPC for autonomous vehicles and robotics.

📝Application: Portfolio Optimization (Markowitz)

Markowitz Mean-Variance Optimization

\min_w \quad w^T\Sigma w \quad \text{s.t.} \quad w^T\mu \geq r_{min}, \quad w^T\mathbf{1} = 1, \quad w \geq 0

Here,

$w$ =Portfolio weight vector
$\Sigma$ =Covariance matrix of asset returns
$\mu$ =Expected return vector
$r_{min}$ =Minimum required return

The covariance matrix Σ is positive semi-definite, making this a QP. The efficient frontier is obtained by varying r_{min}.

📝Application: Least Squares and Fitting

Any least squares problem min ‖Ax - b‖² is a QP with Q = AᵀA, c = -Aᵀb. Non-negative least squares (NNLS) adds x ≥ 0 constraints. Basis pursuit and compressed sensing add L1 terms to create QPs or second-order cone programs.

Common Mistakes

Mistake	Why It Fails	Correct Approach
Using a non-symmetric Q	Gradient is ½(Q + Qᵀ)x, not Qx	Symmetrize: Q ← ½(Q + Qᵀ) before solving
Ignoring positive semi-definiteness	Non-convex QP may have many local minima	Verify Q ≽ 0 via eigenvalue decomposition; use global solver if not
Confusing ½ convention	Gradient off by factor of 2	Always verify: ∇(½xᵀQx) = Qx, not 2Qx
Not scaling constraints	Poorly scaled A, b cause numerical issues	Normalize rows of A so ‖aᵢ‖ ≈ 1
Assuming unbounded feasible region	QP may be infeasible or unbounded	Check feasibility before solving; add bounds
Using QP for non-quadratic objectives	QP solver assumes quadratic curvature	Use nonlinear solver (IPOPT, SNOPT) for general NLP
Ignoring warm-starting	Repeated QPs (MPC, online learning) waste time	Use previous solution as warm start for next solve

Interview Questions

📝Q: Why must Q be positive semi-definite for convex QP?

If Q has a negative eigenvalue λ with eigenvector v, then f(tv) = ½t²λ‖v‖² + t·cᵀv → -∞ as t → ∞ along v. The problem becomes unbounded with no finite minimum. Positive semi-definiteness ensures the quadratic bowl curves upward in every direction.

📝Q: What is the difference between active-set and interior-point methods?

Active-set methods work on the boundary of the feasible region by identifying which constraints are active. Interior-point methods stay strictly inside the feasible region and approach the boundary via a central path. Active-set is better for warm-starting; interior-point has better worst-case complexity.

📝Q: How does the kernel trick relate to QP?

The SVM dual is a QP involving dot products xᵢᵀxⱼ. The kernel trick replaces xᵢᵀxⱼ with K(xᵢ, xⱼ) = φ(xᵢ)ᵀφ(xⱼ), solving an infinite-dimensional QP without explicitly computing φ. Common kernels: RBF, polynomial, sigmoid.

📝Q: Why is ridge regression a QP with no constraints?

Ridge regression minimizes ½‖Xw - y‖² + (λ/2)‖w‖², which expands to ½wᵀ(XᵀX + λI)w - (Xᵀy)Tw + ½yᵀy. This is a QP with Q = XᵀX + λI ≻ 0 and no constraints. The positive definiteness of Q guarantees a unique global minimum.

Practice Problems

📝Problem 1: Formulate as QP

Rewrite the following as a standard QP: minimize f(x₁, x₂) = 3x₁² + 2x₁x₂ + x₂² - 4x₁ + 5x₂ subject to x₁ + x₂ ≤ 2 and x₁, x₂ ≥ 0.

💡Solution 1

Identify Q and c from the quadratic and linear terms:

f(x) = ½xᵀQx + cᵀx where Q = [[6, 2], [2, 2]], c = [-4, 5]ᵀ

Constraints in standard form: A = [[1, 1]], b = [2], with lower bounds x ≥ 0.

📝Problem 2: Solve Ridge Regression as QP

Given X = [[1, 0], [0, 1], [1, 1]] and y = [1, 2, 3]ᵀ with λ = 0.5, find the ridge regression solution using the QP formulation.

💡Solution 2

Compute Q = XᵀX + λI = [[2, 1], [1, 2]] + [[0.5, 0], [0, 0.5]] = [[2.5, 1], [1, 2.5]]

c = -Xᵀy = -[4, 4]ᵀ

Solve Qw = -c: w* = Q⁻¹(-c) = [[2.5, 1], [1, 2.5]]⁻¹ · [4, 4]ᵀ

det(Q) = 6.25 - 1 = 5.25

w* = (1/5.25)[[2.5, -1], [-1, 2.5]] · [4, 4]ᵀ = [10/5.25, 6/5.25]ᵀ ≈ [1.905, 1.143]ᵀ

📝Problem 3: Dual of a Simple QP

Derive the dual of: min ½x² s.t. x ≥ 1.

💡Solution 3

Lagrangian: L(x, λ) = ½x² - λ(x - 1) for λ ≥ 0.

∂L/∂x = 0 ⟹ x* = λ

Dual function: g(λ) = L(λ, λ) = ½λ² - λ² + λ = -½λ² + λ

Dual problem: max -½λ² + λ s.t. λ ≥ 0

Optimal: λ* = 1, x* = 1. Strong duality holds (Slater's condition satisfied).

📝Problem 4: SVM Dual Verification

For a 2D binary classification with support vectors x₁ = [1, 0], y₁ = 1 and x₂ = [0, 1], y₂ = -1, verify that α₁ = α₂ = 1 is optimal for C = 2.

💡Solution 4

Dual objective: α₁ + α₂ - ½(α₁²·1·1·1 + 2α₁α₂·(-1)·(0) + α₂²·1·1·1) = α₁ + α₂ - ½(α₁² + α₂²)

Gradient w.r.t. α₁: 1 - α₁ = 0 ⟹ α₁ = 1 Gradient w.r.t. α₂: 1 - α₂ = 0 ⟹ α₂ = 1

Both satisfy 0 ≤ αᵢ ≤ C = 2, and α₁y₁ + α₂y₂ = 1 - 1 = 0 ✓

Quick Reference

📋QP Standard Form

\min_x \; \frac{1}{2}x^TQx + c^Tx \quad \text{s.t.} \; Ax \leq b, \; A_{eq}x = b_{eq}

where Q is symmetric positive semi-definite (Q ≽ 0).

📋Key Properties

Convexity: Q ≽ 0 ⟹ convex QP; Q ≻ 0 ⟹ strictly convex, unique minimum
Gradient: ∇f = Qx + c; Hessian: ∇²f = Q (constant)
Duality gap is zero when Slater's condition holds (Q ≻ 0, feasible interior exists)
Time complexity: O(n³) per iteration for interior point; active set bounded by constraint count

📋Solver Selection

cvxpy: High-level modeling language; delegates to OSQP, ECOS, SCS
scipy.optimize: minimize() with method='trust-constr' or 'SLSQP' for QPs
OSQP: Operator Splitting QP; fast for large sparse QPs
Gurobi / MOSEK: Commercial solvers; best performance for large-scale QPs
qpOASES: Real-time QP solver for embedded MPC

📋ML / AI Connections

SVM training ⟹ QP (primal or dual form)
Ridge regression ⟹ unconstrained QP with Q = XᵀX + λI
Portfolio optimization ⟹ QP with covariance matrix
Model predictive control ⟹ repeated QP at each timestep
LASSO (L1) is not a QP but a second-order cone program

Cross-References

ℹ️ Related Topics

Convex Optimization — QP is a special case of convex optimization; see module on convex analysis
Linear Programming — LP is QP with Q = 0; simplex and interior-point methods generalize
KKT Conditions — Optimality conditions for constrained QP; see Lagrange duality
Newton's Method — Second-order unconstrained optimization; QP is the model subproblem
SVM Theory — SVM dual derivation relies on QP duality; see kernel methods module
Markowitz Portfolio Theory — Mean-variance optimization is a QP; see quantitative finance module
Model Predictive Control — Real-time QP solving; see control systems module
Regularization — Ridge (L2) and Lasso (L1) penalties in statistical learning