Support Vector Machines — Complete Guide
SVM finds the optimal hyperplane that maximizes the margin between classes. It's one of the most theoretically elegant ML algorithms.
Maximum Margin Classifier
Goal: Find the hyperplane that maximizes the MARGIN
Margin = distance between hyperplane and nearest points
Class -1 Margin Class +1
● ◆
● ┌─────────────┐ ◆
● │ Decision │ ◆
● │ Boundary │ ◆
│ │ ◆
←───────────┼─────────────┼───────────→
│ │
● │ │ ◆
● │ │
● └─────────────┘
●
Support Vectors: The closest points to the boundary
(those ● and ◆ that define the margin)
The Math
Decision function: f(x) = w·x + b
Classification:
f(x) ≥ +1 → Class +1
f(x) ≤ -1 → Class -1
Margin = 2 / ||w||
Maximize margin = Minimize ||w||²/2
Subject to: yᵢ(w·xᵢ + b) ≥ 1 for all i
This is a QUADRATIC PROGRAMMING problem
Solved using Lagrange multipliers
Kernel Trick
Problem: Non-linear data can't be separated by a line
Solution: Map to higher dimension where it IS linear
Original space (2D): Higher dimension (3D):
┌─────────────────┐ ┌─────────────────┐
│ ● ● ● │ │ ● ● ● │
│ ╱ ╲ ◆ ◆ ◆ │ → │ ◆ ◆ ◆ │
│ ● ● ● │ │ ● ● ● │
└─────────────────┘ └─────────────────┘
Not linearly separable Now linearly separable!
Kernel functions compute dot product in higher dimension
WITHOUT explicitly computing the transformation!
K(xᵢ, xⱼ) = φ(xᵢ)·φ(xⱼ)
Common kernels:
├─ Linear: K(x,y) = x·y
├─ Polynomial: K(x,y) = (x·y + c)ᵈ
├─ RBF (Gaussian): K(x,y) = exp(-γ||x-y||²)
└─ Sigmoid: K(x,y) = tanh(αx·y + c)
Python Implementation
from sklearn.svm import SVC
from sklearn.datasets import make_moons
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
# Generate data
X, y = make_moons(n_samples=500, noise=0.2, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
# Scale (important for SVM!)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
# Linear SVM
svm_linear = SVC(kernel='linear', C=1.0)
svm_linear.fit(X_train, y_train)
print(f"Linear: {svm_linear.score(X_test, y_test):.3f}")
# RBF SVM
svm_rbf = SVC(kernel='rbf', C=1.0, gamma='scale')
svm_rbf.fit(X_train, y_train)
print(f"RBF: {svm_rbf.score(X_test, y_test):.3f}")
Key Takeaways
- SVM finds the maximum margin decision boundary
- Support vectors are the critical points defining the boundary
- Kernel trick enables nonlinear classification without explicit transformation
- RBF kernel is the default — works for most problems
- Scale your features — SVM is sensitive to feature magnitudes
- SVMs work well for high-dimensional data
- C parameter controls regularization (small C = more regularization)
- SVMs are slow for large datasets — O(n²) to O(n³)