Support Vector Machines — Complete Guide

ML FoundationsClassificationFree Lesson

Advertisement

Support Vector Machines — Complete Guide

SVM finds the optimal hyperplane that maximizes the margin between classes. It's one of the most theoretically elegant ML algorithms.


Maximum Margin Classifier

Goal: Find the hyperplane that maximizes the MARGIN

Margin = distance between hyperplane and nearest points

    Class -1          Margin          Class +1
        ●                               ◆
          ●     ┌─────────────┐       ◆
            ●   │  Decision   │     ◆
              ● │  Boundary   │   ◆
                │             │ ◆
    ←───────────┼─────────────┼───────────→
                │             │
              ● │             │ ◆
            ●   │             │
          ●     └─────────────┘
        ●

Support Vectors: The closest points to the boundary
(those ● and ◆ that define the margin)

The Math

Decision function: f(x) = w·x + b

Classification:
f(x) ≥ +1 → Class +1
f(x) ≤ -1 → Class -1

Margin = 2 / ||w||

Maximize margin = Minimize ||w||²/2
Subject to: yᵢ(w·xᵢ + b) ≥ 1 for all i

This is a QUADRATIC PROGRAMMING problem
Solved using Lagrange multipliers

Kernel Trick

Problem: Non-linear data can't be separated by a line

Solution: Map to higher dimension where it IS linear

Original space (2D):          Higher dimension (3D):
┌─────────────────┐          ┌─────────────────┐
│ ● ● ●           │          │ ● ● ●           │
│     ╱ ╲ ◆ ◆ ◆  │    →     │       ◆ ◆ ◆    │
│ ● ● ●           │          │ ● ● ●           │
└─────────────────┘          └─────────────────┘
Not linearly separable       Now linearly separable!

Kernel functions compute dot product in higher dimension
WITHOUT explicitly computing the transformation!

K(xᵢ, xⱼ) = φ(xᵢ)·φ(xⱼ)

Common kernels:
├─ Linear: K(x,y) = x·y
├─ Polynomial: K(x,y) = (x·y + c)ᵈ
├─ RBF (Gaussian): K(x,y) = exp(-γ||x-y||²)
└─ Sigmoid: K(x,y) = tanh(αx·y + c)

Python Implementation

from sklearn.svm import SVC
from sklearn.datasets import make_moons
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Generate data
X, y = make_moons(n_samples=500, noise=0.2, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Scale (important for SVM!)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Linear SVM
svm_linear = SVC(kernel='linear', C=1.0)
svm_linear.fit(X_train, y_train)
print(f"Linear: {svm_linear.score(X_test, y_test):.3f}")

# RBF SVM
svm_rbf = SVC(kernel='rbf', C=1.0, gamma='scale')
svm_rbf.fit(X_train, y_train)
print(f"RBF: {svm_rbf.score(X_test, y_test):.3f}")

Key Takeaways

  1. SVM finds the maximum margin decision boundary
  2. Support vectors are the critical points defining the boundary
  3. Kernel trick enables nonlinear classification without explicit transformation
  4. RBF kernel is the default — works for most problems
  5. Scale your features — SVM is sensitive to feature magnitudes
  6. SVMs work well for high-dimensional data
  7. C parameter controls regularization (small C = more regularization)
  8. SVMs are slow for large datasets — O(n²) to O(n³)

Advertisement

Need Expert Machine Learning Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement