Naive Bayes — Complete Guide for Classification

ML FoundationsClassificationFree Lesson

Advertisement

Naive Bayes — Complete Guide

Naive Bayes applies Bayes' theorem with the "naive" assumption that features are independent. Despite this simplification, it works remarkably well in practice.


Bayes' Theorem

P(Class|Features) = P(Features|Class) × P(Class) / P(Features)

P(Spam|free money) = P(free money|Spam) × P(Spam) / P(free money)

Where:
P(Class) = prior probability
P(Features|Class) = likelihood
P(Features) = evidence
P(Class|Features) = posterior probability

The "Naive" Assumption

Assumption: Features are INDEPENDENT given the class

P(x₁, x₂, ..., xₙ|Class) = P(x₁|Class) × P(x₂|Class) × ... × P(xₙ|Class)

Example (Email Spam):
P("free", "money"|Spam) ≈ P("free"|Spam) × P("money"|Spam)

This is NAIVE because "free" and "money" are correlated!
But it makes computation tractable and still works well.

Variants

Gaussian Naive Bayes:
├─ Features are continuous (real numbers)
├─ Assumes Gaussian distribution per feature
├─ P(xᵢ|Class) = (1/√(2πσ²)) × exp(-(xᵢ-μ)²/(2σ²))
└─ Use for: General classification

Multinomial Naive Bayes:
├─ Features are counts/frequencies
├─ P(xᵢ|Class) = (Nᵢᵢ + α) / (Nᵢ + αn)
├─ α = Laplace smoothing
└─ Use for: Text classification (word counts)

Bernoulli Naive Bayes:
├─ Features are binary (0/1)
├─ P(xᵢ=1|Class) = count(xᵢ=1 and Class) / count(Class)
└─ Use for: Binary features, binary text (word present/absent)

Text Classification Example

from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split

# Sample data
emails = [
    "Win free money now", "Claim your prize",
    "Meeting tomorrow at 10", "Project update",
    "Buy discount pills", "Special offer just for you",
    "Team lunch Friday", "Quarterly report attached"
]
labels = [1, 1, 0, 0, 1, 1, 0, 0]  # 1=spam, 0=not spam

# Vectorize
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(emails)

# Train
model = MultinomialNB(alpha=1.0)  # Laplace smoothing
model.fit(X, labels)

# Predict
new_emails = ["Free prize waiting for you"]
X_new = vectorizer.transform(new_emails)
print(f"Prediction: {model.predict(X_new)[0]}")  # 1 (spam)
print(f"Probabilities: {model.predict_proba(X_new)[0]}")

When to Use

Naive Bayes is EXCELLENT for:
├─ Text classification (spam, sentiment)
├─ High-dimensional data
├─ Small training sets
├─ Multi-class problems
├─ When speed matters

Naive Bayes is POOR for:
├─ Correlated features
├─ Continuous features with complex distributions
├─ When probability estimates need to be accurate
└─ When feature interactions matter

Key Takeaways

  1. Naive Bayes applies Bayes' theorem with independence assumption
  2. Gaussian for continuous, Multinomial for counts, Bernoulli for binary
  3. Fast — trains in linear time
  4. Great baseline for text classification
  5. Laplace smoothing handles unseen features
  6. Works surprisingly well despite the naive assumption
  7. Probabilistic output enables threshold tuning
  8. Scale invariant — works with any number of features

Advertisement

Need Expert Machine Learning Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement