Naive Bayes — Complete Guide

Naive Bayes applies Bayes' theorem with the "naive" assumption that features are independent. Despite this simplification, it works remarkably well in practice.

Bayes' Theorem

P(Class|Features) = P(Features|Class) × P(Class) / P(Features)

P(Spam|free money) = P(free money|Spam) × P(Spam) / P(free money)

Where:
P(Class) = prior probability
P(Features|Class) = likelihood
P(Features) = evidence
P(Class|Features) = posterior probability

The "Naive" Assumption

Assumption: Features are INDEPENDENT given the class

P(x₁, x₂, ..., xₙ|Class) = P(x₁|Class) × P(x₂|Class) × ... × P(xₙ|Class)

Example (Email Spam):
P("free", "money"|Spam) ≈ P("free"|Spam) × P("money"|Spam)

This is NAIVE because "free" and "money" are correlated!
But it makes computation tractable and still works well.

Variants

Gaussian Naive Bayes:
├─ Features are continuous (real numbers)
├─ Assumes Gaussian distribution per feature
├─ P(xᵢ|Class) = (1/√(2πσ²)) × exp(-(xᵢ-μ)²/(2σ²))
└─ Use for: General classification

Multinomial Naive Bayes:
├─ Features are counts/frequencies
├─ P(xᵢ|Class) = (Nᵢᵢ + α) / (Nᵢ + αn)
├─ α = Laplace smoothing
└─ Use for: Text classification (word counts)

Bernoulli Naive Bayes:
├─ Features are binary (0/1)
├─ P(xᵢ=1|Class) = count(xᵢ=1 and Class) / count(Class)
└─ Use for: Binary features, binary text (word present/absent)

Text Classification Example

from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split

# Sample data
emails = [
    "Win free money now", "Claim your prize",
    "Meeting tomorrow at 10", "Project update",
    "Buy discount pills", "Special offer just for you",
    "Team lunch Friday", "Quarterly report attached"
]
labels = [1, 1, 0, 0, 1, 1, 0, 0]  # 1=spam, 0=not spam

# Vectorize
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(emails)

# Train
model = MultinomialNB(alpha=1.0)  # Laplace smoothing
model.fit(X, labels)

# Predict
new_emails = ["Free prize waiting for you"]
X_new = vectorizer.transform(new_emails)
print(f"Prediction: {model.predict(X_new)[0]}")  # 1 (spam)
print(f"Probabilities: {model.predict_proba(X_new)[0]}")

When to Use

Naive Bayes is EXCELLENT for:
├─ Text classification (spam, sentiment)
├─ High-dimensional data
├─ Small training sets
├─ Multi-class problems
├─ When speed matters

Naive Bayes is POOR for:
├─ Correlated features
├─ Continuous features with complex distributions
├─ When probability estimates need to be accurate
└─ When feature interactions matter

Key Takeaways

Naive Bayes applies Bayes' theorem with independence assumption
Gaussian for continuous, Multinomial for counts, Bernoulli for binary
Fast — trains in linear time
Great baseline for text classification
Laplace smoothing handles unseen features
Works surprisingly well despite the naive assumption
Probabilistic output enables threshold tuning
Scale invariant — works with any number of features

Naive Bayes — Complete Guide for Classification

Naive Bayes — Complete Guide

Bayes' Theorem

The "Naive" Assumption

Variants

Text Classification Example

When to Use

Key Takeaways

Need Expert Machine Learning Help?