Naive Bayes — Complete Guide
Naive Bayes applies Bayes' theorem with the "naive" assumption that features are independent. Despite this simplification, it works remarkably well in practice.
Bayes' Theorem
P(Class|Features) = P(Features|Class) × P(Class) / P(Features)
P(Spam|free money) = P(free money|Spam) × P(Spam) / P(free money)
Where:
P(Class) = prior probability
P(Features|Class) = likelihood
P(Features) = evidence
P(Class|Features) = posterior probability
The "Naive" Assumption
Assumption: Features are INDEPENDENT given the class
P(x₁, x₂, ..., xₙ|Class) = P(x₁|Class) × P(x₂|Class) × ... × P(xₙ|Class)
Example (Email Spam):
P("free", "money"|Spam) ≈ P("free"|Spam) × P("money"|Spam)
This is NAIVE because "free" and "money" are correlated!
But it makes computation tractable and still works well.
Variants
Gaussian Naive Bayes:
├─ Features are continuous (real numbers)
├─ Assumes Gaussian distribution per feature
├─ P(xᵢ|Class) = (1/√(2πσ²)) × exp(-(xᵢ-μ)²/(2σ²))
└─ Use for: General classification
Multinomial Naive Bayes:
├─ Features are counts/frequencies
├─ P(xᵢ|Class) = (Nᵢᵢ + α) / (Nᵢ + αn)
├─ α = Laplace smoothing
└─ Use for: Text classification (word counts)
Bernoulli Naive Bayes:
├─ Features are binary (0/1)
├─ P(xᵢ=1|Class) = count(xᵢ=1 and Class) / count(Class)
└─ Use for: Binary features, binary text (word present/absent)
Text Classification Example
from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
# Sample data
emails = [
"Win free money now", "Claim your prize",
"Meeting tomorrow at 10", "Project update",
"Buy discount pills", "Special offer just for you",
"Team lunch Friday", "Quarterly report attached"
]
labels = [1, 1, 0, 0, 1, 1, 0, 0] # 1=spam, 0=not spam
# Vectorize
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(emails)
# Train
model = MultinomialNB(alpha=1.0) # Laplace smoothing
model.fit(X, labels)
# Predict
new_emails = ["Free prize waiting for you"]
X_new = vectorizer.transform(new_emails)
print(f"Prediction: {model.predict(X_new)[0]}") # 1 (spam)
print(f"Probabilities: {model.predict_proba(X_new)[0]}")
When to Use
Naive Bayes is EXCELLENT for:
├─ Text classification (spam, sentiment)
├─ High-dimensional data
├─ Small training sets
├─ Multi-class problems
├─ When speed matters
Naive Bayes is POOR for:
├─ Correlated features
├─ Continuous features with complex distributions
├─ When probability estimates need to be accurate
└─ When feature interactions matter
Key Takeaways
- Naive Bayes applies Bayes' theorem with independence assumption
- Gaussian for continuous, Multinomial for counts, Bernoulli for binary
- Fast — trains in linear time
- Great baseline for text classification
- Laplace smoothing handles unseen features
- Works surprisingly well despite the naive assumption
- Probabilistic output enables threshold tuning
- Scale invariant — works with any number of features