Recommendation Systems — Collaborative & Content-Based Filtering

Core MLRecommendationsFree Lesson

Advertisement

Recommendation Systems — Complete Guide

Recommendation systems predict what users will like based on past behavior.


Types

Content-Based:
├─ Recommend items similar to what user liked
├─ Uses item features
├─ No cold-start for new items
└─ Filter bubble problem

Collaborative Filtering:
├─ Recommend based on similar users
├─ Uses user-item interactions
├─ No feature engineering needed
└─ Cold-start problem for new users/items

Hybrid:
├─ Combines both approaches
├─ Best of both worlds
└─ Most production systems use hybrid

Collaborative Filtering

User-Based:
"Users similar to you also liked..."
Similarity: cosine similarity between user vectors
Prediction: weighted average of similar users' ratings

Item-Based:
"Items similar to what you liked..."
Similarity: cosine similarity between item vectors
Prediction: weighted average of similar items' ratings

Matrix Factorization:
├─ Decompose user-item matrix into latent factors
├─ SVD, ALS, or neural network
├─ Netflix Prize winner used this approach
└─ Handles sparse matrices well
from surprise import SVD, Dataset, Reader
from surprise.model_selection import cross_validate

# Load data
reader = Reader(rating_scale=(1, 5))
data = Dataset.load_from_df(ratings_df[['userId', 'itemId', 'rating']], reader)

# Train SVD
model = SVD(n_factors=50, random_state=42)
cross_validate(model, data, measures=['RMSE', 'MAE'], cv=5)

Evaluation

Metrics:
RMSE: √(Σ(predicted - actual)² / n)
MAE: Σ|predicted - actual| / n

Precision@K: Of top K recommended, how many relevant?
Recall@K: Of all relevant items, how many in top K?
MAP: Mean Average Precision across users
NDCG: Normalized Discounted Cumulative Gain

Offline vs Online:
Offline: Split data, evaluate metrics
Online: A/B test, measure CTR, engagement

Key Takeaways

  1. Collaborative filtering uses user behavior patterns
  2. Content-based uses item features
  3. Matrix factorization handles sparse data well
  4. Cold-start is the biggest challenge (new users/items)
  5. Hybrid approaches combine both methods
  6. Implicit feedback (clicks, views) is easier to collect
  7. Deep learning (NeuMF, Transformer) improves performance
  8. Evaluation requires both offline metrics and A/B testing

Advertisement

Need Expert Machine Learning Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement