Recommendation Systems
๐ก Recommendation systems predict user preferences and suggest items. They power Netflix, Spotify, Amazon, and countless other platforms. This lesson covers collaborative filtering, content-based methods, matrix factorization, and hybrid approaches.
1. Types of Recommendation Systems
Recommendation Systems
โ
โโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโ
โ โ โ
Collaborative Content-Based Hybrid
Filtering Filtering Methods
โ โ โ
โโโโโโโโโดโโโโโโโโ โ โโโโโโดโโโโโ
โ โ โ โ โ
User-Based Item-Based โ Weighted Feature
CF CF โ CF Combination
โ โ โ โ
Matrix Factorization โ Neural CF
(SVD, ALS, NMF) โ (Deep Learning)
DfRecommendation System
A recommendation system is an information filtering system that predicts the preference or rating a user would give to an item. It aims to present items (products, content, services) that are most likely relevant to a user based on historical interactions and item characteristics.
2. Collaborative Filtering
User-Based CF
DfUser-Based Collaborative Filtering
Users who agreed in the past will agree in the future. Predicts a user's rating for an item based on the ratings of similar users.
User-Based CF Prediction
Here,
- =Predicted rating for user u, item i
- =Average rating of user u
- =Set of users similar to u
- =Rating of user v for item i
- =Similarity between users u and v
Item-Based CF
Idea: Items rated similarly by users are similar.
Item-Based CF Prediction
Here,
- =Predicted rating for user u, item i
- =Set of items similar to item i
- =Rating of user u for item j
- =Similarity between items i and j
Similarity Metrics
Cosine Similarity
Here,
- =Rating vector of user u
- =Rating vector of user v
Pearson Correlation:
Pearson Correlation
Here,
- =Rating of user u for item i
- =Average rating of user u
- =Rating of user v for item i
- =Average rating of user v
Jaccard Similarity (for implicit feedback):
Jaccard Similarity
Here,
- =Set of items user u interacted with
- =Set of items user v interacted with
โน๏ธ Cosine vs Pearson
Cosine similarity measures the angle between rating vectors, while Pearson correlation first centers the ratings by subtracting mean values. Pearson is preferred when users have different rating biases (e.g., one user consistently rates higher than another).
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
class UserBasedCF:
def __init__(self, ratings_matrix):
"""
ratings_matrix: (n_users, n_items) sparse or dense matrix
"""
self.ratings = np.array(ratings_matrix)
self.user_sim = cosine_similarity(self.ratings)
np.fill_diagonal(self.user_sim, 0) # exclude self-similarity
def predict(self, user_id, item_id, k=10):
# Find k most similar users who rated this item
rated_mask = self.ratings[:, item_id] > 0
similarities = self.user_sim[user_id] * rated_mask
# Top-k similar users
top_k = np.argsort(similarities)[-k:]
top_sims = similarities[top_k]
top_ratings = self.ratings[top_k, item_id]
# Weighted average
if np.sum(np.abs(top_sims)) == 0:
return np.mean(self.ratings[user_id][self.ratings[user_id] > 0])
pred = np.dot(top_sims, top_ratings) / np.sum(np.abs(top_sims))
return np.clip(pred, 1, 5) # clamp to rating range
def recommend(self, user_id, n=5, k=10):
unrated = np.where(self.ratings[user_id] == 0)[0]
predictions = [(item, self.predict(user_id, item, k)) for item in unrated]
predictions.sort(key=lambda x: x[1], reverse=True)
return predictions[:n]
# Example
ratings = np.array([
[5, 3, 0, 1, 4],
[4, 0, 0, 1, 2],
[1, 1, 0, 5, 4],
[0, 0, 5, 4, 0],
[0, 4, 4, 0, 3],
])
cf = UserBasedCF(ratings)
print("Prediction (user 0, item 2):", cf.predict(0, 2))
print("Recommendations for user 0:", cf.recommend(0, n=3))
๐User-Based CF Prediction
Given users 0, 1, 2 with ratings [5,3,0,1,4], [4,0,0,1,2], [1,1,0,5,4]:
For user 0 predicting item 2:
- Similar users who rated item 2: user 2 (similarity = 0.82)
- User 2's rating for item 2 = 0
- Prediction: 3.0 + 0.82 ร (0 - 2.75) / 0.82 = 3.0 - 2.75 = 0.25
- Clamped to range: 1.0 (minimum rating)
3. Matrix Factorization
SVD-Based Approach
DfMatrix Factorization
Decompose the user-item matrix into low-rank latent factor matrices, where each user and item is represented by a dense vector of latent factors.
Matrix Factorization Decomposition
Here,
- =User-item ratings matrix (m ร n)
- =User latent factors matrix (m ร k)
- =Item latent factors matrix (n ร k)
- =Number of latent dimensions
Rating Prediction via Latent Factors
Here,
- =Predicted rating for user u, item i
- =User u's latent factor vector
- =Item i's latent factor vector
- =Number of latent dimensions
User-Item Matrix R (5 users ร 5 items)
โโโโโโฌโโโโโฌโโโโโฌโโโโโฌโโโโโ
โ 5 โ 3 โ ? โ 1 โ 4 โ Decompose into:
โโโโโโผโโโโโผโโโโโผโโโโโผโโโโโค
โ 4 โ ? โ ? โ 1 โ 2 โ P (5ร2) Q (5ร2)
โโโโโโผโโโโโผโโโโโผโโโโโผโโโโโค โโโโโโ โโโโโโ
โ 1 โ 1 โ ? โ 5 โ 4 โ โpโโโ โqโโโ
โโโโโโผโโโโโผโโโโโผโโโโโผโโโโโค โpโโโ ร โqโโโ
โ ? โ ? โ 5 โ 4 โ ? โ โpโโโ ร โqโโโ
โโโโโโผโโโโโผโโโโโผโโโโโผโโโโโค โpโโโ ร โqโโโ
โ ? โ 4 โ 4 โ ? โ 3 โ โpโ
โโ ร โqโ
โโ
โโโโโโดโโโโโดโโโโโดโโโโโดโโโโโ โโโโโโ โโโโโโ
? = predicted: rฬแตคแตข = pแตค ยท qแตข
Training with SGD
Matrix Factorization Loss Function
Here,
- =Actual rating for user u, item i
- =User latent factors
- =Item latent factors
- =Regularization parameter
- =Set of observed ratings
Gradient updates:
User Factor Update
Here,
- =Prediction error: r_{ui} - rฬ_{ui}
- =Learning rate
- =Regularization parameter
Item Factor Update
Here,
- =Prediction error: r_{ui} - rฬ_{ui}
- =Learning rate
- =Regularization parameter
๐ก SGD vs ALS
SGD updates all factors simultaneously but may get stuck in local minima. ALS alternates between fixing P and solving for Q (and vice versa), leading to a closed-form solution per step. ALS is more parallelizable and commonly used in production systems.
class MatrixFactorization:
def __init__(self, n_users, n_items, n_factors=20, lr=0.01, reg=0.02):
self.P = np.random.normal(0, 0.1, (n_users, n_factors))
self.Q = np.random.normal(0, 0.1, (n_items, n_factors))
self.lr = lr
self.reg = reg
def fit(self, ratings, n_epochs=100):
"""
ratings: list of (user, item, rating) tuples
"""
for epoch in range(n_epochs):
np.random.shuffle(ratings)
total_error = 0
for u, i, r in ratings:
pred = np.dot(self.P[u], self.Q[i])
error = r - pred
total_error += error ** 2
# Update factors
p_old = self.P[u].copy()
self.P[u] += self.lr * (error * self.Q[i] - self.reg * self.P[u])
self.Q[i] += self.lr * (error * p_old - self.reg * self.Q[i])
rmse = np.sqrt(total_error / len(ratings))
if (epoch + 1) % 20 == 0:
print(f"Epoch {epoch+1:3d} | RMSE: {rmse:.4f}")
def predict(self, user, item):
return np.clip(np.dot(self.P[user], self.Q[item]), 1, 5)
def recommend(self, user, n=5):
scores = np.dot(self.P[user], self.Q.T)
top_items = np.argsort(scores)[::-1][:n]
return [(item, scores[item]) for item in top_items]
# Example
ratings_data = [
(0, 0, 5), (0, 1, 3), (0, 3, 1), (0, 4, 4),
(1, 0, 4), (1, 3, 1), (1, 4, 2),
(2, 0, 1), (2, 1, 1), (2, 3, 5), (2, 4, 4),
(3, 2, 5), (3, 3, 4),
(4, 1, 4), (4, 2, 4), (4, 4, 3),
]
mf = MatrixFactorization(n_users=5, n_items=5, n_factors=10, lr=0.01, reg=0.02)
mf.fit(ratings_data, n_epochs=100)
print("\nPredictions:")
for u in range(5):
for i in range(5):
if ratings[u][i] == 0:
print(f" User {u}, Item {i}: {mf.predict(u, i):.2f}")
ALS (Alternating Least Squares)
Fixes one factor matrix and solves for the other:
ALS Closed-Form Solution
Here,
- =Rows of Q corresponding to items rated by u
- =Regularization parameter
- =Identity matrix
- =User u's ratings vector
ThALS Convergence
Alternating Least Squares converges to a local minimum because each subproblem (fixing P, solving for Q or vice versa) has a closed-form solution that monotonically decreases the objective function.
class ALS:
def __init__(self, n_users, n_items, n_factors=20, reg=0.1):
self.P = np.random.normal(0, 0.1, (n_users, n_factors))
self.Q = np.random.normal(0, 0.1, (n_items, n_factors))
self.reg = reg
def fit(self, ratings_matrix, n_iterations=15):
"""
ratings_matrix: (n_users, n_items) with 0 for missing
"""
for iteration in range(n_iterations):
# Fix Q, solve for P
for u in range(len(ratings_matrix)):
idx = np.where(ratings_matrix[u] > 0)[0]
if len(idx) == 0:
continue
Q_u = self.Q[idx]
r_u = ratings_matrix[u, idx]
self.P[u] = np.linalg.solve(
Q_u.T @ Q_u + self.reg * np.eye(Q_u.shape[1]),
Q_u.T @ r_u
)
# Fix P, solve for Q
for i in range(ratings_matrix.shape[1]):
idx = np.where(ratings_matrix[:, i] > 0)[0]
if len(idx) == 0:
continue
P_i = self.P[idx]
r_i = ratings_matrix[idx, i]
self.Q[i] = np.linalg.solve(
P_i.T @ P_i + self.reg * np.eye(P_i.shape[1]),
P_i.T @ r_i
)
def predict(self):
return self.P @ self.Q.T
als = ALS(n_users=5, n_items=5, n_factors=10, reg=0.1)
als.fit(ratings)
predicted = als.predict()
print("ALS predicted ratings:\n", np.round(predicted, 2))
4. Content-Based Filtering
Item Feature Representation
Content-Based Prediction
Here,
- =Feature vector of item i
- =User preference weights
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import linear_kernel
class ContentBasedRecommender:
def __init__(self):
self.tfidf = TfidfVectorizer(stop_words='english')
self.cosine_sim = None
self.items = None
def fit(self, items_df, content_column='description'):
"""items_df: DataFrame with 'id' and content_column"""
self.items = items_df.reset_index(drop=True)
# Build TF-IDF matrix
tfidf_matrix = self.tfidf.fit_transform(self.items[content_column])
# Compute cosine similarity
self.cosine_sim = linear_kernel(tfidf_matrix, tfidf_matrix)
def get_recommendations(self, item_id, n=10):
idx = self.items[self.items['id'] == item_id].index[0]
# Get similarity scores
sim_scores = list(enumerate(self.cosine_sim[idx]))
sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
# Exclude itself
sim_scores = sim_scores[1:n+1]
item_indices = [s[0] for s in sim_scores]
return self.items.iloc[item_indices][['id', 'title', 'genres']]
# Example
movies = pd.DataFrame({
'id': [1, 2, 3, 4, 5],
'title': ['The Matrix', 'Inception', 'Toy Story', 'The Dark Knight', 'Finding Nemo'],
'genres': ['Action Sci-Fi', 'Action Sci-Fi Thriller', 'Animation Comedy',
'Action Crime Thriller', 'Animation Comedy Adventure'],
'description': [
'A computer hacker learns about the true nature of reality',
'A thief who steals secrets through dream-sharing technology',
'A cowboy doll is threatened by a new spaceman toy',
'Batman faces the Joker, a criminal mastermind',
'A clownfish searches for his missing son',
]
})
rec = ContentBasedRecommender()
rec.fit(movies, 'description')
print(rec.get_recommendations(1, n=3))
๐TF-IDF Content Similarity
For two movies with descriptions:
- Movie 1: "A computer hacker learns about reality"
- Movie 2: "A thief steals secrets through dreams"
TF-IDF vectors will have high overlap on words like "computer" vs "dreams" if they appear frequently in the corpus. Cosine similarity measures the angle between these vectors, giving higher scores to semantically similar content.
5. Neural Collaborative Filtering
Deep learning approach that learns non-linear user-item interactions:
Neural Collaborative Filtering
Here,
- =User latent factors
- =Item latent factors
- =User side information
- =Item side information
- =Neural network function
โน๏ธ Neural CF Advantages
Neural CF can capture non-linear and complex interactions that linear matrix factorization cannot. It combines the expressive power of neural networks with the interpretability of latent factor models.
import torch
import torch.nn as nn
class NeuralCF(nn.Module):
def __init__(self, n_users, n_items, embed_dim=64, hidden_dims=[128, 64, 32]):
super().__init__()
self.user_embed = nn.Embedding(n_users, embed_dim)
self.item_embed = nn.Embedding(n_items, embed_dim)
layers = []
input_dim = embed_dim * 2
for h_dim in hidden_dims:
layers.extend([
nn.Linear(input_dim, h_dim),
nn.ReLU(),
nn.BatchNorm1d(h_dim),
nn.Dropout(0.2),
])
input_dim = h_dim
layers.append(nn.Linear(input_dim, 1))
self.mlp = nn.Sequential(*layers)
def forward(self, user_ids, item_ids):
user_emb = self.user_embed(user_ids)
item_emb = self.item_embed(item_ids)
x = torch.cat([user_emb, item_emb], dim=-1)
return self.mlp(x).squeeze()
# Training
model = NeuralCF(n_users=1000, n_items=5000)
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
# Example training loop
for epoch in range(20):
model.train()
for batch_users, batch_items, batch_ratings in train_loader:
optimizer.zero_grad()
predictions = model(batch_users, batch_items)
loss = criterion(predictions, batch_ratings.float())
loss.backward()
optimizer.step()
๐Neural CF Architecture
For n_users=1000, n_items=5000, embed_dim=64:
- User embedding: (1000, 64) โ lookup user_id โ (64,)
- Item embedding: (5000, 64) โ lookup item_id โ (64,)
- Concatenate: (128,)
- MLP layers: 128 โ 64 โ 32 โ 1
- Output: scalar rating prediction
Total parameters: ~320K (embeddings) + ~9K (MLP)
6. Evaluation Metrics
Rating Prediction Metrics
RMSE (Root Mean Squared Error):
Root Mean Squared Error
Here,
- =Test set of (user, item) pairs
- =Actual rating
- =Predicted rating
MAE (Mean Absolute Error):
Mean Absolute Error
Here,
- =Test set of (user, item) pairs
- =Actual rating
- =Predicted rating
Ranking Metrics
Precision@K
Here,
- =Set of top-K recommended items
- =Set of relevant items
Recall@K
Here,
- =Set of top-K recommended items
- =Set of relevant items
NDCG@K (Normalized Discounted Cumulative Gain):
Normalized Discounted Cumulative Gain
Here,
- =Relevance score of item at position i
- =Number of recommendations to evaluate
- =Ideal DCG for the best possible ranking
MAP@K (Mean Average Precision):
Mean Average Precision
Here,
- =Number of relevant items
- =Number of recommendations
- =Precision at position k
- =Relevance indicator at position k
โน๏ธ Choosing Metrics
RMSE/MAE measure rating prediction accuracy but don't capture ranking quality. Precision@K and Recall@K measure binary relevance. NDCG@K accounts for position-dependent relevance. Use NDCG@K when item ordering matters.
import numpy as np
class RecommenderEvaluator:
@staticmethod
def rmse(predictions, actuals):
return np.sqrt(np.mean((np.array(predictions) - np.array(actuals)) ** 2))
@staticmethod
def mae(predictions, actuals):
return np.mean(np.abs(np.array(predictions) - np.array(actuals)))
@staticmethod
def precision_at_k(recommended, relevant, k):
recommended = recommended[:k]
relevant_set = set(relevant)
hits = len([r for r in recommended if r in relevant_set])
return hits / k
@staticmethod
def recall_at_k(recommended, relevant, k):
recommended = recommended[:k]
relevant_set = set(relevant)
hits = len([r for r in recommended if r in relevant_set])
return hits / len(relevant) if relevant else 0
@staticmethod
def ndcg_at_k(recommended, relevant_scores, k):
"""
recommended: list of item ids in recommended order
relevant_scores: dict {item_id: relevance_score}
"""
dcg = 0.0
for i, item in enumerate(recommended[:k]):
rel = relevant_scores.get(item, 0)
dcg += (2 ** rel - 1) / np.log2(i + 2)
# Ideal DCG
ideal = sorted(relevant_scores.values(), reverse=True)[:k]
idcg = sum((2 ** r - 1) / np.log2(i + 2) for i, r in enumerate(ideal))
return dcg / idcg if idcg > 0 else 0.0
@staticmethod
def mean_average_precision(user_recommendations, user_relevant, k=10):
aps = []
for rec, rel in zip(user_recommendations, user_relevant):
rel_set = set(rel)
hits = 0
precision_sum = 0.0
for i, item in enumerate(rec[:k]):
if item in rel_set:
hits += 1
precision_sum += hits / (i + 1)
ap = precision_sum / min(k, len(rel)) if rel else 0
aps.append(ap)
return np.mean(aps)
# Example evaluation
evaluator = RecommenderEvaluator()
predictions = [4.2, 3.8, 5.0, 2.1, 4.5]
actuals = [4, 4, 5, 2, 5]
print(f"RMSE: {evaluator.rmse(predictions, actuals):.4f}")
print(f"MAE: {evaluator.mae(predictions, actuals):.4f}")
recommended = [1, 5, 3, 7, 2, 8, 10, 4, 6, 9]
relevant = [1, 3, 5, 7]
print(f"Precision@5: {evaluator.precision_at_k(recommended, relevant, 5):.4f}")
print(f"Recall@5: {evaluator.recall_at_k(recommended, relevant, 5):.4f}")
relevant_scores = {1: 3, 3: 2, 5: 3, 7: 1}
print(f"NDCG@5: {evaluator.ndcg_at_k(recommended, relevant_scores, 5):.4f}")
๐NDCG Calculation
For recommended items [A, B, C] with relevance scores {A: 3, B: 2, C: 1}:
- DCG@3 = (2^3-1)/log2(2) + (2^2-1)/log2(3) + (2^1-1)/log2(4) = 7/1 + 3/1.585 + 1/2 = 7 + 1.89 + 0.5 = 9.39
- IDCG@3 (ideal order [A, B, C]) = same = 9.39
- NDCG@3 = 9.39 / 9.39 = 1.0
For a worse ranking [C, A, B]: DCG = (2^1-1)/1 + (2^3-1)/1.585 + (2^2-1)/2 = 1 + 4.42 + 1.5 = 6.92 NDCG = 6.92 / 9.39 = 0.74
7. Hybrid Methods
Weighted Hybrid
Weighted Hybrid Prediction
Here,
- =Weight for collaborative filtering
- =CF prediction
- =Content-based prediction
Feature-Combination Hybrid
Combine user and item features with collaborative signals:
class HybridRecommender(nn.Module):
def __init__(self, n_users, n_items, user_features_dim, item_features_dim, embed_dim=64):
super().__init__()
self.user_embed = nn.Embedding(n_users, embed_dim)
self.item_embed = nn.Embedding(n_items, embed_dim)
self.user_features_fc = nn.Linear(user_features_dim, embed_dim)
self.item_features_fc = nn.Linear(item_features_dim, embed_dim)
self.classifier = nn.Sequential(
nn.Linear(embed_dim * 4, 128),
nn.ReLU(),
nn.Dropout(0.3),
nn.Linear(128, 1),
)
def forward(self, user_ids, item_ids, user_features, item_features):
user_emb = self.user_embed(user_ids)
item_emb = self.item_embed(item_ids)
user_feat = self.user_features_fc(user_features)
item_feat = self.item_features_fc(item_features)
x = torch.cat([user_emb, item_emb, user_feat, item_feat], dim=-1)
return self.classifier(x).squeeze()
๐ก Hybrid System Design
In practice, hybrid systems often use a two-stage approach: first generate candidates using fast methods (CF or content-based), then re-rank using a more complex model (neural hybrid). This balances efficiency with recommendation quality.
8. Cold Start Problem
New User (No History)
- Use content-based recommendations (demographics, popular items)
- Ask for explicit preferences (onboarding questionnaire)
- Use bandit algorithms to explore quickly
New Item (No Interactions)
- Use content features to find similar existing items
- Popularity-based initial exposure
- Knowledge-based recommendations (item attributes)
DfCold Start Problem
The cold start problem occurs when a recommendation system cannot make accurate predictions for new users or new items due to insufficient historical interaction data.
9. Key Takeaways
๐Summary: Recommendation Systems
- Collaborative filtering leverages user behavior patterns โ no content features needed
- Matrix factorization (SVD, ALS) compresses sparse user-item matrices into dense latent factors
- Content-based methods use item features โ handles cold start for new items
- Neural CF learns non-linear interactions between users and items
- Hybrid methods combine CF and CB for best of both worlds
- Evaluation: use RMSE/MAE for rating prediction, Precision@K/NDCG@K for ranking
- Cold start remains the biggest challenge โ hybrid approaches help
- The choice of similarity metric (cosine vs Pearson) depends on user rating biases
- Regularization is critical to prevent overfitting in matrix factorization
10. Practice Exercises
Exercise 1: Build a Collaborative Filter
# TODO: Implement item-based CF on MovieLens dataset
# Compare cosine vs Pearson similarity
# Evaluate with RMSE on held-out test set
# Target: RMSE < 0.9
Exercise 2: Matrix Factorization from Scratch
# TODO: Implement SGD-based matrix factorization
# Tune: learning rate, regularization, latent factors
# Plot training RMSE vs epochs
# Compare with ALS implementation
Exercise 3: Content-Based + Hybrid
# TODO: Build a content-based recommender for books
# Use: title, author, genre, description as features
# Combine with CF using weighted hybrid
# Find optimal alpha
Exercise 4: Deep Learning Recommender
# TODO: Implement Neural Collaborative Filtering
# Use: embedding layers + MLP
# Compare with matrix factorization baseline
# Experiment with architecture (layers, dropout)