Recommendation Systems

💡 Recommendation systems predict user preferences and suggest items. They power Netflix, Spotify, Amazon, and countless other platforms. This lesson covers collaborative filtering, content-based methods, matrix factorization, and hybrid approaches.

1. Types of Recommendation Systems

Architecture Diagram

                    Recommendation Systems
                            │
            ┌───────────────┼───────────────┐
            │               │               │
    Collaborative     Content-Based      Hybrid
       Filtering          Filtering       Methods
            │               │               │
    ┌───────┴───────┐      │          ┌────┴────┐
    │               │      │          │         │
  User-Based    Item-Based   │    Weighted  Feature
    CF             CF        │      CF     Combination
    │               │      │          │
  Matrix Factorization      │    Neural CF
  (SVD, ALS, NMF)          │    (Deep Learning)

DfRecommendation System

A recommendation system is an information filtering system that predicts the preference or rating a user would give to an item. It aims to present items (products, content, services) that are most likely relevant to a user based on historical interactions and item characteristics.

2. Collaborative Filtering

User-Based CF

DfUser-Based Collaborative Filtering

Users who agreed in the past will agree in the future. Predicts a user's rating for an item based on the ratings of similar users.

User-Based CF Prediction

\hat{r}_{ui} = \bar{r}_u + \frac{\sum_{v \in N(u)} \text{sim}(u,v) \cdot (r_{vi} - \bar{r}_v)}{\sum_{v \in N(u)} |\text{sim}(u,v)|}

Here,

$\hat{r}_{ui}$ =Predicted rating for user u, item i
$\bar{r}_u$ =Average rating of user u
$N(u)$ =Set of users similar to u
$r_{vi}$ =Rating of user v for item i
$sim(u,v)$ =Similarity between users u and v

Item-Based CF

Idea: Items rated similarly by users are similar.

Item-Based CF Prediction

\hat{r}_{ui} = \frac{\sum_{j \in N(i)} \text{sim}(i,j) \cdot r_{uj}}{\sum_{j \in N(i)} |\text{sim}(i,j)|}

Here,

$\hat{r}_{ui}$ =Predicted rating for user u, item i
$N(i)$ =Set of items similar to item i
$r_{uj}$ =Rating of user u for item j
$sim(i,j)$ =Similarity between items i and j

Similarity Metrics

Cosine Similarity

\text{cos}(u, v) = \frac{\mathbf{r}_u \cdot \mathbf{r}_v}{\|\mathbf{r}_u\| \|\mathbf{r}_v\|}

Here,

$\mathbf{r}_u$ =Rating vector of user u
$\mathbf{r}_v$ =Rating vector of user v

Pearson Correlation:

Pearson Correlation

\text{pearson}(u, v) = \frac{\sum_{i}(r_{ui} - \bar{r}_u)(r_{vi} - \bar{r}_v)}{\sqrt{\sum_i (r_{ui} - \bar{r}_u)^2} \sqrt{\sum_i (r_{vi} - \bar{r}_v)^2}}

Here,

$r_{ui}$ =Rating of user u for item i
$\bar{r}_u$ =Average rating of user u
$r_{vi}$ =Rating of user v for item i
$\bar{r}_v$ =Average rating of user v

Jaccard Similarity (for implicit feedback):

Jaccard Similarity

\text{jaccard}(u, v) = \frac{|I_u \cap I_v|}{|I_u \cup I_v|}

Here,

$I_u$ =Set of items user u interacted with
$I_v$ =Set of items user v interacted with

ℹ️ Cosine vs Pearson

Cosine similarity measures the angle between rating vectors, while Pearson correlation first centers the ratings by subtracting mean values. Pearson is preferred when users have different rating biases (e.g., one user consistently rates higher than another).

import numpy as np
from sklearn.metrics.pairwise import cosine_similarity

class UserBasedCF:
    def __init__(self, ratings_matrix):
        """
        ratings_matrix: (n_users, n_items) sparse or dense matrix
        """
        self.ratings = np.array(ratings_matrix)
        self.user_sim = cosine_similarity(self.ratings)
        np.fill_diagonal(self.user_sim, 0)  # exclude self-similarity

    def predict(self, user_id, item_id, k=10):
        # Find k most similar users who rated this item
        rated_mask = self.ratings[:, item_id] > 0
        similarities = self.user_sim[user_id] * rated_mask

        # Top-k similar users
        top_k = np.argsort(similarities)[-k:]
        top_sims = similarities[top_k]
        top_ratings = self.ratings[top_k, item_id]

        # Weighted average
        if np.sum(np.abs(top_sims)) == 0:
            return np.mean(self.ratings[user_id][self.ratings[user_id] > 0])

        pred = np.dot(top_sims, top_ratings) / np.sum(np.abs(top_sims))
        return np.clip(pred, 1, 5)  # clamp to rating range

    def recommend(self, user_id, n=5, k=10):
        unrated = np.where(self.ratings[user_id] == 0)[0]
        predictions = [(item, self.predict(user_id, item, k)) for item in unrated]
        predictions.sort(key=lambda x: x[1], reverse=True)
        return predictions[:n]

# Example
ratings = np.array([
    [5, 3, 0, 1, 4],
    [4, 0, 0, 1, 2],
    [1, 1, 0, 5, 4],
    [0, 0, 5, 4, 0],
    [0, 4, 4, 0, 3],
])

cf = UserBasedCF(ratings)
print("Prediction (user 0, item 2):", cf.predict(0, 2))
print("Recommendations for user 0:", cf.recommend(0, n=3))

📝User-Based CF Prediction

Given users 0, 1, 2 with ratings [5,3,0,1,4], [4,0,0,1,2], [1,1,0,5,4]:

For user 0 predicting item 2:

Similar users who rated item 2: user 2 (similarity = 0.82)
User 2's rating for item 2 = 0
Prediction: 3.0 + 0.82 × (0 - 2.75) / 0.82 = 3.0 - 2.75 = 0.25
Clamped to range: 1.0 (minimum rating)

3. Matrix Factorization

SVD-Based Approach

DfMatrix Factorization

Decompose the user-item matrix into low-rank latent factor matrices, where each user and item is represented by a dense vector of latent factors.

Matrix Factorization Decomposition

R \approx U \Sigma V^T = P Q^T

Here,

$R$ =User-item ratings matrix (m × n)
$P$ =User latent factors matrix (m × k)
$Q$ =Item latent factors matrix (n × k)
$k$ =Number of latent dimensions

Rating Prediction via Latent Factors

\hat{r}_{ui} = \mathbf{p}_u \cdot \mathbf{q}_i = \sum_{f=1}^{k} p_{uf} q_{if}

Here,

$\hat{r}_{ui}$ =Predicted rating for user u, item i
$\mathbf{p}_u$ =User u's latent factor vector
$\mathbf{q}_i$ =Item i's latent factor vector
$k$ =Number of latent dimensions

Architecture Diagram

User-Item Matrix R (5 users × 5 items)
┌────┬────┬────┬────┬────┐
│ 5  │ 3  │  ? │ 1  │ 4  │    Decompose into:
├────┼────┼────┼────┼────┤
│ 4  │  ? │  ? │ 1  │ 2  │    P (5×2)          Q (5×2)
├────┼────┼────┼────┼────┤    ┌────┐           ┌────┐
│ 1  │ 1  │  ? │ 5  │ 4  │    │p₁₁│           │q₁₁│
├────┼────┼────┼────┼────┤    │p₂₁│    ×      │q₂₁│
│  ? │  ? │ 5  │ 4  │  ? │    │p₃₁│    ×      │q₃₁│
├────┼────┼────┼────┼────┤    │p₄₁│    ×      │q₄₁│
│  ? │ 4  │ 4  │  ? │ 3  │    │p₅₁│    ×      │q₅₁│
└────┴────┴────┴────┴────┘    └────┘           └────┘

? = predicted: r̂ᵤᵢ = pᵤ · qᵢ

Training with SGD

Matrix Factorization Loss Function

\min_{P, Q} \sum_{(u,i) \in \mathcal{K}} \left( r_{ui} - \mathbf{p}_u \cdot \mathbf{q}_i \right)^2 + \lambda \left( \|\mathbf{p}_u\|^2 + \|\mathbf{q}_i\|^2 \right)

Here,

$r_{ui}$ =Actual rating for user u, item i
$\mathbf{p}_u$ =User latent factors
$\mathbf{q}_i$ =Item latent factors
$\lambda$ =Regularization parameter
$\mathcal{K}$ =Set of observed ratings

Gradient updates:

User Factor Update

\mathbf{p}_u \leftarrow \mathbf{p}_u + \gamma \left( e_{ui} \cdot \mathbf{q}_i - \lambda \mathbf{p}_u \right)

Here,

$e_{ui}$ =Prediction error: r_{ui} - r̂_{ui}
$\gamma$ =Learning rate
$\lambda$ =Regularization parameter

Item Factor Update

\mathbf{q}_i \leftarrow \mathbf{q}_i + \gamma \left( e_{ui} \cdot \mathbf{p}_u - \lambda \mathbf{q}_i \right)

Here,

$e_{ui}$ =Prediction error: r_{ui} - r̂_{ui}
$\gamma$ =Learning rate
$\lambda$ =Regularization parameter

💡 SGD vs ALS

SGD updates all factors simultaneously but may get stuck in local minima. ALS alternates between fixing P and solving for Q (and vice versa), leading to a closed-form solution per step. ALS is more parallelizable and commonly used in production systems.

class MatrixFactorization:
    def __init__(self, n_users, n_items, n_factors=20, lr=0.01, reg=0.02):
        self.P = np.random.normal(0, 0.1, (n_users, n_factors))
        self.Q = np.random.normal(0, 0.1, (n_items, n_factors))
        self.lr = lr
        self.reg = reg

    def fit(self, ratings, n_epochs=100):
        """
        ratings: list of (user, item, rating) tuples
        """
        for epoch in range(n_epochs):
            np.random.shuffle(ratings)
            total_error = 0

            for u, i, r in ratings:
                pred = np.dot(self.P[u], self.Q[i])
                error = r - pred
                total_error += error ** 2

                # Update factors
                p_old = self.P[u].copy()
                self.P[u] += self.lr * (error * self.Q[i] - self.reg * self.P[u])
                self.Q[i] += self.lr * (error * p_old - self.reg * self.Q[i])

            rmse = np.sqrt(total_error / len(ratings))
            if (epoch + 1) % 20 == 0:
                print(f"Epoch {epoch+1:3d} | RMSE: {rmse:.4f}")

    def predict(self, user, item):
        return np.clip(np.dot(self.P[user], self.Q[item]), 1, 5)

    def recommend(self, user, n=5):
        scores = np.dot(self.P[user], self.Q.T)
        top_items = np.argsort(scores)[::-1][:n]
        return [(item, scores[item]) for item in top_items]

# Example
ratings_data = [
    (0, 0, 5), (0, 1, 3), (0, 3, 1), (0, 4, 4),
    (1, 0, 4), (1, 3, 1), (1, 4, 2),
    (2, 0, 1), (2, 1, 1), (2, 3, 5), (2, 4, 4),
    (3, 2, 5), (3, 3, 4),
    (4, 1, 4), (4, 2, 4), (4, 4, 3),
]

mf = MatrixFactorization(n_users=5, n_items=5, n_factors=10, lr=0.01, reg=0.02)
mf.fit(ratings_data, n_epochs=100)
print("\nPredictions:")
for u in range(5):
    for i in range(5):
        if ratings[u][i] == 0:
            print(f"  User {u}, Item {i}: {mf.predict(u, i):.2f}")

ALS (Alternating Least Squares)

Fixes one factor matrix and solves for the other:

ALS Closed-Form Solution

\mathbf{p}_u = (Q_{I_u}^T Q_{I_u} + \lambda I)^{-1} Q_{I_u}^T \mathbf{r}_u

Here,

$Q_{I_u}$ =Rows of Q corresponding to items rated by u
$\lambda$ =Regularization parameter
$I$ =Identity matrix
$\mathbf{r}_u$ =User u's ratings vector

ThALS Convergence

Alternating Least Squares converges to a local minimum because each subproblem (fixing P, solving for Q or vice versa) has a closed-form solution that monotonically decreases the objective function.

class ALS:
    def __init__(self, n_users, n_items, n_factors=20, reg=0.1):
        self.P = np.random.normal(0, 0.1, (n_users, n_factors))
        self.Q = np.random.normal(0, 0.1, (n_items, n_factors))
        self.reg = reg

    def fit(self, ratings_matrix, n_iterations=15):
        """
        ratings_matrix: (n_users, n_items) with 0 for missing
        """
        for iteration in range(n_iterations):
            # Fix Q, solve for P
            for u in range(len(ratings_matrix)):
                idx = np.where(ratings_matrix[u] > 0)[0]
                if len(idx) == 0:
                    continue
                Q_u = self.Q[idx]
                r_u = ratings_matrix[u, idx]
                self.P[u] = np.linalg.solve(
                    Q_u.T @ Q_u + self.reg * np.eye(Q_u.shape[1]),
                    Q_u.T @ r_u
                )

            # Fix P, solve for Q
            for i in range(ratings_matrix.shape[1]):
                idx = np.where(ratings_matrix[:, i] > 0)[0]
                if len(idx) == 0:
                    continue
                P_i = self.P[idx]
                r_i = ratings_matrix[idx, i]
                self.Q[i] = np.linalg.solve(
                    P_i.T @ P_i + self.reg * np.eye(P_i.shape[1]),
                    P_i.T @ r_i
                )

    def predict(self):
        return self.P @ self.Q.T

als = ALS(n_users=5, n_items=5, n_factors=10, reg=0.1)
als.fit(ratings)
predicted = als.predict()
print("ALS predicted ratings:\n", np.round(predicted, 2))

4. Content-Based Filtering

Item Feature Representation

Content-Based Prediction

\hat{r}_{ui} = \mathbf{w}_u \cdot \mathbf{x}_i

Here,

$\mathbf{x}_i$ =Feature vector of item i
$\mathbf{w}_u$ =User preference weights

import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import linear_kernel

class ContentBasedRecommender:
    def __init__(self):
        self.tfidf = TfidfVectorizer(stop_words='english')
        self.cosine_sim = None
        self.items = None

    def fit(self, items_df, content_column='description'):
        """items_df: DataFrame with 'id' and content_column"""
        self.items = items_df.reset_index(drop=True)

        # Build TF-IDF matrix
        tfidf_matrix = self.tfidf.fit_transform(self.items[content_column])

        # Compute cosine similarity
        self.cosine_sim = linear_kernel(tfidf_matrix, tfidf_matrix)

    def get_recommendations(self, item_id, n=10):
        idx = self.items[self.items['id'] == item_id].index[0]

        # Get similarity scores
        sim_scores = list(enumerate(self.cosine_sim[idx]))
        sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)

        # Exclude itself
        sim_scores = sim_scores[1:n+1]
        item_indices = [s[0] for s in sim_scores]

        return self.items.iloc[item_indices][['id', 'title', 'genres']]

# Example
movies = pd.DataFrame({
    'id': [1, 2, 3, 4, 5],
    'title': ['The Matrix', 'Inception', 'Toy Story', 'The Dark Knight', 'Finding Nemo'],
    'genres': ['Action Sci-Fi', 'Action Sci-Fi Thriller', 'Animation Comedy',
               'Action Crime Thriller', 'Animation Comedy Adventure'],
    'description': [
        'A computer hacker learns about the true nature of reality',
        'A thief who steals secrets through dream-sharing technology',
        'A cowboy doll is threatened by a new spaceman toy',
        'Batman faces the Joker, a criminal mastermind',
        'A clownfish searches for his missing son',
    ]
})

rec = ContentBasedRecommender()
rec.fit(movies, 'description')
print(rec.get_recommendations(1, n=3))

📝TF-IDF Content Similarity

For two movies with descriptions:

Movie 1: "A computer hacker learns about reality"
Movie 2: "A thief steals secrets through dreams"

TF-IDF vectors will have high overlap on words like "computer" vs "dreams" if they appear frequently in the corpus. Cosine similarity measures the angle between these vectors, giving higher scores to semantically similar content.

5. Neural Collaborative Filtering

Deep learning approach that learns non-linear user-item interactions:

Neural Collaborative Filtering

\hat{y}_{ui} = f(\mathbf{p}_u, \mathbf{q}_i, \mathbf{x}_u, \mathbf{x}_i)

Here,

$\mathbf{p}_u$ =User latent factors
$\mathbf{q}_i$ =Item latent factors
$\mathbf{x}_u$ =User side information
$\mathbf{x}_i$ =Item side information
$f$ =Neural network function

ℹ️ Neural CF Advantages

Neural CF can capture non-linear and complex interactions that linear matrix factorization cannot. It combines the expressive power of neural networks with the interpretability of latent factor models.

import torch
import torch.nn as nn

class NeuralCF(nn.Module):
    def __init__(self, n_users, n_items, embed_dim=64, hidden_dims=[128, 64, 32]):
        super().__init__()
        self.user_embed = nn.Embedding(n_users, embed_dim)
        self.item_embed = nn.Embedding(n_items, embed_dim)

        layers = []
        input_dim = embed_dim * 2
        for h_dim in hidden_dims:
            layers.extend([
                nn.Linear(input_dim, h_dim),
                nn.ReLU(),
                nn.BatchNorm1d(h_dim),
                nn.Dropout(0.2),
            ])
            input_dim = h_dim
        layers.append(nn.Linear(input_dim, 1))
        self.mlp = nn.Sequential(*layers)

    def forward(self, user_ids, item_ids):
        user_emb = self.user_embed(user_ids)
        item_emb = self.item_embed(item_ids)
        x = torch.cat([user_emb, item_emb], dim=-1)
        return self.mlp(x).squeeze()

# Training
model = NeuralCF(n_users=1000, n_items=5000)
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)

# Example training loop
for epoch in range(20):
    model.train()
    for batch_users, batch_items, batch_ratings in train_loader:
        optimizer.zero_grad()
        predictions = model(batch_users, batch_items)
        loss = criterion(predictions, batch_ratings.float())
        loss.backward()
        optimizer.step()

📝Neural CF Architecture

For n_users=1000, n_items=5000, embed_dim=64:

User embedding: (1000, 64) → lookup user_id → (64,)
Item embedding: (5000, 64) → lookup item_id → (64,)
Concatenate: (128,)
MLP layers: 128 → 64 → 32 → 1
Output: scalar rating prediction

Total parameters: ~320K (embeddings) + ~9K (MLP)

6. Evaluation Metrics

Rating Prediction Metrics

RMSE (Root Mean Squared Error):

Root Mean Squared Error

\text{RMSE} = \sqrt{\frac{1}{|\mathcal{T}|} \sum_{(u,i) \in \mathcal{T}} (r_{ui} - \hat{r}_{ui})^2}

Here,

$\mathcal{T}$ =Test set of (user, item) pairs
$r_{ui}$ =Actual rating
$\hat{r}_{ui}$ =Predicted rating

MAE (Mean Absolute Error):

Mean Absolute Error

\text{MAE} = \frac{1}{|\mathcal{T}|} \sum_{(u,i) \in \mathcal{T}} |r_{ui} - \hat{r}_{ui}|

Here,

$\mathcal{T}$ =Test set of (user, item) pairs
$r_{ui}$ =Actual rating
$\hat{r}_{ui}$ =Predicted rating

Ranking Metrics

Precision@K

\text{Precision@K} = \frac{|\text{Top-K} \cap \text{Relevant}|}{K}

Here,

$Top-K$ =Set of top-K recommended items
$Relevant$ =Set of relevant items

Recall@K

\text{Recall@K} = \frac{|\text{Top-K} \cap \text{Relevant}|}{|\text{Relevant}|}

Here,

$Top-K$ =Set of top-K recommended items
$Relevant$ =Set of relevant items

NDCG@K (Normalized Discounted Cumulative Gain):

Normalized Discounted Cumulative Gain

\text{NDCG@K} = \frac{\text{DCG@K}}{\text{IDCG@K}} \qquad \text{DCG@K} = \sum_{i=1}^{K} \frac{2^{r_i} - 1}{\log_2(i+1)}

Here,

$r_i$ =Relevance score of item at position i
$K$ =Number of recommendations to evaluate
$IDCG@K$ =Ideal DCG for the best possible ranking

MAP@K (Mean Average Precision):

Mean Average Precision

\text{AP@K} = \frac{1}{\min(K, m)} \sum_{k=1}^{K} \text{precision}(k) \cdot \text{rel}(k)

Here,

$m$ =Number of relevant items
$K$ =Number of recommendations
$precision(k)$ =Precision at position k
$rel(k)$ =Relevance indicator at position k

ℹ️ Choosing Metrics

RMSE/MAE measure rating prediction accuracy but don't capture ranking quality. Precision@K and Recall@K measure binary relevance. NDCG@K accounts for position-dependent relevance. Use NDCG@K when item ordering matters.

import numpy as np

class RecommenderEvaluator:
    @staticmethod
    def rmse(predictions, actuals):
        return np.sqrt(np.mean((np.array(predictions) - np.array(actuals)) ** 2))

    @staticmethod
    def mae(predictions, actuals):
        return np.mean(np.abs(np.array(predictions) - np.array(actuals)))

    @staticmethod
    def precision_at_k(recommended, relevant, k):
        recommended = recommended[:k]
        relevant_set = set(relevant)
        hits = len([r for r in recommended if r in relevant_set])
        return hits / k

    @staticmethod
    def recall_at_k(recommended, relevant, k):
        recommended = recommended[:k]
        relevant_set = set(relevant)
        hits = len([r for r in recommended if r in relevant_set])
        return hits / len(relevant) if relevant else 0

    @staticmethod
    def ndcg_at_k(recommended, relevant_scores, k):
        """
        recommended: list of item ids in recommended order
        relevant_scores: dict {item_id: relevance_score}
        """
        dcg = 0.0
        for i, item in enumerate(recommended[:k]):
            rel = relevant_scores.get(item, 0)
            dcg += (2 ** rel - 1) / np.log2(i + 2)

        # Ideal DCG
        ideal = sorted(relevant_scores.values(), reverse=True)[:k]
        idcg = sum((2 ** r - 1) / np.log2(i + 2) for i, r in enumerate(ideal))

        return dcg / idcg if idcg > 0 else 0.0

    @staticmethod
    def mean_average_precision(user_recommendations, user_relevant, k=10):
        aps = []
        for rec, rel in zip(user_recommendations, user_relevant):
            rel_set = set(rel)
            hits = 0
            precision_sum = 0.0
            for i, item in enumerate(rec[:k]):
                if item in rel_set:
                    hits += 1
                    precision_sum += hits / (i + 1)
            ap = precision_sum / min(k, len(rel)) if rel else 0
            aps.append(ap)
        return np.mean(aps)

# Example evaluation
evaluator = RecommenderEvaluator()

predictions = [4.2, 3.8, 5.0, 2.1, 4.5]
actuals = [4, 4, 5, 2, 5]

print(f"RMSE: {evaluator.rmse(predictions, actuals):.4f}")
print(f"MAE:  {evaluator.mae(predictions, actuals):.4f}")

recommended = [1, 5, 3, 7, 2, 8, 10, 4, 6, 9]
relevant = [1, 3, 5, 7]
print(f"Precision@5: {evaluator.precision_at_k(recommended, relevant, 5):.4f}")
print(f"Recall@5:    {evaluator.recall_at_k(recommended, relevant, 5):.4f}")

relevant_scores = {1: 3, 3: 2, 5: 3, 7: 1}
print(f"NDCG@5:      {evaluator.ndcg_at_k(recommended, relevant_scores, 5):.4f}")

📝NDCG Calculation

For recommended items [A, B, C] with relevance scores {A: 3, B: 2, C: 1}:

DCG@3 = (2^3-1)/log2(2) + (2^2-1)/log2(3) + (2^1-1)/log2(4) = 7/1 + 3/1.585 + 1/2 = 7 + 1.89 + 0.5 = 9.39
IDCG@3 (ideal order [A, B, C]) = same = 9.39
NDCG@3 = 9.39 / 9.39 = 1.0

For a worse ranking [C, A, B]: DCG = (2^1-1)/1 + (2^3-1)/1.585 + (2^2-1)/2 = 1 + 4.42 + 1.5 = 6.92 NDCG = 6.92 / 9.39 = 0.74

7. Hybrid Methods

Weighted Hybrid

Weighted Hybrid Prediction

\hat{r}_{ui} = \alpha \cdot \hat{r}_{ui}^{\text{CF}} + (1-\alpha) \cdot \hat{r}_{ui}^{\text{CB}}

Here,

$\alpha$ =Weight for collaborative filtering
$\hat{r}_{ui}^{\text{CF}}$ =CF prediction
$\hat{r}_{ui}^{\text{CB}}$ =Content-based prediction

Feature-Combination Hybrid

Combine user and item features with collaborative signals:

class HybridRecommender(nn.Module):
    def __init__(self, n_users, n_items, user_features_dim, item_features_dim, embed_dim=64):
        super().__init__()
        self.user_embed = nn.Embedding(n_users, embed_dim)
        self.item_embed = nn.Embedding(n_items, embed_dim)
        self.user_features_fc = nn.Linear(user_features_dim, embed_dim)
        self.item_features_fc = nn.Linear(item_features_dim, embed_dim)

        self.classifier = nn.Sequential(
            nn.Linear(embed_dim * 4, 128),
            nn.ReLU(),
            nn.Dropout(0.3),
            nn.Linear(128, 1),
        )

    def forward(self, user_ids, item_ids, user_features, item_features):
        user_emb = self.user_embed(user_ids)
        item_emb = self.item_embed(item_ids)
        user_feat = self.user_features_fc(user_features)
        item_feat = self.item_features_fc(item_features)

        x = torch.cat([user_emb, item_emb, user_feat, item_feat], dim=-1)
        return self.classifier(x).squeeze()

💡 Hybrid System Design

In practice, hybrid systems often use a two-stage approach: first generate candidates using fast methods (CF or content-based), then re-rank using a more complex model (neural hybrid). This balances efficiency with recommendation quality.

8. Cold Start Problem

New User (No History)

Use content-based recommendations (demographics, popular items)
Ask for explicit preferences (onboarding questionnaire)
Use bandit algorithms to explore quickly

New Item (No Interactions)

Use content features to find similar existing items
Popularity-based initial exposure
Knowledge-based recommendations (item attributes)

DfCold Start Problem

The cold start problem occurs when a recommendation system cannot make accurate predictions for new users or new items due to insufficient historical interaction data.

9. Key Takeaways

📋Summary: Recommendation Systems

Collaborative filtering leverages user behavior patterns — no content features needed
Matrix factorization (SVD, ALS) compresses sparse user-item matrices into dense latent factors
Content-based methods use item features — handles cold start for new items
Neural CF learns non-linear interactions between users and items
Hybrid methods combine CF and CB for best of both worlds
Evaluation: use RMSE/MAE for rating prediction, Precision@K/NDCG@K for ranking
Cold start remains the biggest challenge — hybrid approaches help
The choice of similarity metric (cosine vs Pearson) depends on user rating biases
Regularization is critical to prevent overfitting in matrix factorization

10. Practice Exercises

Exercise 1: Build a Collaborative Filter

# TODO: Implement item-based CF on MovieLens dataset
# Compare cosine vs Pearson similarity
# Evaluate with RMSE on held-out test set
# Target: RMSE < 0.9

Exercise 2: Matrix Factorization from Scratch

# TODO: Implement SGD-based matrix factorization
# Tune: learning rate, regularization, latent factors
# Plot training RMSE vs epochs
# Compare with ALS implementation

Exercise 3: Content-Based + Hybrid

# TODO: Build a content-based recommender for books
# Use: title, author, genre, description as features
# Combine with CF using weighted hybrid
# Find optimal alpha

Exercise 4: Deep Learning Recommender

# TODO: Implement Neural Collaborative Filtering
# Use: embedding layers + MLP
# Compare with matrix factorization baseline
# Experiment with architecture (layers, dropout)

Recommendation Systems