πŸŽ‰ 75% of content is free forever β€” Unlock Premium from $10/mo β†’
CW
Search courses…
πŸ’Ό Servicesℹ️ Aboutβœ‰οΈ ContactView Pricing Plansfrom $10

Embedding Models

🟒 Free Lesson

Advertisement

Embedding Models

Embedding Space VisualizationcatdogkittencartruckhappyjoySemantic ClustersCosine Similaritysim(A,B) = (A . B) / (||A|| * ||B||)High Similaritycat, kittenScore: 0.89Low Similaritycat, carScore: 0.12

What are Embeddings?

Embeddings are dense vector representations of text, images, or other data in a continuous vector space. Similar items are positioned close together, enabling semantic search and similarity comparisons.

How Embeddings Work

from sentence_transformers import SentenceTransformer
import numpy as np

# Load an embedding model
model = SentenceTransformer('all-MiniLM-L6-v2')

# Generate embeddings
sentences = [
    "The cat sat on the mat",
    "A kitten rested on the rug",
    "The car drove down the road"
]

embeddings = model.encode(sentences)

# Calculate similarity
def cosine_similarity(a, b):
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

# Similar sentences have high similarity
sim_01 = cosine_similarity(embeddings[0], embeddings[1])  # High
sim_02 = cosine_similarity(embeddings[0], embeddings[2])  # Low

print(f"Cat-Kitten similarity: {sim_01:.3f}")
print(f"Cat-Car similarity: {sim_02:.3f}")

Popular Embedding Models

Embedding Models ComparisonOpenAI Ada1536 dimensionsPros: High qualityCons: API costCohere Embed1024 dimensionsPros: MultilingualCons: API dependencySentence-BERT384-768 dimensionsPros: Open sourceCons: English focusedE5/Mistral1024 dimensionsPros: State-of-artCons: Large model size

Fine-tuning Embeddings

from sentence_transformers import SentenceTransformer, InputExample, losses
from torch.utils.data import DataLoader

def fine_tune_embeddings(train_pairs, model_name='all-MiniLM-L6-v2'):
    model = SentenceTransformer(model_name)

    # Prepare training data
    train_examples = [
        InputExample(texts=[pair['text1'], pair['text2']],
                    label=pair['similarity'])
        for pair in train_pairs
    ]

    train_dataloader = DataLoader(train_examples, shuffle=True, batch_size=16)
    train_loss = losses.CosineSimilarityLoss(model)

    model.fit(
        train_objectives=[(train_dataloader, train_loss)],
        epochs=3,
        warmup_steps=100
    )

    return model

Summary

Embeddings are foundational to many AI applications. Understanding their properties and how to use them effectively is essential for building semantic search and retrieval systems.

Next: We'll explore text generation models.

⭐

Premium Content

Embedding Models

Unlock this lesson and 900+ advanced tutorials with a Premium plan.

🎯End-to-end Projects
πŸ’ΌInterview Prep
πŸ“œCertificates
🀝Community Access

Already a member? Log in

Need Expert Generative AI Help?

Get personalized tutoring, project support, or professional consulting.

Advertisement