Introduction
Word embeddings represent words as dense vectors capturing semantic relationships.
Using Pre-trained Embeddings
import gensim.downloader as api
# Download pre-trained word vectors
model = api.load("glove-wiki-gigaword-100")
# Find similar words
model.most_similar("king")
model.most_similar("cat", topn=5)
# Word analogies
model.most_similar(positive=["king", "woman"], negative=["man"])[0]
Word2Vec
from gensim.models import Word2Vec
sentences = [["cat", "sat", "on", "mat"], ["dog", "ran", "fast"]]
model = Word2Vec(sentences, vector_size=100, window=5, min_count=1)
vector = model.wv["cat"]
similar = model.wv.most_similar("cat")
Embedding Layer in Keras
from tensorflow.keras import layers
embedding = layers.Embedding(input_dim=10000, output_dim=128, input_length=100)
# Input: (batch_size, 100)
# Output: (batch_size, 100, 128)
Practice Problems
- Use pre-trained word vectors
- Train Word2Vec model
- Explore word relationships
- Use embeddings in neural networks
- Visualize word vectors