Introduction
Named Entity Recognition identifies and classifies entities like persons, organizations, and locations in text.
spaCy NER
import spacy
nlp = spacy.load('en_core_web_sm')
doc = nlp("Apple is looking at buying U.K. startup for $1 billion.")
for ent in doc.ents:
print(f"{ent.text} -> {ent.label_}")
# Apple -> ORG
# U.K. -> GPE
# $1 billion -> MONEY
NER with spaCy
# Custom NER in spaCy
nlp = spacy.blank('en')
ner = nlp.add_pipe('ner')
# Add labels
ner.add_label('PRODUCT')
ner.add_label('EVENT')
# Train
nlp.begin_training()
for _ in range(10):
for text, annotations in training_data:
doc = nlp.make_doc(text)
nlp.update([doc], [annotations])
Transformers NER
from transformers import pipeline
# Use pretrained NER
ner_pipeline = pipeline("ner", model="dslim/bert-base-NER", aggregation_strategy="simple")
result = ner_pipeline("Elon Musk is CEO of SpaceX")
print(result)
# [{'entity_group': 'PER', 'word': 'Elon Musk', ...},
# {'entity_group': 'ORG', 'word': 'SpaceX', ...}]
Fine-tune NER
from transformers import AutoModelForTokenClassification, AutoTokenizer
model = AutoModelForTokenClassification.from_pretrained(
'bert-base-uncased',
num_labels=9 # Number of entity types
)
tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')
# Training with custom dataset
Practice Problems
- Extract entities with spaCy
- Visualize entities
- Use transformers NER
- Fine-tune NER model
- Build custom entity types