Hugging Face Complete Guide — Open Source ML Platform, Transformers & Model Hub
Hugging Face is the GitHub of machine learning — a platform hosting over 500,000 models, 100,000 datasets, and thousands of demo apps. It's the central hub for the open-source AI ecosystem.
What is Hugging Face?
Hugging Face provides tools, libraries, and a platform for building, sharing, and deploying machine learning models. Think of it as:
- GitHub for ML models (Model Hub)
- PyPI for ML libraries (transformers, diffusers)
- Vercel for ML demos (Spaces)
Architecture Diagram
Hugging Face Ecosystem:
+-------------------------------------------------+
| Model Hub (500K+ models) |
| +- LLMs (Llama, Mistral, Qwen) |
| +- Image models (SD, Flux) |
| +- Audio models (Whisper, Bark) |
| +- Multimodal (CLIP, LLaVA) |
| +- Domain-specific (biomed, finance) |
+-------------------------------------------------+
| Libraries |
| +- transformers (NLP, vision, audio) |
| +- diffusers (image generation) |
| +- datasets (data loading) |
| +- tokenizers (fast tokenization) |
| +- accelerate (distributed training) |
+-------------------------------------------------+
| Spaces (100K+ demo apps) |
| +- Gradio apps |
| +- Streamlit apps |
| +- Static hosting |
+-------------------------------------------------+
| Inference API (hosted model inference) |
| Enterprise Hub (private model hosting) |
+-------------------------------------------------+
Transformers Library Deep Dive
Core Architecture
from transformers import AutoModelForCausalLM, AutoTokenizer
# Universal model loading
model_name = "meta-llama/Llama-3.3-70B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
device_map="auto"
)
# Generate text
inputs = tokenizer("Hello, how are you?", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0]))
Pipeline API — One-Line ML
from transformers import pipeline
# Text generation
generator = pipeline("text-generation", model="gpt2")
generator("Once upon a time")
# Sentiment analysis
classifier = pipeline("sentiment-analysis")
classifier("I love this product!") # [{'label': 'POSITIVE', 'score': 0.99}]
# Named entity recognition
ner = pipeline("ner", grouped_entities=True)
ner("Hugging Face is based in New York")
# Question answering
qa = pipeline("question-answering")
qa(question="What is the capital?", context="France's capital is Paris")
# Translation
translator = pipeline("translation", model="Helsinki-NLP/opus-mt-en-fr")
translator("Hello, how are you?")
# Image classification
img_classifier = pipeline("image-classification")
img_classifier("cat.jpg")
# Audio transcription
transcriber = pipeline("automatic-speech-recognition")
transcriber("audio.wav")
Model Hub
Finding Models
Architecture Diagram
Model Hub Features:
1. Search by task
- Text generation
- Image classification
- Object detection
- Speech recognition
- And 100+ more tasks
2. Search by framework
- PyTorch
- TensorFlow
- JAX
- ONNX
- Core ML
3. Search by language
- English, Chinese, French, etc.
- Multilingual models
- Code-specific models
4. Filters
- Model size
- License type
- Downloads
- Last updated
- Trending
Model Card
Every model has a Model Card with:
- Model description
- Intended use
- Training data
- Evaluation results
- Limitations
- Bias analysis
- Code examples
Diffusers Library
from diffusers import StableDiffusionPipeline
# Load model
pipe = StableDiffusionPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
torch_dtype=torch.float16
)
pipe = pipe.to("cuda")
# Generate image
image = pipe("A beautiful sunset over mountains").images[0]
image.save("sunset.png")
# With parameters
image = pipe(
"A cat in space",
negative_prompt="blurry, low quality",
num_inference_steps=50,
guidance_scale=7.5,
width=1024,
height=1024
).images[0]
Datasets Library
from datasets import load_dataset
# Load any dataset
dataset = load_dataset("imdb") # Movie reviews
print(dataset)
# DatasetDict({
# train: Dataset({features: ['text', 'label'], num_rows: 25000})
# test: Dataset({features: ['text', 'label'], num_rows: 25000})
# })
# Access data
print(dataset['train'][0])
# {'text': 'This movie is great...', 'label': 1}
# Filter, map, shuffle
dataset = dataset.filter(lambda x: x['label'] == 1)
dataset = dataset.map(lambda x: {'text': x['text'].lower()})
dataset = dataset.shuffle(seed=42)
Spaces — Deploy ML Demos
# Create a Gradio demo
import gradio as gr
def classify(image):
result = pipeline("image-classification")(image)
return {r['label']: r['score'] for r in result}
demo = gr.Interface(
fn=classify,
inputs=gr.Image(),
outputs=gr.Label(num_top_classes=3)
)
demo.launch()
Fine-Tuning with Hugging Face
from transformers import Trainer, TrainingArguments
# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained("gpt2")
tokenizer = AutoTokenizer.from_pretrained("gpt2")
# Tokenize dataset
def tokenize(examples):
return tokenizer(examples["text"], truncation=True, padding=True)
dataset = load_dataset("imdb")
dataset = dataset.map(tokenize, batched=True)
# Training arguments
args = TrainingArguments(
output_dir="./results",
num_train_epochs=3,
per_device_train_batch_size=8,
learning_rate=2e-5,
evaluation_strategy="epoch"
)
# Train
trainer = Trainer(
model=model,
args=args,
train_dataset=dataset["train"],
eval_dataset=dataset["test"]
)
trainer.train()
Pricing
| Service | Price |
|---|---|
| Model Hub | Free (public models) |
| Spaces (basic) | Free |
| Spaces (GPU) | $0.60/hr (T4) |
| Inference API | Free tier + pay-per-use |
| Enterprise Hub | $20/user/month |
Key Takeaways
- Hugging Face is the central hub for open-source ML
- transformers library supports 100+ ML tasks
- 500K+ models available on Model Hub
- Pipeline API enables one-line ML inference
- Diffusers library for image generation
- Spaces for deploying ML demos
- Datasets library for easy data loading
- Fine-tuning made easy with Trainer API
- Auto classes automatically detect model architecture
- Free tier available for all services
Further Reading
- Hugging Face Course: https://huggingface.co/learn
- Transformers Docs: https://huggingface.co/docs/transformers
- Model Hub: https://huggingface.co/models