Design AI Content Moderation System
Building multi-modal content moderation for billions of posts with high accuracy
Interview Question
"Design an AI content moderation system like Facebook or YouTube that can analyze text, images, and videos in real-time, detect policy violations with high accuracy, and handle millions of content submissions daily while maintaining low latency."
Difficulty: Hard | Frequently asked at Meta, Google/YouTube, TikTok, Twitter
1. Requirements Gathering
Functional Requirements
- Multi-modal Analysis: Analyze text, images, and videos
- Real-time Detection: Classify content in real-time as it's uploaded
- Policy Compliance: Enforce complex community guidelines
- Human Review: Escalate uncertain cases to human moderators
- Appeals Process: Allow users to appeal moderation decisions
- Explainability: Provide reasons for moderation decisions
- Continuous Learning: Adapt to new types of violations
Non-Functional Requirements
- Latency: < 500ms for text, < 2s for images, < 10s for videos
- Throughput: 10M+ content items/day, 1000+ items/second at peak
- Accuracy: > 95% recall for high-severity violations
- Precision: > 99% (minimize false positives)
- Availability: 99.99% uptime
- Scale: 2B+ users, 500M+ daily posts
- Multilingual: Support 100+ languages
βΉοΈ
Scale Perspective: Facebook processes over 2B daily active users, with 500M+ posts daily. YouTube receives 500+ hours of video per minute. Content moderation must scale to handle this volume while maintaining accuracy and low latency.
2. High-Level Architecture Overview
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β CONTENT UPLOAD β
β Mobile Apps β Web Clients β APIs β Live Streaming β Stories β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β CONTENT INGESTION β
β Message Queue β Content Storage β Metadata Extraction β Pre-processing β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββΌββββββββββββββββ
βΌ βΌ βΌ
ββββββββββββββββββββββββββ βββββββββββββββββ ββββββββββββββββββββββββ
β TEXT ANALYSIS β β IMAGE ANALYSISβ β VIDEO ANALYSIS β
β (NLP Models) β β (CV Models) β β (3D CNN + NLP) β
β (< 100ms) β β (< 500ms) β β (< 5s) β
ββββββββββββββββββββββββββ βββββββββββββββββ ββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β DECISION ENGINE β
β Score Aggregation β Policy Rules β Confidence Threshold β Action Selection β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββΌββββββββββββββββ
βΌ βΌ βΌ
ββββββββββββββββββββββββββ βββββββββββββββββ ββββββββββββββββββββββββ
β AUTO-APPROVE β β HUMAN REVIEW β β AUTO-REMOVE β
β (High confidence safe)β β (Uncertain) β β (High confidence bad)β
ββββββββββββββββββββββββββ βββββββββββββββββ ββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β FEEDBACK LOOP β
β Moderator Decisions β User Appeals β Policy Updates β Model Retraining β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
π‘
Key Insight: Content moderation requires multi-modal analysis. Text, images, and videos each need specialized models. The decision engine must combine signals from all modalities and apply complex policy rules.
3. Data Pipeline Design
3.1 Content Data Model
from dataclasses import dataclass
from typing import List, Dict, Optional
from datetime import datetime
@dataclass
class ContentItem:
content_id: str
user_id: str
content_type: str # text, image, video, live
timestamp: datetime
# Text content
text: Optional[str]
# Image content
image_url: Optional[str]
image_hashes: Optional[List[str]]
# Video content
video_url: Optional[str]
video_duration: Optional[float]
thumbnail_url: Optional[str]
# Metadata
language: str
device_type: str
location: Optional[Dict]
# Moderation status
moderation_status: str # pending, approved, rejected, under_review
confidence_score: Optional[float]
violation_types: Optional[List[str]]
@dataclass
class ModerationDecision:
content_id: str
decision: str # approve, reject, escalate
confidence: float
violation_types: List[str]
explanation: str
moderator_id: Optional[str]
timestamp: datetime
appeal_status: Optional[str]
3.2 Multi-Modal Feature Extraction
class MultiModalFeatureExtractor:
def __init__(self):
self.text_analyzer = TextAnalyzer()
self.image_analyzer = ImageAnalyzer()
self.video_analyzer = VideoAnalyzer()
async def extract_features(self, content: ContentItem) -> Dict:
features = {}
if content.text:
text_features = await self.text_analyzer.extract(content.text)
features['text'] = text_features
if content.image_url:
image_features = await self.image_analyzer.extract(content.image_url)
features['image'] = image_features
if content.video_url:
video_features = await self.video_analyzer.extract(content.video_url)
features['video'] = video_features
# Cross-modal features
if 'text' in features and 'image' in features:
features['text_image_match'] = self.compute_cross_modal_match(
features['text'], features['image']
)
return features
class TextAnalyzer:
async def extract(self, text: str) -> Dict:
return {
'toxicity_score': await self.predict_toxicity(text),
'hate_speech_score': await self.predict_hate_speech(text),
'spam_score': await self.predict_spam(text),
'language': await self.detect_language(text),
'sentiment': await self.analyze_sentiment(text),
'entities': await self.extract_entities(text),
'topic': await self.classify_topic(text)
}
class ImageAnalyzer:
async def extract(self, image_url: str) -> Dict:
image = await self.load_image(image_url)
return {
'nsfw_score': await self.predict_nsfw(image),
'violence_score': await self.predict_violence(image),
'gore_score': await self.predict_gore(image),
'face_count': await self.detect_faces(image),
'ocr_text': await self.extract_text_from_image(image),
'objects': await self.detect_objects(image),
'scene': await self.classify_scene(image)
}
β οΈ
Critical Design Considerations:
- Multi-modal fusion: Text and images together can be more harmful than separately
- Context: Same image can be benign or harmful depending on context
- Adversarial attacks: Users try to evade detection with subtle modifications
- Cultural sensitivity: Different regions have different standards
4. Model Selection and Training
4.1 Multi-Modal Architecture
class ContentModerationModel:
def __init__(self):
self.text_model = TextClassificationModel()
self.image_model = ImageClassificationModel()
self.video_model = VideoClassificationModel()
self.fusion_model = MultiModalFusionModel()
async def predict(self, content: ContentItem) -> Dict:
predictions = {}
if content.text:
text_pred = await self.text_model.predict(content.text)
predictions['text'] = text_pred
if content.image_url:
image_pred = await self.image_model.predict(content.image_url)
predictions['image'] = image_pred
if content.video_url:
video_pred = await self.video_model.predict(content.video_url)
predictions['video'] = video_pred
# Multi-modal fusion
if len(predictions) > 1:
fused_pred = await self.fusion_model.fuse(predictions)
return fused_pred
return list(predictions.values())[0]
4.2 Training Strategy
class ModerationTrainer:
def __init__(self):
self.models = {}
def train_with_hard_examples(self, train_data, hard_examples):
"""Train with focus on hard examples"""
# Standard training
self.model.fit(train_data)
# Hard negative mining
hard_negatives = self.mine_hard_negatives(train_data)
# Re-train with emphasis on hard examples
combined_data = train_data + hard_negatives * 3 # Oversample
self.model.fit(combined_data)
def active_learning(self, unlabeled_data, budget=100):
"""Select most informative examples for labeling"""
uncertainties = []
for item in unlabeled_data:
pred = self.model.predict(item)
uncertainty = self.compute_uncertainty(pred)
uncertainties.append((item, uncertainty))
# Select most uncertain examples
uncertainties.sort(key=lambda x: x[1], reverse=True)
return [item for item, _ in uncertainties[:budget]]
βΉοΈ
Training Best Practices:
- Use focal loss for class imbalance
- Implement hard negative mining
- Use active learning for efficient labeling
- Regular retraining with new violation types
5. Serving Architecture
5.1 Real-time Moderation Pipeline
Content Upload β Message Queue β Parallel Processing β Decision Engine β Action
β β β β
βΌ βΌ βΌ βΌ
βββββββββββ βββββββββββ βββββββββββ βββββββββββ
β Kafka β β Text β β Score β β Approve β
β Queue β β Image β β Fusion β β Reject β
β β β Video β β β β Escalateβ
βββββββββββ βββββββββββ βββββββββββ βββββββββββ
5.2 Human-in-the-Loop
class HumanInTheLoop:
def __init__(self):
self.uncertainty_threshold = 0.7
self.priority_queue = PriorityQueue()
async def should_escalate(self, prediction):
# Check confidence
if prediction['confidence'] < self.uncertainty_threshold:
return True
# Check violation type
high_risk_violations = ['terrorism', 'child_exploitation', 'imminent_harm']
if any(v in prediction['violation_types'] for v in high_risk_violations):
return True
return False
async def assign_to_moderator(self, content, prediction):
# Determine priority
priority = self.compute_priority(prediction)
# Find available moderator with expertise
moderator = await self.find_moderator(prediction['violation_types'])
# Assign
await self.priority_queue.add(content, priority, moderator)
return moderator
π‘
Human Review Tips:
- Prioritize high-severity violations
- Match moderator expertise to violation type
- Provide clear context and explanations
- Track moderator performance and wellbeing
6. Monitoring and Observability
6.1 Key Metrics
class ModerationMetrics:
QUALITY_METRICS = ['precision', 'recall', 'f1_score', 'false_positive_rate', 'false_negative_rate']
OPERATIONAL_METRICS = ['latency_p50', 'latency_p99', 'throughput', 'queue_depth']
BUSINESS_METRICS = ['user_satisfaction', 'appeal_rate', 'overturn_rate']
SAFETY_METRICS = ['high_severity_recall', 'time_to_detection', 'recidivism_rate']
7. Scale Considerations and Trade-offs
7.1 Horizontal Scaling
Content Volume: Partition by content type and user region
Model Serving: GPU instances for CV models, CPU for NLP
Storage: Distributed object storage for content
Queue: Kafka with partitioning by content type
7.2 Cost vs Performance Trade-offs
| Dimension | Option A (Cost Optimized) | Option B (Performance Optimized) |
|---|---|---|
| Model Complexity | Lightweight models | Ensemble of heavy models |
| Human Review | Minimal human review | Extensive human review |
| Latency | Batch processing | Real-time processing |
| Accuracy | Accept some false negatives | Minimize all errors |
8. Advanced Topics
8.1 Adversarial Robustness
class AdversarialRobustness:
def detect_evasion(self, content):
# Check for image obfuscation
if self.detect_obfuscated_image(content.image):
return True
# Check for text obfuscation
if self.detect_obfuscated_text(content.text):
return True
# Check for encoding tricks
if self.detect_encoding_tricks(content):
return True
return False
8.2 Cross-Modal Understanding
class CrossModalAnalyzer:
def analyze_combined(self, text, image):
# Text-image consistency
consistency = self.compute_consistency(text, image)
# Combined harmfulness
combined_harm = self.compute_combined_harm(text, image)
# Context understanding
context = self.understand_context(text, image)
return {
'consistency': consistency,
'combined_harm': combined_harm,
'context': context
}
9. Implementation Roadmap
Phase 1: Basic Moderation (Weeks 1-4)
- Text classification model
- Basic image classification
- Simple rule engine
Phase 2: Multi-Modal (Weeks 5-8)
- Video analysis
- Cross-modal fusion
- Human review system
Phase 3: Advanced Features (Weeks 9-12)
- Adversarial robustness
- Active learning
- Appeals process
Phase 4: Optimization (Weeks 13-16)
- Latency optimization
- Cost optimization
- Global deployment
10. Summary and Key Takeaways
Architecture Recap
- Multi-modal analysis: Text, image, and video models
- Fusion model: Combines signals from all modalities
- Human-in-the-loop: For uncertain cases
- Feedback loop: Continuous improvement
Key Metrics
- Recall: > 95% for high-severity violations
- Precision: > 99% to minimize false positives
- Latency: < 500ms for text, < 2s for images
Common Interview Mistakes
- Not discussing multi-modal analysis
- Ignoring adversarial robustness
- Forgetting about human review
- Not considering cultural sensitivity
βΉοΈ
Final Interview Tip: Emphasize the balance between automation and human review. Discuss how you'd handle adversarial attacks and cultural differences. Show understanding of both ML techniques and policy requirements.
Further Reading
- "Multimodal Content Moderation" (Meta Research)
- "Adversarial Robustness in Content Moderation" (Google)
- "Human-in-the-Loop Systems" (Microsoft)
- "Content Policy Enforcement at Scale" (YouTube)