Facebook, YouTube, TikTok, Twitter

Design AI Content Moderation System

Building multi-modal content moderation for billions of posts with high accuracy

Interview Question

"Design an AI content moderation system like Facebook or YouTube that can analyze text, images, and videos in real-time, detect policy violations with high accuracy, and handle millions of content submissions daily while maintaining low latency."

Difficulty: Hard | Frequently asked at Meta, Google/YouTube, TikTok, Twitter

1. Requirements Gathering

Functional Requirements

Multi-modal Analysis: Analyze text, images, and videos
Real-time Detection: Classify content in real-time as it's uploaded
Policy Compliance: Enforce complex community guidelines
Human Review: Escalate uncertain cases to human moderators
Appeals Process: Allow users to appeal moderation decisions
Explainability: Provide reasons for moderation decisions
Continuous Learning: Adapt to new types of violations

Non-Functional Requirements

Latency: < 500ms for text, < 2s for images, < 10s for videos
Throughput: 10M+ content items/day, 1000+ items/second at peak
Accuracy: > 95% recall for high-severity violations
Precision: > 99% (minimize false positives)
Availability: 99.99% uptime
Scale: 2B+ users, 500M+ daily posts
Multilingual: Support 100+ languages

ℹ️

Scale Perspective: Facebook processes over 2B daily active users, with 500M+ posts daily. YouTube receives 500+ hours of video per minute. Content moderation must scale to handle this volume while maintaining accuracy and low latency.

2. High-Level Architecture Overview

Architecture Diagram

┌─────────────────────────────────────────────────────────────────────────────┐
│                         CONTENT UPLOAD                                      │
│  Mobile Apps │ Web Clients │ APIs │ Live Streaming │ Stories                │
└─────────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                         CONTENT INGESTION                                   │
│  Message Queue │ Content Storage │ Metadata Extraction │ Pre-processing    │
└─────────────────────────────────────────────────────────────────────────────┘
                                    │
                    ┌───────────────┼───────────────┐
                    ▼               ▼               ▼
┌────────────────────────┐ ┌───────────────┐ ┌──────────────────────┐
│  TEXT ANALYSIS          │ │ IMAGE ANALYSIS│ │ VIDEO ANALYSIS       │
│  (NLP Models)           │ │ (CV Models)   │ │ (3D CNN + NLP)       │
│  (< 100ms)              │ │ (< 500ms)     │ │ (< 5s)               │
└────────────────────────┘ └───────────────┘ └──────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                        DECISION ENGINE                                       │
│  Score Aggregation │ Policy Rules │ Confidence Threshold │ Action Selection │
└─────────────────────────────────────────────────────────────────────────────┘
                                    │
                    ┌───────────────┼───────────────┐
                    ▼               ▼               ▼
┌────────────────────────┐ ┌───────────────┐ ┌──────────────────────┐
│  AUTO-APPROVE          │ │ HUMAN REVIEW  │ │ AUTO-REMOVE          │
│  (High confidence safe)│ │ (Uncertain)   │ │ (High confidence bad)│
└────────────────────────┘ └───────────────┘ └──────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                        FEEDBACK LOOP                                         │
│  Moderator Decisions │ User Appeals │ Policy Updates │ Model Retraining     │
└─────────────────────────────────────────────────────────────────────────────┘

💡

Key Insight: Content moderation requires multi-modal analysis. Text, images, and videos each need specialized models. The decision engine must combine signals from all modalities and apply complex policy rules.

3. Data Pipeline Design

3.1 Content Data Model

from dataclasses import dataclass
from typing import List, Dict, Optional
from datetime import datetime

@dataclass
class ContentItem:
    content_id: str
    user_id: str
    content_type: str  # text, image, video, live
    timestamp: datetime
    
    # Text content
    text: Optional[str]
    
    # Image content
    image_url: Optional[str]
    image_hashes: Optional[List[str]]
    
    # Video content
    video_url: Optional[str]
    video_duration: Optional[float]
    thumbnail_url: Optional[str]
    
    # Metadata
    language: str
    device_type: str
    location: Optional[Dict]
    
    # Moderation status
    moderation_status: str  # pending, approved, rejected, under_review
    confidence_score: Optional[float]
    violation_types: Optional[List[str]]

@dataclass
class ModerationDecision:
    content_id: str
    decision: str  # approve, reject, escalate
    confidence: float
    violation_types: List[str]
    explanation: str
    moderator_id: Optional[str]
    timestamp: datetime
    appeal_status: Optional[str]

3.2 Multi-Modal Feature Extraction

class MultiModalFeatureExtractor:
    def __init__(self):
        self.text_analyzer = TextAnalyzer()
        self.image_analyzer = ImageAnalyzer()
        self.video_analyzer = VideoAnalyzer()
    
    async def extract_features(self, content: ContentItem) -> Dict:
        features = {}
        
        if content.text:
            text_features = await self.text_analyzer.extract(content.text)
            features['text'] = text_features
        
        if content.image_url:
            image_features = await self.image_analyzer.extract(content.image_url)
            features['image'] = image_features
        
        if content.video_url:
            video_features = await self.video_analyzer.extract(content.video_url)
            features['video'] = video_features
        
        # Cross-modal features
        if 'text' in features and 'image' in features:
            features['text_image_match'] = self.compute_cross_modal_match(
                features['text'], features['image']
            )
        
        return features

class TextAnalyzer:
    async def extract(self, text: str) -> Dict:
        return {
            'toxicity_score': await self.predict_toxicity(text),
            'hate_speech_score': await self.predict_hate_speech(text),
            'spam_score': await self.predict_spam(text),
            'language': await self.detect_language(text),
            'sentiment': await self.analyze_sentiment(text),
            'entities': await self.extract_entities(text),
            'topic': await self.classify_topic(text)
        }

class ImageAnalyzer:
    async def extract(self, image_url: str) -> Dict:
        image = await self.load_image(image_url)
        return {
            'nsfw_score': await self.predict_nsfw(image),
            'violence_score': await self.predict_violence(image),
            'gore_score': await self.predict_gore(image),
            'face_count': await self.detect_faces(image),
            'ocr_text': await self.extract_text_from_image(image),
            'objects': await self.detect_objects(image),
            'scene': await self.classify_scene(image)
        }

⚠️

Critical Design Considerations:

Multi-modal fusion: Text and images together can be more harmful than separately
Context: Same image can be benign or harmful depending on context
Adversarial attacks: Users try to evade detection with subtle modifications
Cultural sensitivity: Different regions have different standards

4. Model Selection and Training

4.1 Multi-Modal Architecture

class ContentModerationModel:
    def __init__(self):
        self.text_model = TextClassificationModel()
        self.image_model = ImageClassificationModel()
        self.video_model = VideoClassificationModel()
        self.fusion_model = MultiModalFusionModel()
    
    async def predict(self, content: ContentItem) -> Dict:
        predictions = {}
        
        if content.text:
            text_pred = await self.text_model.predict(content.text)
            predictions['text'] = text_pred
        
        if content.image_url:
            image_pred = await self.image_model.predict(content.image_url)
            predictions['image'] = image_pred
        
        if content.video_url:
            video_pred = await self.video_model.predict(content.video_url)
            predictions['video'] = video_pred
        
        # Multi-modal fusion
        if len(predictions) > 1:
            fused_pred = await self.fusion_model.fuse(predictions)
            return fused_pred
        
        return list(predictions.values())[0]

4.2 Training Strategy

class ModerationTrainer:
    def __init__(self):
        self.models = {}
    
    def train_with_hard_examples(self, train_data, hard_examples):
        """Train with focus on hard examples"""
        # Standard training
        self.model.fit(train_data)
        
        # Hard negative mining
        hard_negatives = self.mine_hard_negatives(train_data)
        
        # Re-train with emphasis on hard examples
        combined_data = train_data + hard_negatives * 3  # Oversample
        self.model.fit(combined_data)
    
    def active_learning(self, unlabeled_data, budget=100):
        """Select most informative examples for labeling"""
        uncertainties = []
        for item in unlabeled_data:
            pred = self.model.predict(item)
            uncertainty = self.compute_uncertainty(pred)
            uncertainties.append((item, uncertainty))
        
        # Select most uncertain examples
        uncertainties.sort(key=lambda x: x[1], reverse=True)
        return [item for item, _ in uncertainties[:budget]]

ℹ️

Training Best Practices:

Use focal loss for class imbalance
Implement hard negative mining
Use active learning for efficient labeling
Regular retraining with new violation types

5. Serving Architecture

5.1 Real-time Moderation Pipeline

Architecture Diagram

Content Upload → Message Queue → Parallel Processing → Decision Engine → Action
                    │                │                    │              │
                    ▼                ▼                    ▼              ▼
               ┌─────────┐    ┌─────────┐          ┌─────────┐    ┌─────────┐
               │ Kafka   │    │ Text    │          │ Score   │    │ Approve │
               │ Queue   │    │ Image   │          │ Fusion  │    │ Reject  │
               │         │    │ Video   │          │         │    │ Escalate│
               └─────────┘    └─────────┘          └─────────┘    └─────────┘

5.2 Human-in-the-Loop

class HumanInTheLoop:
    def __init__(self):
        self.uncertainty_threshold = 0.7
        self.priority_queue = PriorityQueue()
    
    async def should_escalate(self, prediction):
        # Check confidence
        if prediction['confidence'] < self.uncertainty_threshold:
            return True
        
        # Check violation type
        high_risk_violations = ['terrorism', 'child_exploitation', 'imminent_harm']
        if any(v in prediction['violation_types'] for v in high_risk_violations):
            return True
        
        return False
    
    async def assign_to_moderator(self, content, prediction):
        # Determine priority
        priority = self.compute_priority(prediction)
        
        # Find available moderator with expertise
        moderator = await self.find_moderator(prediction['violation_types'])
        
        # Assign
        await self.priority_queue.add(content, priority, moderator)
        
        return moderator

💡

Human Review Tips:

Prioritize high-severity violations
Match moderator expertise to violation type
Provide clear context and explanations
Track moderator performance and wellbeing

6. Monitoring and Observability

6.1 Key Metrics

class ModerationMetrics:
    QUALITY_METRICS = ['precision', 'recall', 'f1_score', 'false_positive_rate', 'false_negative_rate']
    OPERATIONAL_METRICS = ['latency_p50', 'latency_p99', 'throughput', 'queue_depth']
    BUSINESS_METRICS = ['user_satisfaction', 'appeal_rate', 'overturn_rate']
    SAFETY_METRICS = ['high_severity_recall', 'time_to_detection', 'recidivism_rate']

7. Scale Considerations and Trade-offs

7.1 Horizontal Scaling

Architecture Diagram

Content Volume: Partition by content type and user region
Model Serving: GPU instances for CV models, CPU for NLP
Storage: Distributed object storage for content
Queue: Kafka with partitioning by content type

7.2 Cost vs Performance Trade-offs

Dimension	Option A (Cost Optimized)	Option B (Performance Optimized)
Model Complexity	Lightweight models	Ensemble of heavy models
Human Review	Minimal human review	Extensive human review
Latency	Batch processing	Real-time processing
Accuracy	Accept some false negatives	Minimize all errors

8. Advanced Topics

8.1 Adversarial Robustness

class AdversarialRobustness:
    def detect_evasion(self, content):
        # Check for image obfuscation
        if self.detect_obfuscated_image(content.image):
            return True
        
        # Check for text obfuscation
        if self.detect_obfuscated_text(content.text):
            return True
        
        # Check for encoding tricks
        if self.detect_encoding_tricks(content):
            return True
        
        return False

8.2 Cross-Modal Understanding

class CrossModalAnalyzer:
    def analyze_combined(self, text, image):
        # Text-image consistency
        consistency = self.compute_consistency(text, image)
        
        # Combined harmfulness
        combined_harm = self.compute_combined_harm(text, image)
        
        # Context understanding
        context = self.understand_context(text, image)
        
        return {
            'consistency': consistency,
            'combined_harm': combined_harm,
            'context': context
        }

9. Implementation Roadmap

Phase 1: Basic Moderation (Weeks 1-4)

Text classification model
Basic image classification
Simple rule engine

Phase 2: Multi-Modal (Weeks 5-8)

Video analysis
Cross-modal fusion
Human review system

Phase 3: Advanced Features (Weeks 9-12)

Adversarial robustness
Active learning
Appeals process

Phase 4: Optimization (Weeks 13-16)

Latency optimization
Cost optimization
Global deployment

10. Summary and Key Takeaways

Architecture Recap

Multi-modal analysis: Text, image, and video models
Fusion model: Combines signals from all modalities
Human-in-the-loop: For uncertain cases
Feedback loop: Continuous improvement

Key Metrics

Recall: > 95% for high-severity violations
Precision: > 99% to minimize false positives
Latency: < 500ms for text, < 2s for images

Common Interview Mistakes

Not discussing multi-modal analysis
Ignoring adversarial robustness
Forgetting about human review
Not considering cultural sensitivity

ℹ️

Final Interview Tip: Emphasize the balance between automation and human review. Discuss how you'd handle adversarial attacks and cultural differences. Show understanding of both ML techniques and policy requirements.

Design AI Content Moderation System

Design AI Content Moderation System

Interview Question

1. Requirements Gathering

Functional Requirements

Non-Functional Requirements

2. High-Level Architecture Overview

3. Data Pipeline Design

3.1 Content Data Model

3.2 Multi-Modal Feature Extraction

4. Model Selection and Training

4.1 Multi-Modal Architecture

4.2 Training Strategy

5. Serving Architecture

5.1 Real-time Moderation Pipeline

5.2 Human-in-the-Loop

6. Monitoring and Observability

6.1 Key Metrics

7. Scale Considerations and Trade-offs

7.1 Horizontal Scaling

7.2 Cost vs Performance Trade-offs

8. Advanced Topics

8.1 Adversarial Robustness

8.2 Cross-Modal Understanding

9. Implementation Roadmap

Phase 1: Basic Moderation (Weeks 1-4)

Phase 2: Multi-Modal (Weeks 5-8)

Phase 3: Advanced Features (Weeks 9-12)

Phase 4: Optimization (Weeks 13-16)

10. Summary and Key Takeaways

Architecture Recap

Key Metrics

Common Interview Mistakes

Further Reading