ML System Design — Complete Guide
ML system design combines software engineering with ML to build reliable, scalable production systems.
ML System Architecture
Data Layer:
├─ Data collection (streams, batches)
├─ Feature store (serving features)
├─ Data lake/warehouse
└─ Data quality monitoring
Training Layer:
├─ Experiment tracking
├─ Model training (distributed)
├─ Model evaluation
└─ Model registry
Serving Layer:
├─ Real-time inference (API)
├─ Batch prediction
├─ Edge deployment
└─ A/B testing
Monitoring Layer:
├─ Model performance
├─ Data drift
├─ Latency/throughput
└─ Alerting
Feature Store
Feature Store: Central repository for ML features
Benefits:
├─ Consistent features (training vs serving)
├─ Feature reuse across models
├─ Low-latency feature serving
└─ Feature versioning
Tools:
├─ Feast (open source)
├─ Tecton (managed)
├─ Hopsworks (open source)
└─ Databricks Feature Store
Real-Time vs Batch
Real-time:
├─ Sub-100ms latency
├─ Request-response pattern
├─ Use for: Recommendations, fraud detection
├─ Tools: TensorFlow Serving, Triton, BentoML
└─ Infrastructure: Kubernetes, auto-scaling
Batch:
├─ Process millions of records
├─ Scheduled (hourly, daily)
├─ Use for: Report generation, email campaigns
├─ Tools: Spark, Airflow, dbt
└─ Infrastructure: Data lake, warehouse
Key Takeaways
- ML systems require data, training, serving, and monitoring
- Feature stores ensure consistency between training and serving
- Real-time serving needs sub-100ms latency
- Batch prediction for offline processing
- Model registries version and track models
- Monitoring detects data drift and performance degradation
- A/B testing validates model updates
- Scalability requires Kubernetes and auto-scaling