Model Deployment — Complete Guide
Deploying ML models to production requires APIs, containers, monitoring, and scalability.
Deployment Options
REST API:
├─ FastAPI / Flask
├─ JSON input/output
├─ Easy to integrate
└─ Good for most use cases
Docker Container:
├─ Reproducible environment
├─ Deploy anywhere
├─ Scale with Kubernetes
└─ Production standard
Serverless:
├─ AWS Lambda / GCP Functions
├─ Auto-scaling
├─ Pay per request
└─ Good for sporadic traffic
Edge:
├─ ONNX Runtime
├─ TensorFlow Lite
├─ Core ML (Apple)
└─ Low latency, offline
FastAPI Deployment
from fastapi import FastAPI
from pydantic import BaseModel
import joblib
app = FastAPI()
model = joblib.load("model.pkl")
class PredictionRequest(BaseModel):
features: list[float]
@app.post("/predict")
def predict(request: PredictionRequest):
prediction = model.predict([request.features])
return {"prediction": prediction[0]}
Docker
FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
EXPOSE 8000
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]
Key Takeaways
- FastAPI is the best framework for ML APIs
- Docker ensures reproducible deployments
- Kubernetes scales to thousands of requests
- ONNX enables cross-platform deployment
- Monitor latency, errors, and data drift in production
- A/B test new models before full rollout
- Version your models for rollback capability
- Load test before production deployment