MLOps: Experiment Tracking and MLflow
MLOps applies DevOps principles to machine learning — automating model training, deployment, and monitoring for reliable production systems.
MLOps Lifecycle
1. Why Experiment Tracking?
In production ML, you need to answer:
- What did we train? (data version, hyperparameters, code version)
- How did it perform? (metrics, artifacts, evaluation results)
- Why did we choose this model? (comparison, ablation studies)
- Can we reproduce it? (environment, dependencies, random seeds)
2. MLflow Architecture
3. MLflow Tracking API
import mlflow
import mlflow.sklearn
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, f1_score
mlflow.set_experiment("churn-prediction")
with mlflow.start_run(run_name="rf-v2-depth-10"):
# Log parameters
params = {"n_estimators": 200, "max_depth": 10, "min_samples_split": 5}
mlflow.log_params(params)
# Train model
model = RandomForestClassifier(**params, random_state=42)
model.fit(X_train, y_train)
# Log metrics
y_pred = model.predict(X_test)
mlflow.log_metric("accuracy", accuracy_score(y_test, y_pred))
mlflow.log_metric("f1_score", f1_score(y_test, y_pred, average="weighted"))
# Log model
mlflow.sklearn.log_model(model, "model")
# Log artifacts
mlflow.log_artifact("confusion_matrix.png")
mlflow.log_artifact("feature_importance.png")
4. Model Registry
# Register model
model_uri = "runs:/<run_id>/model"
result = mlflow.register_model(model_uri, "churn-predictor")
# Transition to staging
client = mlflow.tracking.MlflowClient()
client.transition_model_version_stage(
name="churn-predictor",
version=1,
stage="Staging"
)
# Transition to production
client.transition_model_version_stage(
name="churn-predictor",
version=1,
stage="Production"
)
Model Stages
5. Reproducibility Best Practices
import random
import numpy as np
def set_seed(seed=42):
random.seed(seed)
np.random.seed(seed)
os.environ["PYTHONHASHSEED"] = str(seed)
# Log everything
mlflow.log_param("python_version", sys.version)
mlflow.log_param("sklearn_version", sklearn.__version__)
mlflow.log_param("random_seed", 42)
mlflow.set_tag("git_commit", get_git_commit())
6. Comparison: Tracking Tools
| Feature | MLflow | Weights and Biases | Neptune.ai | Comet ML |
|---|---|---|---|---|
| Open Source | Yes | No | No | No |
| Self-hosted | Yes | Yes | Yes | Yes |
| Model Registry | Yes | Yes | Yes | Yes |
| Hyperparameter Search | No (integrate) | Yes | No | Yes |
| Visualization | Basic | Advanced | Advanced | Advanced |
| Cost | Free | Free tier + paid | Paid | Free tier + paid |
7. Automated Pipeline Integration
# In CI/CD pipeline
def train_and_track(config):
with mlflow.start_run():
mlflow.log_params(config)
model = train(config)
metrics = evaluate(model, test_data)
mlflow.log_metrics(metrics)
if metrics["f1"] > THRESHOLD:
mlflow.sklearn.log_model(model, "model")
register_model(run_id, "production-model")
Key Takeaways
- Track everything: parameters, metrics, code version, data version, environment
- Model Registry: formalize model lifecycle with stage transitions
- Reproducibility: seed everything, version data, pin dependencies
- Automate: integrate tracking into CI/CD pipelines