MLOps: Experiment Tracking and MLflow

MLOps applies DevOps principles to machine learning — automating model training, deployment, and monitoring for reliable production systems.

Experiment Tracking Flow Define Train Log Compare Ship MLflow Components Tracking Projects Models Registry Parameters + Metrics + Artifacts → Reproducibility

MLOps Lifecycle

1. Why Experiment Tracking?

In production ML, you need to answer:

What did we train? (data version, hyperparameters, code version)
How did it perform? (metrics, artifacts, evaluation results)
Why did we choose this model? (comparison, ablation studies)
Can we reproduce it? (environment, dependencies, random seeds)

Model performance decay over time (data drift):

2. MLflow Architecture

3. MLflow Tracking API

import mlflow
import mlflow.sklearn
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, f1_score

mlflow.set_experiment("churn-prediction")

with mlflow.start_run(run_name="rf-v2-depth-10"):
    # Log parameters
    params = {"n_estimators": 200, "max_depth": 10, "min_samples_split": 5}
    mlflow.log_params(params)

    # Train model
    model = RandomForestClassifier(**params, random_state=42)
    model.fit(X_train, y_train)

    # Log metrics
    y_pred = model.predict(X_test)
    mlflow.log_metric("accuracy", accuracy_score(y_test, y_pred))
    mlflow.log_metric("f1_score", f1_score(y_test, y_pred, average="weighted"))

    # Log model
    mlflow.sklearn.log_model(model, "model")

    # Log artifacts
    mlflow.log_artifact("confusion_matrix.png")
    mlflow.log_artifact("feature_importance.png")

4. Model Registry

# Register model
model_uri = "runs:/<run_id>/model"
result = mlflow.register_model(model_uri, "churn-predictor")

# Transition to staging
client = mlflow.tracking.MlflowClient()
client.transition_model_version_stage(
    name="churn-predictor",
    version=1,
    stage="Staging"
)

# Transition to production
client.transition_model_version_stage(
    name="churn-predictor",
    version=1,
    stage="Production"
)

Model Stages

5. Reproducibility Best Practices

import random
import numpy as np

def set_seed(seed=42):
    random.seed(seed)
    np.random.seed(seed)
    os.environ["PYTHONHASHSEED"] = str(seed)

# Log everything
mlflow.log_param("python_version", sys.version)
mlflow.log_param("sklearn_version", sklearn.__version__)
mlflow.log_param("random_seed", 42)
mlflow.set_tag("git_commit", get_git_commit())

6. Comparison: Tracking Tools

Feature	MLflow	Weights and Biases	Neptune.ai	Comet ML
Open Source	Yes	No	No	No
Self-hosted	Yes	Yes	Yes	Yes
Model Registry	Yes	Yes	Yes	Yes
Hyperparameter Search	No (integrate)	Yes	No	Yes
Visualization	Basic	Advanced	Advanced	Advanced
Cost	Free	Free tier + paid	Paid	Free tier + paid

7. Automated Pipeline Integration

# In CI/CD pipeline
def train_and_track(config):
    with mlflow.start_run():
        mlflow.log_params(config)
        model = train(config)
        metrics = evaluate(model, test_data)
        mlflow.log_metrics(metrics)

        if metrics["f1"] > THRESHOLD:
            mlflow.sklearn.log_model(model, "model")
            register_model(run_id, "production-model")

Key Takeaways

Track everything: parameters, metrics, code version, data version, environment
Model Registry: formalize model lifecycle with stage transitions
Reproducibility: seed everything, version data, pin dependencies
Automate: integrate tracking into CI/CD pipelines

MLOps: Experiment Tracking and MLflow

MLOps: Experiment Tracking and MLflow

MLOps Lifecycle

1. Why Experiment Tracking?

2. MLflow Architecture

3. MLflow Tracking API

4. Model Registry

Model Stages

5. Reproducibility Best Practices

6. Comparison: Tracking Tools

7. Automated Pipeline Integration

Key Takeaways

Need Expert Data Science Help?