Feature Stores — Complete Guide
Feature stores are centralized repositories for ML features, ensuring consistency between training and serving.
Why Feature Stores?
Without feature store:
├─ Feature engineering duplicated across teams
├─ Training-serving skew (different feature values)
├─ No feature reuse
└─ Difficult to discover existing features
With feature store:
├─ Single source of truth for features
├─ Consistent features in training and serving
├─ Feature sharing and reuse
└─ Feature discovery and documentation
Architecture
Offline Store (batch features):
├─ Data lake / warehouse
├─ Historical features for training
├─ High throughput, high latency
└─ Tools: Spark, BigQuery, Snowflake
Online Store (real-time features):
├─ Low-latency database (Redis, DynamoDB)
├─ Current features for serving
├─ Low latency, lower throughput
└─ Tools: Redis, Cassandra, DynamoDB
Feature Pipeline:
├─ Compute features from raw data
├─ Write to offline + online stores
├─ Schedule (batch) or stream (real-time)
└─ Tools: Feast, Tecton, Hopsworks
Key Takeaways
- Feature stores ensure training-serving consistency
- Offline store for batch features (training)
- Online store for real-time features (serving)
- Feast is the leading open-source feature store
- Feature stores enable feature reuse across teams
- Feature pipelines compute and update features
- Feature stores reduce time to production
- Point-in-time correctness prevents data leakage