Data Products: Building & Operating Data-as-a-Product
Difficulty: Staff Level | Companies: LinkedIn, Uber, Netflix, Airbnb, Stripe
1. What is a Data Product?
A data product is a dataset that is treated as a product β with clear ownership, SLAs, documentation, and consumers.
Architecture Diagram
Data Product Components:
βββ Schema & Documentation
βββ SLAs (freshness, completeness, availability)
βββ Quality Checks (automated validation)
βββ Access Controls (who can use it)
βββ Lineage (where it comes from)
βββ Versioning (breaking changes)
βββ Monitoring (health dashboards)
2. Data Product Schema
from dataclasses import dataclass, field
from typing import Dict, List
@dataclass
class DataProduct:
name: str
domain: str
description: str
owner: str
team: str
schema: Dict[str, str]
sla: Dict[str, any]
quality_checks: List[str]
access_pattern: str # "batch", "streaming", "api"
version: str = "1.0.0"
status: str = "active"
consumers: List[str] = field(default_factory=list)
tags: List[str] = field(default_factory=list)
def to_contract(self):
return {
"product": self.name,
"version": self.version,
"schema": self.schema,
"sla": self.sla,
"quality": self.quality_checks,
"owner": f"{self.team}/{self.owner}",
"status": self.status,
}
# Example
orders_product = DataProduct(
name="orders_fact",
domain="commerce",
description="All customer orders with line items",
owner="alice",
team="checkout-team",
schema={"order_id": "string", "user_id": "string", "amount": "decimal", "status": "string"},
sla={"freshness_minutes": 15, "availability": 99.95, "completeness": 99.9},
quality_checks=["not_null(order_id)", "unique(order_id)", "positive(amount)"],
access_pattern="batch",
tags=["finance", "core", "pii"],
)
3. Product Metrics
class ProductMetrics:
def __init__(self, product: DataProduct):
self.product = product
def compute_health_score(self, metrics: dict) -> float:
weights = {"freshness": 25, "completeness": 25, "usage": 20, "quality": 30}
score = 0
if metrics.get("freshness_ok"): score += weights["freshness"]
if metrics.get("completeness", 0) >= 0.99: score += weights["completeness"]
if metrics.get("daily_queries", 0) > 0: score += weights["usage"]
if metrics.get("quality_score", 0) >= 0.95: score += weights["quality"]
return score
def adoption_rate(self, total_users: int) -> float:
return len(self.product.consumers) / total_users if total_users > 0 else 0
βΉοΈ
Key Insight: Treat data like a product. If nobody uses it, delete it. If many people use it, invest in it.
Follow-Up Questions
- How would you measure the success of a data product?
- Design a data product marketplace for internal teams.
- How do you handle versioning and backward compatibility?
- Design a self-serve portal for creating new data products.
- How would you handle data product retirement?