Data Contracts & Schema Evolution
This advanced lesson on Data Contracts & Schema Evolution prepares you for senior data engineering roles and complex real-world challenges.
Advanced Concepts
At senior level, data engineers must balance technical excellence with business impact, team productivity, and system reliability.
Implementation
# Advanced data engineering pattern
from dataclasses import dataclass
from typing import Optional, List
from datetime import datetime
import hashlib
@dataclass
class DataContract:
"""Formal contract between data producer and consumer."""
name: str
version: str
owner: str
schema: dict
quality_rules: List[dict]
sla_hours: int
def validate(self, data) -> tuple[bool, List[str]]:
"""Validate data against contract."""
errors = []
# Schema validation
for field, dtype in self.schema.items():
if field not in data.columns:
errors.append(f"Missing required field: {field}")
elif data[field].dtype != dtype:
errors.append(f"Wrong type for {field}: expected {dtype}")
# Quality rules
for rule in self.quality_rules:
if rule["type"] == "not_null":
nulls = data[rule["column"]].isnull().sum()
if nulls > 0:
errors.append(f"Null values found in {rule['column']}: {nulls}")
elif rule["type"] == "unique":
dupes = data[rule["column"]].duplicated().sum()
if dupes > 0:
errors.append(f"Duplicate values in {rule['column']}: {dupes}")
return len(errors) == 0, errors
# Usage
contract = DataContract(
name="orders",
version="2.0.0",
owner="data-platform-team",
schema={"order_id": "int64", "amount": "float64"},
quality_rules=[
{"type": "not_null", "column": "order_id"},
{"type": "unique", "column": "order_id"},
],
sla_hours=4
)
Career Pathways
Senior data engineers move into Staff Engineer, Data Platform Lead, or Head of Data Engineering roles. Building expertise in Data Contracts & Schema Evolution accelerates that journey.
Summary
Mastering advanced topics like Data Contracts & Schema Evolution separates senior data engineers from mid-level engineers.