Data Mesh on Azure: Domain Ownership & Data Products
Decentralized data architecture with domain-oriented ownership and self-serve data platform
Data Mesh Architecture
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β DATA MESH ARCHITECTURE ON AZURE β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β DOMAIN 1: SALES DOMAIN 2: MARKETING β
β ββββββββββββββββββββ ββββββββββββββββββββ β
β β Data Product β β Data Product β β
β β ββββββββββββββββ β β ββββββββββββββββ β β
β β β Raw: ADLS β β β β Raw: ADLS β β β
β β β Curated:Deltaβ β β β Curated:Deltaβ β β
β β β API: Synapse β β β β API: Synapse β β β
β β ββββββββββββββββ β β ββββββββββββββββ β β
β β Owner: Sales Eng β β Owner: Mktg Eng β β
β β SLA: 99.9% β β SLA: 99.5% β β
β ββββββββββ¬ββββββββββ ββββββββββ¬ββββββββββ β
β β β β
β βββββββββββββ¬ββββββββββββββββ β
β β β
β βΌ β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β SELF-SERVE PLATFORM β β
β β β β
β β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β β
β β β Data Lake β β Compute β β Governance β β β
β β β (ADLS Gen2) β β (Synapse/ β β (Purview) β β β
β β β β β Databricks) β β β β β
β β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β β
β β β β
β β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β β
β β β Identity β β Monitoring β β Discovery β β β
β β β (Azure AD) β β (Monitor) β β (Purview) β β β
β β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β FEDERATED COMPUTATIONAL GOVERNANCE β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β’ Global data standards (naming, schema, quality) β β
β β β’ Cross-domain data contracts β β
β β β’ Interoperability protocols β β
β β β’ Data product certification β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Data Product Template
{
"dataProduct": {
"name": "sales-transactions",
"domain": "sales",
"version": "2.1.0",
"owner": "sales-data-team@company.com",
"sla": {
"availability": "99.9%",
"freshness": "1 hour",
"latency": "< 100ms"
},
"dataAssets": [
{
"name": "fact_sales",
"type": "delta-table",
"location": "abfss://sales@stdatalake001.dfs.core.windows.net/curated/fact_sales",
"schema": "schema/fact_sales.json",
"quality": {
"completeness": 99.5,
"accuracy": 99.9
}
}
],
"interfaces": [
{
"type": "sql-endpoint",
"connection": "syn-workspace.sql.azuresynapse.net",
"database": "sales_analytics"
},
{
"type": "rest-api",
"endpoint": "https://api.company.com/sales/v2"
}
],
"discovery": {
"purviewCollection": "SalesData",
"tags": ["transactions", "revenue", "daily"]
}
}
}
Domain-Specific Data Pipelines
# Sales domain data product pipeline
from pyspark.sql import SparkSession
from delta.tables import DeltaTable
spark = SparkSession.builder \
.appName("sales-data-product") \
.config("spark.sql.catalog.sales", "com.databricks.sql.datacatalog") \
.getOrCreate()
# Ingest raw data
raw_df = spark.read \
.format("parquet") \
.load("abfss://raw@stdatalake001.dfs.core.windows.net/sales/")
# Apply domain transformations
curated_df = raw_df \
.filter(raw_df.amount > 0) \
.withColumn("revenue", raw_df.quantity * raw_df.unit_price) \
.groupBy("sale_date", "product_category", "region") \
.agg(
F.sum("revenue").alias("total_revenue"),
F.count("*").alias("transaction_count")
)
# Write as data product
curated_df.write \
.format("delta") \
.mode("overwrite") \
.save("abfss://sales@stdatalake001.dfs.core.windows.net/curated/fact_sales")
βΉοΈ
Pro Tip: Each data product should be independently deployable, discoverable via Purview, and have clear SLAs and data contracts.
Interview Questions
Q1: How does Data Mesh differ from traditional data warehousing? A: Traditional: Central team owns all data. Data Mesh: Domain teams own their data as products. Benefits: Scalability, domain expertise, faster time-to-market. Challenges: Cross-domain governance, data consistency.
Q2: What are the four principles of Data Mesh? A: 1) Domain ownership, 2) Data as a product, 3) Self-serve platform, 4) Federated computational governance. Each principle addresses a specific challenge in decentralized data architectures.
Q3: How do you implement cross-domain data sharing in Data Mesh? A: Define data contracts between domains, use standardized APIs (Synapse SQL endpoints), implement data product discovery via Purview, and establish governance policies for data quality and access.