πŸŽ‰ 75% of content is free forever β€” Unlock Premium from $10/mo β†’
CW
Search courses…
πŸ’Ό Servicesℹ️ Aboutβœ‰οΈ ContactView Pricing Plansfrom $10

Synapse Serverless: External Tables & Lake Databases

Azure Data EngineeringSynapse Serverless⭐ Premium

Advertisement

Synapse Serverless: External Tables & Lake Databases

Query data in-place from ADLS Gen2 with Synapse Serverless SQL Pool

Serverless Architecture

Architecture Diagram
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    SYNAPSE SERVERLESS ARCHITECTURE                    β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                     β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚                    SYNAPSE WORKSPACE                          β”‚   β”‚
β”‚  β”‚                                                               β”‚   β”‚
β”‚  β”‚  SERVERLESS SQL POOL                                          β”‚   β”‚
β”‚  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚   β”‚
β”‚  β”‚  β”‚ β€’ Pay-per-TB-scanned pricing                         β”‚    β”‚   β”‚
β”‚  β”‚  β”‚ β€’ No infrastructure management                       β”‚    β”‚   β”‚
β”‚  β”‚  β”‚ β€’ Auto-scale from 0 to max                           β”‚    β”‚   β”‚
β”‚  β”‚  β”‚ β€’ Query data in-place (no loading)                   β”‚    β”‚   β”‚
β”‚  β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚   β”‚
β”‚  β”‚                                                               β”‚   β”‚
β”‚  β”‚  CONNECT TO:                                                  β”‚   β”‚
β”‚  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”‚   β”‚
β”‚  β”‚  β”‚ ADLS Gen2    β”‚  β”‚ Blob Storage β”‚  β”‚ Delta Lake   β”‚      β”‚   β”‚
β”‚  β”‚  β”‚ (Parquet)    β”‚  β”‚ (CSV/JSON)   β”‚  β”‚              β”‚      β”‚   β”‚
β”‚  β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜      β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚                                                                     β”‚
β”‚  COST MODEL:                                                        β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚ $5.00 per TB scanned (first 5 TB/month free)                β”‚   β”‚
β”‚  β”‚                                                              β”‚   β”‚
β”‚  β”‚ Example: Query 100 GB Parquet file                          β”‚   β”‚
β”‚  β”‚ Cost: 0.1 GB Γ— $5.00/GB = $0.50 per query                  β”‚   β”‚
β”‚  β”‚                                                              β”‚   β”‚
β”‚  β”‚ Optimization: Use partitioning to scan only relevant data    β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

External Table Setup

-- Create database-scoped credential
CREATE DATABASE SCOPED CREDENTIAL ManagedIdentityCredential
WITH IDENTITY = 'Managed Identity';

-- Create external data source
CREATE EXTERNAL DATA SOURCE ADLSDataSource
WITH (
    LOCATION = 'https://stdatalake001.dfs.core.windows.net',
    CREDENTIAL = ManagedIdentityCredential
);

-- Create external file format for Parquet
CREATE EXTERNAL FILE FORMAT ParquetFormat
WITH (
    FORMAT_TYPE = PARQUET,
    DATA_COMPRESSION = 'org.apache.hadoop.io.compress.SnappyCodec'
);

-- Create external table
CREATE EXTERNAL TABLE raw.Sales
WITH (
    LOCATION = 'raw/sales/',
    DATA_SOURCE = ADLSDataSource,
    FILE_FORMAT = ParquetFormat
)
AS
SELECT * FROM OPENROWSET(
    BULK 'raw/sales/**/*.parquet',
    FORMAT = 'PARQUET'
) WITH (
    sale_id BIGINT,
    customer_id INT,
    sale_date DATE,
    quantity INT,
    unit_price DECIMAL(18,2),
    total_amount DECIMAL(18,2)
) AS Sales;

-- Create view for curated data
CREATE VIEW curated.vw_FactSales
AS
SELECT
    s.sale_id,
    s.customer_id,
    s.sale_date,
    s.quantity,
    s.total_amount,
    c.customer_name,
    c.segment
FROM raw.Sales s
LEFT JOIN raw.Customers c ON s.customer_id = c.customer_id;

Lake Database Pattern

-- Create lake database (schema)
CREATE SCHEMA analytics;
GO

-- Create statistics for better query plans
CREATE STATISTICS STAT_Sales_SaleDate
ON raw.Sales(sale_date);

-- Create view for time-based queries
CREATE VIEW analytics.vw_DailySales
AS
SELECT
    sale_date,
    COUNT(*) AS transactions,
    SUM(total_amount) AS revenue,
    AVG(total_amount) AS avg_order_value
FROM raw.Sales
GROUP BY sale_date;

ℹ️

Pro Tip: Use partitioning to reduce costs. If your data is partitioned by date, add date filters to queries to scan only relevant partitions instead of the entire dataset.

Interview Questions

Q1: When would you use Serverless vs Dedicated SQL Pool? A: Serverless for ad-hoc exploration, prototyping, and pay-per-use scenarios. Dedicated for production dashboards with SLA requirements and predictable performance.

Q2: How do external tables work in Serverless? A: External tables define schema over files in ADLS Gen2. Queries push down computation to the storage layer, scanning only required columns and partitions. No data is loaded into Synapse.

Q3: What are the limitations of Serverless SQL Pool? A: 1) No stored procedures, 2) Limited DML support, 3) No cross-database queries, 4) No result set caching, 5) Performance depends on file format and partitioning.

Advertisement