Synapse Serverless: External Tables & Lake Databases
Query data in-place from ADLS Gen2 with Synapse Serverless SQL Pool
Serverless Architecture
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β SYNAPSE SERVERLESS ARCHITECTURE β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β SYNAPSE WORKSPACE β β
β β β β
β β SERVERLESS SQL POOL β β
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β β’ Pay-per-TB-scanned pricing β β β
β β β β’ No infrastructure management β β β
β β β β’ Auto-scale from 0 to max β β β
β β β β’ Query data in-place (no loading) β β β
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β β
β β CONNECT TO: β β
β β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β β
β β β ADLS Gen2 β β Blob Storage β β Delta Lake β β β
β β β (Parquet) β β (CSV/JSON) β β β β β
β β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β COST MODEL: β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β $5.00 per TB scanned (first 5 TB/month free) β β
β β β β
β β Example: Query 100 GB Parquet file β β
β β Cost: 0.1 GB Γ $5.00/GB = $0.50 per query β β
β β β β
β β Optimization: Use partitioning to scan only relevant data β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
External Table Setup
-- Create database-scoped credential
CREATE DATABASE SCOPED CREDENTIAL ManagedIdentityCredential
WITH IDENTITY = 'Managed Identity';
-- Create external data source
CREATE EXTERNAL DATA SOURCE ADLSDataSource
WITH (
LOCATION = 'https://stdatalake001.dfs.core.windows.net',
CREDENTIAL = ManagedIdentityCredential
);
-- Create external file format for Parquet
CREATE EXTERNAL FILE FORMAT ParquetFormat
WITH (
FORMAT_TYPE = PARQUET,
DATA_COMPRESSION = 'org.apache.hadoop.io.compress.SnappyCodec'
);
-- Create external table
CREATE EXTERNAL TABLE raw.Sales
WITH (
LOCATION = 'raw/sales/',
DATA_SOURCE = ADLSDataSource,
FILE_FORMAT = ParquetFormat
)
AS
SELECT * FROM OPENROWSET(
BULK 'raw/sales/**/*.parquet',
FORMAT = 'PARQUET'
) WITH (
sale_id BIGINT,
customer_id INT,
sale_date DATE,
quantity INT,
unit_price DECIMAL(18,2),
total_amount DECIMAL(18,2)
) AS Sales;
-- Create view for curated data
CREATE VIEW curated.vw_FactSales
AS
SELECT
s.sale_id,
s.customer_id,
s.sale_date,
s.quantity,
s.total_amount,
c.customer_name,
c.segment
FROM raw.Sales s
LEFT JOIN raw.Customers c ON s.customer_id = c.customer_id;
Lake Database Pattern
-- Create lake database (schema)
CREATE SCHEMA analytics;
GO
-- Create statistics for better query plans
CREATE STATISTICS STAT_Sales_SaleDate
ON raw.Sales(sale_date);
-- Create view for time-based queries
CREATE VIEW analytics.vw_DailySales
AS
SELECT
sale_date,
COUNT(*) AS transactions,
SUM(total_amount) AS revenue,
AVG(total_amount) AS avg_order_value
FROM raw.Sales
GROUP BY sale_date;
βΉοΈ
Pro Tip: Use partitioning to reduce costs. If your data is partitioned by date, add date filters to queries to scan only relevant partitions instead of the entire dataset.
Interview Questions
Q1: When would you use Serverless vs Dedicated SQL Pool? A: Serverless for ad-hoc exploration, prototyping, and pay-per-use scenarios. Dedicated for production dashboards with SLA requirements and predictable performance.
Q2: How do external tables work in Serverless? A: External tables define schema over files in ADLS Gen2. Queries push down computation to the storage layer, scanning only required columns and partitions. No data is loaded into Synapse.
Q3: What are the limitations of Serverless SQL Pool? A: 1) No stored procedures, 2) Limited DML support, 3) No cross-database queries, 4) No result set caching, 5) Performance depends on file format and partitioning.