πŸŽ‰ 75% of content is free forever β€” Unlock Premium from $10/mo β†’
CW
Search courses…
πŸ’Ό Servicesℹ️ Aboutβœ‰οΈ ContactView Pricing Plansfrom $10

Azure Stream Analytics: Windowing, Temporal Joins & Inputs

Azure Data EngineeringAzure Stream Analytics⭐ Premium

Advertisement

Azure Stream Analytics: Windowing, Temporal Joins & Inputs

Real-time stream processing with SQL-like queries, windowing functions, and anomaly detection

Stream Analytics Architecture

Architecture Diagram
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    STREAM ANALYTICS ARCHITECTURE                     β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                     β”‚
β”‚  INPUTS                 QUERY                 OUTPUTS               β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”‚
β”‚  β”‚ Event Hubs   │─────>β”‚              │────>β”‚ Cosmos DB    β”‚     β”‚
β”‚  β”‚ (Streaming)  β”‚      β”‚              β”‚     β”‚ (Real-time)  β”‚     β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜      β”‚              β”‚     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β”‚
β”‚                        β”‚              β”‚                           β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”‚   Stream     β”‚     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”‚
β”‚  β”‚ IoT Hub      │─────>β”‚   Analytics  │────>β”‚ ADLS Gen2    β”‚     β”‚
β”‚  β”‚ (Device Telemetry) β”‚   Job        β”‚     β”‚ (Storage)    β”‚     β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜      β”‚              β”‚     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β”‚
β”‚                        β”‚              β”‚                           β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”‚              β”‚     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”‚
β”‚  β”‚ Blob Storage │─────>β”‚              │────>β”‚ Power BI     β”‚     β”‚
β”‚  β”‚ (Reference)  β”‚      β”‚              β”‚     β”‚ (Dashboard)  β”‚     β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜      β”‚              β”‚     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β”‚
β”‚                        β”‚              β”‚                           β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”      β”‚              β”‚     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”‚
β”‚  β”‚ SQL Database │─────>β”‚              │────>β”‚ Function App β”‚     β”‚
β”‚  β”‚ (Lookup)     β”‚      β”‚              β”‚     β”‚ (Actions)    β”‚     β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β”‚
β”‚                                                                     β”‚
β”‚  SCALING UNITS:                                                     β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚ β€’ Streaming Units (SU): 1 SU = 1 MB/s ingress              β”‚   β”‚
β”‚  β”‚ β€’ SU ranges: 1, 3, 6, 12, 24, 48, 96, 120 (varies)        β”‚   β”‚
β”‚  β”‚ β€’ Min 3 SU for partitioned queries                          β”‚   β”‚
β”‚  β”‚ β€’ Max 120 SU per job                                        β”‚   β”‚
β”‚  β”‚ β€’ 6 SU = 1 vCPU, 2 GB RAM                                  β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Windowing Functions

Architecture Diagram
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    WINDOWING FUNCTIONS                           β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                 β”‚
β”‚  TUMBLING WINDOW                                                β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚ Events: A B C D E F G H I J                             β”‚   β”‚
β”‚  β”‚                                                           β”‚   β”‚
β”‚  β”‚ [A B C] [D E F] [G H I] [J ...]                         β”‚   β”‚
β”‚  β”‚  Window1   Window2   Window3   Window4                   β”‚   β”‚
β”‚  β”‚                                                           β”‚   β”‚
β”‚  β”‚ β€’ Fixed size, no overlap                                  β”‚   β”‚
β”‚  β”‚ β€’ Each event in exactly one window                        β”‚   β”‚
β”‚  β”‚ β€’ Example: Count events every 5 minutes                  β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚                                                                 β”‚
β”‚  HOPPING WINDOW                                                 β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚ Events: A B C D E F G H I J                             β”‚   β”‚
β”‚  β”‚                                                           β”‚   β”‚
β”‚  β”‚ [A B C D E]                                              β”‚   β”‚
β”‚  β”‚     [C D E F G]                                          β”‚   β”‚
β”‚  β”‚         [E F G H I]                                      β”‚   β”‚
β”‚  β”‚             [G H I J ...]                                β”‚   β”‚
β”‚  β”‚                                                           β”‚   β”‚
β”‚  β”‚ β€’ Fixed size with overlap                                 β”‚   β”‚
β”‚  β”‚ β€’ Each event in multiple windows                          β”‚   β”‚
β”‚  β”‚ β€’ Example: Rolling 10-min average, hop every 5 min       β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚                                                                 β”‚
β”‚  SLIDING WINDOW                                                 β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚ Events: A B C D E F G H I J                             β”‚   β”‚
β”‚  β”‚                                                           β”‚   β”‚
β”‚  β”‚ [A] [AB] [ABC] [ABCD] [ABCDE] ...                       β”‚   β”‚
β”‚  β”‚                                                           β”‚   β”‚
β”‚  β”‚ β€’ Window defined by time range                            β”‚   β”‚
β”‚  β”‚ β€’ Updates with every new event                            β”‚   β”‚
β”‚  β”‚ β€’ Example: Distinct count over last 5 minutes            β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β”‚                                                                 β”‚
β”‚  SESSION WINDOW                                                 β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚ Events: A B C D E F G H I J                             β”‚   β”‚
β”‚  β”‚                                                           β”‚   β”‚
β”‚  β”‚ [A B C D E]     [F G H I]     [J ...]                   β”‚   β”‚
β”‚  β”‚   (gap > 5min)    (gap > 5min)                           β”‚   β”‚
β”‚  β”‚                                                           β”‚   β”‚
β”‚  β”‚ β€’ Dynamic size based on event activity                    β”‚   β”‚
β”‚  β”‚ β€’ Closes when gap exceeds timeout                         β”‚   β”‚
β”‚  β”‚ β€’ Example: User session analysis                         β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Stream Analytics Query Examples

-- Tumbling Window: Count events every 5 minutes
SELECT
    System.Timestamp() AS window_end,
    COUNT(*) AS event_count,
    AVG(temperature) AS avg_temperature,
    MAX(temperature) AS max_temperature,
    MIN(temperature) AS min_temperature
INTO [cosmos-output]
FROM [eventhub-input] PARTITION BY device_id
GROUP BY
    TumblingWindow(minute, 5),
    device_id;

-- Hopping Window: Rolling 10-minute average
SELECT
    System.Timestamp() AS window_end,
    device_id,
    AVG(temperature) AS rolling_avg_temp
INTO [powerbi-output]
FROM [eventhub-input]
GROUP BY
    HoppingWindow(minute, 10, 5),  -- 10-min window, 5-min hop
    device_id;

-- Sliding Window: Detect anomalies
SELECT
    System.Timestamp() AS window_end,
    device_id,
    AVG(temperature) AS avg_temp,
    STDEV(temperature) AS std_temp,
    CASE
        WHEN AVG(temperature) > (SELECT AVG(temperature) + 3 * STDEV(temperature)
                                  FROM [eventhub-input]
                                  WHERE device_id = Input.device_id
                                  TIMESTAMP BY event_time
                                  AND DATEDIFF(minute, System.Timestamp(), event_time) BETWEEN -60 AND 0)
        THEN 'ANOMALY'
        ELSE 'NORMAL'
    END AS status
INTO [alert-output]
FROM [eventhub-input] PARTITION BY device_id
GROUP BY
    TumblingWindow(minute, 1),
    device_id;

-- Temporal Join: Enrich with reference data
SELECT
    e.device_id,
    e.temperature,
    e.event_time,
    r.device_location,
    r.device_type,
    r.threshold
INTO [enriched-output]
FROM [eventhub-input] e
JOIN [reference-input] r
    ON e.device_id = r.device_id
    AND DATEDIFF(minute, e, r) BETWEEN -5 AND 5;

-- Session Window: User activity
SELECT
    device_id,
    MIN(event_time) AS session_start,
    MAX(event_time) AS session_end,
    COUNT(*) AS events_in_session
INTO [session-output]
FROM [eventhub-input]
GROUP BY
    SessionWindow(minute, 10),  -- 10-min timeout
    device_id;

Anomaly Detection

-- Built-in anomaly detection
SELECT
    device_id,
    System.Timestamp() AS window_end,
    AVG(temperature) AS avg_temp,
    AnomalyDetection_SpikeAndDip(
        temperature, 95, 100,
        DATEDIFF(minute, System.Timestamp(), event_time)
    ) AS spike_result
INTO [anomaly-output]
FROM [eventhub-input]
GROUP BY
    TumblingWindow(minute, 5),
    device_id;

-- CNTK-based anomaly detection
SELECT
    device_id,
    System.Timestamp() AS window_end,
    AnomalyDetection_ChangePoint(
        temperature, 95, 200,
        DATEDIFF(minute, System.Timestamp(), event_time)
    ) AS change_point_result
INTO [changepoint-output]
FROM [eventhub-input]
GROUP BY
    TumblingWindow(minute, 10),
    device_id;

ℹ️

Pro Tip: Use PARTITION BY to parallelize queries across device/entity partitions. Each partition processes independently, improving throughput and reducing latency.

Interview Questions

Q1: Explain the difference between Tumbling and Hopping windows. A: Tumbling windows are fixed-size, non-overlapping (each event in one window). Hopping windows are fixed-size with overlap (each event in multiple windows). Use tumbling for periodic aggregations; hopping for rolling calculations.

Q2: How do you handle late-arriving data in Stream Analytics? A: Use the Late Arrival Policy setting to define how long to accept late events (up to 5 days). Late events update window results retroactively. Use Watermark Delay to control when windows close.

Q3: When would you use Stream Analytics vs Azure Functions for real-time processing? A: Stream Analytics for SQL-like windowed aggregations, temporal joins, and complex event processing. Azure Functions for custom logic, API calls, and simple transformations. Stream Analytics is better for continuous analytics; Functions for event-driven actions.

Advertisement