π Data Pipeline Monitoring
Master CloudWatch, CloudTrail, and audit patterns for data pipeline monitoring.
Module: AWS Data Engineering β’ Topic 29 of 65 β’ Premium Content
Monitoring Architecture
Architecture Diagram
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β DATA PIPELINE MONITORING β
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β CLOUDWATCH METRICS β β
β β β β
β β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β β
β β β Glue Jobs β β Kinesis β β Lambda β β β
β β β β’ DPU Hours β β β’ IteratorAgeβ β β’ Invocationsβ β β
β β β β’ Job Durationβ β β’ Throughput β β β’ Duration β β β
β β β β’ Errors β β β’ Errors β β β’ Errors β β β
β β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β CLOUDWATCH ALARMS β β
β β β β
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β Alarm: HighIteratorAge β β β
β β β Metric: Kinesis.IteratorAgeMilliseconds > 300000 (5 min) β β β
β β β Action: SNS β Email + Lambda β Auto-scale β β β
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β Alarm: GlueJobFailure β β β
β β β Metric: Glue.Jobs.Failed > 0 β β β
β β β Action: SNS β Slack + PagerDuty β β β
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β CLOUDTRAIL AUDIT β β
β β β β
β β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β β
β β β S3 Access β β Glue API β β IAM Changes β β β
β β β Logs β β Calls β β β β β
β β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Custom CloudWatch Metrics
import boto3
import time
cloudwatch = boto3.client('cloudwatch')
def put_pipeline_metric(pipeline_name, status, records_processed, duration):
"""Put custom metrics for pipeline monitoring."""
cloudwatch.put_metric_data(
Namespace='DataPipeline',
MetricData=[
{
'MetricName': 'RecordsProcessed',
'Dimensions': [
{'Name': 'PipelineName', 'Value': pipeline_name},
{'Name': 'Status', 'Value': status}
],
'Value': records_processed,
'Unit': 'Count'
},
{
'MetricName': 'PipelineDuration',
'Dimensions': [
{'Name': 'PipelineName', 'Value': pipeline_name}
],
'Value': duration,
'Unit': 'Seconds'
},
{
'MetricName': 'PipelineStatus',
'Dimensions': [
{'Name': 'PipelineName', 'Value': pipeline_name}
],
'Value': 1 if status == 'SUCCESS' else 0,
'Unit': 'None'
}
]
)
CloudWatch Logs Insights
-- Query Glue job logs
fields @timestamp, @message
| filter @message like /ERROR/
| sort @timestamp desc
| limit 50
-- Query Lambda function logs
fields @timestamp, @duration, @requestId
| stats avg(@duration) as avgDuration, max(@duration) as maxDuration by bin(5m)
-- Query VPC Flow Logs
fields srcAddr, dstAddr, srcPort, dstPort, bytes
| filter action = 'REJECT'
| stats sum(bytes) as totalBytes by srcAddr
| sort totalBytes desc
| limit 20
Interview Q&A
Q1: What CloudWatch metric indicates Kinesis is falling behind?
Answer: IteratorAgeMilliseconds. If this exceeds your SLA threshold, the consumer is not keeping up with the stream.
Q2: How do you monitor Glue job performance?
Answer: Track DPU hours, job duration, rows processed, and error rates. Use CloudWatch Logs for detailed Spark UI logs.
Q3: What is the purpose of CloudTrail in data pipelines?
Answer: CloudTrail records API calls for auditing. Track who accessed what data, when, and from where.
Summary
- Metrics: CloudWatch for pipeline-specific metrics
- Alarms: Proactive alerting on anomalies
- Logs: CloudWatch Logs Insights for log analysis
- Audit: CloudTrail for API call tracking
- Dashboards: CloudWatch Dashboards for visualization