Snowpipe: Automated Data Ingestion & Streaming
Architecture Diagram 1: Snowpipe Ingestion Pipeline
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β SNOWPIPE INGESTION ARCHITECTURE β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β DATA SOURCES β β
β β β β
β β ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββββββββββ β β
β β β Cloud β β IoT β β Kafka β β Application β β β
β β β Storage β β Devices β β Streams β β Events β β β
β β β (S3/ β β Sensors β β Topics β β (Webhooks) β β β
β β β Azure/ β β β β β β β β β
β β β GCS) β β β β β β β β β
β β ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββββββββββ β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β βΌ β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β NOTIFICATION LAYER β β
β β β β
β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β S3 Event Notification / Azure Event Grid / GCS Triggers β β β
β β β β β β
β β β Event Types: β β β
β β β β’ ObjectCreated β’ ObjectRemoved β’ ObjectTagging β β β
β β β β β β
β β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β
β β β β SQS Queue / Event Hub / Pub/Sub β β β β
β β β β (Message buffering and delivery guarantee) β β β β
β β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β
β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β βΌ β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β SNOWPIPE SERVICE (Serverless) β β
β β β β
β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β βββββββββββββββ βββββββββββββββ βββββββββββββββββββββββ β β β
β β β β File β β Data β β Load β β β β
β β β β Format β β Validation β β Optimization β β β β
β β β β Detection β β β β β β β β
β β β βββββββββββββββ βββββββββββββββ βββββββββββββββββββββββ β β β
β β β β β β
β β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β
β β β β COPY INTO Engine (Shared across all Snowpipe) β β β β
β β β β β β β β
β β β β βββββββββββ βββββββββββ βββββββββββ ββββββββββββ β β β
β β β β β Parser β βTransformβ βCompress β βWrite ββ β β β
β β β β β β β β β β βMicro- ββ β β β
β β β β β β β β β β βPartitionββ β β β
β β β β βββββββββββ βββββββββββ βββββββββββ ββββββββββββ β β β
β β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β
β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β β
β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β MONITORING & HISTORY β β β
β β β β β β
β β β COPY_HISTORY β LOAD_HISTORY β PIPE_USAGE_HISTORY β β β
β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β βΌ β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β TARGET TABLES β β
β β β β
β β βββββββββββββββ βββββββββββββββ βββββββββββββββ ββββββββββββ β β
β β β Table 1 β β Table 2 β β Table 3 β β Table N β β β
β β β (Raw) β β (Staged) β β (Curated) β β β β β
β β β β β β β β β β β β
β β β Micro-parts β β Micro-parts β β Micro-parts β β β β β
β β βββββββββββββββ βββββββββββββββ βββββββββββββββ ββββββββββββ β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Architecture Diagram 2: File Format Processing Pipeline
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β FILE FORMAT PROCESSING PIPELINE β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β INPUT FILE: sales_2024_01_15.csv β
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Raw CSV Content: β β
β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β id,name,amount,region,date β β β
β β β 1,John Smith,1500.50,US,2024-01-15 β β β
β β β 2,Jane Doe,2300.75,EU,2024-01-15 β β β
β β β 3,Bob Johnson,890.25,ASIA,2024-01-15 β β β
β β β ... β β β
β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β File Size: 500 MB (uncompressed) β β
β β Rows: ~2,500,000 β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β βΌ β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β STAGE 1: FORMAT DETECTION & VALIDATION β β
β β β β
β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β File Format Configuration: β β β
β β β β β β
β β β TYPE = CSV β β β
β β β FIELD_DELIMITER = ',' β β β
β β β SKIP_HEADER = 1 β β β
β β β NULL_IF = ('', 'NULL', 'null') β β β
β β β DATE_FORMAT = 'YYYY-MM-DD' β β β
β β β TIMESTAMP_FORMAT = 'YYYY-MM-DD HH24:MI:SS' β β β
β β β ENCODING = 'UTF-8' β β β
β β β β β β
β β β Validation Checks: β β β
β β β β Header row exists and matches expected columns β β β
β β β β Data types match column definitions β β β
β β β β No malformed rows β β β
β β β β Encoding is valid β β β
β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β βΌ β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β STAGE 2: DATA TRANSFORMATION β β
β β β β
β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β Transformations Applied: β β β
β β β β β β
β β β 1. Type Casting: β β β
β β β '1500.50' β DECIMAL(10,2) β β β
β β β '2024-01-15' β DATE β β β
β β β β β β
β β β 2. Null Handling: β β β
β β β '' β NULL β β β
β β β 'NULL' β NULL β β β
β β β β β β
β β β 3. Column Mapping: β β β
β β β CSV Column 1 β Table Column 'ID' β β β
β β β CSV Column 2 β Table Column 'NAME' β β β
β β β CSV Column 3 β Table Column 'AMOUNT' β β β
β β β CSV Column 4 β Table Column 'REGION' β β β
β β β CSV Column 5 β Table Column 'TRANSACTION_DATE' β β β
β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β βΌ β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β STAGE 3: MICRO-PARTITION CREATION β β
β β β β
β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β Output Micro-Partitions: β β β
β β β β β β
β β β βββββββββββ βββββββββββ βββββββββββ βββββββββββ ββββββββ β β β
β β β β MP_001 β β MP_002 β β MP_003 β β MP_004 β β .. β β β β
β β β β 125 MB β β 128 MB β β 130 MB β β 117 MB β β β β β β
β β β β 625K β β 640K β β 650K β β 585K β β β β β β
β β β β rows β β rows β β rows β β rows β β β β β β
β β β βββββββββββ βββββββββββ βββββββββββ βββββββββββ ββββββββ β β β
β β β β β β
β β β Total: 125 MB compressed (from 500 MB) = 4:1 ratio β β β
β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β βΌ β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β STAGE 4: LOADED FILE TRACKING β β
β β β β
β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β COPY_HISTORY: β β β
β β β ββββββββββββββββββββ¬βββββββββββββ¬βββββββββββ¬ββββββββββββββ β β β
β β β β File Name β Rows Loadedβ Errors β Status β β β β
β β β ββββββββββββββββββββΌβββββββββββββΌβββββββββββΌββββββββββββββ€ β β β
β β β β sales_2024_01_15β 2,500,000 β 0 β LOADED β β β β
β β β β .csv β β β β β β β
β β β ββββββββββββββββββββ΄βββββββββββββ΄βββββββββββ΄ββββββββββββββ β β β
β β β β β β
β β β This file will NOT be reloaded (idempotent ingestion) β β β
β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Architecture Diagram 3: Auto-Ingest Configuration Flow
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β AUTO-INGEST CONFIGURATION FLOW β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β STEP 1: CREATE STORAGE INTEGRATION β
β βββββββββββββββββββββββββββββββββββββ β
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β β
β β Snowflake βββββββββββββββββββββββββββββββββββββββββββββββββββΆ β β
β β β β
β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β Storage Integration: s3_ingestion β β β
β β β β β β
β β β Type: EXTERNAL_STAGE β β β
β β β Provider: AWS β β β
β β β Storage AWS Role: arn:aws:iam::123:role/snowflake_role β β β
β β β Enabled: TRUE β β β
β β β β β β
β β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β
β β β β Snowflake generates IAM policy: β β β β
β β β β β β β β
β β β β { β β β β
β β β β "Effect": "Allow", β β β β
β β β β "Action": [ β β β β
β β β β "s3:GetObject", β β β β
β β β β "s3:GetObjectVersion" β β β β
β β β β ], β β β β
β β β β "Resource": "arn:aws:s3:::bucket/*" β β β β
β β β β } β β β β
β β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β
β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β βΌ β
β STEP 2: CREATE PIPE β
β βββββββββββββββββββββ β
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β β
β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β Pipe: sales_pipe β β β
β β β β β β
β β β CREATE PIPE sales_pipe β β β
β β β AUTO_INGEST = TRUE β β β
β β β INTEGRATION = 's3_ingestion' β β β
β β β AS β β β
β β β COPY INTO sales_raw β β β
β β β FROM '@s3_ingestion/data/' β β β
β β β FILE_FORMAT = (TYPE = CSV SKIP_HEADER = 1) β β β
β β β β β β
β β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β
β β β β Pipe returns notification channel: β β β β
β β β β β β β β
β β β β SQS Queue: arn:aws:sqs:us-east-1:123456:myQueue β β β β
β β β β (Configure in S3 bucket event notification) β β β β
β β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β
β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β βΌ β
β STEP 3: CONFIGURE S3 EVENT NOTIFICATION β
β ββββββββββββββββββββββββββββββββββββββββββ β
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β β
β β S3 Bucket Configuration: β β
β β β β
β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β β β Bucket: data-lake-sales β β β
β β β β β β
β β β Event Notification: β β β
β β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β
β β β β { β β β β
β β β β "LambdaFunctionConfigurations": [ β β β β
β β β β { β β β β
β β β β "Event": "s3:ObjectCreated:*", β β β β
β β β β "Prefix": "data/", β β β β
β β β β "Suffix": ".csv", β β β β
β β β β "QueueConfigurations": [ β β β β
β β β β { β β β β
β β β β "QueueArn": "arn:aws:sqs:...:snowflake_queue"β β β β
β β β β } β β β β
β β β β ] β β β β
β β β β } β β β β
β β β β ] β β β β
β β β β } β β β β
β β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β β
β β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β βΌ β
β STEP 4: INGESTION FLOW β
β ββββββββββββββββββββββ β
β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β β
β β File Upload β S3 Event β SQS Queue β Snowpipe β COPY INTO β β
β β β β
β β ββββββββββ ββββββββββ ββββββββββ ββββββββββ ββββββββ β β
β β β Upload βββββΆβ Event βββββΆβ Queue βββββΆβDetect βββββΆβ Load β β β
β β β File β β Fire β β Buffer β β New β β Data β β β
β β ββββββββββ ββββββββββ ββββββββββ ββββββββββ ββββββββ β β
β β β β
β β Typical latency: 1-3 minutes end-to-end β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Snowpipe is Snowflake's serverless, continuous data ingestion service that automatically loads data from cloud storage into tables. It operates on a micro-batch model (1β3 minute latency), handling format detection, validation, transformation, compression, and micro-partition creation without manual orchestration.
Idempotent ingestion ensures loading the same file multiple times produces identical results. Snowpipe tracks loaded files in COPY_HISTORY and automatically excludes duplicates, enabling safe retries and data consistency even with duplicate events.
Snowpipe charges per file processed (serverless pricing). Use Parquet for analytics workloads (built-in compression, schema info, predicate pushdown). CSV is simpler but lacks these benefits. JSON supports semi-structured data with parsing overhead.
- Create Storage Integration: Define IAM roles for Snowflake access to cloud storage
- Create Pipe: Specify ingestion logic (COPY INTO statement, file format, target table)
- Configure Event Notifications: Set up S3/Azure/GCS triggers to route file events to Snowpipe
- Test Ingestion: Upload sample files and verify data loads correctly
- Monitor: Check PIPE_STATUS for pending files, COPY_HISTORY for load results
- Alert: Set up alerts for pipe lag (pending files > 100) and error rates (> 1%)
- Serverless: No warehouse management β Snowflake handles compute
- Latency: 1β3 minutes end-to-end (file upload β queryable)
- Idempotent: Duplicate files are automatically excluded
- File sizing: Target 100β250 MB compressed for optimal parallelism
- Monitoring: Use PIPE_STATUS, COPY_HISTORY, and PIPE_ERR views
Detailed Explanation
Snowpipe Fundamentals
Snowpipe is Snowflake's serverless, continuous data ingestion service that automates the loading of data from cloud storage into Snowflake tables. Unlike bulk COPY operations that require explicit execution, Snowpipe operates continuously, monitoring designated storage locations for new files and loading them automatically as they arrive. This architecture eliminates the need for manual intervention or custom ETL orchestration for simple data loading scenarios.
The service operates on a micro-batch processing model, typically detecting and loading new files within 1-3 minutes of their arrival in cloud storage. This near-real-time capability makes Snowpipe suitable for use cases requiring low-latency data availability without the complexity of streaming architectures. Snowpipe handles all aspects of the ingestion process, including file format detection, data validation, transformation, compression, and micro-partition creation.
Auto-Ingest Architecture
Auto-ingest is the recommended approach for production Snowpipe implementations. It leverages cloud-native event notification systems (S3 Event Notifications, Azure Event Grid, GCS Triggers) to detect new files and automatically trigger the ingestion process. This event-driven architecture provides several advantages over polling-based approaches, including lower latency, reduced cloud storage API costs, and elimination of polling overhead.
The auto-ingest configuration requires three components: a storage integration that defines Snowflake's access permissions to cloud storage, a pipe that specifies the ingestion logic and target table, and cloud-side event notifications that route file creation events to Snowpipe. The storage integration uses cloud IAM roles to grant Snowflake read access to specific storage locations without exposing credentials.
File Format Optimization
Snowpipe supports multiple file formats, each optimized for different data characteristics. CSV files are suitable for simple, flat data structures but lack built-in schema information and compression. JSON files support semi-structured data with nested structures but incur parsing overhead. Parquet and ORC formats provide columnar storage with built-in compression, schema evolution, and predicate pushdown capabilities, making them ideal for analytics workloads.
File size optimization is critical for Snowpipe performance. Snowflake recommends file sizes between 100MB and 250MB compressed. Smaller files create overhead from processing many micro-partitions, while larger files can cause timeout issues and reduce parallelism. The optimal file size depends on data complexity, compression ratio, and ingestion velocity requirements.
Error Handling and Idempotency
Snowpipe implements comprehensive error handling with automatic retry logic for transient failures. When a file fails to load due to network issues, format errors, or constraint violations, Snowpipe retries the operation with exponential backoff. Persistent errors are logged to the PIPE_ERR table, providing detailed error messages and file information for troubleshooting.
A key design principle of Snowpipe is idempotencyβloading the same file multiple times produces the same result as loading it once. Snowpipe tracks all successfully loaded files in the COPY_HISTORY table and automatically excludes them from subsequent loads. This behavior enables safe retries and ensures data consistency even in the presence of duplicate events.
Monitoring and Alerting
Snowpipe provides comprehensive monitoring through several system views and tables. The COPY_HISTORY table tracks all completed load operations with detailed metrics including rows loaded, bytes processed, and error counts. The PIPE_STATUS function provides real-time information about pipe health, including pending file counts and last error details. The LOAD_HISTORY table provides historical information about file loading patterns.
Alerting can be configured using Snowflake's alerting features or external monitoring tools. Common alerts include pipe lag (pending files growing), error rates exceeding thresholds, and ingestion latency exceeding SLA requirements. Integration with cloud monitoring services (CloudWatch, Azure Monitor, Stackdriver) enables centralized alerting across the entire data pipeline.
Key Concepts Table
| Component | Purpose | Configuration | Cost Model |
|---|---|---|---|
| Storage Integration | Cloud access permissions | IAM roles, external stages | Free |
| Pipe | Ingestion logic definition | SQL COPY statement | Free (compute charged separately) |
| Auto-Ingest | Event-driven loading | Cloud event notifications | Serverless (per file processed) |
| COPY_HISTORY | Load operation tracking | Automatic | Free |
| PIPE_ERR | Error logging | Automatic | Free |
| File Format | Compression | Schema Support | Best For |
|---|---|---|---|
| CSV | External | None | Simple flat data |
| JSON | External | Full nesting | Semi-structured data |
| Parquet | Built-in (Snappy/ZSTD) | Full schema | Analytics workloads |
| ORC | Built-in (Zlib/ZSTD) | Full schema | Hive ecosystem |
| Avro | Built-in (Deflate) | Full schema | Schema evolution |
| Metric | Target | Warning | Critical |
|---|---|---|---|
| Ingestion Latency | < 3 min | 3-5 min | > 5 min |
| File Size | 100-250 MB | 50-100 MB | < 50 MB or > 500 MB |
| Error Rate | < 0.1% | 0.1-1% | > 1% |
| Pipe Lag | < 10 files | 10-100 files | > 100 files |
| Daily Throughput | Per SLA | 80-100% of SLA | > 100% of SLA |
Code Examples
-- Example 1: Create storage integration for S3
CREATE STORAGE INTEGRATION s3_sales_integration
TYPE = EXTERNAL_STAGE
ENABLED = TRUE
STORAGE_PROVIDER = S3
STORAGE_AWS_ROLE_ARN = 'arn:aws:iam::123456789012:role/SnowflakeAccessRole'
STORAGE_ALLOWED_LOCATIONS = ('s3://data-lake-sales/data/', 's3://data-lake-sales/archive/');
-- Example 2: Create pipe with auto-ingest
CREATE PIPE sales_auto_ingest
AUTO_INGEST = TRUE
INTEGRATION = 's3_sales_integration'
ERROR_INTEGRATION = 's3_error_integration'
COMMENT = 'Auto-ingest sales data from S3'
AS
COPY INTO sales_raw
FROM '@s3_sales_integration/data/'
FILE_FORMAT = (
TYPE = CSV
SKIP_HEADER = 1
NULL_IF = ('', 'NULL', 'null')
FIELD_OPTIONALLY_ENCLOSED_BY = '"'
TRIM_SPACE = TRUE
ERROR_ON_COLUMN_COUNT_MISMATCH = TRUE
)
ON_ERROR = 'CONTINUE';
-- Example 3: Create pipe for JSON data
CREATE PIPE json_events_pipe
AUTO_INGEST = TRUE
INTEGRATION = 's3_events_integration'
AS
COPY INTO events_raw
FROM '@s3_events_integration/events/'
FILE_FORMAT = (
TYPE = JSON
STRIP_OUTER_ARRAY = TRUE
IGNORE_UTF8_ERRORS = TRUE
)
ON_ERROR = 'SKIP_FILE';
-- Example 4: Monitor pipe status
SELECT *
FROM TABLE(INFORMATION_SCHEMA.PIPE_STATUS('sales_auto_ingest'));
-- Check pending files
SELECT
pipe_name,
pending_file_count,
last_loaded_file,
last_loaded_time,
error_count
FROM TABLE(INFORMATION_SCHEMA.PIPE_STATUS('sales_auto_ingest'));
-- Example 5: Query copy history
SELECT
pipe_name,
file_name,
file_size,
row_count,
error_count,
status,
load_time
FROM TABLE(INFORMATION_SCHEMA.COPY_HISTORY(
TABLE_NAME => 'sales_raw',
START_TIME => DATEADD(day, -7, CURRENT_TIMESTAMP())
))
ORDER BY load_time DESC;
-- Example 6: Analyze load errors
SELECT
file_name,
error_code,
error_message,
row_number,
column_number,
loaded_time
FROM sales_raw$errors
ORDER BY loaded_time DESC
LIMIT 100;
-- Example 7: Create file format for semi-structured data
CREATE OR REPLACE FILE FORMAT json_format
TYPE = JSON
STRIP_OUTER_ARRAY = FALSE
STRIP_NULL_VALUES = FALSE
IGNORE_UTF8_ERRORS = FALSE
COMMENT = 'Standard JSON format for event data';
CREATE OR REPLACE FILE FORMAT parquet_format
TYPE = PARQUET
TRIM_SPACE = TRUE
IGNORE_UTF8_ERRORS = FALSE
COMMENT = 'Parquet format for analytics data';
-- Example 8: Advanced pipe configuration
CREATE PIPE analytics_ingest
AUTO_INGEST = TRUE
INTEGRATION = 'azure_analytics_integration'
PATTERN = '.*sales_[0-9]{8}\\.csv$'
ERROR_INTEGRATION = 'azure_error_queue'
COMMENT = 'Ingest sales data with pattern matching'
AS
COPY INTO analytics.sales_fact
FROM @azure_analytics_integration/raw_data/
FILE_FORMAT = (
TYPE = CSV
SKIP_HEADER = 1
NULL_IF = ('NA', 'N/A', '')
FIELD_DELIMITER = '|'
DATE_FORMAT = 'YYYY-MM-DD'
TIMESTAMP_FORMAT = 'YYYY-MM-DD HH24:MI:SS.FF3'
)
VALIDATION_MODE = RETURN_ERRORS
ON_ERROR = 'SKIP_FILE';
-- Example 9: Create alert for pipe lag
CREATE OR REPLACE ALERT pipe_lag_alert
WAREHOUSE = 'monitoring_wh'
SCHEDULE = 'USING CRON 0 */4 * * * America/New_York'
IF (
SELECT COUNT(*)
FROM TABLE(INFORMATION_SCHEMA.PIPE_STATUS('sales_auto_ingest'))
WHERE pending_file_count > 100
) > 0
THEN
CALL SYSTEM$SEND_EMAIL(
'alert_channel',
'data-team@company.com',
'Snowpipe Lag Alert',
'The sales_auto_ingest pipe has excessive pending files.'
);
Performance Metrics
| Metric | Value | Description |
|---|---|---|
| Typical Latency | 1-3 minutes | End-to-end from file upload to queryable |
| Max File Size | 5 GB | Maximum supported file size |
| Recommended Size | 100-250 MB | Optimal for parallel processing |
| Max Pipes per Account | 100 | Default limit (can be increased) |
| Max Integration per Pipe | 1 | One integration per pipe |
| Daily Quota | Unlimited | No limit on daily ingestion |
Best Practices
-
Optimize file sizes: Target 100-250 MB compressed files for optimal parallelism and processing efficiency.
-
Use Parquet for analytics: Prefer Parquet over CSV for analytics workloads due to built-in compression, schema information, and predicate pushdown.
-
Implement monitoring: Set up alerts for pipe lag, error rates, and ingestion latency to catch issues proactively.
-
Handle errors gracefully: Use ON_ERROR = 'CONTINUE' for non-critical data and 'SKIP_FILE' for malformed files. Review PIPE_ERR tables regularly.
-
Use pattern matching: Configure PATTERN parameters to filter specific file types and avoid accidental ingestion of unrelated files.
-
Implement idempotent loads: Leverage Snowpipe's automatic deduplication to safely handle retries and duplicate events.
-
Separate concerns: Create dedicated pipes for different data sources, file types, or target tables to simplify monitoring and error isolation.
-
Monitor pipe health: Regularly check PIPE_STATUS for pending file counts and error conditions. Address issues before they impact SLAs.
-
Test with small files: Validate file format configurations and error handling with sample files before production deployment.
-
Plan for scale: Design ingestion architecture to handle peak loads by using multiple pipes and optimizing cloud storage configurations.
See Also
- PySpark Iceberg - Iceberg table ingestion patterns
- Delta Lake on Databricks - Delta Lake ingestion comparison
- Data Warehouse Concepts - Data warehouse design principles