BigQuery Omni Architecture
Implementation
Querying AWS S3 Data
-- Create external connection to AWS
CREATE EXTERNAL CONNECTION `aws_connection`
OPTIONS (
connection_type = 'AZURE',
connection_properties = '{"azure_properties": {"tenant_id": "...", "federated_application_client_id": "..."}}'
);
-- Or for AWS
CREATE EXTERNAL CONNECTION `aws_connection`
OPTIONS (
connection_type = 'AWS',
connection_properties = '{"aws_properties": {"role_arn": "arn:aws:iam::123456789:role/bigquery-role"}}'
);
-- Create external table for S3 data
CREATE OR REPLACE EXTERNAL TABLE `project.dataset.aws_sales`
WITH CONNECTION `us-central1.aws_connection`
OPTIONS (
format = 'PARQUET',
uris = ['s3://my-aws-bucket/sales/**/*.parquet']
);
-- Query across GCS and S3
SELECT 'GCS' as source, COUNT(*) as cnt
FROM `project.dataset.gcs_sales`
UNION ALL
SELECT 'S3' as source, COUNT(*) as cnt
FROM `project.dataset.aws_sales`;
β¨
Best Practice: Use BigQuery Omni for multi-cloud analytics when data must remain in its original location. For frequent cross-cloud queries, consider consolidating data to GCS for better performance. Use authorized datasets to share data across projects.
Common Interview Questions
Q1: What is BigQuery Omni?
Answer: BigQuery Omni allows querying data stored in AWS S3 and Azure Blob Storage directly from BigQuery. It uses compute capacity in the same region as the data, providing cross-cloud analytics without data movement.
Q2: When would you use BigQuery Omni vs. data replication?
Answer: Use Omni when data must remain in its original cloud location (compliance, latency). Use replication when you need frequent, low-latency queries on the data. Omni is better for occasional cross-cloud queries; replication for analytics-heavy workloads.
Q3: What are the cost implications of BigQuery Omni?
Answer: Omni charges for compute in the remote region + data transfer. Queries scan data in the remote location. For frequent queries, consolidating to GCS may be more cost-effective. Consider data transfer costs when deciding.
Q4: What formats does BigQuery Omni support?
Answer: Parquet, ORC, Avro, JSON, and CSV. Parquet and ORC are recommended for best performance due to columnar storage and predicate pushdown support.
Q5: How do you set up BigQuery Omni?
Answer: 1) Create external connection to AWS/Azure, 2) Grant BigQuery access to remote storage, 3) Create external tables pointing to remote data, 4) Query using standard BigQuery SQL, 5) Monitor cross-cloud data transfer costs.