BigQuery Omni: Multi-Cloud Analytics

Master BigQuery Omni for multi-cloud analytics including AWS S3, Azure Blob, cross-cloud queries, and hybrid architectures.

16 min readAdvanced

BigQuery Omni Architecture

📊 BigQuery Architecture for Data Engineering

Interview Tip: BigQuery separates storage and compute. Queries are charged by slots (compute) + bytes scanned. Always partition and cluster tables to reduce costs.

Implementation

Querying AWS S3 Data

-- Create external connection to AWS
CREATE EXTERNAL CONNECTION `aws_connection`
OPTIONS (
  connection_type = 'AZURE',
  connection_properties = '{"azure_properties": {"tenant_id": "...", "federated_application_client_id": "..."}}'
);

-- Or for AWS
CREATE EXTERNAL CONNECTION `aws_connection`
OPTIONS (
  connection_type = 'AWS',
  connection_properties = '{"aws_properties": {"role_arn": "arn:aws:iam::123456789:role/bigquery-role"}}'
);

-- Create external table for S3 data
CREATE OR REPLACE EXTERNAL TABLE `project.dataset.aws_sales`
WITH CONNECTION `us-central1.aws_connection`
OPTIONS (
  format = 'PARQUET',
  uris = ['s3://my-aws-bucket/sales/**/*.parquet']
);

-- Query across GCS and S3
SELECT 'GCS' as source, COUNT(*) as cnt
FROM `project.dataset.gcs_sales`
UNION ALL
SELECT 'S3' as source, COUNT(*) as cnt
FROM `project.dataset.aws_sales`;

✨

Best Practice: Use BigQuery Omni for multi-cloud analytics when data must remain in its original location. For frequent cross-cloud queries, consider consolidating data to GCS for better performance. Use authorized datasets to share data across projects.

💬

Common Interview Questions

Q1: What is BigQuery Omni?

Answer: BigQuery Omni allows querying data stored in AWS S3 and Azure Blob Storage directly from BigQuery. It uses compute capacity in the same region as the data, providing cross-cloud analytics without data movement.

Q2: When would you use BigQuery Omni vs. data replication?

Answer: Use Omni when data must remain in its original cloud location (compliance, latency). Use replication when you need frequent, low-latency queries on the data. Omni is better for occasional cross-cloud queries; replication for analytics-heavy workloads.

Q3: What are the cost implications of BigQuery Omni?

Answer: Omni charges for compute in the remote region + data transfer. Queries scan data in the remote location. For frequent queries, consolidating to GCS may be more cost-effective. Consider data transfer costs when deciding.

Q4: What formats does BigQuery Omni support?

Answer: Parquet, ORC, Avro, JSON, and CSV. Parquet and ORC are recommended for best performance due to columnar storage and predicate pushdown support.

Q5: How do you set up BigQuery Omni?

Answer: 1) Create external connection to AWS/Azure, 2) Grant BigQuery access to remote storage, 3) Create external tables pointing to remote data, 4) Query using standard BigQuery SQL, 5) Monitor cross-cloud data transfer costs.

BigQuery Omni: Multi-Cloud & Cross-Cloud Queries