Advanced LookML Patterns
# Advanced explore with fanout prevention
explore: orders {
label: "Sales Orders"
group_label: "Sales"
# Join with fanout prevention
join: order_items {
type: left_outer
sql_on: ${orders.order_id} = ${order_items.order_id} ;;
relationship: one_to_many
}
# Aggregate awareness for performance
aggregate_table: daily_orders {
query: {
dimensions: [order_date_date]
measures: [order_count, total_revenue]
}
materialization: {
datagroup_trigger: daily_etl
}
}
# Always filter for performance
always_filter: {
filters: [order_date_date: "30 days"]
}
# Access control
access_filter: {
field: customers.region
user_attribute: allowed_regions
}
# Conditionally join
join: promotions {
type: left_outer
sql_on: ${orders.promotion_id} = ${promotions.promotion_id} AND {% condition promotions.promotion_date %} ${promotions.start_date} {% endcondition %} ;;
relationship: many_to_one
}
}
Persistent Derived Tables (PDTs)
view: user_lifetime_value {
derived_table: {
sql:
SELECT
user_id,
SUM(amount) as ltv,
COUNT(*) as order_count,
MIN(order_date) as first_order_date,
MAX(order_date) as last_order_date
FROM `project.analytics.orders`
WHERE status = 'completed'
GROUP BY 1
;;
datagroup_trigger: daily_etl
distribution: "user_id"
sortkeys: ["user_id"]
partition_keys: ["first_order_date"]
cluster_keys: ["user_id"]
}
dimension: user_id {
type: string
primary_key: yes
sql: ${TABLE}.user_id ;;
}
dimension: ltv {
type: number
sql: ${TABLE}.ltv ;;
value_format_name: usd
}
dimension: ltv_tier {
type: tier
tiers: [0, 100, 500, 1000, 5000]
style: integer
sql: ${ltv} ;;
}
measure: total_ltv {
type: sum
sql: ${ltv} ;;
value_format_name: usd
}
}
β¨
Best Practice: Use aggregate tables for dashboard performance. Implement datagroups for PDT refresh scheduling. Use always_filter to prevent full-table scans. Document all explores and fields. Test LookML changes in development mode before deploying.
Common Interview Questions
Q1: What is aggregate awareness in Looker?
Answer: Aggregate awareness automatically routes queries to the most efficient pre-computed aggregate table. Looker selects the smallest aggregate that satisfies the query, improving performance and reducing costs.
Q2: What is a datagroup in Looker?
Answer: Datagroups define refresh policies for PDTs. They can trigger on schedule, SQL query results, or custom triggers. Datagroups ensure PDTs are refreshed at appropriate intervals based on data freshness requirements.
Q3: How do you prevent fanout in Looker joins?
Answer: Use symmetric aggregates (count_distinct, sum_distinct) to prevent double-counting. Use join_types that limit row multiplication. Consider creating derived tables that pre-aggregate data before joining.
Q4: What is the difference between derived tables and PDTs?
Answer: Derived tables are virtual tables computed at query time. PDTs are materialized in the database and refreshed on a schedule. PDTs improve performance for expensive queries but require storage and refresh time.
Q5: How do you optimize Looker for large datasets?
Answer: 1) Use aggregate tables for dashboards, 2) Implement PDTs for expensive queries, 3) Use always_filter to prevent full-table scans, 4) Optimize LookML joins, 5) Use BigQuery materialized views, 6) Implement caching.