Extended Blending Groups
Use extended blending groups to treat the same logical dimension across multiple tables as one for querying.
Overview
When the same business dimension (e.g. Order Date, Ship Date, Activity Date) exists in several tables with different column names or grains, you can assign a shared extended_blend_group on each. The planner can then blend these dimensions so that a single query can group or filter by that concept across tables.
For the basic YAML setup, see Extended blending.
When to Use Extended Blending
- Cross-table date alignment — e.g. “Order Date” in
ordersand “Order Date” inshipmentsorreturnsso users can filter by one order-date concept. - Shared attributes in a star or snowflake — The same dimension name and meaning in multiple fact tables; you want one blended dimension instead of one per table.
- Multi-fact reporting — Dashboards or queries that slice by a common dimension (date, customer, product) across several fact tables.
Setting Up extended_blend_group
Use the same extended_blend_group value on each dimension that should blend:
# tbl.orders.yml
fields:
- type: dimension
name: Order Date
data_type: date
extended_blend_group: order_date_blend
grains:
- day
- week
- month
- quarter
- year
expression:
sql: order_date
# tbl.shipments.yml
fields:
- type: dimension
name: Order Date
data_type: date
extended_blend_group: order_date_blend
grains:
- day
- week
- month
- quarter
- year
expression:
sql: ship_date
The string is case-insensitive. Only dimensions in the same group blend together.
Example: Multiple Fact Tables
Sales, returns, and inventory snapshots each have a date. Define one blend group for “activity date”:
# tbl.sales.yml
fields:
- type: dimension
name: Activity Date
data_type: date
extended_blend_group: activity_date
grains:
- day
- week
- month
- quarter
- year
expression:
sql: sale_date
# tbl.returns.yml
fields:
- type: dimension
name: Activity Date
data_type: date
extended_blend_group: activity_date
grains:
- day
- week
- month
- quarter
- year
expression:
sql: return_date
# tbl.inventory.yml
fields:
- type: dimension
name: Activity Date
data_type: date
extended_blend_group: activity_date
grains:
- day
- week
- month
- quarter
- year
expression:
sql: snapshot_date
Queries that use “Activity Date” can then apply it across sales, returns, and inventory according to the relationships and paths in the model.
Requirements and Best Practices
- Same
data_type— All dimensions in a group should have the same type (e.g. alldateor alldate_time). - Compatible grains — For
date/date_time, use grains that make sense for all members (e.g. day, week, month, quarter, year). - Same universe/branch — Blending applies within a branch; dimensions in different branches do not blend.
- Meaningful group names — Use a group name that reflects the business concept (e.g.
order_date_blend,activity_date). - Avoid over-blending — Only put dimensions in a group when they truly represent the same concept; mixing different semantics can produce confusing results.
How automatic data blending works (developer view)
Extended blend groups are one part of Strata’s automatic data blending system. In industry terms, this is sometimes called data blending or multipass SQL; Strata uses the terms automatic data blending and extended blend groups. At a high level:
- There is a single logical key for a concept like “activity date” in the semantic layer.
- Each fact table exposes one or more physical columns that implement that key, and you tie those columns together using a shared
extended_blend_group. - At query time, the planner knows which physical column to use for each fact, and can safely aggregate and merge results on the logical key.
When a query asks for measures from multiple facts that share a blend group:
- Universe selection picks the relevant universes/facts.
- For each fact/universe, the planner builds a node in a query graph that aggregates that fact to the lowest common dimensionality (for example, by the blended date and any other shared dimensions).
- The planner then adds a final merge node that joins these aggregate nodes on the blended dimensions.
The generated SQL typically uses Common Table Expressions (CTEs) to represent this graph:
WITH
store_sales_agg AS (
SELECT activity_date, SUM(store_amount) AS store_revenue
FROM store_sales
GROUP BY activity_date
),
web_sales_agg AS (
SELECT activity_date, SUM(web_amount) AS web_revenue
FROM web_sales
GROUP BY activity_date
),
final AS (
SELECT
COALESCE(s.activity_date, w.activity_date) AS activity_date,
store_revenue,
web_revenue
FROM store_sales_agg s
FULL OUTER JOIN web_sales_agg w
ON s.activity_date = w.activity_date
)
SELECT *
FROM final;
You do not need to write this SQL yourself—the semantic model (relationships, grains, and extended_blend_group values) gives the planner enough information to construct it automatically.
Relation to Relationships and Paths
Blending does not replace relationships. Tables still need proper one_to_many, many_to_one, or one_to_one relationships. Extended blend groups tell the planner how to treat dimensions that live in different tables as one for grouping and filtering when paths exist.
Next Steps
- Extended blending — YAML and
extended_blend_groupon dimensions - Dimensions
- Relationships