TPC-DS Tutorial
Complete walkthrough of the TPC-DS benchmark semantic model.
Overview
The TPC-DS Tutorial is a nearly complete model of the TPC-DS benchmark system. It demonstrates best practices for modeling a complex data warehouse in Strata.
Project Structure
tpcds-tutorial/
├── project.yml # Project configuration
├── datasources.yml # Datasource connection
├── models/ # Semantic model files
│ ├── catalog/ # Catalog channel models
│ ├── common/ # Shared dimension tables
│ ├── inventory/ # Inventory models
│ └── store/ # Store channel models
├── migrations/ # Schema migrations
└── tests/ # Query validation tests
Key Components
Datasource Configuration
tpcds:
adapter: postgres
name: POSTGRES
description: Postgres datasource
tier: warm
query_timeout: 3600
host: localhost
port: 5432
database: tpcds
schema: public
ssl: false
username: tpcds
Table Models
Fact Tables
Store Sales (models/store/tbl.store_sales.yml):
name: Store Sales
physical_name: store_sales
datasource: tpcds
cost: 100 # Fact table: higher cost
fields:
- type: dimension
name: Store Ticket number
data_type: integer
expression:
primary_key: true
sql: ss_ticket_number
- type: measure
name: Store Quantity
data_type: integer
expression:
sql: sum(ss_quantity)
- type: measure
name: Store Sales Price
data_type: decimal
expression:
sql: sum(ss_sales_price)
Dimension Tables
Store (models/store/tbl.store.yml):
name: Store
physical_name: store
datasource: tpcds
cost: 10 # Dimension table: lower cost
fields:
- type: dimension
name: Store Name
data_type: string
expression:
sql: s_store_name
Relationships
Store Relationships (models/store/rel.store.yml):
datasource: tpcds
# Store Sales to Date Dimension
store_sales_sold_date:
left: Store Sales
right: Date
sql: left.ss_sold_date_sk = right.d_date_sk
cardinality: many_to_one
# Store Sales to Customer Dimension
store_sales_customer:
left: Store Sales
right: Customer
sql: left.ss_customer_sk = right.c_customer_sk
cardinality: many_to_one
# Store Sales to Store Dimension
store_sales_store:
left: Store Sales
right: Store
sql: left.ss_store_sk = right.s_store_sk
cardinality: many_to_one
Key Patterns
1. Cost Configuration
- Dimension tables:
cost: 10(lower cost = preferred) - Fact tables:
cost: 100(higher cost = used when needed)
2. Primary Keys
Mark unique identifiers as primary keys:
- type: dimension
name: Store Ticket number
expression:
primary_key: true
sql: ss_ticket_number
3. Measure Aggregation
All measures include aggregation:
- type: measure
name: Store Quantity
expression:
sql: sum(ss_quantity)
4. Relationship Cardinality
Most relationships are many_to_one:
store_sales_store:
left: Store Sales
right: Store
sql: left.ss_store_sk = right.s_store_sk
cardinality: many_to_one
Testing
Test files validate SQL generation:
name: Store Sales Positive Values
projections:
- Store Sales Price
- Store Quantity
assert_sql: "SELECT sum(T0.\"ss_quantity\") AS \"Store Quantity\", sum(T0.\"ss_sales_price\") AS \"Store Sales Price\" FROM store_sales T0"
Migrations
Example rename migration:
- type: rename
hook: pre
entity: dimension
from: Store Name
to: Store Location Name
Lessons Learned
- Organize by domain - Group related tables (catalog, store, common)
- Use appropriate costs - Lower for dimensions, higher for facts
- Mark primary keys - Helps with query optimization
- Define relationships - Enable cross-table queries
- Test queries - Validate SQL generation