Semantic Layer Concepts
A glossary of Atlas-specific terms and how the semantic layer pieces fit together.
The semantic layer is how you teach Atlas about your data — a set of YAML files in semantic/ that describe your tables, columns, relationships, business terms, and key metrics. The agent reads these files before writing any SQL. This page defines the key terms and explains how the pieces connect.
For the full YAML field reference and generation commands, see Semantic Layer.
Glossary
Entity
A YAML file that describes one database table or view. Each entity lives in semantic/entities/ and tells the agent what columns exist, what they mean, how to join to other tables, and what questions this table can answer.
Entities have a type that reflects their role in the semantic layer:
| Type | Purpose | Example |
|---|---|---|
dimension_table | Lookup or reference data — attributes you group and filter by | Customers, Products, Regions |
fact_table | Events or transactions — things you count and aggregate | Orders, Logins, Payments |
view | A database view — simplified profile with no PKs, FKs, or measures | monthly_summary |
materialized_view | A PostgreSQL materialized view — treated like a view but refreshable | daily_stats |
atlas init assigns fact_table to all non-view tables by default. Set dimension_table manually for lookup/reference tables, or use --enrich to let the LLM classify them.
Grain
A short phrase on an entity that describes what one row represents. Grain helps the agent avoid double-counting and understand cardinality.
grain: one row per subscription accountIf the grain is "one row per order line item" and a user asks "how many orders?", the agent knows it needs COUNT(DISTINCT order_id), not COUNT(*).
Dimension
A column or expression on an entity that users can filter, group, or select. Dimensions describe attributes of the data — who, what, where, when. Each dimension has a name, sql expression, type, and description.
dimensions:
- name: plan
sql: plan
type: string
description: Subscription tier (Free, Starter, Pro, Enterprise)
sample_values: [Enterprise, Pro, Starter, Free]A dimension marked primary_key: true identifies the unique row. A dimension marked virtual: true is computed from an expression, not a raw database column (see Virtual dimension).
Supported types: string, number, integer, real, numeric, date, boolean, timestamp, text
Virtual dimension
A computed dimension defined by a SQL expression — typically a CASE statement, date extraction, or formula. Virtual dimensions don't map to a single raw column; they're derived at query time.
dimensions:
- name: monthly_value_bucket
sql: |-
CASE
WHEN monthly_value = 0 THEN 'Free'
WHEN monthly_value < 100 THEN 'Low'
WHEN monthly_value < 1000 THEN 'Mid'
ELSE 'High'
END
type: string
description: MRR tier bucket — Free / Low / Mid / High
virtual: true
sample_values: [Free, Low, Mid, High]Use virtual dimensions for common groupings (revenue tiers, date buckets, status categories) that users ask about frequently but don't exist as raw columns.
Virtual dimensions can also be defined in a separate virtual_dimensions section of the entity YAML. Both approaches are functionally identical — the agent reads the raw YAML either way. The demo dataset and atlas init both use the inline virtual: true style.
Measure
A pre-defined aggregation on an entity. Measures tell the agent how to summarize a column — count it, sum it, average it. Unlike dimensions (which describe individual rows), measures describe aggregate values across rows.
measures:
- name: total_mrr
sql: monthly_value
type: sum
description: Total Monthly Recurring RevenueAggregation types: count_distinct, sum, avg, min, max
Without measures, the agent might SUM a column that should be counted, or AVG one that should be summed.
Join
A relationship between two entities, defined by foreign key columns and cardinality. Joins tell the agent how to combine tables correctly.
joins:
- target_entity: Companies
relationship: many_to_one
join_columns:
from: company_id
to: id
description: Each account belongs to one company (customer)Relationship types: many_to_one, one_to_many, one_to_one, many_to_many
When a question spans multiple entities (e.g., "MRR by industry" requires joining accounts to companies), the agent uses these definitions to generate the correct JOIN clause.
Cross-source join
A join between entities in different datasources. Used in multi-datasource setups where tables live in separate databases.
cross_source_joins:
- source: warehouse
target_table: events
on: "orders.id = events.order_id"
relationship: one_to_many
description: Join orders to event stream in warehouseCross-source joins require a multi-datasource configuration.
Query pattern
A pre-written SQL query for a common question. When a user asks something that matches a query pattern, the agent can use it directly instead of generating SQL from scratch.
query_patterns:
- name: churn_rate_by_plan
description: Churn rate by plan type
sql: |-
SELECT plan,
COUNT(*) FILTER (WHERE status = 'Churned') AS churned,
COUNT(*) AS total,
ROUND(100.0 * COUNT(*) FILTER (WHERE status = 'Churned') / COUNT(*), 1) AS churn_pct
FROM accounts
GROUP BY plan
ORDER BY churn_pct DESCQuery patterns are especially valuable for complex queries involving window functions, CTEs, or non-obvious filter conditions that the agent might get wrong.
Sample values
An array of example values on a dimension. Sample values help the agent understand the domain — what a column actually contains — so it can write correct WHERE clauses without guessing.
- name: status
sql: status
type: string
description: Current billing status
sample_values: [Active, Churned, Inactive, Suspended]Without sample values, the agent might filter on WHERE status = 'active' (lowercase) when the actual data uses 'Active' (title case). Sample values eliminate this class of error.
Metric
An authoritative, named business KPI defined in semantic/metrics/*.yml. Unlike measures (which are building blocks on a single entity), metrics are standalone definitions with exact SQL that the agent uses as-is.
metrics:
- id: churn_rate
label: Churn Rate
description: Percentage of all accounts that have churned
type: derived
unit: percent
sql: |-
SELECT ROUND(100.0 * COUNT(*) FILTER (WHERE status = 'Churned') / COUNT(*), 1) AS churn_rate
FROM accounts
aggregation: ratio
objective: minimizeMetric types:
| Type | Description | Example |
|---|---|---|
atomic | Base metric directly from data | Total MRR, Active Accounts |
derived | Calculated from an expression or ratio | Churn Rate, ARPA |
breakdown | A metric split by one or more dimensions | MRR by Plan, Churn by Region |
Metric SQL is authoritative — the agent executes it verbatim. If a metric's SQL is wrong, the agent will produce wrong results. Treat metric files as source code.
Glossary term
A business term defined in semantic/glossary.yml. Glossary terms resolve ambiguity — when the same word means different things in different tables, the glossary tells the agent to ask for clarification.
Each term is marked defined (unambiguous) or ambiguous (needs clarification):
terms:
MRR:
status: defined
definition: >
Monthly Recurring Revenue. The sum of monthly_value across all active accounts.
tables:
- accounts
revenue:
status: ambiguous
note: >
Could mean company revenue (companies.revenue — annual total revenue)
or account revenue (accounts.monthly_value — our MRR). Clarify which.
possible_mappings:
- companies.revenue
- accounts.monthly_valueWhen a user asks about "revenue" and the glossary marks it as ambiguous, the agent asks "Do you mean company revenue or account MRR?" rather than guessing.
Catalog
An index file (semantic/catalog.yml) that gives the agent a high-level map of the entire semantic layer — every entity's purpose, use cases, and common questions. The agent reads the catalog first to decide which entity YAML files are relevant to a question.
version: "1.0"
name: Acme Analytics
description: Sales and customer analytics
entities:
- name: Accounts
file: entities/accounts.yml
grain: one row per subscription account
description: Subscription accounts with plan, status, and MRR
use_for:
- MRR and revenue analysis
- Churn and retention analysis
common_questions:
- What is total MRR by plan type?
- How many accounts are churned vs active?
glossary: glossary.yml
metrics:
- file: metrics/accounts.yml
description: Subscription and revenue metricsHow the pieces fit together
The semantic layer isn't just documentation — it's the runtime context the agent uses to answer questions. Here's the flow:
User asks a question
↓
1. Agent reads catalog.yml
→ Finds which entities are relevant (by use_for, common_questions)
↓
2. Agent reads entity YAML files
→ Learns columns, types, sample values, joins, query patterns
↓
3. Agent checks glossary.yml
→ If any term is ambiguous, asks the user for clarification
↓
4. Agent checks metrics/*.yml
→ If the question matches a defined metric, uses that SQL exactly
↓
5. Agent writes SQL
→ Uses dimensions, measures, joins, and virtual dimensions from the entity
→ Query is validated against the table whitelist (only entity tables allowed)
↓
6. SQL passes through validation pipeline
→ See SQL Validation for details
↓
7. Results returned to userEach layer of the semantic layer serves a distinct purpose:
| File | The agent uses it to... |
|---|---|
catalog.yml | Decide which entities to read for a given question |
entities/*.yml | Understand columns, types, relationships, and pre-built queries |
glossary.yml | Resolve ambiguous terms before writing SQL |
metrics/*.yml | Execute authoritative SQL for defined KPIs |
Annotated entity YAML
Here's an annotated entity file based on the Atlas demo dataset (semantic/entities/accounts.yml), showing how the concepts above appear in practice. Some fields are simplified for clarity — see the Semantic Layer reference for the full field reference.
name: Accounts # Display name
type: fact_table # dimension_table | fact_table | view | materialized_view
table: accounts # Database table (can be schema-qualified: analytics.accounts)
grain: one row per subscription account # What one row represents
description: | # Multi-line context for the agent
Subscription accounts with plan tier, billing status, and MRR contribution.
dimensions:
- name: id # Primary key
sql: id
type: number
description: Unique account identifier
primary_key: true
- name: company_id # Foreign key (used by joins below)
sql: company_id
type: number
description: Which customer owns this account
- name: plan # String with sample_values for correct WHERE clauses
sql: plan
type: string
description: Subscription tier (Free, Starter, Pro, Enterprise)
sample_values: [Enterprise, Pro, Starter, Free]
- name: status # Ambiguous term — glossary flags for clarification
sql: status
type: string
description: Billing status (Active, Inactive, Suspended, Churned)
sample_values: [Active, Churned, Inactive, Suspended]
- name: monthly_value # Numeric — no sample_values needed
sql: monthly_value
type: number
description: Monthly Recurring Revenue (MRR) in dollars
- name: monthly_value_bucket # Virtual dimension — CASE expression
sql: |-
CASE
WHEN monthly_value = 0 THEN 'Free'
WHEN monthly_value < 100 THEN 'Low'
WHEN monthly_value < 1000 THEN 'Mid'
ELSE 'High'
END
type: string
description: MRR tier bucket
virtual: true
sample_values: [Free, Low, Mid, High]
- name: contract_start_year # Virtual dimension — date extraction
sql: EXTRACT(YEAR FROM contract_start)
type: number
description: Contract start year for cohort analysis
virtual: true
measures:
- name: total_mrr # SUM aggregation
sql: monthly_value
type: sum
description: Total Monthly Recurring Revenue
- name: account_count # COUNT_DISTINCT aggregation
sql: id
type: count_distinct
description: Number of unique accounts
joins:
- target_entity: Companies # many_to_one | one_to_many | one_to_one | many_to_many
relationship: many_to_one
join_columns:
from: company_id # Local FK
to: id # Target PK
use_cases: # Read by the catalog for routing
- MRR analysis by plan, status, and time period
- Churn and retention analysis
query_patterns:
- name: mrr_by_plan # Pre-written SQL for a common question
description: Total MRR by plan tier
sql: |-
SELECT plan, SUM(monthly_value) AS total_mrr, COUNT(*) AS account_count
FROM accounts WHERE status = 'Active'
GROUP BY plan ORDER BY total_mrr DESCComparison with other tools
If you've used other semantic layer or metrics tools, here's how Atlas terminology maps:
| Concept | Atlas | Cube | dbt | Looker (LookML) |
|---|---|---|---|---|
| Table description file | Entity YAML | Cube (model file) | Model (.sql + .yml) | View / Explore |
| Column / attribute | Dimension | Dimension | Column (in .yml) | Dimension |
| Computed column | Virtual dimension | Dimension (with sql) | SQL expression in model | Dimension (with sql) |
| Aggregation | Measure | Measure | Metric (semantic layer) | Measure |
| Standalone KPI | Metric | — (use measures) | Metric (MetricFlow) | — (use measures) |
| Table relationship | Join | Join | Relationship (.yml) | Explore join |
| Pre-written query | Query pattern | — | — | — |
| Business term glossary | Glossary (glossary.yml) | — | — | — |
| Entity index | Catalog (catalog.yml) | — | — | — |
| Example values | Sample values | — | — | — |
| Row granularity | Grain | — (implied) | Grain (in models) | — |
Atlas's glossary, query patterns, sample values, and catalog don't have direct equivalents in other tools. They exist because Atlas is agent-first — an LLM needs more context than a BI tool to write correct SQL. Sample values prevent case-sensitivity errors, query patterns encode tribal knowledge, and the glossary resolves ambiguity that a human analyst would handle intuitively.
Related
- Semantic Layer reference — full YAML field reference and generation commands
- Quick Start — get Atlas running and generate your first semantic layer
- SQL Validation — how the semantic layer powers the table whitelist
- Schema Evolution — detecting and resolving drift between your database and semantic layer
- CLI Reference —
atlas init,atlas diff, and other semantic layer commands