Semantic Layer Concepts

A glossary of Atlas-specific terms and how the semantic layer pieces fit together.

The semantic layer is how you teach Atlas about your data — a set of YAML files in semantic/ that describe your tables, columns, relationships, business terms, and key metrics. The agent reads these files before writing any SQL. This page defines the key terms and explains how the pieces connect.

For the full YAML field reference and generation commands, see Semantic Layer.

Glossary

Entity

A YAML file that describes one database table or view. Each entity lives in semantic/entities/ and tells the agent what columns exist, what they mean, how to join to other tables, and what questions this table can answer.

Entities have a type that reflects their role in the semantic layer:

Type	Purpose	Example
`dimension_table`	Lookup or reference data — attributes you group and filter by	Customers, Products, Regions
`fact_table`	Events or transactions — things you count and aggregate	Orders, Logins, Payments
`view`	A database view — simplified profile with no PKs, FKs, or measures	`monthly_summary`
`materialized_view`	A PostgreSQL materialized view — treated like a view but refreshable	`daily_stats`

atlas init assigns fact_table to all non-view tables by default. Set dimension_table manually for lookup/reference tables, or use --enrich to let the LLM classify them.

Grain

A short phrase on an entity that describes what one row represents. Grain helps the agent avoid double-counting and understand cardinality.

grain: one row per subscription account

If the grain is "one row per order line item" and a user asks "how many orders?", the agent knows it needs COUNT(DISTINCT order_id), not COUNT(*).

Dimension

A column or expression on an entity that users can filter, group, or select. Dimensions describe attributes of the data — who, what, where, when. Each dimension has a name, sql expression, type, and description.

dimensions:
  - name: plan
    sql: plan
    type: string
    description: Subscription tier (Free, Starter, Pro, Enterprise)
    sample_values: [Enterprise, Pro, Starter, Free]

A dimension marked primary_key: true identifies the unique row. A dimension marked virtual: true is computed from an expression, not a raw database column (see Virtual dimension).

Supported types: string, number, integer, real, numeric, date, boolean, timestamp, text

Virtual dimension

A computed dimension defined by a SQL expression — typically a CASE statement, date extraction, or formula. Virtual dimensions don't map to a single raw column; they're derived at query time.

dimensions:
  - name: monthly_value_bucket
    sql: |-
      CASE
        WHEN monthly_value = 0 THEN 'Free'
        WHEN monthly_value < 100 THEN 'Low'
        WHEN monthly_value < 1000 THEN 'Mid'
        ELSE 'High'
      END
    type: string
    description: MRR tier bucket — Free / Low / Mid / High
    virtual: true
    sample_values: [Free, Low, Mid, High]

Use virtual dimensions for common groupings (revenue tiers, date buckets, status categories) that users ask about frequently but don't exist as raw columns.

Virtual dimensions can also be defined in a separate virtual_dimensions section of the entity YAML. Both approaches are functionally identical — the agent reads the raw YAML either way. The demo dataset and atlas init both use the inline virtual: true style.

Measure

A pre-defined aggregation on an entity. Measures tell the agent how to summarize a column — count it, sum it, average it. Unlike dimensions (which describe individual rows), measures describe aggregate values across rows.

measures:
  - name: total_mrr
    sql: monthly_value
    type: sum
    description: Total Monthly Recurring Revenue

Aggregation types: count_distinct, sum, avg, min, max

Without measures, the agent might SUM a column that should be counted, or AVG one that should be summed.

Join

A relationship between two entities, defined by foreign key columns and cardinality. Joins tell the agent how to combine tables correctly.

joins:
  - target_entity: Companies
    relationship: many_to_one
    join_columns:
      from: company_id
      to: id
    description: Each account belongs to one company (customer)

Relationship types: many_to_one, one_to_many, one_to_one, many_to_many

When a question spans multiple entities (e.g., "MRR by industry" requires joining accounts to companies), the agent uses these definitions to generate the correct JOIN clause.

Cross-source join

A join between entities in different datasources. Used in multi-datasource setups where tables live in separate databases.

cross_source_joins:
  - source: warehouse
    target_table: events
    on: "orders.id = events.order_id"
    relationship: one_to_many
    description: Join orders to event stream in warehouse

Cross-source joins require a multi-datasource configuration.

Query pattern

A pre-written SQL query for a common question. When a user asks something that matches a query pattern, the agent can use it directly instead of generating SQL from scratch.

query_patterns:
  - name: churn_rate_by_plan
    description: Churn rate by plan type
    sql: |-
      SELECT plan,
             COUNT(*) FILTER (WHERE status = 'Churned') AS churned,
             COUNT(*) AS total,
             ROUND(100.0 * COUNT(*) FILTER (WHERE status = 'Churned') / COUNT(*), 1) AS churn_pct
      FROM accounts
      GROUP BY plan
      ORDER BY churn_pct DESC

Query patterns are especially valuable for complex queries involving window functions, CTEs, or non-obvious filter conditions that the agent might get wrong.

Sample values

An array of example values on a dimension. Sample values help the agent understand the domain — what a column actually contains — so it can write correct WHERE clauses without guessing.

- name: status
  sql: status
  type: string
  description: Current billing status
  sample_values: [Active, Churned, Inactive, Suspended]

Without sample values, the agent might filter on WHERE status = 'active' (lowercase) when the actual data uses 'Active' (title case). Sample values eliminate this class of error.

Metric

An authoritative, named business KPI defined in semantic/metrics/*.yml. Unlike measures (which are building blocks on a single entity), metrics are standalone definitions with exact SQL that the agent uses as-is.

metrics:
  - id: churn_rate
    label: Churn Rate
    description: Percentage of all accounts that have churned
    type: derived
    unit: percent
    sql: |-
      SELECT ROUND(100.0 * COUNT(*) FILTER (WHERE status = 'Churned') / COUNT(*), 1) AS churn_rate
      FROM accounts
    aggregation: ratio
    objective: minimize

Metric types:

Type	Description	Example
`atomic`	Base metric directly from data	Total MRR, Active Accounts
`derived`	Calculated from an expression or ratio	Churn Rate, ARPA
`breakdown`	A metric split by one or more dimensions	MRR by Plan, Churn by Region

Metric SQL is authoritative — the agent executes it verbatim. If a metric's SQL is wrong, the agent will produce wrong results. Treat metric files as source code.

Glossary term

A business term defined in semantic/glossary.yml. Glossary terms resolve ambiguity — when the same word means different things in different tables, the glossary tells the agent to ask for clarification.

Each term is marked defined (unambiguous) or ambiguous (needs clarification):

terms:
  MRR:
    status: defined
    definition: >
      Monthly Recurring Revenue. The sum of monthly_value across all active accounts.
    tables:
      - accounts

  revenue:
    status: ambiguous
    note: >
      Could mean company revenue (companies.revenue — annual total revenue)
      or account revenue (accounts.monthly_value — our MRR). Clarify which.
    possible_mappings:
      - companies.revenue
      - accounts.monthly_value

When a user asks about "revenue" and the glossary marks it as ambiguous, the agent asks "Do you mean company revenue or account MRR?" rather than guessing.

Catalog

An index file (semantic/catalog.yml) that gives the agent a high-level map of the entire semantic layer — every entity's purpose, use cases, and common questions. The agent reads the catalog first to decide which entity YAML files are relevant to a question.

version: "1.0"
name: Acme Analytics
description: Sales and customer analytics
entities:
  - name: Accounts
    file: entities/accounts.yml
    grain: one row per subscription account
    description: Subscription accounts with plan, status, and MRR
    use_for:
      - MRR and revenue analysis
      - Churn and retention analysis
    common_questions:
      - What is total MRR by plan type?
      - How many accounts are churned vs active?
glossary: glossary.yml
metrics:
  - file: metrics/accounts.yml
    description: Subscription and revenue metrics

How the pieces fit together

The semantic layer isn't just documentation — it's the runtime context the agent uses to answer questions. Here's the flow:

User asks a question
       ↓
1. Agent reads catalog.yml
   → Finds which entities are relevant (by use_for, common_questions)
       ↓
2. Agent reads entity YAML files
   → Learns columns, types, sample values, joins, query patterns
       ↓
3. Agent checks glossary.yml
   → If any term is ambiguous, asks the user for clarification
       ↓
4. Agent checks metrics/*.yml
   → If the question matches a defined metric, uses that SQL exactly
       ↓
5. Agent writes SQL
   → Uses dimensions, measures, joins, and virtual dimensions from the entity
   → Query is validated against the table whitelist (only entity tables allowed)
       ↓
6. SQL passes through validation pipeline
   → See SQL Validation for details
       ↓
7. Results returned to user

Each layer of the semantic layer serves a distinct purpose:

File	The agent uses it to...
`catalog.yml`	Decide which entities to read for a given question
`entities/*.yml`	Understand columns, types, relationships, and pre-built queries
`glossary.yml`	Resolve ambiguous terms before writing SQL
`metrics/*.yml`	Execute authoritative SQL for defined KPIs

Annotated entity YAML

Here's an annotated entity file based on the Atlas demo dataset (semantic/entities/accounts.yml), showing how the concepts above appear in practice. Some fields are simplified for clarity — see the Semantic Layer reference for the full field reference.

name: Accounts                         # Display name
type: fact_table                        # dimension_table | fact_table | view | materialized_view
table: accounts                         # Database table (can be schema-qualified: analytics.accounts)
grain: one row per subscription account # What one row represents
description: |                          # Multi-line context for the agent
  Subscription accounts with plan tier, billing status, and MRR contribution.

dimensions:
  - name: id                           # Primary key
    sql: id
    type: number
    description: Unique account identifier
    primary_key: true

  - name: company_id                   # Foreign key (used by joins below)
    sql: company_id
    type: number
    description: Which customer owns this account

  - name: plan                         # String with sample_values for correct WHERE clauses
    sql: plan
    type: string
    description: Subscription tier (Free, Starter, Pro, Enterprise)
    sample_values: [Enterprise, Pro, Starter, Free]

  - name: status                       # Ambiguous term — glossary flags for clarification
    sql: status
    type: string
    description: Billing status (Active, Inactive, Suspended, Churned)
    sample_values: [Active, Churned, Inactive, Suspended]

  - name: monthly_value                # Numeric — no sample_values needed
    sql: monthly_value
    type: number
    description: Monthly Recurring Revenue (MRR) in dollars

  - name: monthly_value_bucket         # Virtual dimension — CASE expression
    sql: |-
      CASE
        WHEN monthly_value = 0 THEN 'Free'
        WHEN monthly_value < 100 THEN 'Low'
        WHEN monthly_value < 1000 THEN 'Mid'
        ELSE 'High'
      END
    type: string
    description: MRR tier bucket
    virtual: true
    sample_values: [Free, Low, Mid, High]

  - name: contract_start_year          # Virtual dimension — date extraction
    sql: EXTRACT(YEAR FROM contract_start)
    type: number
    description: Contract start year for cohort analysis
    virtual: true

measures:
  - name: total_mrr                    # SUM aggregation
    sql: monthly_value
    type: sum
    description: Total Monthly Recurring Revenue

  - name: account_count                # COUNT_DISTINCT aggregation
    sql: id
    type: count_distinct
    description: Number of unique accounts

joins:
  - target_entity: Companies           # many_to_one | one_to_many | one_to_one | many_to_many
    relationship: many_to_one
    join_columns:
      from: company_id                 # Local FK
      to: id                           # Target PK

use_cases:                             # Read by the catalog for routing
  - MRR analysis by plan, status, and time period
  - Churn and retention analysis

query_patterns:
  - name: mrr_by_plan                  # Pre-written SQL for a common question
    description: Total MRR by plan tier
    sql: |-
      SELECT plan, SUM(monthly_value) AS total_mrr, COUNT(*) AS account_count
      FROM accounts WHERE status = 'Active'
      GROUP BY plan ORDER BY total_mrr DESC

Comparison with other tools

If you've used other semantic layer or metrics tools, here's how Atlas terminology maps:

Concept	Atlas	Cube	dbt	Looker (LookML)
Table description file	Entity YAML	Cube (model file)	Model (`.sql` + `.yml`)	View / Explore
Column / attribute	Dimension	Dimension	Column (in `.yml`)	Dimension
Computed column	Virtual dimension	Dimension (with `sql`)	SQL expression in model	Dimension (with `sql`)
Aggregation	Measure	Measure	Metric (semantic layer)	Measure
Standalone KPI	Metric	— (use measures)	Metric (MetricFlow)	— (use measures)
Table relationship	Join	Join	Relationship (`.yml`)	Explore join
Pre-written query	Query pattern	—	—	—
Business term glossary	Glossary (`glossary.yml`)	—	—	—
Entity index	Catalog (`catalog.yml`)	—	—	—
Example values	Sample values	—	—	—
Row granularity	Grain	— (implied)	Grain (in models)	—

Atlas's glossary, query patterns, sample values, and catalog don't have direct equivalents in other tools. They exist because Atlas is agent-first — an LLM needs more context than a BI tool to write correct SQL. Sample values prevent case-sensitivity errors, query patterns encode tribal knowledge, and the glossary resolves ambiguity that a human analyst would handle intuitively.

Semantic Layer reference — full YAML field reference and generation commands
Quick Start — get Atlas running and generate your first semantic layer
SQL Validation — how the semantic layer powers the table whitelist
Schema Evolution — detecting and resolving drift between your database and semantic layer
CLI Reference — atlas init, atlas diff, and other semantic layer commands

Semantic Layer Concepts

On this page