Demo Datasets

Self-hosted only

Demo datasets are for self-hosted local development and evaluation. On app.useatlas.dev, the onboarding wizard offers a demo dataset option when you create your workspace — no CLI needed.

Atlas ships with three demo datasets for evaluation and development. Each targets a different use case — pick the one that matches your goal.

Quick Comparison

	Simple	Cybersec	E-commerce
Company	—	Sentinel Security	NovaMart
Domain	CRM (companies, people, accounts)	B2B cybersecurity SaaS	DTC home goods brand + marketplace
Tables	3	62	52
Rows	~330	~500K	~480K
Database	Postgres only	Postgres only	Postgres only
Tech debt patterns	None	4 patterns	4 patterns
Schema evolution	No	No	5 instances (old + new columns coexist)
Best for	Quick start, tutorials	Realistic evaluation, profiler testing	Universally understood domain, production-scale evaluation
Load time	~5 seconds	~30 seconds	~30 seconds
CLI flag	`--demo` or `--demo simple`	`--demo cybersec`	`--demo ecommerce`

Which dataset should I choose?

Use Simple when you want a fast setup — three clean tables, no ambiguity, perfect for tutorials and first-run evaluation.

Use Cybersec when you want to see how Atlas handles a real-world B2B SaaS database with messy data. Includes missing FK constraints, abandoned tables, inconsistent enums, and denormalized reporting tables. Good for testing the profiler and evaluating agent reasoning on complex schemas.

Use E-commerce when you want a universally understood domain (orders, products, customers) at production scale. Includes the same four tech debt patterns as cybersec, plus five schema evolution artifacts where old and new columns coexist. Good for demos to non-technical stakeholders who already understand retail data.

Simple Demo (default)

Three clean tables: companies (50), people (~200), accounts (80). No tech debt, no ambiguity.

Note: Bare --demo (with no argument) defaults to simple. If you already ran bun run db:up, the simple demo data is already seeded -- just run bun run atlas -- init (without --demo) to profile it. The --demo flag is for when you have not run db:up or want to explicitly re-seed.

# Option A: Using db:up (already seeds simple demo)
bun run db:up
bun run atlas -- init

# Option B: Explicit seed (without db:up, or to re-seed)
bun run atlas -- init --demo

Simple demo questions

Try these in the Atlas chat UI to exercise different patterns:

Aggregation:

"How many companies are there by industry?"
"Which industries have the most accounts?"

Joins:

"Who are the top 5 people by account value?"
"Show me all people at companies in the Technology industry"

Filtering:

"List all companies with more than 3 accounts"
"Which people are associated with accounts created in the last year?"

Cybersec Demo: Sentinel Security

A 62-table B2B cybersecurity SaaS company database. ~500K rows spanning 2019-2025. Covers vulnerability management, threat detection, compliance, billing, and reporting.

Loading the cybersec dataset

Requires PostgreSQL (uses GENERATE_SERIES for data generation). Note that bun run db:up seeds the simple demo only -- the --demo cybersec flag seeds the cybersec dataset on top.

# Start local Postgres + sandbox sidecar (if not already running)
bun run db:up

# Load cybersec demo and generate semantic layer
bun run atlas -- init --demo cybersec

# Start Atlas (containers already running from db:up)
bun run dev

To reset and reload from scratch:

bun run db:reset
bun run atlas -- init --demo cybersec

Cybersec demo questions

Try these in the Atlas chat UI to exercise different patterns:

Basic aggregation:

"How many vulnerabilities by severity?"
"What's the total invoice amount by organization?"
"How many scans ran in the last 30 days?"

Joins:

"Which organizations have the most critical scan results?"
"Show me the top 10 users by number of alerts acknowledged"
"Which compliance frameworks have the most failing controls?"

Time series:

"What's the trend in critical vulnerabilities over the past 6 months?"
"Show me monthly invoice totals"

Aggregation + filtering:

"What's the average time to remediate by severity level?"
"Alert noise ratio: what percentage of alerts become incidents?"

Tech debt discovery (exercises profiler warnings):

"Break down organizations by industry" (surfaces enum inconsistency via profiler note)
"Show me scan results for assets that no longer exist" (orphan rows)
"What tables exist that look abandoned?" (agent reads profiler_notes)
"Compare scan_results_denormalized with scan_results" (denormalized flag)

Cybersec tech debt patterns

The cybersec dataset was designed to include four real-world tech debt patterns that the profiler detects automatically:

1. Missing FK Constraints

Eight *_id columns reference other tables but lack FOREIGN KEY constraints. The profiler infers these from naming conventions and marks them with inferred: true in the generated YAML.

Column	Should reference
`scan_results.asset_id`	`assets.id`
`scan_results.vulnerability_id`	`vulnerabilities.id`
`scan_results.scan_id`	`scans.id`
`agent_heartbeats.agent_id`	`agents.id`
`alerts.incident_id`	`incidents.id`
`api_requests.user_id`	`users.id`
`invoice_line_items.subscription_id`	`subscriptions.id`
`vulnerability_instances.scan_result_id`	`scan_results.id`

2. Abandoned Tables

Six tables match legacy/temp naming patterns and have no inbound foreign keys:

old_scan_results_v2 -- abandoned schema migration
temp_asset_import_2024 -- one-time CSV import artifact
feature_flags_legacy -- replaced by LaunchDarkly
notifications_backup -- migration backup
user_sessions_archive -- old session system
legacy_risk_scores -- old risk scoring algorithm

The profiler flags these with possibly_abandoned and prepends a warning in use_cases.

3. Inconsistent Enums

Some text columns have case-inconsistent values:

organizations.industry: 'Technology', 'tech', 'Tech', 'TECHNOLOGY'
compliance_findings.status: 'pass', 'Pass', 'PASS'

The profiler detects these and adds LOWER() guidance in the glossary.

4. Denormalized Tables

Four reporting/cache tables duplicate data from other tables:

scan_results_denormalized -- pre-joined scan results
daily_scan_stats -- daily rollup
monthly_vulnerability_summary -- monthly aggregates
executive_dashboard_cache -- pre-computed dashboard data

The profiler flags these with possibly_denormalized.

Cybersec schema overview

Table groups:

Core Business (7 tables): organizations, users, teams, roles
Billing (6 tables): plans, subscriptions, invoices
Asset Management (6 tables): assets, agents, agent_heartbeats
Vulnerability Management (7 tables): vulnerabilities, scans, scan_results
Threat & Incident (6 tables): incidents, alerts
Threat Intelligence (3 tables): threat_feeds, IOCs, threat_actors
Compliance (4 tables): frameworks, controls, assessments, findings
Product Usage (5 tables): API keys, requests, feature usage, login events
Reporting (5 tables): denormalized/rollup tables
Reports & Dashboards (4 tables): saved reports, dashboards
Integration & Audit (3 tables): integrations, audit_log
Legacy (6 tables): abandoned tables

E-commerce Demo: NovaMart

A 52-table DTC (direct-to-consumer) home goods brand database. ~480K rows spanning 2020-2025. NovaMart was founded during the pandemic, started with bedding, expanded to kitchen/bath/outdoor, and launched a small marketplace in 2022.

Loading the e-commerce dataset

Requires PostgreSQL (uses GENERATE_SERIES for data generation). Note that bun run db:up seeds the simple demo only -- the --demo ecommerce flag seeds the ecommerce dataset on top.

# Start local Postgres + sandbox sidecar (if not already running)
bun run db:up

# Load e-commerce demo and generate semantic layer
bun run atlas -- init --demo ecommerce

# Start Atlas (containers already running from db:up)
bun run dev

To reset and reload from scratch:

bun run db:reset
bun run atlas -- init --demo ecommerce

E-commerce demo questions

Try these in the Atlas chat UI to exercise different patterns:

Sales & revenue:

"What's the monthly revenue trend since launch?"
"Top 10 products by total revenue"
"Average order value by customer segment"
"Revenue breakdown: own products vs marketplace"

Customer analytics:

"How many customers are in each loyalty tier?"
"What's the customer retention rate by cohort?"
"Breakdown of new vs returning customers per month"
"Average customer lifetime value by acquisition source"

Operations:

"Average delivery time by carrier"
"Return rate by product category"
"Top reasons for returns"
"Shipping cost per order over time"

Marketing:

"Which UTM sources drive the most revenue?"
"Email campaign conversion rates"
"Promo code usage rate by campaign"

Tech debt discovery (exercises profiler warnings):

"Why are there two price fields on products?" (schema evolution)
"Break down customers by acquisition source" (surfaces enum inconsistency)
"What tables look abandoned?" (agent reads profiler_notes)
"Compare orders_denormalized with orders" (denormalized flag)

E-commerce tech debt patterns

The e-commerce dataset includes the same four tech-debt patterns as the cybersec demo (missing FK constraints, abandoned tables, inconsistent enums, denormalized tables). E-commerce-specific examples:

19 missing FK constraints -- plus ~1.5% of payments reference nonexistent orders (orphaned from deleted test orders)
4 abandoned tables -- old_orders_v1, temp_product_import_2023, legacy_analytics_events, payment_methods_backup
Inconsistent enums -- e.g. customers.acquisition_source: 'Google'/'google'/'GOOGLE'; loyalty_accounts.tier: 'Gold'/'gold'/'GOLD'
5 denormalized tables -- orders_denormalized, daily_sales_summary, monthly_revenue_summary, product_performance_cache, customer_ltv_cache

Schema Evolution Artifacts

The dataset includes five schema evolution instances where old and new columns coexist:

Table	Old column	New column	Issue
`products`	`price` (dollars)	`price_cents` (cents)	~40% NULL price_cents
`customers`	`phone`	`mobile_phone`	~15% NULL mobile_phone (all post-2022 customers have it)
`shipments`	`carrier` (text)	`carrier_id` (integer)	~60% NULL carrier_id
`orders`	`shipping_cost`	--	dollars pre-2023-06, cents after
`product_reviews`	`rating` (int)	`rating_decimal` (numeric)	~70% NULL rating_decimal

E-commerce schema overview

Table groups:

Core Commerce (6 tables): customers, addresses, segments, loyalty
Product Catalog (7 tables): products, variants, images, tags, inventory
Marketplace (4 tables): sellers, applications, payouts, performance
Orders & Transactions (7 tables): orders, items, events, payments, refunds, gift cards
Shipping & Fulfillment (5 tables): shipments, carriers, returns
Marketing & Promotions (5 tables): promotions, email campaigns, UTM tracking
Reviews (3 tables): product reviews, responses, helpfulness
Reporting (5 tables): denormalized/rollup/cache tables
Site Analytics (3 tables): page views, cart events, search queries
Internal / Ops (3 tables): admin users, audit log, settings
Legacy (4 tables): abandoned tables

Demo Datasets

On this page