Demo Datasets
Pre-built demo datasets for evaluation and development.
Self-hosted only
Demo datasets are for self-hosted local development and evaluation. On app.useatlas.dev, the onboarding wizard offers a demo dataset option when you create your workspace — no CLI needed.
Atlas ships with three demo datasets for evaluation and development. Each targets a different use case — pick the one that matches your goal.
Quick Comparison
| Simple | Cybersec | E-commerce | |
|---|---|---|---|
| Company | — | Sentinel Security | NovaMart |
| Domain | CRM (companies, people, accounts) | B2B cybersecurity SaaS | DTC home goods brand + marketplace |
| Tables | 3 | 62 | 52 |
| Rows | ~330 | ~500K | ~480K |
| Database | Postgres only | Postgres only | Postgres only |
| Tech debt patterns | None | 4 patterns | 4 patterns |
| Schema evolution | No | No | 5 instances (old + new columns coexist) |
| Best for | Quick start, tutorials | Realistic evaluation, profiler testing | Universally understood domain, production-scale evaluation |
| Load time | ~5 seconds | ~30 seconds | ~30 seconds |
| CLI flag | --demo or --demo simple | --demo cybersec | --demo ecommerce |
Which dataset should I choose?
Use Simple when you want a fast setup — three clean tables, no ambiguity, perfect for tutorials and first-run evaluation.
Use Cybersec when you want to see how Atlas handles a real-world B2B SaaS database with messy data. Includes missing FK constraints, abandoned tables, inconsistent enums, and denormalized reporting tables. Good for testing the profiler and evaluating agent reasoning on complex schemas.
Use E-commerce when you want a universally understood domain (orders, products, customers) at production scale. Includes the same four tech debt patterns as cybersec, plus five schema evolution artifacts where old and new columns coexist. Good for demos to non-technical stakeholders who already understand retail data.
Simple Demo (default)
Three clean tables: companies (50), people (~200), accounts (80). No tech debt, no ambiguity.
Note: Bare
--demo(with no argument) defaults tosimple. If you already ranbun run db:up, the simple demo data is already seeded -- just runbun run atlas -- init(without--demo) to profile it. The--demoflag is for when you have not rundb:upor want to explicitly re-seed.
# Option A: Using db:up (already seeds simple demo)
bun run db:up
bun run atlas -- init
# Option B: Explicit seed (without db:up, or to re-seed)
bun run atlas -- init --demoSimple demo questions
Try these in the Atlas chat UI to exercise different patterns:
Aggregation:
- "How many companies are there by industry?"
- "Which industries have the most accounts?"
Joins:
- "Who are the top 5 people by account value?"
- "Show me all people at companies in the Technology industry"
Filtering:
- "List all companies with more than 3 accounts"
- "Which people are associated with accounts created in the last year?"
Cybersec Demo: Sentinel Security
A 62-table B2B cybersecurity SaaS company database. ~500K rows spanning 2019-2025. Covers vulnerability management, threat detection, compliance, billing, and reporting.
Loading the cybersec dataset
Requires PostgreSQL (uses GENERATE_SERIES for data generation). Note that bun run db:up seeds the simple demo only -- the --demo cybersec flag seeds the cybersec dataset on top.
# Start local Postgres + sandbox sidecar (if not already running)
bun run db:up
# Load cybersec demo and generate semantic layer
bun run atlas -- init --demo cybersec
# Start Atlas (containers already running from db:up)
bun run devTo reset and reload from scratch:
bun run db:reset
bun run atlas -- init --demo cybersecCybersec demo questions
Try these in the Atlas chat UI to exercise different patterns:
Basic aggregation:
- "How many vulnerabilities by severity?"
- "What's the total invoice amount by organization?"
- "How many scans ran in the last 30 days?"
Joins:
- "Which organizations have the most critical scan results?"
- "Show me the top 10 users by number of alerts acknowledged"
- "Which compliance frameworks have the most failing controls?"
Time series:
- "What's the trend in critical vulnerabilities over the past 6 months?"
- "Show me monthly invoice totals"
Aggregation + filtering:
- "What's the average time to remediate by severity level?"
- "Alert noise ratio: what percentage of alerts become incidents?"
Tech debt discovery (exercises profiler warnings):
- "Break down organizations by industry" (surfaces enum inconsistency via profiler note)
- "Show me scan results for assets that no longer exist" (orphan rows)
- "What tables exist that look abandoned?" (agent reads profiler_notes)
- "Compare scan_results_denormalized with scan_results" (denormalized flag)
Cybersec tech debt patterns
The cybersec dataset was designed to include four real-world tech debt patterns that the profiler detects automatically:
1. Missing FK Constraints
Eight *_id columns reference other tables but lack FOREIGN KEY constraints. The profiler infers these from naming conventions and marks them with inferred: true in the generated YAML.
| Column | Should reference |
|---|---|
scan_results.asset_id | assets.id |
scan_results.vulnerability_id | vulnerabilities.id |
scan_results.scan_id | scans.id |
agent_heartbeats.agent_id | agents.id |
alerts.incident_id | incidents.id |
api_requests.user_id | users.id |
invoice_line_items.subscription_id | subscriptions.id |
vulnerability_instances.scan_result_id | scan_results.id |
2. Abandoned Tables
Six tables match legacy/temp naming patterns and have no inbound foreign keys:
old_scan_results_v2-- abandoned schema migrationtemp_asset_import_2024-- one-time CSV import artifactfeature_flags_legacy-- replaced by LaunchDarklynotifications_backup-- migration backupuser_sessions_archive-- old session systemlegacy_risk_scores-- old risk scoring algorithm
The profiler flags these with possibly_abandoned and prepends a warning in use_cases.
3. Inconsistent Enums
Some text columns have case-inconsistent values:
organizations.industry: 'Technology', 'tech', 'Tech', 'TECHNOLOGY'compliance_findings.status: 'pass', 'Pass', 'PASS'
The profiler detects these and adds LOWER() guidance in the glossary.
4. Denormalized Tables
Four reporting/cache tables duplicate data from other tables:
scan_results_denormalized-- pre-joined scan resultsdaily_scan_stats-- daily rollupmonthly_vulnerability_summary-- monthly aggregatesexecutive_dashboard_cache-- pre-computed dashboard data
The profiler flags these with possibly_denormalized.
Cybersec schema overview
Table groups:
- Core Business (7 tables): organizations, users, teams, roles
- Billing (6 tables): plans, subscriptions, invoices
- Asset Management (6 tables): assets, agents, agent_heartbeats
- Vulnerability Management (7 tables): vulnerabilities, scans, scan_results
- Threat & Incident (6 tables): incidents, alerts
- Threat Intelligence (3 tables): threat_feeds, IOCs, threat_actors
- Compliance (4 tables): frameworks, controls, assessments, findings
- Product Usage (5 tables): API keys, requests, feature usage, login events
- Reporting (5 tables): denormalized/rollup tables
- Reports & Dashboards (4 tables): saved reports, dashboards
- Integration & Audit (3 tables): integrations, audit_log
- Legacy (6 tables): abandoned tables
E-commerce Demo: NovaMart
A 52-table DTC (direct-to-consumer) home goods brand database. ~480K rows spanning 2020-2025. NovaMart was founded during the pandemic, started with bedding, expanded to kitchen/bath/outdoor, and launched a small marketplace in 2022.
Loading the e-commerce dataset
Requires PostgreSQL (uses GENERATE_SERIES for data generation). Note that bun run db:up seeds the simple demo only -- the --demo ecommerce flag seeds the ecommerce dataset on top.
# Start local Postgres + sandbox sidecar (if not already running)
bun run db:up
# Load e-commerce demo and generate semantic layer
bun run atlas -- init --demo ecommerce
# Start Atlas (containers already running from db:up)
bun run devTo reset and reload from scratch:
bun run db:reset
bun run atlas -- init --demo ecommerceE-commerce demo questions
Try these in the Atlas chat UI to exercise different patterns:
Sales & revenue:
- "What's the monthly revenue trend since launch?"
- "Top 10 products by total revenue"
- "Average order value by customer segment"
- "Revenue breakdown: own products vs marketplace"
Customer analytics:
- "How many customers are in each loyalty tier?"
- "What's the customer retention rate by cohort?"
- "Breakdown of new vs returning customers per month"
- "Average customer lifetime value by acquisition source"
Operations:
- "Average delivery time by carrier"
- "Return rate by product category"
- "Top reasons for returns"
- "Shipping cost per order over time"
Marketing:
- "Which UTM sources drive the most revenue?"
- "Email campaign conversion rates"
- "Promo code usage rate by campaign"
Tech debt discovery (exercises profiler warnings):
- "Why are there two price fields on products?" (schema evolution)
- "Break down customers by acquisition source" (surfaces enum inconsistency)
- "What tables look abandoned?" (agent reads profiler_notes)
- "Compare orders_denormalized with orders" (denormalized flag)
E-commerce tech debt patterns
The e-commerce dataset includes the same four tech-debt patterns as the cybersec demo (missing FK constraints, abandoned tables, inconsistent enums, denormalized tables). E-commerce-specific examples:
- 19 missing FK constraints -- plus ~1.5% of payments reference nonexistent orders (orphaned from deleted test orders)
- 4 abandoned tables --
old_orders_v1,temp_product_import_2023,legacy_analytics_events,payment_methods_backup - Inconsistent enums -- e.g.
customers.acquisition_source: 'Google'/'google'/'GOOGLE';loyalty_accounts.tier: 'Gold'/'gold'/'GOLD' - 5 denormalized tables --
orders_denormalized,daily_sales_summary,monthly_revenue_summary,product_performance_cache,customer_ltv_cache
Schema Evolution Artifacts
The dataset includes five schema evolution instances where old and new columns coexist:
| Table | Old column | New column | Issue |
|---|---|---|---|
products | price (dollars) | price_cents (cents) | ~40% NULL price_cents |
customers | phone | mobile_phone | ~15% NULL mobile_phone (all post-2022 customers have it) |
shipments | carrier (text) | carrier_id (integer) | ~60% NULL carrier_id |
orders | shipping_cost | -- | dollars pre-2023-06, cents after |
product_reviews | rating (int) | rating_decimal (numeric) | ~70% NULL rating_decimal |
E-commerce schema overview
Table groups:
- Core Commerce (6 tables): customers, addresses, segments, loyalty
- Product Catalog (7 tables): products, variants, images, tags, inventory
- Marketplace (4 tables): sellers, applications, payouts, performance
- Orders & Transactions (7 tables): orders, items, events, payments, refunds, gift cards
- Shipping & Fulfillment (5 tables): shipments, carriers, returns
- Marketing & Promotions (5 tables): promotions, email campaigns, UTM tracking
- Reviews (3 tables): product reviews, responses, helpfulness
- Reporting (5 tables): denormalized/rollup/cache tables
- Site Analytics (3 tables): page views, cart events, search queries
- Internal / Ops (3 tables): admin users, audit log, settings
- Legacy (4 tables): abandoned tables