Demo Datasets

Atlas ships with three demo datasets for evaluation and development.

Quick Comparison

	Simple	Cybersec	E-commerce
Tables	3	62	52
Rows	~330	~500K	~480K
Database	Postgres only	Postgres only	Postgres only
Tech debt patterns	None	4 patterns	4 patterns
Best for	Quick start, tutorials	Realistic evaluation, profiler testing	Universally understood domain, production-scale evaluation

Use Simple when you want a fast setup. Use Cybersec when you want to see how Atlas handles a real-world B2B SaaS database with messy data. Use E-commerce when you want a universally understood domain (orders, products, customers) at production scale.

Simple Demo (default)

Three clean tables: companies (50), people (~200), accounts (80). No tech debt, no ambiguity.

Note: Bare --demo (with no argument) defaults to simple. If you already ran bun run db:up, the simple demo data is already seeded -- just run bun run atlas -- init (without --demo) to profile it. The --demo flag is for when you have not run db:up or want to explicitly re-seed.

# Option A: Using db:up (already seeds simple demo)
bun run db:up
bun run atlas -- init

# Option B: Explicit seed (without db:up, or to re-seed)
bun run atlas -- init --demo

Cybersec Demo: Sentinel Security

A 62-table B2B cybersecurity SaaS company database. ~500K rows spanning 2019-2025. Covers vulnerability management, threat detection, compliance, billing, and reporting.

Loading

Requires PostgreSQL (uses GENERATE_SERIES for data generation). Note that bun run db:up only seeds the simple demo -- the --demo cybersec flag seeds the cybersec dataset on top.

# Start local Postgres (if not already running)
bun run db:up

# Load cybersec demo and generate semantic layer
bun run atlas -- init --demo cybersec

# Start Atlas
bun run dev

To reset and reload from scratch:

bun run db:reset
bun run atlas -- init --demo cybersec

Tech Debt Patterns

The cybersec dataset was designed to include four real-world tech debt patterns that the profiler detects automatically:

1. Missing FK Constraints

Eight *_id columns reference other tables but lack FOREIGN KEY constraints. The profiler infers these from naming conventions and marks them with inferred: true in the generated YAML.

Column	Should reference
`scan_results.asset_id`	`assets.id`
`scan_results.vulnerability_id`	`vulnerabilities.id`
`scan_results.scan_id`	`scans.id`
`agent_heartbeats.agent_id`	`agents.id`
`alerts.incident_id`	`incidents.id`
`api_requests.user_id`	`users.id`
`invoice_line_items.subscription_id`	`subscriptions.id`
`vulnerability_instances.scan_result_id`	`scan_results.id`

2. Abandoned Tables

Six tables match legacy/temp naming patterns and have no inbound foreign keys:

old_scan_results_v2 -- abandoned schema migration
temp_asset_import_2024 -- one-time CSV import artifact
feature_flags_legacy -- replaced by LaunchDarkly
notifications_backup -- migration backup
user_sessions_archive -- old session system
legacy_risk_scores -- old risk scoring algorithm

The profiler flags these with possibly_abandoned and prepends a warning in use_cases.

3. Inconsistent Enums

Some text columns have case-inconsistent values:

organizations.industry: 'Technology', 'tech', 'Tech', 'TECHNOLOGY'
compliance_findings.status: 'pass', 'Pass', 'PASS'

The profiler detects these and adds LOWER() guidance in the glossary.

4. Denormalized Tables

Four reporting/cache tables duplicate data from other tables:

scan_results_denormalized -- pre-joined scan results
daily_scan_stats -- daily rollup
monthly_vulnerability_summary -- monthly aggregates
executive_dashboard_cache -- pre-computed dashboard data

The profiler flags these with possibly_denormalized.

Schema Overview

Table groups:

Core Business (7 tables): organizations, users, teams, roles
Billing (6 tables): plans, subscriptions, invoices
Asset Management (6 tables): assets, agents, agent_heartbeats
Vulnerability Management (7 tables): vulnerabilities, scans, scan_results
Threat & Incident (6 tables): incidents, alerts
Threat Intelligence (3 tables): threat_feeds, IOCs, threat_actors
Compliance (4 tables): frameworks, controls, assessments, findings
Product Usage (5 tables): API keys, requests, feature usage, login events
Reporting (5 tables): denormalized/rollup tables
Reports & Dashboards (4 tables): saved reports, dashboards
Integration & Audit (3 tables): integrations, audit_log
Legacy (6 tables): abandoned tables

E-commerce Demo: NovaMart

A 52-table DTC (direct-to-consumer) home goods brand database. ~480K rows spanning 2020-2025. NovaMart was founded during the pandemic, started with bedding, expanded to kitchen/bath/outdoor, and launched a small marketplace in 2022.

Loading

Requires PostgreSQL (uses GENERATE_SERIES for data generation). Note that bun run db:up only seeds the simple demo -- the --demo ecommerce flag seeds the ecommerce dataset on top.

# Start local Postgres (if not already running)
bun run db:up

# Load e-commerce demo and generate semantic layer
bun run atlas -- init --demo ecommerce

# Start Atlas
bun run dev

To reset and reload from scratch:

bun run db:reset
bun run atlas -- init --demo ecommerce

Tech Debt Patterns

The e-commerce dataset includes the same four tech-debt patterns as the cybersec demo (missing FK constraints, abandoned tables, inconsistent enums, denormalized tables). E-commerce-specific examples:

19 missing FK constraints -- plus ~1.5% of payments reference nonexistent orders (orphaned from deleted test orders)
4 abandoned tables -- old_orders_v1, temp_product_import_2023, legacy_analytics_events, payment_methods_backup
Inconsistent enums -- e.g. customers.acquisition_source: 'Google'/'google'/'GOOGLE'; loyalty_accounts.tier: 'Gold'/'gold'/'GOLD'
5 denormalized tables -- orders_denormalized, daily_sales_summary, monthly_revenue_summary, product_performance_cache, customer_ltv_cache

Schema Evolution Artifacts

The dataset includes five schema evolution instances where old and new columns coexist:

Table	Old column	New column	Issue
`products`	`price` (dollars)	`price_cents` (cents)	~40% NULL price_cents
`customers`	`phone`	`mobile_phone`	~15% NULL mobile_phone (all post-2022 customers have it)
`shipments`	`carrier` (text)	`carrier_id` (integer)	~60% NULL carrier_id
`orders`	`shipping_cost`	--	dollars pre-2023-06, cents after
`product_reviews`	`rating` (int)	`rating_decimal` (numeric)	~70% NULL rating_decimal

Schema Overview

Table groups:

Core Commerce (6 tables): customers, addresses, segments, loyalty
Product Catalog (7 tables): products, variants, images, tags, inventory
Marketplace (4 tables): sellers, applications, payouts, performance
Orders & Transactions (7 tables): orders, items, events, payments, refunds, gift cards
Shipping & Fulfillment (5 tables): shipments, carriers, returns
Marketing & Promotions (5 tables): promotions, email campaigns, UTM tracking
Reviews (3 tables): product reviews, responses, helpfulness
Reporting (5 tables): denormalized/rollup/cache tables
Site Analytics (3 tables): page views, cart events, search queries
Internal / Ops (3 tables): admin users, audit log, settings
Legacy (4 tables): abandoned tables

Quick Comparison

Simple Demo (default)

Suggested Questions

Cybersec Demo: Sentinel Security

Loading

Suggested Questions

Tech Debt Patterns

1. Missing FK Constraints

2. Abandoned Tables

3. Inconsistent Enums

4. Denormalized Tables

Schema Overview

E-commerce Demo: NovaMart

Loading

Suggested Questions

Tech Debt Patterns

Schema Evolution Artifacts

Schema Overview

On this page