Multi-Datasource Routing

Configure multiple databases in a single Atlas deployment and control how the agent routes queries to each datasource.

When your data lives in more than one database — a PostgreSQL application DB, a Snowflake warehouse, a ClickHouse analytics cluster — Atlas can query all of them from a single deployment. This guide explains how the agent selects which datasource to query and how you control that routing.

Prerequisites

Atlas installed (bun install)
Two or more datasources you want to connect (see Connect Your Data for per-database setup)
atlas.config.ts — multi-datasource requires the config file (env vars only support a single datasource)

How routing works

Every datasource in Atlas gets a connection ID — a string key like "default", "warehouse", or "analytics". The agent routes queries through two mechanisms:

Semantic layer partitioning — entity YAMLs are organized by datasource, so the agent knows which tables live where
The connectionId parameter — the agent's executeSQL tool accepts an optional connectionId to target a specific datasource

When the agent reads entity files to understand your data, the directory structure (or explicit connection field in the YAML) tells it which datasource owns each table. When it writes SQL, it includes the connectionId so the query hits the right database.

The default datasource

The datasource keyed as "default" in your config is used whenever connectionId is omitted. For single-datasource deployments, this is the only connection. For multi-datasource setups, it's the fallback.

Resolution order

executeSQL(sql, explanation, connectionId?)
    │
    ├─ connectionId provided → ConnectionRegistry.get(connectionId)
    │                              ├─ found → execute against that datasource
    │                              └─ not found → error: "Unknown connection"
    │
    └─ connectionId omitted  → ConnectionRegistry.getDefault()
                                   ├─ "default" registered → execute against default
                                   └─ fallback → ATLAS_DATASOURCE_URL env var

Configuration

Define named datasources in atlas.config.ts. Each gets its own connection pool, table whitelist, rate limits, and health checks.

// atlas.config.ts — two PostgreSQL databases
import { defineConfig } from "@atlas/api/lib/config";

export default defineConfig({
  datasources: {
    // "default" is used when no connectionId is specified
    default: {
      url: process.env.ATLAS_DATASOURCE_URL!,
      description: "Application database — users, orders, products",
    },
    warehouse: {
      url: process.env.WAREHOUSE_URL!,
      schema: "analytics",
      description: "Data warehouse — aggregated metrics and reports",
      maxConnections: 20,
    },
  },
});

// atlas.config.ts — mixed database types
import { defineConfig } from "@atlas/api/lib/config";
import { clickhousePlugin } from "@useatlas/clickhouse";
import { snowflakePlugin } from "@useatlas/snowflake";

export default defineConfig({
  datasources: {
    default: {
      url: process.env.ATLAS_DATASOURCE_URL!,
      description: "Primary PostgreSQL database",
    },
    // Named datasources for plugin-backed databases
    warehouse: {
      url: process.env.WAREHOUSE_URL!,
      description: "Snowflake data warehouse",
    },
    analytics: {
      url: process.env.CLICKHOUSE_URL!,
      description: "ClickHouse analytics cluster",
    },
  },
  plugins: [
    // Each plugin binds to its named datasource
    snowflakePlugin({ connectionId: "warehouse" }),
    clickhousePlugin({ connectionId: "analytics" }),
  ],
});

// atlas.config.ts — per-datasource rate limiting
import { defineConfig } from "@atlas/api/lib/config";
import { snowflakePlugin } from "@useatlas/snowflake";

export default defineConfig({
  datasources: {
    default: {
      url: process.env.ATLAS_DATASOURCE_URL!,
      description: "Application database",
      rateLimit: {
        queriesPerMinute: 60,  // default
        concurrency: 5,        // default
      },
    },
    warehouse: {
      url: process.env.WAREHOUSE_URL!,
      description: "Snowflake warehouse — expensive queries",
      rateLimit: {
        queriesPerMinute: 20,  // Lower limit for cost control
        concurrency: 2,        // Max 2 concurrent queries
      },
    },
  },
  plugins: [
    snowflakePlugin({ connectionId: "warehouse" }),
  ],
  // Cap total connections across all datasources
  maxTotalConnections: 50,
});

Key points

At least one datasource named "default" is recommended — it's the fallback when connectionId is omitted
Plugin-based datasources (ClickHouse, Snowflake, DuckDB, BigQuery, Salesforce) require both a datasources entry and a plugins entry with a matching connectionId
Each datasource gets independent connection pooling, rate limiting, and health monitoring
See Configuration Reference for all datasource fields

Semantic layer organization

The semantic layer tells the agent which tables belong to which datasource. There are two ways to express this:

Directory-based partitioning (recommended)

Create a subdirectory under semantic/ for each non-default datasource. The directory name must match the connection ID:

semantic/
├── entities/              # "default" datasource
│   ├── users.yml
│   └── orders.yml
├── metrics/               # "default" datasource metrics
├── glossary.yml
├── catalog.yml
├── warehouse/             # "warehouse" datasource
│   ├── entities/
│   │   ├── events.yml
│   │   └── daily_metrics.yml
│   ├── metrics/
│   └── glossary.yml       # Optional per-source glossary
└── analytics/             # "analytics" datasource
    └── entities/
        └── page_views.yml

Atlas discovers per-source subdirectories automatically. Tables in semantic/entities/ belong to the default connection. Tables in semantic/warehouse/entities/ belong to the "warehouse" connection.

Generate per-source semantic layers

Use the --connection flag to profile a specific datasource:

# Profile the default datasource → writes to semantic/entities/
bun run atlas -- init

# Profile the "warehouse" datasource → writes to semantic/warehouse/entities/
bun run atlas -- init --connection warehouse

Explicit `connection` field

Alternatively, set the connection field directly in an entity YAML. This overrides directory-based inference:

# semantic/entities/external_events.yml
table: external_events
connection: analytics  # Routes to the "analytics" datasource regardless of file location
description: Clickstream events from ClickHouse
dimensions:
  event_id:
    type: string
    description: Unique event identifier

When any entity uses the connection field or per-source subdirectories exist, Atlas enters partitioned mode — each datasource gets its own isolated table whitelist. Tables from one datasource cannot be queried through another datasource's connection.

Agent behavior

System prompt adaptation

When multiple datasources are registered, Atlas automatically expands the agent's system prompt to list all available sources:

## Available Data Sources

This environment has 3 database connections:
- **default** (PostgreSQL) — Application database — users, orders, products
- **warehouse** (Snowflake) — Data warehouse — aggregated metrics and reports
- **analytics** (ClickHouse) — ClickHouse analytics cluster

The agent sees each datasource's ID, database type, description, and health status. If a datasource is degraded or unavailable, the system prompt warns the agent.

How the agent picks a datasource

The agent reads entity YAMLs via the explore tool to understand the semantic layer
Entity files indicate which datasource owns each table (via directory or connection field)
When writing SQL, the agent includes connectionId to target the correct datasource
If the user's question only involves tables from one datasource, routing is straightforward
If the question spans datasources, the agent queries each one separately

Cross-datasource queries

SQL JOINs across datasources are not supported — each query executes against a single database. When the user asks a question that spans multiple datasources, the agent:

Queries each datasource separately with its own executeSQL call
Combines the results in its response (reasoning over the data from each source)

You can help the agent by declaring cross-source relationships in your entity YAMLs:

# semantic/entities/users.yml
table: users
description: Application users
cross_source_joins:
  - source: warehouse
    target_table: daily_metrics
    on: users.id = daily_metrics.user_id
    relationship: one_to_many
    description: "User activity metrics from the warehouse"

These hints are surfaced in the agent's system prompt so it knows to query each source separately and correlate the results:

## Cross-Source Relationships
- **default.users** → **warehouse.daily_metrics**: User activity metrics (one_to_many)

Cross-source join hints are informational — they tell the agent which tables are related across datasources, but the agent still needs to execute separate queries and combine results. There is no automatic query federation.

SDK and API usage

SDK

The Atlas SDK sends messages to the chat API endpoint. Datasource routing is handled by the agent — there's no client-side connectionId parameter. The agent automatically determines which datasource to query based on the user's question and the semantic layer.

import { Atlas } from "@useatlas/sdk";

const atlas = new Atlas({ baseUrl: "https://api.your-atlas.com" });

// The agent routes to the correct datasource automatically
const response = await atlas.chat({
  messages: [
    { role: "user", content: "What are our top events by revenue?" }
  ],
});

REST API

When calling the chat API directly, the agent handles routing internally. You don't need to specify a datasource — the agent reads the semantic layer and includes the appropriate connectionId in its tool calls:

curl -X POST https://your-atlas.com/api/chat \
  -H "Content-Type: application/json" \
  -d '{"messages": [{"role": "user", "content": "Compare order counts with warehouse metrics"}]}'

Health endpoint

The /api/health endpoint reports the status of each registered datasource:

curl http://localhost:3001/api/health

{
  "status": "ok",
  "checks": {
    "datasource": { "status": "ok" }
  },
  "sources": {
    "default": { "dbType": "postgres", "status": "healthy" },
    "warehouse": { "dbType": "snowflake", "status": "healthy" },
    "analytics": { "dbType": "clickhouse", "status": "degraded" }
  }
}

Troubleshooting

Table not found

Symptom: The agent returns "table not in whitelist" or fails to find a table you know exists.

Causes:

The entity YAML is in the wrong directory — check that the file is in semantic/{connectionId}/entities/, not semantic/entities/ (or vice versa)
The connection field in the YAML doesn't match a registered datasource ID
The table name in the YAML doesn't match the actual database table (case-sensitive)

Fix: Verify the directory structure matches your config. Run atlas validate to check for broken references:

bun run atlas -- validate

Unknown connection error

Symptom: executeSQL returns "Unknown connection: warehouse".

Causes:

The datasource isn't registered in atlas.config.ts
For plugin datasources, the plugin isn't listed in the plugins array or the connectionId doesn't match
The config file has a typo in the datasource key

Fix: Check that the connection ID in your entity YAMLs matches a key in datasources (or a plugin's connectionId):

// These must match:
datasources: {
  warehouse: { url: "..." },  // ← connection ID "warehouse"
},
plugins: [
  snowflakePlugin({ connectionId: "warehouse" }),  // ← same ID
],

Agent queries the wrong datasource

Symptom: The agent runs a query against "default" when it should target "warehouse".

Causes:

Entity YAMLs are all in semantic/entities/ without per-source subdirectories or connection fields — Atlas runs in shared mode where all connections see all tables
The entity's connection field is missing or set to the wrong value

Fix: Move entity files to per-source subdirectories or add explicit connection fields. Once any entity has a connection field or per-source directories exist, Atlas switches to partitioned mode with isolated whitelists.

Connection failures for one datasource

Symptom: One datasource is healthy but another shows "degraded" or "unhealthy" in the health endpoint.

Causes:

Network connectivity issue to that specific database
Credentials expired or rotated for that datasource
The database is overloaded or down

Fix:

Check the health endpoint: curl http://localhost:3001/api/health
Verify the connection string works directly (e.g., psql "$WAREHOUSE_URL")
Check rate limits — the datasource may be at its concurrency cap

Atlas runs background health checks. A temporarily failing datasource is marked "degraded" first, then "unhealthy" after sustained failures. The agent's system prompt reflects this status so it can warn users.

Rate limit exceeded

Symptom: executeSQL returns a rate limit error with a retryAfterMs value.

Causes:

The datasource's queriesPerMinute or concurrency limit was reached

Fix: Either increase the limits in your config, or wait for the sliding window to reset (60 seconds). The agent will see the retry hint and can try again.

datasources: {
  warehouse: {
    url: "...",
    rateLimit: {
      queriesPerMinute: 100,  // Increase from default 60
      concurrency: 10,        // Increase from default 5
    },
  },
},

For more, see Troubleshooting.

Multi-Datasource Routing

How routing works

The default datasource

Resolution order

Configuration

Key points

Semantic layer organization

Directory-based partitioning (recommended)

Generate per-source semantic layers

Explicit `connection` field

Agent behavior

System prompt adaptation

How the agent picks a datasource

Cross-datasource queries

SDK and API usage

SDK

REST API

Health endpoint

Troubleshooting

Table not found

Unknown connection error

Agent queries the wrong datasource

Connection failures for one datasource

Rate limit exceeded

See Also

On this page