Atlas

Datasource Plugins

Connect ClickHouse, DuckDB, MySQL, Snowflake, and Salesforce as Atlas query targets.

Atlas supports PostgreSQL and MySQL natively via ATLAS_DATASOURCE_URL. For other databases and APIs, use datasource plugins. Each plugin provides a connection factory, SQL dialect hints for the agent, query validation rules, and health checks.

All datasource plugins follow the same pattern:

  1. Install the database driver (optional peer dependency)
  2. Import the plugin factory in atlas.config.ts
  3. Pass connection credentials via environment variables

Never commit credentials to version control. Use environment variables (process.env.CLICKHOUSE_URL) in atlas.config.ts and add .env to .gitignore.


ClickHouse

Connects to a ClickHouse instance via the HTTP transport provided by @clickhouse/client. The HTTP transport is stateless -- a new client is created per call, so no connection pooling is needed. Every query is executed with readonly: 1 at the ClickHouse settings level, providing database-enforced read-only access.

Installation

bun add @clickhouse/client

Configuration

FieldTypeRequiredDefaultDescription
urlstringYes--Connection URL. Must start with clickhouse:// (plain) or clickhouses:// (TLS)
databasestringNofrom URL pathDatabase name override

The plugin rewrites clickhouse:// to http:// and clickhouses:// to https:// for the HTTP transport internally.

Usage

// atlas.config.ts
import { defineConfig } from "@atlas/api/lib/config";
import { clickhousePlugin } from "@atlas/plugin-clickhouse-datasource";

export default defineConfig({
  plugins: [
    clickhousePlugin({
      url: process.env.CLICKHOUSE_URL!,
      database: "analytics",
    }),
  ],
});

SQL Dialect

The following hints are injected into the agent's system prompt so it writes valid ClickHouse SQL:

  • Use toStartOfMonth(), toStartOfWeek() for date truncation (not DATE_TRUNC)
  • Use countIf(condition) instead of COUNT(CASE WHEN ... END)
  • Use sumIf(column, condition) instead of SUM(CASE WHEN ... END)
  • Use arrayJoin() to unnest arrays
  • String functions: lower(), upper(), trim(), splitByChar()
  • Do not add FORMAT clauses -- the adapter handles output format automatically
  • ClickHouse is column-oriented -- avoid SELECT * on wide tables

Validation

In addition to the standard SQL validation pipeline (regex guard + AST parse + table whitelist), the plugin blocks ClickHouse-specific admin commands: SYSTEM, KILL, ATTACH, DETACH, RENAME, EXCHANGE, SHOW, DESCRIBE, EXPLAIN, and USE. The AST parser uses PostgreSQL mode (the closest available match in node-sql-parser).

Security

ClickHouse enforces read-only access at the query level via the readonly: 1 setting on every request. Statement timeout is set per query via max_execution_time.


DuckDB

Connects to a DuckDB database -- either a file on disk or an in-memory instance. DuckDB runs in-process via @duckdb/node-api, so there is no network connection or connection pooling. The plugin uses lazy initialization with a cached connection promise and automatic retry on transient failures.

Installation

bun add @duckdb/node-api

Configuration

FieldTypeRequiredDefaultDescription
urlstringNo--Connection URL starting with duckdb://. Use duckdb://:memory: for in-memory, duckdb:///absolute/path.duckdb for absolute paths, or duckdb://relative/path.duckdb for relative paths
pathstringNo--Direct path to a .duckdb file, or :memory: for in-memory. Alternative to url
readOnlybooleanNotrue for files, false for :memory:Open the database in read-only mode

Either url or path is required. When both are provided, url takes precedence.

Usage

// atlas.config.ts
import { defineConfig } from "@atlas/api/lib/config";
import { duckdbPlugin } from "@atlas/plugin-duckdb-datasource";

export default defineConfig({
  plugins: [
    duckdbPlugin({ url: "duckdb://analytics.duckdb" }),
  ],
});

Or with a direct path:

duckdbPlugin({ path: "/data/warehouse.duckdb", readOnly: true })

SQL Dialect

  • DuckDB syntax is similar to PostgreSQL with additional features
  • Use UNNEST() to expand arrays into rows
  • LIST and STRUCT types are natively supported
  • File-reading functions (read_csv_auto, read_parquet, etc.) are blocked for security
  • Use DATE_TRUNC() and DATE_PART() for date operations
  • Use STRING_AGG() for string aggregation
  • Supports window functions, CTEs, and lateral joins
  • In-process engine -- no connection pooling needed

Validation

The plugin blocks DuckDB-specific statements beyond the standard SQL pipeline: PRAGMA, ATTACH, DETACH, INSTALL, EXPORT, IMPORT, CHECKPOINT, SET, DESCRIBE, EXPLAIN, and SHOW. File-reading table functions (read_csv_auto, read_csv, read_parquet, read_json, read_json_auto, read_text, parquet_scan, csv_scan, json_scan) are blocked to prevent host filesystem access. The AST parser uses PostgreSQL mode.

Security

File-based databases default to READ_ONLY access mode, which provides database-enforced read-only access. In-memory databases cannot use read-only mode (nothing to protect), so they rely on SQL validation. DuckDB has no native query timeout API -- timeout is enforced via Promise.race().

DuckDB runs in the same process as Atlas. File-reading functions are blocked at the validation layer to prevent the agent from reading arbitrary files on the host filesystem.


MySQL

Connects to a MySQL instance via a mysql2/promise connection pool. The pool is cached per plugin instance -- unlike stateless HTTP transports, MySQL pools are heavyweight resources that should be reused.

Installation

bun add mysql2

Configuration

FieldTypeRequiredDefaultDescription
urlstringYes--Connection URL. Must start with mysql:// or mysql2://
poolSizenumberNo10Maximum pool size (1--500)
idleTimeoutMsnumberNo30000Idle connection timeout in milliseconds

Usage

// atlas.config.ts
import { defineConfig } from "@atlas/api/lib/config";
import { mysqlPlugin } from "@atlas/plugin-mysql-datasource";

export default defineConfig({
  plugins: [
    mysqlPlugin({
      url: process.env.MYSQL_URL!,
      poolSize: 20,
    }),
  ],
});

Atlas also supports MySQL natively via ATLAS_DATASOURCE_URL (when the URL starts with mysql://). Use this plugin when you need MySQL as an additional datasource alongside the native connection, or when you want explicit pool configuration.

SQL Dialect

  • Use LIMIT before OFFSET (e.g. LIMIT 10 OFFSET 20)
  • Use backtick quoting for identifiers (e.g. `table_name`)
  • Use DATE_FORMAT() instead of TO_CHAR() for date formatting
  • Use IFNULL() instead of COALESCE() for two-argument null handling
  • Use STR_TO_DATE() for string-to-date conversion
  • GROUP_CONCAT() for string aggregation (not STRING_AGG)
  • Use NOW() for current timestamp, CURDATE() for current date

Validation

The plugin uses the standard SQL validation pipeline with MySQL parser mode (auto-detected from dbType: "mysql"). No additional forbidden patterns are needed beyond the base DML/DDL guard.

Security

Every query runs in a session set to READ ONLY mode (SET SESSION TRANSACTION READ ONLY), providing defense-in-depth beyond SQL validation. Per-query timeout is enforced via SET SESSION MAX_EXECUTION_TIME.


Snowflake

Connects to a Snowflake account via the snowflake-sdk callback-based API. The plugin manages a connection pool internally (snowflake.createPool()) and sets a session-level statement timeout and audit tag before each query.

Installation

bun add snowflake-sdk

Configuration

FieldTypeRequiredDefaultDescription
urlstringYes--Connection URL in the format snowflake://user:pass@account/database/schema?warehouse=WH&role=ROLE
maxConnectionsnumberNo10Maximum pool connections (1--100)

The URL is parsed into Snowflake SDK connection options:

URL componentMaps to
hostnameaccount (e.g. xy12345 or xy12345.us-east-1)
usernameusername
passwordpassword
path segment 1database (optional)
path segment 2schema (optional)
?warehouse=warehouse (optional)
?role=role (optional)

Usage

// atlas.config.ts
import { defineConfig } from "@atlas/api/lib/config";
import { snowflakePlugin } from "@atlas/plugin-snowflake-datasource";

export default defineConfig({
  plugins: [
    snowflakePlugin({
      url: process.env.SNOWFLAKE_URL!,
      // e.g. "snowflake://analyst:pass@xy12345.us-east-1/ANALYTICS/PUBLIC?warehouse=COMPUTE_WH&role=ATLAS_READONLY"
    }),
  ],
});

SQL Dialect

  • Use FLATTEN() for semi-structured data (VARIANT, ARRAY, OBJECT columns)
  • Use PARSE_JSON() to cast strings to semi-structured types
  • Use DATE_TRUNC('month', col) for date truncation (not EXTRACT or dateadd patterns)
  • Use QUALIFY for window function filtering (e.g. QUALIFY ROW_NUMBER() OVER (...) = 1)
  • Use TRY_CAST() for safe type conversions that return NULL on failure
  • Use $$ for dollar-quoted string literals
  • Identifiers are case-insensitive and stored uppercase by default
  • VARIANT type supports semi-structured data -- use :key or ['key'] notation to access fields

Validation

The plugin blocks Snowflake-specific statements beyond the standard SQL pipeline: PUT, GET, LIST, REMOVE, RM, MERGE, SHOW, DESCRIBE, DESC, EXPLAIN, and USE. The AST parser uses Snowflake mode (native support in node-sql-parser).

Security

Snowflake has no session-level read-only mode. Atlas enforces SELECT-only access via SQL validation (regex + AST parsing), but there is no database-level defense-in-depth like ClickHouse's readonly: 1 or MySQL's SET SESSION TRANSACTION READ ONLY.

For defense-in-depth, configure the Snowflake connection with a role that has only SELECT privileges:

CREATE ROLE atlas_readonly;
GRANT USAGE ON WAREHOUSE COMPUTE_WH TO ROLE atlas_readonly;
GRANT USAGE ON DATABASE ANALYTICS TO ROLE atlas_readonly;
GRANT USAGE ON SCHEMA ANALYTICS.PUBLIC TO ROLE atlas_readonly;
GRANT SELECT ON ALL TABLES IN SCHEMA ANALYTICS.PUBLIC TO ROLE atlas_readonly;
GRANT SELECT ON FUTURE TABLES IN SCHEMA ANALYTICS.PUBLIC TO ROLE atlas_readonly;

Then pass ?role=ATLAS_READONLY in the connection URL.

All Atlas queries are tagged with QUERY_TAG = 'atlas:readonly' for audit trail visibility in Snowflake's QUERY_HISTORY. Statement timeout is enforced via ALTER SESSION SET STATEMENT_TIMEOUT_IN_SECONDS.


Salesforce

Connects to Salesforce via the jsforce library. Unlike the other datasource plugins, Salesforce uses SOQL (Salesforce Object Query Language) instead of SQL. The plugin registers a dedicated querySalesforce agent tool -- the agent uses this tool instead of executeSQL when querying Salesforce data.

Installation

bun add jsforce

Configuration

FieldTypeRequiredDefaultDescription
urlstringYes--Connection URL in the format salesforce://user:pass@login.salesforce.com?token=TOKEN

The URL is parsed into Salesforce connection options:

URL componentMaps to
hostnameloginUrl (defaults to login.salesforce.com; use test.salesforce.com for sandboxes)
usernameusername
passwordpassword
?token=securityToken (appended to password for login)
?clientId=clientId (for OAuth Connected App)
?clientSecret=clientSecret (for OAuth Connected App)

Usage

// atlas.config.ts
import { defineConfig } from "@atlas/api/lib/config";
import { salesforcePlugin } from "@atlas/plugin-salesforce-datasource";

export default defineConfig({
  plugins: [
    salesforcePlugin({
      url: process.env.SALESFORCE_URL!,
      // e.g. "salesforce://analyst@company.com:password@login.salesforce.com?token=SECURITY_TOKEN"
    }),
  ],
});

The querySalesforce Tool

The plugin automatically registers a querySalesforce tool during initialization. The agent uses this tool instead of executeSQL for Salesforce queries. The tool:

  • Validates SOQL structure (SELECT-only, no DML, no semicolons)
  • Checks the object whitelist against the semantic layer
  • Auto-appends a LIMIT clause (respects ATLAS_ROW_LIMIT, default 1000)
  • Enforces query timeout (respects ATLAS_QUERY_TIMEOUT, default 30s)
  • Scrubs sensitive error messages (credentials, session IDs) before returning them to the agent

SOQL Dialect

SOQL is not SQL. The agent receives these hints to write valid SOQL:

  • SOQL queries Salesforce objects, not database tables
  • No JOINs -- use relationship queries instead (e.g. SELECT Account.Name FROM Contact)
  • Parent-to-child: subquery in SELECT (e.g. SELECT Id, (SELECT LastName FROM Contacts) FROM Account)
  • Child-to-parent: dot notation (e.g. SELECT Account.Name FROM Contact)
  • Aggregate functions: COUNT(), SUM(), AVG(), MIN(), MAX(), COUNT_DISTINCT()
  • GROUP BY and HAVING are supported
  • Date literals: YESTERDAY, TODAY, LAST_WEEK, THIS_MONTH, LAST_N_DAYS:n, etc.
  • No wildcards in field lists -- always list specific fields (no SELECT *)

Validation

Salesforce uses a custom SOQL validation pipeline instead of the standard node-sql-parser-based SQL validation:

  1. Empty check -- reject empty/whitespace queries
  2. Semicolons -- reject statement chaining
  3. Regex mutation guard -- block INSERT, UPDATE, DELETE, UPSERT, MERGE, UNDELETE (string literals are stripped first to prevent false positives)
  4. SELECT-only -- query must start with SELECT
  5. Object whitelist -- objects in FROM clauses must be in the semantic layer (case-insensitive). Parent-to-child relationship subqueries in the SELECT list are skipped since they use relationship names, not object names

Security

The jsforce session is stateful and cached -- the plugin handles session expiry by automatically re-authenticating and retrying the query. Error messages are scrubbed for sensitive patterns (passwords, credentials, session IDs, login errors) before being returned to the agent or user.

Salesforce enforces object-level and field-level security server-side based on the connected user's profile and permission sets. Use a dedicated integration user with read-only access to the objects the agent should query.


Comparison

FeatureClickHouseDuckDBMySQLSnowflakeSalesforce
Driver package@clickhouse/client@duckdb/node-apimysql2snowflake-sdkjsforce
TransportHTTPIn-processTCP poolSDK poolREST API
Query languageClickHouse SQLDuckDB SQLMySQL SQLSnowflake SQLSOQL
Parser dialectPostgreSQL (closest)PostgreSQL (closest)MySQLSnowflakeCustom (regex)
Read-only enforcementreadonly: 1 per queryREAD_ONLY access modeREAD ONLY sessionSQL validation onlySELECT-only validation
Agent toolexecuteSQLexecuteSQLexecuteSQLexecuteSQLquerySalesforce
Connection cachingNo (stateless HTTP)Yes (in-process)Yes (pool)Yes (pool)Yes (session)

On this page