Datasource Plugins

Connect ClickHouse, DuckDB, MySQL, Snowflake, and Salesforce as Atlas query targets.

Atlas supports PostgreSQL and MySQL natively via ATLAS_DATASOURCE_URL. For other databases and APIs, use datasource plugins. Each plugin provides a connection factory, SQL dialect hints for the agent, query validation rules, and health checks.

All datasource plugins follow the same pattern:

Install the database driver (optional peer dependency)
Import the plugin factory in atlas.config.ts
Pass connection credentials via environment variables

Never commit credentials to version control. Use environment variables (process.env.CLICKHOUSE_URL) in atlas.config.ts and add .env to .gitignore.

ClickHouse

Connects to a ClickHouse instance via the HTTP transport provided by @clickhouse/client. The HTTP transport is stateless -- a new client is created per call, so no connection pooling is needed. Every query is executed with readonly: 1 at the ClickHouse settings level, providing database-enforced read-only access.

Installation

bun add @clickhouse/client

Configuration

Field	Type	Required	Default	Description
`url`	`string`	Yes	--	Connection URL. Must start with `clickhouse://` (plain) or `clickhouses://` (TLS)
`database`	`string`	No	from URL path	Database name override

The plugin rewrites clickhouse:// to http:// and clickhouses:// to https:// for the HTTP transport internally.

Usage

// atlas.config.ts
import { defineConfig } from "@atlas/api/lib/config";
import { clickhousePlugin } from "@atlas/plugin-clickhouse-datasource";

export default defineConfig({
  plugins: [
    clickhousePlugin({
      url: process.env.CLICKHOUSE_URL!,
      database: "analytics",
    }),
  ],
});

SQL Dialect

The following hints are injected into the agent's system prompt so it writes valid ClickHouse SQL:

Use toStartOfMonth(), toStartOfWeek() for date truncation (not DATE_TRUNC)
Use countIf(condition) instead of COUNT(CASE WHEN ... END)
Use sumIf(column, condition) instead of SUM(CASE WHEN ... END)
Use arrayJoin() to unnest arrays
String functions: lower(), upper(), trim(), splitByChar()
Do not add FORMAT clauses -- the adapter handles output format automatically
ClickHouse is column-oriented -- avoid SELECT * on wide tables

Validation

In addition to the standard SQL validation pipeline (regex guard + AST parse + table whitelist), the plugin blocks ClickHouse-specific admin commands: SYSTEM, KILL, ATTACH, DETACH, RENAME, EXCHANGE, SHOW, DESCRIBE, EXPLAIN, and USE. The AST parser uses PostgreSQL mode (the closest available match in node-sql-parser).

Security

ClickHouse enforces read-only access at the query level via the readonly: 1 setting on every request. Statement timeout is set per query via max_execution_time.

DuckDB

Connects to a DuckDB database -- either a file on disk or an in-memory instance. DuckDB runs in-process via @duckdb/node-api, so there is no network connection or connection pooling. The plugin uses lazy initialization with a cached connection promise and automatic retry on transient failures.

Installation

bun add @duckdb/node-api

Configuration

Field	Type	Required	Default	Description
`url`	`string`	No	--	Connection URL starting with `duckdb://`. Use `duckdb://:memory:` for in-memory, `duckdb:///absolute/path.duckdb` for absolute paths, or `duckdb://relative/path.duckdb` for relative paths
`path`	`string`	No	--	Direct path to a `.duckdb` file, or `:memory:` for in-memory. Alternative to `url`
`readOnly`	`boolean`	No	`true` for files, `false` for `:memory:`	Open the database in read-only mode

Either url or path is required. When both are provided, url takes precedence.

Usage

// atlas.config.ts
import { defineConfig } from "@atlas/api/lib/config";
import { duckdbPlugin } from "@atlas/plugin-duckdb-datasource";

export default defineConfig({
  plugins: [
    duckdbPlugin({ url: "duckdb://analytics.duckdb" }),
  ],
});

Or with a direct path:

duckdbPlugin({ path: "/data/warehouse.duckdb", readOnly: true })

SQL Dialect

DuckDB syntax is similar to PostgreSQL with additional features
Use UNNEST() to expand arrays into rows
LIST and STRUCT types are natively supported
File-reading functions (read_csv_auto, read_parquet, etc.) are blocked for security
Use DATE_TRUNC() and DATE_PART() for date operations
Use STRING_AGG() for string aggregation
Supports window functions, CTEs, and lateral joins
In-process engine -- no connection pooling needed

Validation

The plugin blocks DuckDB-specific statements beyond the standard SQL pipeline: PRAGMA, ATTACH, DETACH, INSTALL, EXPORT, IMPORT, CHECKPOINT, SET, DESCRIBE, EXPLAIN, and SHOW. File-reading table functions (read_csv_auto, read_csv, read_parquet, read_json, read_json_auto, read_text, parquet_scan, csv_scan, json_scan) are blocked to prevent host filesystem access. The AST parser uses PostgreSQL mode.

Security

File-based databases default to READ_ONLY access mode, which provides database-enforced read-only access. In-memory databases cannot use read-only mode (nothing to protect), so they rely on SQL validation. DuckDB has no native query timeout API -- timeout is enforced via Promise.race().

DuckDB runs in the same process as Atlas. File-reading functions are blocked at the validation layer to prevent the agent from reading arbitrary files on the host filesystem.

MySQL

Connects to a MySQL instance via a mysql2/promise connection pool. The pool is cached per plugin instance -- unlike stateless HTTP transports, MySQL pools are heavyweight resources that should be reused.

Installation

bun add mysql2

Configuration

Field	Type	Required	Default	Description
`url`	`string`	Yes	--	Connection URL. Must start with `mysql://` or `mysql2://`
`poolSize`	`number`	No	`10`	Maximum pool size (1--500)
`idleTimeoutMs`	`number`	No	`30000`	Idle connection timeout in milliseconds

Usage

// atlas.config.ts
import { defineConfig } from "@atlas/api/lib/config";
import { mysqlPlugin } from "@atlas/plugin-mysql-datasource";

export default defineConfig({
  plugins: [
    mysqlPlugin({
      url: process.env.MYSQL_URL!,
      poolSize: 20,
    }),
  ],
});

Atlas also supports MySQL natively via ATLAS_DATASOURCE_URL (when the URL starts with mysql://). Use this plugin when you need MySQL as an additional datasource alongside the native connection, or when you want explicit pool configuration.

SQL Dialect

Use LIMIT before OFFSET (e.g. LIMIT 10 OFFSET 20)
Use backtick quoting for identifiers (e.g. `table_name`)
Use DATE_FORMAT() instead of TO_CHAR() for date formatting
Use IFNULL() instead of COALESCE() for two-argument null handling
Use STR_TO_DATE() for string-to-date conversion
GROUP_CONCAT() for string aggregation (not STRING_AGG)
Use NOW() for current timestamp, CURDATE() for current date

Validation

The plugin uses the standard SQL validation pipeline with MySQL parser mode (auto-detected from dbType: "mysql"). No additional forbidden patterns are needed beyond the base DML/DDL guard.

Security

Every query runs in a session set to READ ONLY mode (SET SESSION TRANSACTION READ ONLY), providing defense-in-depth beyond SQL validation. Per-query timeout is enforced via SET SESSION MAX_EXECUTION_TIME.

Snowflake

Connects to a Snowflake account via the snowflake-sdk callback-based API. The plugin manages a connection pool internally (snowflake.createPool()) and sets a session-level statement timeout and audit tag before each query.

Installation

bun add snowflake-sdk

Configuration

Field	Type	Required	Default	Description
`url`	`string`	Yes	--	Connection URL in the format `snowflake://user:pass@account/database/schema?warehouse=WH&role=ROLE`
`maxConnections`	`number`	No	`10`	Maximum pool connections (1--100)

The URL is parsed into Snowflake SDK connection options:

URL component	Maps to
hostname	`account` (e.g. `xy12345` or `xy12345.us-east-1`)
username	`username`
password	`password`
path segment 1	`database` (optional)
path segment 2	`schema` (optional)
`?warehouse=`	`warehouse` (optional)
`?role=`	`role` (optional)

Usage

// atlas.config.ts
import { defineConfig } from "@atlas/api/lib/config";
import { snowflakePlugin } from "@atlas/plugin-snowflake-datasource";

export default defineConfig({
  plugins: [
    snowflakePlugin({
      url: process.env.SNOWFLAKE_URL!,
      // e.g. "snowflake://analyst:pass@xy12345.us-east-1/ANALYTICS/PUBLIC?warehouse=COMPUTE_WH&role=ATLAS_READONLY"
    }),
  ],
});

SQL Dialect

Use FLATTEN() for semi-structured data (VARIANT, ARRAY, OBJECT columns)
Use PARSE_JSON() to cast strings to semi-structured types
Use DATE_TRUNC('month', col) for date truncation (not EXTRACT or dateadd patterns)
Use QUALIFY for window function filtering (e.g. QUALIFY ROW_NUMBER() OVER (...) = 1)
Use TRY_CAST() for safe type conversions that return NULL on failure
Use $$ for dollar-quoted string literals
Identifiers are case-insensitive and stored uppercase by default
VARIANT type supports semi-structured data -- use :key or ['key'] notation to access fields

Validation

The plugin blocks Snowflake-specific statements beyond the standard SQL pipeline: PUT, GET, LIST, REMOVE, RM, MERGE, SHOW, DESCRIBE, DESC, EXPLAIN, and USE. The AST parser uses Snowflake mode (native support in node-sql-parser).

Security

Snowflake has no session-level read-only mode. Atlas enforces SELECT-only access via SQL validation (regex + AST parsing), but there is no database-level defense-in-depth like ClickHouse's readonly: 1 or MySQL's SET SESSION TRANSACTION READ ONLY.

For defense-in-depth, configure the Snowflake connection with a role that has only SELECT privileges:

CREATE ROLE atlas_readonly;
GRANT USAGE ON WAREHOUSE COMPUTE_WH TO ROLE atlas_readonly;
GRANT USAGE ON DATABASE ANALYTICS TO ROLE atlas_readonly;
GRANT USAGE ON SCHEMA ANALYTICS.PUBLIC TO ROLE atlas_readonly;
GRANT SELECT ON ALL TABLES IN SCHEMA ANALYTICS.PUBLIC TO ROLE atlas_readonly;
GRANT SELECT ON FUTURE TABLES IN SCHEMA ANALYTICS.PUBLIC TO ROLE atlas_readonly;

Then pass ?role=ATLAS_READONLY in the connection URL.

All Atlas queries are tagged with QUERY_TAG = 'atlas:readonly' for audit trail visibility in Snowflake's QUERY_HISTORY. Statement timeout is enforced via ALTER SESSION SET STATEMENT_TIMEOUT_IN_SECONDS.

Salesforce

Connects to Salesforce via the jsforce library. Unlike the other datasource plugins, Salesforce uses SOQL (Salesforce Object Query Language) instead of SQL. The plugin registers a dedicated querySalesforce agent tool -- the agent uses this tool instead of executeSQL when querying Salesforce data.

Installation

bun add jsforce

Configuration

Field	Type	Required	Default	Description
`url`	`string`	Yes	--	Connection URL in the format `salesforce://user:pass@login.salesforce.com?token=TOKEN`

The URL is parsed into Salesforce connection options:

URL component	Maps to
hostname	`loginUrl` (defaults to `login.salesforce.com`; use `test.salesforce.com` for sandboxes)
username	`username`
password	`password`
`?token=`	`securityToken` (appended to password for login)
`?clientId=`	`clientId` (for OAuth Connected App)
`?clientSecret=`	`clientSecret` (for OAuth Connected App)

Usage

// atlas.config.ts
import { defineConfig } from "@atlas/api/lib/config";
import { salesforcePlugin } from "@atlas/plugin-salesforce-datasource";

export default defineConfig({
  plugins: [
    salesforcePlugin({
      url: process.env.SALESFORCE_URL!,
      // e.g. "salesforce://analyst@company.com:password@login.salesforce.com?token=SECURITY_TOKEN"
    }),
  ],
});

The `querySalesforce` Tool

The plugin automatically registers a querySalesforce tool during initialization. The agent uses this tool instead of executeSQL for Salesforce queries. The tool:

Validates SOQL structure (SELECT-only, no DML, no semicolons)
Checks the object whitelist against the semantic layer
Auto-appends a LIMIT clause (respects ATLAS_ROW_LIMIT, default 1000)
Enforces query timeout (respects ATLAS_QUERY_TIMEOUT, default 30s)
Scrubs sensitive error messages (credentials, session IDs) before returning them to the agent

SOQL Dialect

SOQL is not SQL. The agent receives these hints to write valid SOQL:

SOQL queries Salesforce objects, not database tables
No JOINs -- use relationship queries instead (e.g. SELECT Account.Name FROM Contact)
Parent-to-child: subquery in SELECT (e.g. SELECT Id, (SELECT LastName FROM Contacts) FROM Account)
Child-to-parent: dot notation (e.g. SELECT Account.Name FROM Contact)
Aggregate functions: COUNT(), SUM(), AVG(), MIN(), MAX(), COUNT_DISTINCT()
GROUP BY and HAVING are supported
Date literals: YESTERDAY, TODAY, LAST_WEEK, THIS_MONTH, LAST_N_DAYS:n, etc.
No wildcards in field lists -- always list specific fields (no SELECT *)

Validation

Salesforce uses a custom SOQL validation pipeline instead of the standard node-sql-parser-based SQL validation:

Empty check -- reject empty/whitespace queries
Semicolons -- reject statement chaining
Regex mutation guard -- block INSERT, UPDATE, DELETE, UPSERT, MERGE, UNDELETE (string literals are stripped first to prevent false positives)
SELECT-only -- query must start with SELECT
Object whitelist -- objects in FROM clauses must be in the semantic layer (case-insensitive). Parent-to-child relationship subqueries in the SELECT list are skipped since they use relationship names, not object names

Security

The jsforce session is stateful and cached -- the plugin handles session expiry by automatically re-authenticating and retrying the query. Error messages are scrubbed for sensitive patterns (passwords, credentials, session IDs, login errors) before being returned to the agent or user.

Salesforce enforces object-level and field-level security server-side based on the connected user's profile and permission sets. Use a dedicated integration user with read-only access to the objects the agent should query.

Comparison

Feature	ClickHouse	DuckDB	MySQL	Snowflake	Salesforce
Driver package	`@clickhouse/client`	`@duckdb/node-api`	`mysql2`	`snowflake-sdk`	`jsforce`
Transport	HTTP	In-process	TCP pool	SDK pool	REST API
Query language	ClickHouse SQL	DuckDB SQL	MySQL SQL	Snowflake SQL	SOQL
Parser dialect	PostgreSQL (closest)	PostgreSQL (closest)	MySQL	Snowflake	Custom (regex)
Read-only enforcement	`readonly: 1` per query	`READ_ONLY` access mode	`READ ONLY` session	SQL validation only	SELECT-only validation
Agent tool	`executeSQL`	`executeSQL`	`executeSQL`	`executeSQL`	`querySalesforce`
Connection caching	No (stateless HTTP)	Yes (in-process)	Yes (pool)	Yes (pool)	Yes (session)

On this page