SQL Validation Pipeline

How Atlas enforces read-only SQL execution through a multi-layer defense-in-depth pipeline.

Every SQL query the agent writes passes through a multi-layer validation pipeline before it reaches your database. No query can bypass validation — if any layer rejects, the query is blocked.

Pipeline overview

Layer	Name	Purpose
0	Empty check	Reject empty or whitespace-only input
1	Regex mutation guard	Quick reject of obvious DML/DDL keywords
2	AST parse	Full SQL parse — verify single SELECT statement
3	Table whitelist	Only allow tables defined in the semantic layer

Layers 0–3 run in validateSQL() before execution. Layers 4–6 below are applied during execution.

Execution-time controls

These are not validation layers — they are applied during query execution in executeSQL().

Control	Purpose
Per-source rate limiting	Atomic concurrency + QPM check via `acquireSourceSlot(connId)` before the query reaches the database. Sits between validation (layers 0--3) and RLS injection (layer 4). Rejects with a rate-limit error (and an optional `retryAfterMs` hint) when the source is at capacity. Configurable via `rateLimit` in `atlas.config.ts` datasource entries (`queriesPerMinute` and `concurrency` fields, defaults 60 and 5 respectively)
RLS injection (layer 4)	Inject WHERE clauses based on user claims (when enabled)
Auto LIMIT (layer 5)	Append row limit to every query
Statement timeout (layer 6)	Per-query execution deadline

The validation pipeline is identical across all supported databases. The only difference is the AST parser mode: node-sql-parser uses PostgresQL mode for PostgreSQL connections and MySQL mode for MySQL connections. Plugin-registered database types (any dbType beyond the two built-in ones) fall back to the PostgreSQL parser with a warning. Plugins can register a custom parserDialect via ConnectionPluginMeta to override the parser mode for any connection.

Layer 0: Empty check

Rejects empty strings, whitespace-only input, and trailing semicolons (stripped before processing).

-- Blocked
""
"   "
";"

Layer 1: Regex mutation guard

A fast first-pass check that rejects queries containing DML or DDL keywords. Comments are stripped before testing so they can't be used to bypass the patterns.

Blocked keywords (all databases):

INSERT, UPDATE, DELETE, DROP, CREATE, ALTER, TRUNCATE
GRANT, REVOKE, EXEC, EXECUTE, CALL
COPY, LOAD, VACUUM, REINDEX, OPTIMIZE
INTO OUTFILE

Additional blocked keywords (MySQL only):

HANDLER, SHOW, DESCRIBE, EXPLAIN, USE

Examples:

-- Blocked: DML
INSERT INTO users (name) VALUES ('alice')

-- Blocked: DDL
DROP TABLE users

-- Blocked: comment bypass attempt (comments stripped before check)
/* harmless */ DELETE FROM users

-- Blocked: keyword inside string literal (known false positive)
SELECT * FROM logs WHERE status = 'DELETE'

The regex guard is intentionally broad — security over usability. Keywords inside string literals (like WHERE status = 'DELETE') are blocked as a false positive. The agent can work around this by reformulating the query (e.g. using a subquery or different filter).

Plugins can register additional forbidden patterns via ConnectionPluginMeta.forbiddenPatterns to add database-specific protection for connection types not covered by the built-in rules.

Layer 2: AST parse

The query is fully parsed into an abstract syntax tree using node-sql-parser. The parser mode is auto-detected based on the connection type (PostgreSQL or MySQL). If a plugin has registered a custom parserDialect via ConnectionPluginMeta, that dialect is used instead of the default mapping.

Enforced rules:

Single statement — No semicolon-separated batches. Exactly one statement allowed.
SELECT only — The parsed statement type must be select. Any other type (INSERT, UPDATE, CREATE, etc.) is rejected.
Must parse — If the parser cannot understand the query, it is rejected. A query that passes the regex guard but confuses the parser could be a crafted bypass attempt. The agent can always reformulate into standard SQL.

-- Blocked: multiple statements
SELECT 1; DROP TABLE users

-- Blocked: non-SELECT
CREATE TABLE evil (id INT)

-- Blocked: unparseable (rejected, not allowed through)
SELECT * FROM users UNION WEIRD SYNTAX

-- Allowed: CTEs (WITH clause)
WITH active AS (SELECT * FROM users WHERE active = true)
SELECT * FROM active

CTE names (WITH x AS (...)) are collected during parsing for use in the table whitelist check (layer 3).

Layer 3: Table whitelist

Every table referenced in the query must exist in the semantic layer (semantic/entities/*.yml or semantic/{source}/entities/*.yml). Tables not in the whitelist are blocked.

-- Given semantic layer defines: users, orders, products

-- Allowed
SELECT * FROM users JOIN orders ON users.id = orders.user_id

-- Blocked: "secrets" not in semantic layer
SELECT * FROM secrets

-- Blocked: schema-qualified reference requires qualified name in whitelist
SELECT * FROM internal.secrets

Schema-qualified queries: If a query uses schema.table syntax (e.g. analytics.orders), the qualified name must be in the whitelist. Unqualified names cannot bypass schema restrictions.

CTE exclusion: CTE names collected in layer 2 are excluded from the whitelist check — they are temporary result sets defined by the query, not real tables.

Disabling: Set ATLAS_TABLE_WHITELIST=false to disable (not recommended for production).

Layer 4: RLS injection

When Row-Level Security is enabled, WHERE clauses are injected based on the authenticated user's claims. RLS injection occurs after all validation layers (0--3) pass and after all plugin hooks have run. This ordering ensures plugins cannot strip RLS conditions.

RLS can be configured via environment variables (single-policy shorthand) or declaratively in atlas.config.ts with the rls key (supports multiple policies targeting different tables and claims).

-- Original query
SELECT * FROM orders WHERE status = 'active'

-- After RLS injection (tenant_id from user's JWT org_id claim)
SELECT * FROM orders WHERE status = 'active' AND orders.tenant_id = 'acme-corp'

RLS is fail-closed: if claims are missing or cannot be resolved, the query is blocked entirely.

Layer 5: Auto LIMIT

A LIMIT clause is appended to every query that doesn't already have one. This prevents accidental full-table scans.

-- Input (no LIMIT)
SELECT * FROM users

-- Executed
SELECT * FROM users LIMIT 1000

Default: 1000 rows. Configure with ATLAS_ROW_LIMIT.

If the result set hits the limit, the response includes truncated: true so the agent knows data was cut off.

Layer 6: Statement timeout

A per-query execution deadline prevents long-running queries from consuming database resources.

PostgreSQL: Session-level statement_timeout set on the connection
MySQL: Session-level timeout variable

Default: 30 seconds (30000ms). Configure with ATLAS_QUERY_TIMEOUT.

Queries that exceed the timeout are terminated by the database and return an error to the agent.

Plugin hooks

When plugins are installed, two hook points extend the pipeline:

beforeQuery — Plugins can inspect or rewrite SQL before execution. If a plugin rewrites the query, the rewritten SQL goes through layers 0–3 again for re-validation. This prevents plugins from introducing DML or bypassing the whitelist.
afterQuery — Plugins receive the query results for logging, transformation, or side effects.

Custom validators: For non-SQL datasources (SOQL, GraphQL, MQL), a plugin can provide a customValidator via ConnectionPluginMeta.validate. When a connection has a custom validator registered, it completely replaces the built-in validation layers 0–3 — validateSQL is not called. RLS injection (layer 4) and auto LIMIT (layer 5) are also skipped, since non-SQL languages may not support standard WHERE/LIMIT syntax. The custom validator is responsible for providing equivalent safety guarantees, including any row-level isolation.

Error handling

When a query fails at any validation layer, the response includes a structured error:

{
  "success": false,
  "error": "Table \"secrets\" is not in the allowed list. Check catalog.yml for available tables."
}

Database errors that might expose connection details or internal state (passwords, connection strings, SSL certificates) are automatically scrubbed before being returned to the agent. The full error is logged server-side.

All queries — successful and failed — are recorded in the audit log with user attribution, timing, and error details.

Test coverage

The validation pipeline has ~103 unit tests covering edge cases including:

Comment-based bypass attempts
CTE name exclusion
Schema-qualified table references
MySQL-specific forbidden patterns
Semicolon injection
Unparseable SQL rejection
String literal false positives

See packages/api/src/lib/tools/__tests__/sql.test.ts for the full test suite.

Run the tests: bun run test (isolated per-file runner).

SQL Validation Pipeline

On this page