SQL Validation Pipeline
How Atlas enforces read-only SQL execution through a multi-layer defense-in-depth pipeline.
Every SQL query the agent writes passes through a multi-layer validation pipeline before it reaches your database. No query can bypass validation — if any layer rejects, the query is blocked.
Pipeline overview
| Layer | Name | Purpose |
|---|---|---|
| 0 | Empty check | Reject empty or whitespace-only input |
| 1 | Regex mutation guard | Quick reject of obvious DML/DDL keywords |
| 2 | AST parse | Full SQL parse — verify single SELECT statement |
| 3 | Table whitelist | Only allow tables defined in the semantic layer |
Layers 0–3 run in validateSQL() before execution. Layers 4–6 below are applied during execution.
Execution-time controls
These are not validation layers — they are applied during query execution in executeSQL().
| Control | Purpose |
|---|---|
| Per-source rate limiting | Atomic concurrency + QPM check via acquireSourceSlot(connId) before the query reaches the database. Sits between validation (layers 0--3) and RLS injection (layer 4). Rejects with a rate-limit error (and an optional retryAfterMs hint) when the source is at capacity. Configurable via rateLimit in atlas.config.ts datasource entries (queriesPerMinute and concurrency fields, defaults 60 and 5 respectively) |
| RLS injection (layer 4) | Inject WHERE clauses based on user claims (when enabled) |
| Auto LIMIT (layer 5) | Append row limit to every query |
| Statement timeout (layer 6) | Per-query execution deadline |
The validation pipeline is identical across all supported databases. The only difference is the AST parser mode: node-sql-parser uses PostgresQL mode for PostgreSQL connections and MySQL mode for MySQL connections. Plugin-registered database types (any dbType beyond the two built-in ones) fall back to the PostgreSQL parser with a warning. Plugins can register a custom parserDialect via ConnectionPluginMeta to override the parser mode for any connection.
Layer 0: Empty check
Rejects empty strings, whitespace-only input, and trailing semicolons (stripped before processing).
-- Blocked
""
" "
";"Layer 1: Regex mutation guard
A fast first-pass check that rejects queries containing DML or DDL keywords. Comments are stripped before testing so they can't be used to bypass the patterns.
Blocked keywords (all databases):
INSERT, UPDATE, DELETE, DROP, CREATE, ALTER, TRUNCATE
GRANT, REVOKE, EXEC, EXECUTE, CALL
COPY, LOAD, VACUUM, REINDEX, OPTIMIZE
INTO OUTFILEAdditional blocked keywords (MySQL only):
HANDLER, SHOW, DESCRIBE, EXPLAIN, USEExamples:
-- Blocked: DML
INSERT INTO users (name) VALUES ('alice')
-- Blocked: DDL
DROP TABLE users
-- Blocked: comment bypass attempt (comments stripped before check)
/* harmless */ DELETE FROM users
-- Blocked: keyword inside string literal (known false positive)
SELECT * FROM logs WHERE status = 'DELETE'The regex guard is intentionally broad — security over usability. Keywords inside string literals (like WHERE status = 'DELETE') are blocked as a false positive. The agent can work around this by reformulating the query (e.g. using a subquery or different filter).
Plugins can register additional forbidden patterns via ConnectionPluginMeta.forbiddenPatterns to add database-specific protection for connection types not covered by the built-in rules.
Layer 2: AST parse
The query is fully parsed into an abstract syntax tree using node-sql-parser. The parser mode is auto-detected based on the connection type (PostgreSQL or MySQL). If a plugin has registered a custom parserDialect via ConnectionPluginMeta, that dialect is used instead of the default mapping.
Enforced rules:
- Single statement — No semicolon-separated batches. Exactly one statement allowed.
- SELECT only — The parsed statement type must be
select. Any other type (INSERT, UPDATE, CREATE, etc.) is rejected. - Must parse — If the parser cannot understand the query, it is rejected. A query that passes the regex guard but confuses the parser could be a crafted bypass attempt. The agent can always reformulate into standard SQL.
-- Blocked: multiple statements
SELECT 1; DROP TABLE users
-- Blocked: non-SELECT
CREATE TABLE evil (id INT)
-- Blocked: unparseable (rejected, not allowed through)
SELECT * FROM users UNION WEIRD SYNTAX
-- Allowed: CTEs (WITH clause)
WITH active AS (SELECT * FROM users WHERE active = true)
SELECT * FROM activeCTE names (WITH x AS (...)) are collected during parsing for use in the table whitelist check (layer 3).
Layer 3: Table whitelist
Every table referenced in the query must exist in the semantic layer (semantic/entities/*.yml or semantic/{source}/entities/*.yml). Tables not in the whitelist are blocked.
-- Given semantic layer defines: users, orders, products
-- Allowed
SELECT * FROM users JOIN orders ON users.id = orders.user_id
-- Blocked: "secrets" not in semantic layer
SELECT * FROM secrets
-- Blocked: schema-qualified reference requires qualified name in whitelist
SELECT * FROM internal.secretsSchema-qualified queries: If a query uses schema.table syntax (e.g. analytics.orders), the qualified name must be in the whitelist. Unqualified names cannot bypass schema restrictions.
CTE exclusion: CTE names collected in layer 2 are excluded from the whitelist check — they are temporary result sets defined by the query, not real tables.
Disabling: Set ATLAS_TABLE_WHITELIST=false to disable (not recommended for production).
Layer 4: RLS injection
When Row-Level Security is enabled, WHERE clauses are injected based on the authenticated user's claims. RLS injection occurs after all validation layers (0--3) pass and after all plugin hooks have run. This ordering ensures plugins cannot strip RLS conditions.
RLS can be configured via environment variables (single-policy shorthand) or declaratively in atlas.config.ts with the rls key (supports multiple policies targeting different tables and claims).
-- Original query
SELECT * FROM orders WHERE status = 'active'
-- After RLS injection (tenant_id from user's JWT org_id claim)
SELECT * FROM orders WHERE status = 'active' AND orders.tenant_id = 'acme-corp'RLS is fail-closed: if claims are missing or cannot be resolved, the query is blocked entirely.
Layer 5: Auto LIMIT
A LIMIT clause is appended to every query that doesn't already have one. This prevents accidental full-table scans.
-- Input (no LIMIT)
SELECT * FROM users
-- Executed
SELECT * FROM users LIMIT 1000Default: 1000 rows. Configure with ATLAS_ROW_LIMIT.
If the result set hits the limit, the response includes truncated: true so the agent knows data was cut off.
Layer 6: Statement timeout
A per-query execution deadline prevents long-running queries from consuming database resources.
- PostgreSQL: Session-level
statement_timeoutset on the connection - MySQL: Session-level timeout variable
Default: 30 seconds (30000ms). Configure with ATLAS_QUERY_TIMEOUT.
Queries that exceed the timeout are terminated by the database and return an error to the agent.
Plugin hooks
When plugins are installed, two hook points extend the pipeline:
beforeQuery— Plugins can inspect or rewrite SQL before execution. If a plugin rewrites the query, the rewritten SQL goes through layers 0–3 again for re-validation. This prevents plugins from introducing DML or bypassing the whitelist.afterQuery— Plugins receive the query results for logging, transformation, or side effects.
Custom validators: For non-SQL datasources (SOQL, GraphQL, MQL), a plugin can provide a customValidator via ConnectionPluginMeta.validate. When a connection has a custom validator registered, it completely replaces the built-in validation layers 0–3 — validateSQL is not called. RLS injection (layer 4) and auto LIMIT (layer 5) are also skipped, since non-SQL languages may not support standard WHERE/LIMIT syntax. The custom validator is responsible for providing equivalent safety guarantees, including any row-level isolation.
Error handling
When a query fails at any validation layer, the response includes a structured error:
{
"success": false,
"error": "Table \"secrets\" is not in the allowed list. Check catalog.yml for available tables."
}Database errors that might expose connection details or internal state (passwords, connection strings, SSL certificates) are automatically scrubbed before being returned to the agent. The full error is logged server-side.
All queries — successful and failed — are recorded in the audit log with user attribution, timing, and error details.
Test coverage
The validation pipeline has ~103 unit tests covering edge cases including:
- Comment-based bypass attempts
- CTE name exclusion
- Schema-qualified table references
- MySQL-specific forbidden patterns
- Semicolon injection
- Unparseable SQL rejection
- String literal false positives
See packages/api/src/lib/tools/__tests__/sql.test.ts for the full test suite.
Run the tests: bun run test (isolated per-file runner).