Atlas vs Raw MCP
Why connecting an AI client directly to your database via MCP is risky — and what a semantic layer adds.
MCP (Model Context Protocol) lets AI clients like Claude Desktop and Cursor call external tools. Database MCP servers like DBHub give the AI direct access to your schema and the ability to run arbitrary SQL.
This works for quick exploration. It does not work for production.
This page explains what you lose when the AI talks directly to your database, and what Atlas adds on top of MCP.
The problem with raw MCP
When you point Claude Desktop at a database MCP server, the AI sees raw schema metadata — column names like dim_user_status_cd, fact_txn_amt, is_del_flg. It has no business context, no guardrails, and no audit trail.
Three things go wrong:
-
The AI guesses. Without descriptions, sample values, or documented relationships, the AI infers what columns mean from their names. It gets it wrong often enough to be dangerous — silently returning incorrect numbers that look plausible.
-
Nothing validates the SQL. The generated query runs directly against the database. There's no mutation guard, no table whitelist, no row-level filtering. A prompt injection or hallucinated query could read sensitive tables or, if the connection has write access, modify data.
-
Nobody knows what happened. No audit log records which user asked which question, what SQL ran, or what data was returned. For compliance-regulated environments, this is a non-starter.
What Atlas adds
Atlas sits between the AI and the database, providing context before query generation and validation before execution.
| Capability | Raw MCP | Atlas |
|---|---|---|
| Business context | Column names only | Entity YAML with descriptions, sample values, joins, metrics, glossary + dynamic learned patterns |
| SQL validation | None | 7-layer pipeline (empty check, regex guard, AST parse, table whitelist, RLS, auto-LIMIT, timeout) |
| Table access control | Full schema visible | Only tables defined in the semantic layer |
| Row-level security | None | Automatic WHERE clause injection based on user identity (multi-column, array, OR-logic) |
| Mutation protection | Depends on DB user permissions | SELECT-only enforcement at the application layer, independent of DB permissions |
| Audit logging | None | Every query logged with user attribution, timing, tables/columns accessed, data classification |
| PII protection | None | PII detection and column masking with role-based strategies |
| Sandboxed execution | None | Explore tool isolated via nsjail, sidecar, or BYOC backends |
| Auth & enterprise | None | SSO/SCIM, custom roles, IP allowlists, approval workflows |
| Admin console | None | Connections, users, plugins, semantic editor, analytics, notebooks, billing |
| Chat integrations | None | 8 platforms via Chat SDK (Slack, Teams, Discord, Telegram, Google Chat, GitHub, Linear, WhatsApp) |
| Data residency | None | 3-region deployment (US, EU, APAC) with misrouting detection |
Side-by-side: the same question, two approaches
Question: "What's our monthly revenue by product?"
Raw MCP
The AI sees the schema and guesses:
-- The AI picks the wrong table — fact_order_lines stores
-- gross amounts before discounts and refunds.
-- It also misses the soft-delete flag, including deleted orders.
SELECT
DATE_TRUNC('month', order_date) AS month,
product_name,
SUM(fact_txn_amt) AS revenue
FROM fact_order_lines
GROUP BY 1, 2
ORDER BY 1 DESCProblems:
fact_txn_amtis gross amount, not net revenue- Deleted rows (
is_del_flg = 1) are included - No LIMIT — could return thousands of rows
- No timeout — a full table scan on a large table could run for minutes
Atlas
The agent reads the semantic layer first, which defines revenue clearly:
# semantic/entities/orders.yml — the agent reads this before writing SQL
table: orders
dimensions:
product_name:
type: string
description: Display name of the product
created_at:
type: timestamp
description: When the order was placed
is_deleted:
type: boolean
description: Soft-delete flag — exclude from all queries
measures:
- name: net_revenue
sql: "total - discount - refund_amount"
type: sum
description: Net revenue after discounts and refundsThe agent generates:
-- Uses the documented measure definition for net revenue.
-- Filters out soft-deleted rows as the entity description instructs.
-- Atlas appends LIMIT 1000 to the query and sets a 30-second statement timeout on the connection.
SELECT
DATE_TRUNC('month', created_at) AS month,
product_name,
SUM(total - discount - refund_amount) AS net_revenue
FROM orders
WHERE is_deleted = false
GROUP BY 1, 2
ORDER BY 1 DESCBefore this query reaches the database, Atlas validates it through the SQL validation pipeline:
- Empty check — rejects empty or whitespace-only input
- Regex guard — confirms no DML/DDL keywords
- AST parse — confirms single SELECT statement
- Table whitelist —
ordersis in the semantic layer, allowed - RLS injection — if configured, injects a WHERE clause like
AND tenant_id = 'acme-corp'based on the user's JWT claims - Auto LIMIT — appends
LIMIT 1000 - Statement timeout — sets 30-second deadline on the database connection
Security implications
The security gap between raw MCP and Atlas is not theoretical. Here are three concrete risks:
No validation = SQL injection surface
MCP servers pass AI-generated SQL directly to the database. If a user crafts a prompt that causes the AI to generate SELECT 1; DROP TABLE users, there's nothing between that SQL and your database except the connection's permission level. Atlas blocks this at multiple layers — the regex guard catches DROP, and the AST parser rejects multi-statement queries.
No RLS = data leakage
In a multi-tenant application, raw MCP has no concept of row-level isolation. The AI can query any row in any table the connection can access. Atlas injects WHERE clauses automatically based on the authenticated user's identity claims — fail-closed, so missing claims block the query entirely.
No audit = compliance gap
Regulations like SOC 2, HIPAA, and GDPR require knowing who accessed what data and when. Raw MCP provides no audit trail. Atlas logs every query — successful and failed — with user attribution, timing, the SQL that ran, and whether results were truncated.
When raw MCP is fine
Raw MCP is a reasonable choice when:
- Local development — You're exploring your own database in Claude Desktop or Cursor, iterating on queries, and don't need guardrails because the data isn't sensitive.
- Read-only on non-sensitive data — The database contains only public or non-regulated data, the connection is strictly read-only at the database level, and you're the only user.
- Prototyping — You're testing whether text-to-SQL is useful for your use case before investing in a semantic layer.
Once any of these change — multiple users, sensitive data, production workloads, compliance requirements — you need the semantic layer, validation, and audit trail that Atlas provides.
Atlas has its own MCP server (stdio + SSE transport) that gives Claude Desktop and Cursor the same validated, context-aware SQL execution as the API. You don't have to choose between MCP compatibility and safety.
The semantic layer advantage
The core difference isn't just validation — it's context. Raw MCP gives the AI a schema. Atlas gives it understanding.
A schema tells the AI that a column named total exists and is numeric. The semantic layer tells it that total is the order total in USD before discounts, that net revenue is calculated as SUM(total - discount - refund_amount), and that rows with is_deleted = true should always be excluded.
This matters because the quality of AI-generated SQL is bounded by the quality of the context the AI receives. Without business context, an AI agent can write syntactically correct SQL that returns confidently wrong answers — and nobody catches it until a decision has already been made on bad data.
A semantic layer is that governance. It's the difference between an AI that can query your database and one that understands your business.
Related
- MCP Server — use Atlas as an MCP server with full validation and semantic context
- SQL Validation Pipeline — the 7-layer defense-in-depth pipeline
- Row-Level Security — automatic tenant isolation
- Semantic Layer — how entity YAML files provide business context
- Atlas vs Alternatives — comparisons with WrenAI, Vanna, and Metabase