Atlas
PluginsDatasources

BigQuery

Connect Atlas to Google BigQuery for analytics and data warehousing.

Connects to Google BigQuery via the @google-cloud/bigquery client library. BigQuery is a REST-based service -- each query creates a job, so there is no connection pool to manage. Read-only enforcement relies on the standard SQL validation pipeline (regex guard + AST parse + table whitelist) since BigQuery has no session-level read-only mode.

Installation

bun add @useatlas/bigquery @google-cloud/bigquery

Configuration

// atlas.config.ts
import { defineConfig } from "@atlas/api/lib/config";
import { bigqueryPlugin } from "@useatlas/bigquery";

export default defineConfig({
  plugins: [
    bigqueryPlugin({
      projectId: process.env.GCP_PROJECT_ID!,
      dataset: "analytics",
      location: "US",
      keyFilename: process.env.GOOGLE_APPLICATION_CREDENTIALS,
    }),
  ],
});

Options

OptionTypeRequiredDefaultDescription
projectIdstringNofrom credentials/ADCGCP project ID
datasetstringNo--Default dataset for unqualified table references
locationstringNo--Geographic location for query jobs (e.g. US, EU, us-east1)
keyFilenamestringNo--Path to a service account JSON key file
credentialsobjectNo--Service account credentials object (parsed JSON key contents)
costApproval"auto" | "threshold" | "always"No"threshold"Cost approval mode (see Cost Control)
costThresholdnumberNo1.00USD threshold for "threshold" mode

Authentication

The plugin supports three authentication methods, tried in priority order:

  1. Credentials object -- pass the parsed service account JSON key directly via credentials
  2. Key file -- path to a service account JSON key file via keyFilename
  3. Application Default Credentials (ADC) -- automatic in GCP environments (GCE, Cloud Run, GKE) or when GOOGLE_APPLICATION_CREDENTIALS is set

When running locally, either set GOOGLE_APPLICATION_CREDENTIALS to a service account key file path or use gcloud auth application-default login.

SQL Dialect

The following hints are injected into the agent's system prompt so it writes valid BigQuery Standard SQL:

  • Use backtick-quoted identifiers for table references: `project.dataset.table`
  • Use DATE_TRUNC(date_expr, MONTH) for date truncation
  • Use TIMESTAMP_TRUNC() for timestamps, DATETIME_TRUNC() for datetimes
  • Use COUNTIF(condition) instead of COUNT(CASE WHEN ... END)
  • Use SAFE_DIVIDE(a, b) to avoid division-by-zero errors
  • Use UNNEST(array_column) to flatten arrays (in FROM or JOIN clause)
  • Use FORMAT_DATE / FORMAT_TIMESTAMP for date formatting
  • Use IFNULL(expr, default) or COALESCE() for null handling
  • Use EXCEPT() and REPLACE() with SELECT * to exclude or transform columns
  • Standard SQL is the default -- do not use Legacy SQL syntax

Cost Control

BigQuery charges per query based on bytes scanned (on-demand pricing: $5 per TB). The plugin runs a free dry-run before every query to estimate cost, then gates execution based on the costApproval mode.

Modes

ModeBehavior
"threshold" (default)Auto-approve queries under costThreshold (default $1.00). Queries above the threshold are rejected with a cost estimate — the agent surfaces this to the user for approval.
"auto"Execute immediately. The estimated cost and bytes scanned are included in the query result metadata so the agent can mention them.
"always"Every query requires user approval. The agent shows the estimated cost before executing.

Cost approval configuration examples

// Default: threshold mode at $1.00
bigqueryPlugin({
  projectId: "my-project",
  dataset: "analytics",
})

// Stricter: $0.10 threshold
bigqueryPlugin({
  projectId: "my-project",
  costApproval: "threshold",
  costThreshold: 0.10,
})

// No gates: always execute, log cost in metadata
bigqueryPlugin({
  projectId: "my-project",
  costApproval: "auto",
})

// Maximum control: always require approval
bigqueryPlugin({
  projectId: "my-project",
  costApproval: "always",
})

How it works

  1. Before executing a query, the plugin runs a BigQuery dry run (dryRun: true) to get totalBytesProcessed
  2. Estimated cost is calculated: (bytes / 1 TB) * $5
  3. Based on the approval mode, the query either proceeds (with cost metadata attached) or is rejected with a user-facing message
  4. If the dry run fails (e.g. network issue, permissions), the query proceeds without cost gating — dry-run failures never block execution

Cost metadata

When a query proceeds (in "auto" mode or under threshold in "threshold" mode), the tool result includes:

  • estimatedCostUsd — estimated cost in USD
  • bytesScanned — total bytes the query will process

The agent uses this metadata to inform the user about query costs.

Validation

In addition to the standard SQL validation pipeline (regex guard + AST parse + table whitelist), the plugin blocks BigQuery-specific statements: MERGE, EXPORT DATA, DECLARE, SET, BEGIN, ASSERT, and RAISE. The AST parser uses BigQuery mode natively (supported by node-sql-parser).

Security

BigQuery has no session-level read-only mode. Read-only enforcement relies on the Atlas SQL validation pipeline (regex + AST + table whitelist) and the service account's IAM permissions. Grant the service account only the BigQuery Data Viewer and BigQuery Job User roles for least-privilege access.

Troubleshooting

Authentication errors

Ensure the service account JSON key file is valid and accessible. For ADC, run gcloud auth application-default login locally. In GCP environments, verify the compute instance or Cloud Run service has the correct service account attached.

Permission denied

The service account needs at minimum roles/bigquery.dataViewer (to read tables) and roles/bigquery.jobUser (to run queries) on the project. Check IAM permissions in the GCP Console.

Query timeout

BigQuery queries run as async jobs. Increase ATLAS_QUERY_TIMEOUT (default 30000ms) for long-running analytical queries. The plugin passes this as jobTimeoutMs to the BigQuery client.

Dataset not found

If using unqualified table names, ensure the dataset config option is set. Otherwise, fully qualify table references as `project.dataset.table` in your semantic layer entity definitions.

On this page