Observability

Atlas includes built-in support for OpenTelemetry distributed tracing and structured JSON logging via Pino. Both are zero-overhead when disabled.

Prerequisites

Atlas server running (bun run dev)
For tracing: an OpenTelemetry-compatible collector (Jaeger, Grafana Tempo, Datadog, etc.)

OpenTelemetry Tracing

Atlas uses the @opentelemetry/api package to create spans around key operations. When the OpenTelemetry SDK is not initialized, the API returns no-op tracers with zero runtime overhead.

Enabling Tracing

Set OTEL_EXPORTER_OTLP_ENDPOINT to your collector's OTLP HTTP endpoint:

OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318

The API server initializes the OpenTelemetry Node.js SDK on startup, registering a trace exporter that sends spans to {OTEL_EXPORTER_OTLP_ENDPOINT}/v1/traces. The API is identified as atlas-api; the web frontend as atlas. Both use the package version from package.json.

When the environment variable is absent, no SDK is initialized and all trace calls are no-ops — zero overhead.

What Gets Traced

Atlas creates spans at each layer of the request lifecycle, forming a proper parent-child hierarchy:

HTTP Request (http.request)
  └── Agent Loop (atlas.agent)
      ├── Step 1
      │   ├── atlas.explore
      │   └── atlas.explore
      └── Step 2
          └── atlas.sql.execute

Span Name	Attributes	Description
`http.request`	`http.method`, `http.target`, `http.status_code`	Root span per API request (Hono middleware)
`atlas.agent`	`atlas.provider`, `atlas.model`, `atlas.message_count`, `atlas.finish_reason`, `atlas.total_steps`, `atlas.total_input_tokens`, `atlas.total_output_tokens`	Full agent loop (one per `streamText` call)
`atlas.sql.execute`	`db.system`, `atlas.connection_id`, `atlas.row_count`, `atlas.column_count`	SQL query execution. SQL content is not included for security
`atlas.explore`	`atlas.command` (truncated), `atlas.backend`	Semantic layer file exploration (ls, cat, grep, find)
`atlas.python.execute`	`code.length`	Python code execution in sandbox

Each span records success/failure status and captures exceptions on error, making it straightforward to trace agent step failures back to specific tool calls.

Collector Setup

Any OpenTelemetry-compatible collector works. Here are common setups:

Jaeger

# Run Jaeger with OTLP ingestion (port 16686 = UI, 4318 = OTLP HTTP)
docker run -d --name jaeger \
  -p 16686:16686 \
  -p 4318:4318 \
  jaegertracing/jaeger:latest

# Point Atlas at the Jaeger OTLP endpoint
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318

Open http://localhost:16686 to view traces. Search for service atlas-api (API server) or atlas (web frontend).

Grafana Tempo

OTEL_EXPORTER_OTLP_ENDPOINT=http://tempo:4318

Query traces in Grafana's Explore view using the Tempo datasource.

Datadog

Use the Datadog Agent's OTLP ingestion:

OTEL_EXPORTER_OTLP_ENDPOINT=http://datadog-agent:4318

See the Datadog OTLP documentation for agent configuration.

Generic OTLP Collector

Any service that accepts OTLP over HTTP (Honeycomb, Axiom, Signoz, etc.) works by setting the endpoint:

# Set the OTLP endpoint and any required auth headers for your collector
OTEL_EXPORTER_OTLP_ENDPOINT=https://api.honeycomb.io
OTEL_EXPORTER_OTLP_HEADERS="x-honeycomb-team=your-api-key"

Atlas uses @opentelemetry/exporter-trace-otlp-http for trace export. Standard OpenTelemetry environment variables like OTEL_EXPORTER_OTLP_HEADERS and OTEL_RESOURCE_ATTRIBUTES are respected by the underlying SDK.

Graceful Shutdown

The SDK registers a SIGTERM handler to flush pending spans before the process exits. This ensures traces from the final requests are not lost during container restarts or deployments.

Structured Logging

Atlas uses Pino for structured JSON logging. Every log line includes a timestamp, level, component name, and request context (when available).

Log Levels

Control verbosity with ATLAS_LOG_LEVEL:

ATLAS_LOG_LEVEL=debug  # trace | debug | info | warn | error | fatal

The default level is info. In development (NODE_ENV !== "production"), logs are formatted with pino-pretty for human readability. In production, logs are emitted as single-line JSON for machine parsing.

Log Structure

Each log entry includes:

Field	Description
`level`	Numeric Pino level (10=trace, 20=debug, 30=info, 40=warn, 50=error, 60=fatal)
`time`	Unix timestamp in milliseconds
`msg`	Human-readable message
`component`	Module that emitted the log (e.g., `agent`, `sql`, `explore`, `auth`)
`requestId`	UUID for the current request (when inside a request context)
`userId`	Authenticated user ID (when inside a request context)

Example Output

Production (JSON):

{"level":30,"time":1706000000000,"component":"sql","requestId":"abc-123","msg":"Query executed","durationMs":45,"rowCount":100}

Development (pretty-printed):

[10:30:00.000] INFO (sql): Query executed
    requestId: "abc-123"
    durationMs: 45
    rowCount: 100

Component Loggers

Atlas creates child loggers per component using createLogger("component-name"). Key components:

agent -- Agent loop lifecycle and step transitions
sql -- SQL validation, execution, and audit
explore -- Semantic layer file access
auth -- Authentication and authorization decisions
admin-routes -- Admin API operations
scheduler -- Scheduled task execution
conversations -- Conversation persistence
actions -- Action approval and execution

For more, see Troubleshooting.

Troubleshooting -- enable debug logging and interpret diagnostic output
Environment Variables -- ATLAS_LOG_LEVEL, OTEL_EXPORTER_OTLP_ENDPOINT, and related config