Atlas
Platform Operations

Observability

Monitor Atlas with OpenTelemetry tracing and structured logging.

Atlas includes built-in support for OpenTelemetry distributed tracing and structured JSON logging via Pino. Both are zero-overhead when disabled.

Prerequisites

  • Atlas server running (bun run dev)
  • For tracing: an OpenTelemetry-compatible collector (Jaeger, Grafana Tempo, Datadog, etc.)

OpenTelemetry Tracing

Atlas uses the @opentelemetry/api package to create spans around key operations. When the OpenTelemetry SDK is not initialized, the API returns no-op tracers with zero runtime overhead.

Enabling Tracing

Set OTEL_EXPORTER_OTLP_ENDPOINT to your collector's OTLP HTTP endpoint:

OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318

The API server initializes the OpenTelemetry Node.js SDK on startup, registering a trace exporter that sends spans to {OTEL_EXPORTER_OTLP_ENDPOINT}/v1/traces. The API is identified as atlas-api; the web frontend as atlas. Both use the package version from package.json.

When the environment variable is absent, no SDK is initialized and all trace calls are no-ops — zero overhead.

What Gets Traced

Atlas creates spans at each layer of the request lifecycle, forming a proper parent-child hierarchy:

HTTP Request (http.request)
  └── Agent Loop (atlas.agent)
      ├── Step 1
      │   ├── atlas.explore
      │   └── atlas.explore
      └── Step 2
          └── atlas.sql.execute
Span NameAttributesDescription
http.requesthttp.method, http.target, http.status_codeRoot span per API request (Hono middleware)
atlas.agentatlas.provider, atlas.model, atlas.message_count, atlas.finish_reason, atlas.total_steps, atlas.total_input_tokens, atlas.total_output_tokensFull agent loop (one per streamText call)
atlas.sql.executedb.system, atlas.connection_id, atlas.row_count, atlas.column_countSQL query execution. SQL content is not included for security
atlas.exploreatlas.command (truncated), atlas.backendSemantic layer file exploration (ls, cat, grep, find)
atlas.python.executecode.lengthPython code execution in sandbox

Each span records success/failure status and captures exceptions on error, making it straightforward to trace agent step failures back to specific tool calls.

Collector Setup

Any OpenTelemetry-compatible collector works. Here are common setups:

Jaeger

# Run Jaeger with OTLP ingestion (port 16686 = UI, 4318 = OTLP HTTP)
docker run -d --name jaeger \
  -p 16686:16686 \
  -p 4318:4318 \
  jaegertracing/jaeger:latest

# Point Atlas at the Jaeger OTLP endpoint
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318

Open http://localhost:16686 to view traces. Search for service atlas-api (API server) or atlas (web frontend).

Grafana Tempo

OTEL_EXPORTER_OTLP_ENDPOINT=http://tempo:4318

Query traces in Grafana's Explore view using the Tempo datasource.

Datadog

Use the Datadog Agent's OTLP ingestion:

OTEL_EXPORTER_OTLP_ENDPOINT=http://datadog-agent:4318

See the Datadog OTLP documentation for agent configuration.

Generic OTLP Collector

Any service that accepts OTLP over HTTP (Honeycomb, Axiom, Signoz, etc.) works by setting the endpoint:

# Set the OTLP endpoint and any required auth headers for your collector
OTEL_EXPORTER_OTLP_ENDPOINT=https://api.honeycomb.io
OTEL_EXPORTER_OTLP_HEADERS="x-honeycomb-team=your-api-key"

Atlas uses @opentelemetry/exporter-trace-otlp-http for trace export. Standard OpenTelemetry environment variables like OTEL_EXPORTER_OTLP_HEADERS and OTEL_RESOURCE_ATTRIBUTES are respected by the underlying SDK.

Graceful Shutdown

The SDK registers a SIGTERM handler to flush pending spans before the process exits. This ensures traces from the final requests are not lost during container restarts or deployments.


Structured Logging

Atlas uses Pino for structured JSON logging. Every log line includes a timestamp, level, component name, and request context (when available).

Log Levels

Control verbosity with ATLAS_LOG_LEVEL:

ATLAS_LOG_LEVEL=debug  # trace | debug | info | warn | error | fatal

The default level is info. In development (NODE_ENV !== "production"), logs are formatted with pino-pretty for human readability. In production, logs are emitted as single-line JSON for machine parsing.

Log Structure

Each log entry includes:

FieldDescription
levelNumeric Pino level (10=trace, 20=debug, 30=info, 40=warn, 50=error, 60=fatal)
timeUnix timestamp in milliseconds
msgHuman-readable message
componentModule that emitted the log (e.g., agent, sql, explore, auth)
requestIdUUID for the current request (when inside a request context)
userIdAuthenticated user ID (when inside a request context)

Example Output

Production (JSON):

{"level":30,"time":1706000000000,"component":"sql","requestId":"abc-123","msg":"Query executed","durationMs":45,"rowCount":100}

Development (pretty-printed):

[10:30:00.000] INFO (sql): Query executed
    requestId: "abc-123"
    durationMs: 45
    rowCount: 100

Component Loggers

Atlas creates child loggers per component using createLogger("component-name"). Key components:

  • agent -- Agent loop lifecycle and step transitions
  • sql -- SQL validation, execution, and audit
  • explore -- Semantic layer file access
  • auth -- Authentication and authorization decisions
  • admin-routes -- Admin API operations
  • scheduler -- Scheduled task execution
  • conversations -- Conversation persistence
  • actions -- Action approval and execution

Troubleshooting

No traces appearing in the collector

Cause: OTEL_EXPORTER_OTLP_ENDPOINT is not set, or the collector is unreachable from the Atlas server.

Fix: Verify the environment variable is set and the endpoint is reachable: curl http://localhost:4318/v1/traces. Check that the collector is running and accepting OTLP HTTP connections on the configured port.

Logs are JSON in development

Cause: NODE_ENV is set to production (or a non-development value). Pino uses JSON output in production and pretty-printed output in development.

Fix: For development, ensure NODE_ENV is unset or set to development. For production where you want readable logs, pipe through pino-pretty: bun run dev:api | bun x pino-pretty.

Missing requestId or userId in log entries

Cause: The log was emitted outside of a request context (e.g., during startup or in a background task like the scheduler).

Fix: This is expected. Context fields (requestId, userId) are only present for logs emitted inside an HTTP request handler. Startup and scheduler logs include component but not request-scoped fields.

For more, see Troubleshooting.


On this page