Observability
Monitor Atlas with OpenTelemetry tracing and structured logging.
Atlas includes built-in support for OpenTelemetry distributed tracing and structured JSON logging via Pino. Both are zero-overhead when disabled.
Prerequisites
- Atlas server running (
bun run dev) - For tracing: an OpenTelemetry-compatible collector (Jaeger, Grafana Tempo, Datadog, etc.)
OpenTelemetry Tracing
Atlas uses the @opentelemetry/api package to create spans around key operations. When the OpenTelemetry SDK is not initialized, the API returns no-op tracers with zero runtime overhead.
Enabling Tracing
Set OTEL_EXPORTER_OTLP_ENDPOINT to your collector's OTLP HTTP endpoint:
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318The API server initializes the OpenTelemetry Node.js SDK on startup, registering a trace exporter that sends spans to {OTEL_EXPORTER_OTLP_ENDPOINT}/v1/traces. The API is identified as atlas-api; the web frontend as atlas. Both use the package version from package.json.
When the environment variable is absent, no SDK is initialized and all trace calls are no-ops — zero overhead.
What Gets Traced
Atlas creates spans at each layer of the request lifecycle, forming a proper parent-child hierarchy:
HTTP Request (http.request)
└── Agent Loop (atlas.agent)
├── Step 1
│ ├── atlas.explore
│ └── atlas.explore
└── Step 2
└── atlas.sql.execute| Span Name | Attributes | Description |
|---|---|---|
http.request | http.method, http.target, http.status_code | Root span per API request (Hono middleware) |
atlas.agent | atlas.provider, atlas.model, atlas.message_count, atlas.finish_reason, atlas.total_steps, atlas.total_input_tokens, atlas.total_output_tokens | Full agent loop (one per streamText call) |
atlas.sql.execute | db.system, atlas.connection_id, atlas.row_count, atlas.column_count | SQL query execution. SQL content is not included for security |
atlas.explore | atlas.command (truncated), atlas.backend | Semantic layer file exploration (ls, cat, grep, find) |
atlas.python.execute | code.length | Python code execution in sandbox |
Each span records success/failure status and captures exceptions on error, making it straightforward to trace agent step failures back to specific tool calls.
Collector Setup
Any OpenTelemetry-compatible collector works. Here are common setups:
Jaeger
# Run Jaeger with OTLP ingestion (port 16686 = UI, 4318 = OTLP HTTP)
docker run -d --name jaeger \
-p 16686:16686 \
-p 4318:4318 \
jaegertracing/jaeger:latest
# Point Atlas at the Jaeger OTLP endpoint
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318Open http://localhost:16686 to view traces. Search for service atlas-api (API server) or atlas (web frontend).
Grafana Tempo
OTEL_EXPORTER_OTLP_ENDPOINT=http://tempo:4318Query traces in Grafana's Explore view using the Tempo datasource.
Datadog
Use the Datadog Agent's OTLP ingestion:
OTEL_EXPORTER_OTLP_ENDPOINT=http://datadog-agent:4318See the Datadog OTLP documentation for agent configuration.
Generic OTLP Collector
Any service that accepts OTLP over HTTP (Honeycomb, Axiom, Signoz, etc.) works by setting the endpoint:
# Set the OTLP endpoint and any required auth headers for your collector
OTEL_EXPORTER_OTLP_ENDPOINT=https://api.honeycomb.io
OTEL_EXPORTER_OTLP_HEADERS="x-honeycomb-team=your-api-key"Atlas uses @opentelemetry/exporter-trace-otlp-http for trace export. Standard OpenTelemetry environment variables like OTEL_EXPORTER_OTLP_HEADERS and OTEL_RESOURCE_ATTRIBUTES are respected by the underlying SDK.
Graceful Shutdown
The SDK registers a SIGTERM handler to flush pending spans before the process exits. This ensures traces from the final requests are not lost during container restarts or deployments.
Structured Logging
Atlas uses Pino for structured JSON logging. Every log line includes a timestamp, level, component name, and request context (when available).
Log Levels
Control verbosity with ATLAS_LOG_LEVEL:
ATLAS_LOG_LEVEL=debug # trace | debug | info | warn | error | fatalThe default level is info. In development (NODE_ENV !== "production"), logs are formatted with pino-pretty for human readability. In production, logs are emitted as single-line JSON for machine parsing.
Log Structure
Each log entry includes:
| Field | Description |
|---|---|
level | Numeric Pino level (10=trace, 20=debug, 30=info, 40=warn, 50=error, 60=fatal) |
time | Unix timestamp in milliseconds |
msg | Human-readable message |
component | Module that emitted the log (e.g., agent, sql, explore, auth) |
requestId | UUID for the current request (when inside a request context) |
userId | Authenticated user ID (when inside a request context) |
Example Output
Production (JSON):
{"level":30,"time":1706000000000,"component":"sql","requestId":"abc-123","msg":"Query executed","durationMs":45,"rowCount":100}Development (pretty-printed):
[10:30:00.000] INFO (sql): Query executed
requestId: "abc-123"
durationMs: 45
rowCount: 100Component Loggers
Atlas creates child loggers per component using createLogger("component-name"). Key components:
agent-- Agent loop lifecycle and step transitionssql-- SQL validation, execution, and auditexplore-- Semantic layer file accessauth-- Authentication and authorization decisionsadmin-routes-- Admin API operationsscheduler-- Scheduled task executionconversations-- Conversation persistenceactions-- Action approval and execution
Troubleshooting
No traces appearing in the collector
Cause: OTEL_EXPORTER_OTLP_ENDPOINT is not set, or the collector is unreachable from the Atlas server.
Fix: Verify the environment variable is set and the endpoint is reachable: curl http://localhost:4318/v1/traces. Check that the collector is running and accepting OTLP HTTP connections on the configured port.
Logs are JSON in development
Cause: NODE_ENV is set to production (or a non-development value). Pino uses JSON output in production and pretty-printed output in development.
Fix: For development, ensure NODE_ENV is unset or set to development. For production where you want readable logs, pipe through pino-pretty: bun run dev:api | bun x pino-pretty.
Missing requestId or userId in log entries
Cause: The log was emitted outside of a request context (e.g., during startup or in a background task like the scheduler).
Fix: This is expected. Context fields (requestId, userId) are only present for logs emitted inside an HTTP request handler. Startup and scheduler logs include component but not request-scoped fields.
For more, see Troubleshooting.
Related
- Troubleshooting -- enable debug logging and interpret diagnostic output
- Environment Variables --
ATLAS_LOG_LEVEL,OTEL_EXPORTER_OTLP_ENDPOINT, and related config