OpenTrace
Give your AI coding agent eyes into production
How It Works · Get Started · Features · MCP Tools · Security
Your AI coding assistant can read your code — but it's blind to production. When something breaks, you copy-paste stack traces, dig through log dashboards, and manually feed context into chat.
OpenTrace fixes that. It's a self-hosted server that connects your AI agent directly to your production observability data via MCP. No dashboards, no UI — the agent is the interface.
You: "Why is the checkout endpoint throwing 500s?"
Agent: Error rate on POST /api/checkout spiked from 0.1% to 4.2%
starting 8 minutes ago. 23 errors, 12 affected users.
Root cause: NilPointerError at payments_controller.rb:87
— customer.address is nil when charging.
This started after the latest deploy (commit a1b2c3d by
[email protected]). The commit modified payments_controller.rb
lines 82-95.
Want me to fix it?
Your App Your Server
┌──────────────────────┐ ┌──────────────────────────────────┐
│ │ │ OpenTrace Server │
│ SDK (Ruby / Node) │── POST ──────>│ │
│ Thin, async, non- │ /api/v2/logs │ Ingest Pipeline │
│ blocking. Just │ flat JSON │ PII scrub → fingerprint → │
│ serialize & send. │ │ expand in-request logs │
│ │ │ │ │
└──────────────────────┘ │ ▼ │
│ Segmented Log Store │
Your Laptop │ Binary WAL → hourly seal → │
┌──────────────────────┐ │ columnar chunks + FTS index │
│ │ │ 45 columns, 6 encoding types │
│ Claude Code / Cursor│◄── MCP ──────│ ~260KB runtime memory │
│ │ over HTTPS │ │
│ Reads .mcp.json │ │ SQLite (platform data) │
│ Auto-connects │ │ Users, watches, error groups │
│ │ │ │
└──────────────────────┘ │ Connects to your Postgres │
│ (read-only) │
└──────────────────────────────────┘
The SDK captures logs, request performance, SQL queries, external API calls, emails, file operations, and audit trails — then sends everything as flat JSON. Your app never blocks or crashes due to OpenTrace.
The server ingests logs into a custom columnar storage engine (no Elasticsearch, no ClickHouse — just files on disk), monitors health checks, tracks errors, and runs alert rules. Runs on a $4/month VM.
The agent queries all of this through MCP tools — searching logs, investigating errors, explaining slow queries, assessing deploy risk — without you copy-pasting anything.
OpenTrace uses a custom segmented columnar log store instead of SQLite or Elasticsearch for log data:
- Write path: SDK sends flat JSON → server appends to binary WAL → fsync. No indexes on write. 200-500K entries/sec.
- Seal: Every hour, the WAL is sealed into compressed columnar chunks (45 columns, 6 encoding types: dictionary, sparse, delta, bitpack, varint, zstd). 3-5MB peak memory.
- Query: Parallel column scans across segments + custom inverted index for full-text search. Most queries complete in 5-50ms.
- Pruning:
rm -rfold segment directories. Instant — no DELETE queries, no VACUUM. - Storage: ~76MB/hour at 1M logs/hr (vs ~500GB with SQLite). Fits on a $4 VM.
data/logs/
2026-04-04T10/
chunk_000.col 3MB ← 45 compressed columns
chunk_000.idx 1MB ← inverted index for FTS
meta.json 2KB ← pre-computed histograms
2026-04-04T11/
...
2026-04-04T12/
active.wal 12MB ← current hour, accumulating
Every entry captured by the SDK flows through the full pipeline — PII scrubbing, error fingerprinting, in-request log expansion — and lands in the store as a single row with 45 searchable columns plus an opaque body blob for deep details (SQL queries, stack traces, timeline, etc.).
Pick one:
VPS (Hetzner, DigitalOcean, any Linux server)
ssh root@your-server
curl -fsSL https://raw.githubusercontent.com/adham90/opentrace/main/scripts/install.sh | bashThe installer:
- Downloads the latest binary
- Initializes the database
- Sets up a systemd service
- Optionally installs Caddy for automatic HTTPS
- Prints the connect command when done
Docker
docker run -d --name opentrace \
-p 8080:8080 \
-v opentrace-data:/data \
-e OPENTRACE_LISTEN_ADDR=0.0.0.0:8080 \
ghcr.io/adham90/opentrace:latestDocker Compose
docker compose -f docker-compose.prod.yml up -dIn your project directory, run the connect command the installer printed:
curl -s https://your-server.com/connect | bashNo client install needed. Just curl and bash. The script creates .mcp.json in your project — Claude Code reads this file and connects to OpenTrace automatically.
Open Claude Code and ask:
"Set up opentrace for my project"
The agent detects your framework, installs the SDK, configures it with the correct API key, and verifies logs are flowing.
| SDK | Platform | Install |
|---|---|---|
| opentrace | Ruby / Rails | gem 'opentrace' |
| @opentrace-sdk/node | Node.js | npm install @opentrace-sdk/node |
The SDK captures structured logs, request lifecycle data (SQL queries, external API calls, cache metrics, view rendering, email delivery), error traces with stack traces, and runtime metrics — all sent as flat JSON with async I/O. Your app never blocks.
You're done. Start asking:
| Question | What happens |
|---|---|
| "What errors are happening in production?" | Agent searches error groups, shows impact and stack traces |
| "Why is the payments endpoint slow?" | Agent checks request performance — duration, SQL count, external API time, N+1 detection |
| "Show me logs from the last hour with level ERROR" | Agent searches logs with columnar filters |
| "Is it safe to deploy this change?" | Agent checks blast radius, code risk scores, recent errors |
| "Generate tests for the most common production errors" | Agent creates regression tests from real error data |
| "Set up a watcher for checkout error rate > 1%" | Agent creates a threshold alert |
| "What happened after the last deploy?" | Agent checks deploy impact, error rate changes |
Full-text search across all services via custom inverted index. Filter by level, service, trace ID, time range, handler, status code, error class. Assemble distributed traces. Compare error rates between time periods.
Every HTTP request captured by the middleware includes: SQL queries with durations and EXPLAIN plans, external API calls, cache hits/misses, view rendering times, email deliveries, file operations, audit trail, and a waterfall timeline — all in one log entry.
Errors are automatically grouped by fingerprint (hash of error class + source file + line). The agent sees occurrence counts, affected users, impact scores, and full stack traces. It can resolve or ignore error groups.
Connect your Postgres databases (read-only). The agent runs EXPLAIN ANALYZE on slow queries, checks index health, detects lock contention, and identifies N+1 query patterns. All queries are validated SELECT-only via SQL AST parsing.
Create HTTP health checks that run on a schedule. The agent sees uptime percentages, response times, and gets notified when endpoints go down.
Create threshold watches on error rate, response time (mean or p95), log volume, error count, SQL count, cache hit rate, or service heartbeat. The agent can create watches for code it just deployed — self-monitoring its own changes.
Every file and endpoint gets a risk score based on error frequency, investigation history, and change velocity. Before modifying a file, the agent checks its production behavior.
The SDK sends the git commit hash with every log. OpenTrace detects deploys automatically when the commit hash changes. The agent correlates errors to specific commits.
OpenTrace exposes 13 tools with 90+ actions via MCP. Each tool returns suggested_tools with pre-filled arguments so the agent knows what to call next.
| Tool | Actions | What it does |
|---|---|---|
| logs | search, context, attributes, stats, summary, performance, trace, compare | Full-text log search, distributed trace assembly, N+1 detection |
| errors | list, detail, investigate, impact, user_errors, ranking, resolve, ignore, reopen, new | Error grouping by fingerprint, user impact scoring, stack traces |
| database | queries, explain, tables, activity, locks, connections, indexes, schema, storage, kill_query, long_transactions | Postgres introspection, EXPLAIN plans, lock and connection analysis |
| watches | status, create, delete, alerts, dismiss, acknowledge, investigate | Threshold alerts on error rate, latency, request volume |
| overview | status, triage, diagnose, timeline, investigate, changes, settings, notes, delete_note | System health, alerts, incident timeline, settings, agent memory |
| analytics | traffic, endpoints, heatmap, trends, movers | Traffic patterns, endpoint performance, time-series analysis |
| code | risk, fragile, annotate_file, annotate_function, hotspots, gen_context, gen_suggest, deps_service, deps_blast, deps_risk | Code risk scores, test generation, blast radius, production annotations |
| deep_capture | request_capture, sql_captures, http_captures, email_captures, audit_trail, search_audit, search_sql, file_captures, get_pii_config, update_pii_config, get_retention, update_retention | Per-request deep capture: SQL, HTTP, emails, audit trail, file ops, PII config |
| healthchecks | list, uptime, create, delete | HTTP endpoint monitoring with uptime tracking |
| servers | list, query, health | Server and process metrics (CPU, memory, GC) |
| connectors | list, get, create, test, update, delete | Manage database connectors (Postgres, MySQL, etc.) |
| setup | status, detect, guide, verify | SDK setup assistant — detects framework, provides config with API key |
| admin | update_retention, users, update_role, toggle_active, delete_user, audit | User management, retention, audit log (admin only) |
| Protection | How |
|---|---|
| No self-registration | First curl .../connect creates admin. Everyone else needs an invite. |
| Per-user tokens | Each developer gets a personal MCP token, stored in their local .mcp.json. Revocable independently. |
| HTTPS via Caddy | The install script sets up Caddy with automatic Let's Encrypt certificates. |
| PII scrubbing | Credit cards, emails, phone numbers, SSNs, and configurable sensitive fields are scrubbed from request bodies before storage. |
| Rate limiting | Auth endpoints are rate-limited — 10 attempts per minute per IP. |
| Read-only DB access | All queries against your Postgres are validated SELECT-only via SQL AST parsing. |
| API key auth | SDK log ingestion requires a Bearer token. |
| No telemetry | Fully self-hosted. No external calls. No tracking. Your data stays on your server. |
Server-side environment variables (.env file):
| Variable | Default | Description |
|---|---|---|
OPENTRACE_LISTEN_ADDR |
127.0.0.1:8080 |
HTTP listen address |
OPENTRACE_DATA_DIR |
~/.opentrace |
Data directory (SQLite + log segments) |
OPENTRACE_API_KEY |
(auto-generated) | Bearer token for SDK log ingestion |
OPENTRACE_MAX_QUERY_ROWS |
500 |
Max rows returned from SQL queries |
OPENTRACE_STATEMENT_TIMEOUT_MS |
5000 |
SQL query timeout in milliseconds |
OPENTRACE_TRUSTED_PROXIES |
(empty) | Comma-separated proxy IPs for rate limiting |
See .env.example for all options.
- Go — single binary, no runtime dependencies, cross-compiled for Linux and macOS
- Custom columnar storage — 45-column format with 6 encoding types (dictionary, sparse, delta, bitpack, varint, zstd). Binary WAL for writes, hourly seal into compressed chunks, custom inverted index for FTS.
- SQLite — for platform data (users, watches, error groups, health checks). Not used for log storage.
- MCP — native Model Context Protocol with Streamable HTTP and SSE transports
- Pure Go — no CGO, no system dependencies,
go buildand ship
git clone https://github.com/adham90/opentrace.git && cd opentrace
cp .env.example .env
go build -o opentrace ./cmd/opentrace
./opentrace servego test -short -race ./... # unit tests (44 packages)
go vet ./... # lintingMIT — use it however you want.