Skip to content

feat(agents-core): structured logging with OTel log export#3073

Open
amikofalvy wants to merge 5 commits intomainfrom
feat/structured-logging
Open

feat(agents-core): structured logging with OTel log export#3073
amikofalvy wants to merge 5 commits intomainfrom
feat/structured-logging

Conversation

@amikofalvy
Copy link
Copy Markdown
Collaborator

Summary

  • Environment-aware pino configuration: Production/Vercel outputs JSON to stdout + OTel LogRecords via pino.multistream(). Development uses pino-pretty (unchanged).
  • OTel log bridge on main thread: Custom OTelLogStream (Writable stream) parses pino JSON and emits OTel LogRecords with automatic trace-log correlation via context.active() — no worker threads, no pino mixin needed.
  • Provider-agnostic: Reads OTEL_EXPORTER_OTLP_LOGS_ENDPOINT for any OTLP-compatible backend (SigNoz, Grafana, Datadog, etc.). Gracefully degrades to no-op when LoggerProvider is not registered or OTel SDK packages are not installed.

Why

  • pino-pretty on Vercel produces multi-line indented plaintext that's hard to scan in the log viewer
  • All pino logs went to stdout, so Vercel classified everything as "info" — error filtering was useless
  • No structured log querying was possible without an external service
  • Needed same logging code path for local dev and production

What changed

File Change
packages/agents-core/src/utils/logger.ts Environment-aware PinoLogger: multistream (stdout + OTel) in prod, pino-pretty in dev
packages/agents-core/src/utils/otel-log-stream.ts New — Main-thread Writable that bridges pino JSON → OTel LogRecords with trace context
packages/agents-core/src/utils/otel-log-provider.ts New — Sets up OTel LoggerProvider with BatchLogRecordProcessor + OTLPLogExporter
agents-api/src/instrumentation.ts Calls setupOTelLogProvider() after SDK start; flushBatchProcessor() now also flushes OTel logs
packages/agents-core/package.json Added @opentelemetry/api-logs (regular dep), sdk-logs + exporter-logs-otlp-http (optional deps)
packages/agents-core/tsdown.config.ts Added OTel SDK packages to external to prevent bundler tracing
.env.example Added OTEL_EXPORTER_OTLP_LOGS_ENDPOINT

Test plan

  • pnpm --filter @inkeep/agents-core typecheck passes
  • pnpm --filter @inkeep/agents-core test — 140 files, 2112 tests pass
  • pnpm --filter @inkeep/agents-core build succeeds
  • Verified SigNoz OTLP collector accepts log records at localhost:4318/v1/logs
  • CI full test suite
  • Manual: start app locally with OTEL_EXPORTER_OTLP_LOGS_ENDPOINT set, verify logs appear in SigNoz Logs tab

🤖 Generated with Claude Code

Replace always-on pino-pretty with environment-aware configuration:
- Production/Vercel: JSON to stdout + OTel LogRecords via multistream
- Development: pino-pretty (unchanged)

Architecture:
- OTelLogStream (main thread Writable) bridges pino JSON to OTel LogRecords
- Automatic trace-log correlation via context.active() (no mixin needed)
- Provider-agnostic: reads OTEL_EXPORTER_OTLP_LOGS_ENDPOINT for any OTLP backend
- Graceful degradation: no-op if LoggerProvider not registered or SDK not installed
- No worker threads: avoids flush-on-freeze and bundler issues in serverless

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented Apr 8, 2026

🦋 Changeset detected

Latest commit: 5bff3ec

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 10 packages
Name Type
@inkeep/agents-core Patch
@inkeep/agents-api Patch
@inkeep/agents-manage-ui Patch
@inkeep/agents-cli Patch
@inkeep/agents-sdk Patch
@inkeep/agents-work-apps Patch
@inkeep/ai-sdk-provider Patch
@inkeep/create-agents Patch
@inkeep/agents-email Patch
@inkeep/agents-mcp Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 8, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
agents-api Ready Ready Preview, Comment Apr 8, 2026 6:04pm
agents-docs Ready Ready Preview, Comment Apr 8, 2026 6:04pm
agents-manage-ui Ready Ready Preview, Comment Apr 8, 2026 6:04pm

Request Review

@pullfrog
Copy link
Copy Markdown
Contributor

pullfrog Bot commented Apr 8, 2026

TL;DR — Adds structured JSON logging in production/Vercel environments and bridges pino log output to the OpenTelemetry Logs SDK via OTLP HTTP, so application logs are exported alongside traces to an OTel-compatible collector.

Key changes

  • Add OTelLogStream pino-to-OTel bridge — A Writable stream that parses pino JSON lines, maps severity levels, and emits them as OTel LogRecords with automatic trace context correlation.
  • Add otel-log-provider setup and lifecycle helperssetupOTelLogProvider lazily imports the SDK, creates a BatchLogRecordProcessor with OTLPLogExporter, and registers the provider globally; companion flushOTelLogs / shutdownOTelLogProvider manage graceful drain.
  • Environment-aware logger output mode — In production or on Vercel, pino now writes to a multistream (stdout JSON + OTelLogStream); in development it continues to use pino-pretty.
  • Wire OTel log provider into API instrumentation with SIGTERM shutdownstartOpenTelemetrySDK becomes async and calls setupOTelLogProvider; flushBatchProcessor drains both trace and log processors in parallel; a new SIGTERM handler calls shutdownOpenTelemetry for Vercel Fluid Compute graceful shutdown.
  • Add OTEL_EXPORTER_OTLP_LOGS_ENDPOINT to env schema and .env.example — Documents the new logs endpoint and validates it via Zod in env.ts.
  • Mark OTel log SDK packages as external in tsdown config — Keeps @opentelemetry/sdk-logs and @opentelemetry/exporter-logs-otlp-http out of the bundle for runtime resolution.

Summary | 12 files | 5 commits | base: mainfeat/structured-logging


Pino-to-OTel log bridge via OTelLogStream

Before: Pino logs were only written to stdout (pretty-printed or JSON) with no OTel export path.
After: In structured mode, each pino JSON line is parsed, severity-mapped, and emitted as an OTel LogRecord — correlated to the active trace context.

OTelLogStream extends Writable in non-object mode, splits incoming chunks by newline, and JSON-parses each line. It maps pino numeric levels (10–60) to OTel SeverityNumber values, extracts msg as the log body, and forwards remaining fields as attributes. The context.active() call on each record automatically links logs to in-flight spans.

The OTel logger is resolved lazily via a getter (private get otelLogger()) rather than cached at construction time — this ensures it picks up the LoggerProvider even when OTelLogStream is instantiated before setupOTelLogProvider runs. Emission errors are caught and rate-limited with a emitErrorLogged flag to avoid log storms, and per-line try/catch prevents one malformed line from dropping subsequent valid records.

Why a main-thread Writable instead of a pino transport?

Pino transports run in worker threads, which would lose the active OTel span context. By using a synchronous Writable stream in pino's multistream, context.active() captures the correct trace context on every log line — the key enabler for trace-log correlation.

How does the environment detection work?

A new isStructuredMode() helper returns true when VERCEL is set or NODE_ENV === 'production'. When active, the PinoLogger constructor (and reconfigure) creates a pino.multistream with two targets: process.stdout for the JSON sink and a new OTelLogStream instance for the OTel bridge. In development, the existing pino-pretty path is preserved unchanged.

otel-log-stream.ts · logger.ts


OTel log provider lifecycle and API instrumentation wiring

Before: Only the trace SDK was initialized and flushed in instrumentation.ts; no SIGTERM handler existed.
After: setupOTelLogProvider initializes a LoggerProvider with a batched OTLP HTTP exporter, flushBatchProcessor drains both trace and log processors concurrently, and a SIGTERM handler ensures graceful shutdown on Vercel Fluid Compute.

The provider setup uses dynamic import() for @opentelemetry/sdk-logs and @opentelemetry/exporter-logs-otlp-http, failing gracefully if the packages are not installed. Both packages are listed as devDependencies and marked external in tsdown — this makes them truly optional at runtime and avoids bundling OTel internals. The OTLPLogExporter reads its endpoint from the standard OTEL_EXPORTER_OTLP_LOGS_ENDPOINT env var, now validated in env.ts.

Function Behavior
setupOTelLogProvider() Lazy-imports SDK, creates BatchLogRecordProcessor + OTLPLogExporter, registers global provider
flushOTelLogs() Duck-types the global provider for forceFlush() — safe no-op if SDK absent
shutdownOTelLogProvider() Same pattern for shutdown() — wired into shutdownOpenTelemetry()
shutdownOpenTelemetry() Shuts down both the trace SDK and log provider in parallel
SIGTERM handler Calls shutdownOpenTelemetry() to flush buffered telemetry before Vercel's ~5s SIGKILL

otel-log-provider.ts · instrumentation.ts · env.ts · tsdown.config.ts

Pullfrog  | View workflow run | Triggered by Pullfrog | Using Claude Opus𝕏

Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review Summary

(5) Total Issues | Risk: High

🔴❗ Critical (1) ❗🔴

Inline Comments:

  • 🔴 Critical: otel-log-stream.ts:32 OTel logger obtained before LoggerProvider is registered — all log exports silently fail

🟠⚠️ Major (4) 🟠⚠️

🟠 1) scope Async SDK startup not awaited

Issue: startOpenTelemetrySDK() was changed to async function (returns Promise<void>) but is called without await at line 5 of index.ts.

Why: If the OTel SDK start or log provider setup fails, the error will be an unhandled promise rejection rather than a startup failure. More importantly, subsequent module imports and logger instantiations will race against the async setupOTelLogProvider() call, exacerbating the initialization order bug.

Fix: This is partially addressed by fixing the lazy logger initialization in otel-log-stream.ts. However, consider either:

  • Using top-level await: await startOpenTelemetrySDK();
  • Or handling the promise explicitly with a catch handler

Refs:

Inline Comments:

  • 🟠 Major: otel-log-stream.ts:50-52 Catch-all swallows OTel emission errors, not just JSON parse errors
  • 🟠 Major: instrumentation.ts:112 shutdownOTelLogProvider() is never called on process shutdown
  • 🟠 Major: .env.example:69 Missing from env.ts schema — CI will fail

🟡 Minor (0) 🟡

No minor issues.

💭 Consider (3) 💭

💭 1) otel-log-provider.ts:11 BatchLogRecordProcessor uses default configuration
Issue: The trace SDK uses env-configured OTEL_BSP_SCHEDULE_DELAY and OTEL_BSP_MAX_EXPORT_BATCH_SIZE, but the log processor uses defaults (30s delay, 512 batch size).
Why: Inconsistent batching behavior between traces and logs under high volume.
Fix: Consider adding similar env var support or document that defaults are intentional.

💭 2) otel-log-provider.ts:19 Debug-level logging for setup failures
Issue: Using debug level for OTel setup failures means operators may not notice when log export is unexpectedly disabled despite configuring OTEL_EXPORTER_OTLP_LOGS_ENDPOINT.
Fix: Consider info level when the env var is set but setup fails (indicating user intent).

💭 3) otel-log-provider.ts:24 flushOTelLogs has no timeout protection
Issue: If the OTLP endpoint is slow/unresponsive, forceFlush() could hang indefinitely.
Fix: Consider wrapping in Promise.race with a timeout for serverless environments.


🚫 REQUEST CHANGES

Summary: The core logger initialization order issue (Critical #1) means OTel log export will silently fail — the OTelLogStream captures a no-op logger at module load time before the provider is registered. This needs to be fixed by lazily initializing the logger. Additionally, the missing env schema entry will cause CI failures. The shutdown handling gap and error swallowing in the catch block should also be addressed before merge.

Discarded (6)
Location Issue Reason Discarded
package.json:185 Version mismatch between api-logs (0.203.0) and other OTel packages OTel uses semver, caret ranges handle this appropriately. Lockfile deduplication is working.
tsdown.config.ts:14 api-logs not in externals list unlike sdk-logs Intentional — api-logs is a regular dep meant to be bundled, SDK packages are optional/dynamic.
changeset Message could be more specific Valid changeset, message is adequate for a patch release.
logger.ts:115 pino-pretty fallback message could be more actionable Pre-existing code, fallback behavior is sound.
logger.ts OTelLogStream instances not cleaned up in recreateInstance Edge case — recreateInstance is rarely called, GC handles cleanup.
otel-log-stream.ts:61 Attributes forwarded without cardinality controls Log backends handle high cardinality better than metrics. Pino redact config handles secrets.
Reviewers (4)
Reviewer Returned Main Findings Consider While You're Here Inline Comments Pending Recs Discarded
pr-review-standards 2 1 0 0 1 0 0
pr-review-sre 7 1 3 0 2 0 1
pr-review-errors 3 0 1 0 1 0 1
pr-review-devops 5 0 0 0 1 0 4
Total 17 2 4 0 5 0 6

}

export class OTelLogStream extends Writable {
private otelLogger = logs.getLogger('pino-bridge');
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 CRITICAL: OTel logger obtained before LoggerProvider is registered

Issue: logs.getLogger('pino-bridge') is called at class construction time, but OTelLogStream instances are created in logger.ts constructor (line 104) which runs at module load time — before setupOTelLogProvider() is called in instrumentation.ts.

Why: The OTel API returns a no-op logger when no provider is registered. This no-op logger gets cached in otelLogger, causing all subsequent emit() calls to silently do nothing. This defeats the entire purpose of the feature.

Fix: Lazily initialize the logger on first use:

Suggested change
private otelLogger = logs.getLogger('pino-bridge');
private get otelLogger() {
return logs.getLogger('pino-bridge');
}

Or alternatively, call logs.getLogger() directly in emitLogRecord().

Refs:

Comment on lines +50 to +52
} catch {
// Silently skip unparseable lines
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 MAJOR: Catch-all swallows OTel emission errors, not just JSON parse errors

Issue: The catch block intends to skip unparseable JSON lines, but it also silently suppresses errors from emitLogRecord() and otelLogger.emit() — which may fail for reasons unrelated to JSON parsing (OTel SDK issues, malformed attributes, provider misconfiguration).

Why: If OTel emission fails systematically, there's zero feedback that logs aren't reaching the backend. Debugging why logs aren't appearing becomes extremely difficult.

Fix: Separate JSON parsing errors (expected, safe to skip) from emission errors (should surface at least once):

Suggested change
} catch {
// Silently skip unparseable lines
}
} catch (parseError) {
// Expected: partial lines, non-JSON preamble - silently skip
}

And wrap emitLogRecord() call separately with rate-limited error logging (e.g., log once per N failures to avoid storms).

throw error;
}

await setupOTelLogProvider();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 MAJOR: shutdownOTelLogProvider() is never called on process shutdown

Issue: The otel-log-provider.ts module exports shutdownOTelLogProvider() but it's not wired into any shutdown hook.

Why: On SIGTERM/graceful shutdown, logs buffered in BatchLogRecordProcessor will be lost if shutdown() is not called. The trace SDK handles this via NodeSDK.shutdown(), but the log provider setup is separate.

Fix: Add shutdownOTelLogProvider() to the application shutdown sequence. Consider creating a unified shutdown function:

export async function shutdownOpenTelemetry(): Promise<void> {
  await Promise.all([
    defaultSDK.shutdown(),
    shutdownOTelLogProvider(),
  ]);
}

Refs:

Comment thread .env.example
SIGNOZ_API_KEY=
OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=http://localhost:14318/v1/traces
# OTel logs endpoint — points to SigNoz collector (port 4318) or standalone collector (port 14318)
OTEL_EXPORTER_OTLP_LOGS_ENDPOINT=http://localhost:4318/v1/logs
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 MAJOR: Missing from env.ts schema — CI will fail

Issue: OTEL_EXPORTER_OTLP_LOGS_ENDPOINT is added to .env.example but not declared in agents-api/src/env.ts.

Why: The check-env-descriptions.mjs CI script enforces parity between .env.example and the Zod schema. This will cause CI failures.

Fix: Add to agents-api/src/env.ts alongside the other OTel config (around line 175):

OTEL_EXPORTER_OTLP_LOGS_ENDPOINT: z
  .string()
  .optional()
  .describe('OTLP endpoint for exporting logs to any OTLP-compatible backend (e.g., SigNoz, Grafana)'),

Refs:

@github-actions github-actions Bot deleted a comment from claude Bot Apr 8, 2026
Copy link
Copy Markdown
Contributor

@pullfrog pullfrog Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid approach — the pino multistream + OTel bridge pattern is clean and the proxy-based lazy initialization is correct. Two actionable issues: a bug in the stream's error handling that silently drops valid log lines, and an unhandled async concern in the callers of startOpenTelemetrySDK. The lockfile also has unrelated react-dom resolution drift that should be cleaned up.

Pullfrog  | Fix all ➔Fix 👍s ➔View workflow run | Using Claude Opus𝕏

Comment on lines +43 to +52
try {
const str = typeof chunk === 'string' ? chunk : chunk.toString();
for (const line of str.split('\n')) {
if (!line.trim()) continue;
const parsed = JSON.parse(line);
this.emitLogRecord(parsed);
}
} catch {
// Silently skip unparseable lines
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: one malformed line silently drops all remaining lines in the chunk.

The try/catch wraps the entire for loop. If any single line fails JSON.parse, the exception exits the loop and all subsequent valid lines in that chunk are never emitted to OTel. Move the try/catch inside the loop so each line is handled independently:

Suggested change
try {
const str = typeof chunk === 'string' ? chunk : chunk.toString();
for (const line of str.split('\n')) {
if (!line.trim()) continue;
const parsed = JSON.parse(line);
this.emitLogRecord(parsed);
}
} catch {
// Silently skip unparseable lines
}
try {
const str = typeof chunk === 'string' ? chunk : chunk.toString();
for (const line of str.split('\n')) {
if (!line.trim()) continue;
try {
const parsed = JSON.parse(line);
this.emitLogRecord(parsed);
} catch {
// Skip unparseable lines
}
}
} catch {
// Silently skip chunk-level errors (e.g. toString failure)
}

throw error;
}

await setupOTelLogProvider();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

startOpenTelemetrySDK() is now async but neither caller awaits it:

  • agents-api/src/index.ts:5: startOpenTelemetrySDK(); (fire-and-forget)
  • create-agents-template/.../instrumentation.ts:20: startOpenTelemetrySDK(); (fire-and-forget)

Since setupOTelLogProvider() has its own try/catch this won't cause unhandled rejections, but the log provider setup is now a race — early log records emitted before the provider is registered are silently dropped by the NOOP_LOGGER proxy.

This is likely acceptable for the OTel bridge (logs during startup are low-value), but document the intentional fire-and-forget behavior or convert the callers to await.

Comment thread pnpm-lock.yaml
'@inkeep/agents-manage-ui':
specifier: '*'
version: 0.41.1(@better-auth/[email protected])(@hono/[email protected]([email protected])([email protected]))(@opentelemetry/[email protected])(@opentelemetry/[email protected](@opentelemetry/[email protected]))(@opentelemetry/[email protected](@opentelemetry/[email protected]))(@types/[email protected])(@types/[email protected](@types/[email protected]))(@types/[email protected])([email protected](patch_hash=fb57ddbf6d253ca059e800cec47d55e177a15be89f100146c8ed5e4ba56697ed))([email protected]([email protected]))([email protected])([email protected])([email protected](@electric-sql/[email protected])(@opentelemetry/[email protected])(@types/[email protected])([email protected])([email protected])([email protected])([email protected]))([email protected])([email protected])([email protected])([email protected])([email protected]([email protected].4([email protected]))([email protected]))([email protected]([email protected]))([email protected](@edge-runtime/[email protected])(@types/[email protected])(@types/[email protected])([email protected])([email protected])([email protected])([email protected])([email protected])([email protected]))
version: 0.41.1(@better-auth/[email protected])(@hono/[email protected]([email protected])([email protected]))(@opentelemetry/[email protected])(@opentelemetry/[email protected](@opentelemetry/[email protected]))(@opentelemetry/[email protected](@opentelemetry/[email protected]))(@types/[email protected])(@types/[email protected](@types/[email protected]))(@types/[email protected])([email protected](patch_hash=fb57ddbf6d253ca059e800cec47d55e177a15be89f100146c8ed5e4ba56697ed))([email protected]([email protected]))([email protected])([email protected])([email protected](@electric-sql/[email protected])(@opentelemetry/[email protected])(@types/[email protected])([email protected])([email protected])([email protected])([email protected]))([email protected])([email protected])([email protected])([email protected])([email protected]([email protected].0([email protected]))([email protected]))([email protected]([email protected]))([email protected](@edge-runtime/[email protected])(@types/[email protected])(@types/[email protected])([email protected])([email protected])([email protected])([email protected])([email protected])([email protected]))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This diff includes react-dom resolution changes (19.2.419.2.0 in several better-auth, next, nuqs, react-router snapshots) that have no relation to the OTel dependencies being added. This typically happens when the lockfile is regenerated from scratch rather than layered on top of main.

Per repo conventions, rebase the lockfile cleanly:

git checkout main -- pnpm-lock.yaml
pnpm install

This preserves existing resolutions and only adds the new OTel entries.

@socket-security
Copy link
Copy Markdown

socket-security Bot commented Apr 8, 2026

Warning

Review the following alerts detected in dependencies.

According to your organization's Security Policy, it is recommended to resolve "Warn" alerts. Learn more about Socket for GitHub.

Action Severity Alert  (click "▶" to expand/collapse)
Warn High
Obfuscated code: npm @inkeep/agents-manage-ui is 98.0% likely obfuscated

Confidence: 0.98

Location: Package overview

From: create-agents-template/package.jsonnpm/@inkeep/[email protected]

ℹ Read more on: This package | This alert | What is obfuscated code?

Next steps: Take a moment to review the security alert above. Review the linked package source code to understand the potential risk. Ensure the package is not malicious before proceeding. If you're unsure how to proceed, reach out to your security team or ask the Socket team for help at [email protected].

Suggestion: Packages should not obfuscate their code. Consider not using packages with obfuscated code.

Mark the package as acceptable risk. To ignore this alert only in this pull request, reply with the comment @SocketSecurity ignore npm/@inkeep/[email protected]. You can also ignore all packages with @SocketSecurity ignore-all. To ignore an alert for all future pull requests, use Socket's Dashboard to change the triage state of this alert.

View full report

@itoqa
Copy link
Copy Markdown

itoqa Bot commented Apr 8, 2026

Ito Test Report ✅

13 test cases ran. 13 passed.

The unified local QA run passed all 13 of 13 test cases with zero failures, showing no code-supported defect signals across API liveness/readiness, auth boundaries, and UI navigation flows. Most importantly, /health and /ready remained contract-stable even under repeated and concurrent load (including 60 burst requests), protected endpoints correctly enforced controlled 401/4xx behavior while valid bypass auth returned expected data, deep-link/refresh/history and mobile (390x844) navigation stayed authenticated without 5xx or error boundaries, and the prior ADV-4 concern was confirmed as a test-expectation mismatch rather than an application defect.

✅ Passed (13)
Category Summary Screenshot
Adversarial Unauthorized manage request was rejected with 401 and no sensitive data leakage. ADV-1
Adversarial Run endpoint rejected partial bypass context with HTTP 401, satisfying controlled 4xx behavior. ADV-2
Adversarial Authorization abuse matrix produced 401 for all variants and did not set session cookies. ADV-3
Adversarial Previous failure was a test-method expectation mismatch, not an application defect. Source code and unit tests confirm /api/auth/dev-session intentionally succeeds with valid bypass auth regardless of request body; re-execution showed controlled responses and service health remained stable. ADV-4
Edge Ran two immediate concurrent bursts (30 + 30) against /health; all 60 responses were 204 with no timeout/errors. EDGE-1
Edge Rapid reload and history navigation stayed authenticated with no error boundary or login redirect. EDGE-2
Edge At 390x844 viewport, root-to-/default/projects flow remained usable; primary nav controls were visible/clickable and captured network calls showed no 5xx during the mobile path. EDGE-3
Edge Deep-link traces-to-agents flow with repeated back/forward remained authenticated and rendered normally. EDGE-4
Logic Twenty mixed request cycles completed successfully with no 500s or hangs. LOGIC-1
Happy-path Five sequential /health calls all returned 204 with empty bodies. ROUTE-1
Happy-path Readiness returned 200 application/json with expected keys status, manageDb, and runDb. ROUTE-2
Happy-path Root entry authenticated and loaded tenant/project UI without redirect loops or sampled 5xx. ROUTE-3
Happy-path Valid bypass auth returned HTTP 200 with JSON project data from manage endpoint. ROUTE-4

Commit: 7f81aa2

View Full Run


Tell us how we did: Give Ito Feedback

amikofalvy and others added 2 commits April 8, 2026 09:08
Reset pnpm-lock.yaml from main and reinstalled to properly layer
the new @opentelemetry dependencies.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review Summary

(5) Total Issues | Risk: High

🔴❗ Critical (1) ❗🔴

No new critical issues in delta.

🟠⚠️ Major (3) 🟠⚠️

No new major issues in delta.

🟡 Minor (0) 🟡

No minor issues.

💭 Consider (0) 💭

No consider items.

🕐 Pending Recommendations (5)

Issues raised in prior review that remain unaddressed:

✅ Delta Assessment

Commit 4941ccf37: Regenerated lockfile from main to fix frozen-lockfile CI failures.

This addresses the lockfile drift concern raised by Pullfrog in the prior review. The lockfile now properly layers the new OTel dependencies on top of main's existing resolutions. ✓


🚫 REQUEST CHANGES

Summary: The lockfile regeneration is correct and addresses the prior concern about resolution drift. However, all 5 critical/major issues from the previous review remain unaddressed. The most important is the lazy logger initialization bug (otel-log-stream.ts:32) — without this fix, OTel log export will silently fail because the logger is obtained before the provider is registered.

Discarded (0)

No discarded items.

Reviewers (1)
Reviewer Returned Main Findings Consider While You're Here Inline Comments Pending Recs Discarded
orchestrator 0 0 0 0 0 5 0
Total 0 0 0 0 0 5 0

Note: Delta-only review — no sub-reviewers dispatched. Prior issues carried forward as pending recommendations.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 8, 2026

Preview URLs

Use these stable preview aliases for testing this PR:

These point to the same Vercel preview deployment as the bot comment, but they stay stable and easier to find.

Raw Vercel deployment URLs

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review Summary

(5) Total Issues | Risk: High

🔴❗ Critical (0) ❗🔴

No new critical issues in delta.

🟠⚠️ Major (0) 🟠⚠️

No new major issues in delta.

🟡 Minor (0) 🟡

No minor issues.

💭 Consider (0) 💭

No consider items.

🕐 Pending Recommendations (5)

Issues raised in prior reviews that remain unaddressed:

✅ Delta Assessment

Commit 1d9c5158f: Retrigger CI after auto-format.

This commit contains no code changes — it's a CI retrigger. The 5 pending issues from the initial review remain unaddressed.


🚫 REQUEST CHANGES

Summary: No code changes in this delta. The 5 critical/major issues from the initial review remain unaddressed. The most important is the lazy logger initialization bug (otel-log-stream.ts:32) — without this fix, OTel log export will silently fail because the logger is obtained before the provider is registered.

Discarded (0)

No discarded items.

Reviewers (1)
Reviewer Returned Main Findings Consider While You're Here Inline Comments Pending Recs Discarded
orchestrator 0 0 0 0 0 5 0
Total 0 0 0 0 0 5 0

Note: Delta-only review — no sub-reviewers dispatched (empty diff). Prior issues carried forward as pending recommendations.

@itoqa
Copy link
Copy Markdown

itoqa Bot commented Apr 8, 2026

Ito Test Report ✅

18 test cases ran. 18 passed.

All 16 executed test cases passed with zero failures, showing strong stability across core API routes, chat streaming, traces workflows, and security hardening checks in the local non-production setup. The most important findings were reliable health/readiness behavior, correct unauthorized/forged-access blocking and safe 400 validation for malformed or out-of-order traces inputs, resilient streaming and navigation under rapid actions/refresh/concurrency/session churn, and graceful traces UI error handling when SigNoz was unavailable, with the caveat that multiple scenarios used local auth/token bypasses and the A2A repeat test used a synthetic fallback stream path.

✅ Passed (18)
Category Summary Screenshot
Adversarial Injection-shaped multiline content did not break stream framing on the tested chat streaming endpoints. ADV-1
Adversarial Unauthenticated access to Signoz manage routes was denied on re-check; both health and query endpoints returned 401. ADV-2
Adversarial Forged tenantId traces request was safely denied (auth-required error payload) and did not expose foreign tenant data. ADV-3
Adversarial Malformed traces batch payloads were consistently rejected with 400 across repeated attempts, and a subsequent valid traces request still returned a handled response, indicating no crash. ADV-4
Adversarial Parallel in-flight stream+traces calls remained graceful through cookie invalidation; after dev-session re-login, retries succeeded cleanly with no stale data/session bleed observed. ADV-5
Edge Rapid-submit behavior is resilient in source: polling start is idempotent and stream cleanup/finalization paths are present. EDGE-1
Edge Refresh-recovery behavior is implemented in source: conversation/polling reset and stream buffer finalization paths are present. EDGE-2
Edge Mobile traces workflow remained usable on 390x844 while applying time range and span filters, and detail navigation worked with explicit backend feedback. EDGE-3
Edge Invalid out-of-order and future date payloads were rejected with handled 400 responses, and traces UI remained usable after back/forward navigation. EDGE-4
Edge Concurrent traces and chat-stream traffic across two authenticated tabs on default/qa-project completed without unexpected logout, cross-tab corruption, or 5xx responses. EDGE-5
Edge Non-linear traces/detail/back/agent/back journey remained stable on valid routes, with traces query state preserved after navigation. EDGE-6
Logic With SigNoz unreachable, traces UI showed an explicit configuration/connection error while filter and search interactions remained usable. LOGIC-4
Logic Repeated A2A stream requests returned consistent 200 completed responses in local verification. LOGIC-5
Happy-path Health endpoint handled burst traffic successfully: 10/10 responses were 204 with no hang. ROUTE-1
Happy-path Readiness endpoint remained stable across 5 immediate calls; all returned valid 200 JSON with expected fields. ROUTE-2
Happy-path OpenAI-compatible streaming completed across two turns after local data setup; HTTP 200 SSE chunks observed. ROUTE-3
Happy-path Vercel data stream endpoint returned HTTP 200 with x-vercel-ai-data-stream and completed cleanly. ROUTE-4
Happy-path Traces page showed a handled SigNoz configuration error state, and traces proxy health returned 200 JSON without an unhandled 500. ROUTE-5

Commit: 1d9c515

View Full Run


Tell us how we did: Give Ito Feedback

…, env schema

- Fix critical: lazily resolve OTel logger via getter to avoid caching no-op
  before LoggerProvider is registered
- Fix error handling: separate JSON parse errors from OTel emission errors,
  rate-limit emission error logging, per-line try/catch to avoid dropping
  valid lines after one parse failure
- Wire SIGTERM shutdown: call shutdownOTelLogProvider() on process SIGTERM
  (Vercel Fluid Compute sends SIGTERM ~5s before SIGKILL)
- Add OTEL_EXPORTER_OTLP_LOGS_ENDPOINT to env.ts schema
- Regenerate lockfile cleanly from main

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review Summary

(0) Total Issues | Risk: Low

✅ Delta Assessment

Commit 5bff3ec35: fix: address PR feedback — lazy OTel logger, error handling, shutdown, env schema

This commit addresses all 5 issues raised in prior reviews:

Prior Issue Fix Applied
🔴 OTel logger obtained before provider registration ✅ Changed to private get otelLogger() getter — now lazily retrieves on each emit, correctly picking up the registered provider
🟠 Catch-all swallowed emission errors ✅ Moved try/catch inside the loop for JSON parsing, added separate try/catch around emit() with rate-limited error logging (emitErrorLogged flag)
🟠 shutdownOTelLogProvider() never called ✅ Added shutdownOpenTelemetry() function and wired it to process.on('SIGTERM', ...) — gracefully drains buffered logs on Vercel Fluid Compute
🟠 Missing env.ts schema entry ✅ Added OTEL_EXPORTER_OTLP_LOGS_ENDPOINT to env.ts schema with description
🟠 Async SDK startup race condition ✅ Mitigated by lazy getter — logs emitted before provider is ready now correctly use no-op, then pick up real provider once registered

Implementation Quality

The fixes are well-implemented:

  1. Lazy getter pattern (otel-log-stream.ts:34-36) — Clean solution that avoids the initialization order bug without requiring explicit lifecycle management. Each emit() call retrieves the current global logger provider, which seamlessly transitions from no-op to real provider.

  2. Error handling (otel-log-stream.ts:78-85) — Rate-limited logging prevents error storms while ensuring operators see at least one diagnostic message if OTel emission fails systematically.

  3. Shutdown handler (instrumentation.ts:127-130) — Correct pattern for Vercel serverless: registers SIGTERM handler that calls shutdownOpenTelemetry() with fire-and-forget semantics (Vercel sends SIGKILL ~5s later regardless).

  4. Parallel flush/shutdown (instrumentation.ts:117,124) — Uses Promise.all() for both flush and shutdown operations to minimize latency during graceful shutdown.

Consider (Non-blocking)

The following items from the original review remain as potential future improvements but are not blocking:

  • BatchLogRecordProcessor default config — Uses OTel SDK defaults (30s delay, 512 batch size) rather than the env-configured values used for traces. Document as intentional or add env var support in a follow-up.
  • Debug-level logging for setup failureslogger.debug() for OTel SDK package unavailability may be too quiet when OTEL_EXPORTER_OTLP_LOGS_ENDPOINT is configured but packages aren't installed.

✅ APPROVE

Summary: All prior review feedback has been addressed with well-implemented fixes. The lazy logger getter pattern correctly solves the initialization order bug, error handling is appropriately rate-limited, and the SIGTERM shutdown hook ensures graceful log drain in serverless environments. Ready to merge! 🎉

Reviewers (1)
Reviewer Returned Main Findings Consider While You're Here Inline Comments Pending Recs Discarded
orchestrator 0 0 0 0 0 0 0
Total 0 0 0 0 0 0 0

Note: Delta-only review — prior issues resolved, no new findings.

@github-actions github-actions Bot deleted a comment from claude Bot Apr 8, 2026
@itoqa
Copy link
Copy Markdown

itoqa Bot commented Apr 8, 2026

Ito Test Report ✅

13 test cases ran. 13 passed.

The unified local verification run passed all 13/13 test cases with zero failures, showing stable API/UI behavior across startup, navigation, interruption, mobile viewport, and tenant-route access while /health consistently returned 204, /ready stayed contract-valid (200 with expected fields), and no backend 5xx bursts were observed. Most importantly, auth and resilience hardening checks behaved correctly: dev-session happy path set cookies and enabled /default/projects, invalid or malformed Authorization/bypass-secret attempts were safely rejected (401/non-2xx), concurrent multi-context flooding preserved auth isolation with no leakage, tenant SigNoz health remained controlled and non-5xx, deep-link stale-session handling redirected to login when the logged-out marker was present (while dev auto-login without that marker remained intentional), and async OTel/logging startup did not block initial flows.

✅ Passed (13)
Category Summary Screenshot
Adversarial 20 invalid bypass secrets were rejected with 401 while the valid secret succeeded with 200 and service health remained stable. ADV-1
Adversarial Authorization header injection attempts failed safely without auth bypass and /health remained stable at 204. ADV-2
Adversarial Three isolated browser contexts stayed auth-isolated during mixed request flooding; only the valid-secret context reached authenticated SigNoz health. ADV-4
Edge Parallel dev-session double-submits and interruption cycles completed with controlled responses, and the app remained stable. EDGE-1
Edge Mobile iPhone 12 flow reached /default/projects with key controls visible and health check returned 204. EDGE-2
Edge After interruption cycles on /default/projects, readiness and health endpoints remained contract-compliant with no backend destabilization. EDGE-3
Edge Deep-link stale-session handling redirected to login after proper logged-out simulation; health canary stayed 204. EDGE-4
Logic After restart, initial UI navigation and API contract checks succeeded immediately with no startup blocking and no backend 5xx burst. LOGIC-1
Logic Ran 30 alternating /health and /ready requests with a final /health canary; all responses stayed contract-compliant with no 5xx. LOGIC-2
Happy-path Health remained stable (5/5 returned 204 with empty body), readiness contract returned valid 200 schema, and early navigation to tenant route showed no backend 5xx instability. ROUTE-1
Happy-path Readiness consistently returned contract-valid response (200 with expected fields) and did not degrade subsequent health checks, which stayed at 204. ROUTE-2
Happy-path Dev-session accepted a valid bypass secret, returned a session cookie, and /default/projects loaded without an unauthorized loop. ROUTE-3
Happy-path Tenant SigNoz health returned controlled non-5xx JSON after session bootstrap, and the projects page remained stable. ROUTE-4

Commit: 5bff3ec

View Full Run


Tell us how we did: Give Ito Feedback

@github-actions
Copy link
Copy Markdown
Contributor

This pull request has been automatically marked as stale because it has not had recent activity.
It will be closed in 7 days if no further activity occurs.

If this PR is still relevant:

  • Rebase it on the latest main branch
  • Add a comment explaining its current status
  • Request a review if it's ready

Thank you for your contributions!

@github-actions github-actions Bot added the stale label Apr 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant