Prd 6370 copy by robert-inkeep · Pull Request #3171 · inkeep/agents

robert-inkeep · 2026-04-22T14:15:20Z

No description provided.

Removed cookie header from request headers.

changeset-bot · 2026-04-22T14:15:29Z

⚠️ No Changeset found

Latest commit: b33cc46

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

github-actions · 2026-04-22T14:15:55Z

The matching internal PR #203 has been merged in agents-private, and the latest non-dry-run mirror sync completed successfully.

Mirror sync run: https://github.com/inkeep/agents-private/actions/runs/24910015823
Public default branch should now include the mirrored change

Closing this public PR so the public repo stays aligned with the monorepo merge surface.

pullfrog · 2026-04-22T14:17:17Z

PR #3171 — Prd 6370 copy

TL;DR

Introduces a centralized authentication and authorization layer for Next.js API routes in the manage UI. Previously, these routes blindly forwarded raw cookie/authorization headers to agents-api without verifying the caller's identity. Now, every route validates the session (or bearer token) before proceeding, and project-scoped routes additionally check fine-grained permissions.

Key Changes

New file: api-route-auth.ts — centralized auth utilities for Next.js API route handlers (session validation, bearer token support, project permission checks, cookie filtering)
New file: api-route-auth.test.ts — 199 lines of unit tests covering cookie filtering, session auth, bearer auth, fallback behavior, and project permission rejection
5 route handlers updated to use the new auth guards instead of manually forwarding raw cookie/authorization headers
Cookie header sanitization — only better-auth.* cookies are forwarded to agents-api, stripping all other cookies

New centralized auth module (`api-route-auth.ts`)

A 256-line module providing three main exported functions:

Function	Purpose
`requireApiRouteSession`	Validates the caller has a valid Better Auth session via cookie. Filters cookies, calls `agents-api /api/auth/get-session`, returns a discriminated union (`ok: true` with session data or `ok: false` with a `NextResponse`).
`requireApiRouteSessionOrBearer`	Accepts either a `Bearer` token (preferred, skips session fetch) or falls back to cookie-based session auth. Returns a `headers` record ready to spread into downstream fetch calls.
`requireApiRouteProjectPermission`	Validates session first, then checks project-level permissions (`canView`/`canUse`/`canEdit`) by calling the `agents-api` permissions endpoint. Returns 403 when the user lacks the requested permission level.

Helper: filterAuthCookieHeader strips non-auth cookies from the raw Cookie header using the shared isAuthCookie predicate from @inkeep/agents-core/auth/cookie-names.

Before: Routes had no server-side auth validation — they forwarded whatever headers the browser sent.
After: Every route validates authentication before doing any work. Unauthenticated requests get a 401; unauthorized requests get a 403.

Artifact & data component generate-render routes now require project `edit` permission

Files: artifact-components/[artifactComponentId]/generate-render/route.ts, data-components/[dataComponentId]/generate-render/route.ts

Before: No auth check — any request with a tenantId/projectId in the body could trigger generation.
After: Calls requireApiRouteProjectPermission(request, { tenantId, projectId, level: 'edit' }) and short-circuits with the error response if the check fails.

Trace routes now require session or bearer auth with cleaner header forwarding

Files: traces/route.ts, traces/conversations/[conversationId]/route.ts, traces/spans/[spanId]/route.ts

Before: Each route manually extracted req.headers.get('cookie') and req.headers.get('authorization'), then conditionally set them on downstream fetch headers — no actual auth validation occurred.

After:

Each route calls requireApiRouteSessionOrBearer(req) at the top and returns early on failure.
The validated authResult.headers record (containing either Cookie or Authorization) is spread into downstream fetch headers, replacing the ad-hoc cookie/auth header forwarding.
The signozQuery helper in the conversation route is simplified from (payload, tenantId, cookieHeader, authHeader) to (payload, tenantId, authHeaders).

Comprehensive test coverage

File: api-route-auth.test.ts (199 lines)

Tests cover:

Cookie filtering: non-auth cookies stripped; auth cookies (better-auth.session_token, __Secure-better-auth.session_data) preserved
Missing auth: returns 401 without calling agents-api
Valid session: calls /api/auth/get-session with filtered cookies, returns session data
Bearer-only: returns immediately with authType: 'bearer' without fetching session
Bearer + cookie: bearer takes priority; no session fetch
Session fallback: when no bearer token, falls back to cookie-based session auth
Project permission rejection: user with canEdit: false gets 403 when level: 'edit' is required

^{｜ View workflow run ｜ Triggered by Pullfrog ｜ Using Claude Opus ｜ 𝕏}

pullfrog

Severity: has a runtime-breaking bug. The auth hardening itself is well-structured — clean discriminated union types, cookie filtering via the shared isAuthCookie utility, proper session validation, and good test coverage.

Critical: conversations/[conversationId]/route.ts line 716 (outside the diff) still calls signozQuery(usageEventsPayload, tenantId, cookieHeader, authHeader) with 4 arguments using the now-removed cookieHeader and authHeader variables. signozQuery was updated to accept 3 parameters (payload, tenantId, authHeaders: Record<string, string>), and the old variable declarations were removed on lines 677–678. This will throw a ReferenceError at runtime whenever the usage-events query path executes. Fix: signozQuery(usageEventsPayload, tenantId, authResult.headers) (same pattern as the corrected call on line 701).

^{｜ Fix all ➔ ｜ Fix 👍s ➔ ｜ View workflow run ｜ Using Claude Opus ｜ 𝕏}

pullfrog · 2026-04-22T14:20:02Z

        payloads.map(async (p, i) => {
          const batchStart = Date.now();
-          const result = await signozQuery(p, tenantId, cookieHeader, authHeader);
+          const result = await signozQuery(p, tenantId, authResult.headers);


This call was correctly updated, but there is a second signozQuery call ~15 lines below (line 716, outside this diff hunk) that was not updated:

const result = await signozQuery(usageEventsPayload, tenantId, cookieHeader, authHeader);

cookieHeader and authHeader no longer exist in scope (removed in this PR on old lines 677–678). This will throw a ReferenceError at runtime. It should be:

const result = await signozQuery(usageEventsPayload, tenantId, authResult.headers);

pullfrog · 2026-04-22T14:20:02Z

+  if (authorizationHeader) {
+    return {
+      ok: true,
+      authType: 'bearer',
+      authorizationHeader,
+      headers: {
+        Authorization: authorizationHeader,
+      },
+    };


Informational (not blocking): The bearer path trusts the token without server-side validation here — it passes it through to agents-api. This is a valid pass-through pattern as long as every downstream agents-api endpoint (e.g. /manage/tenants/{id}/signoz/query, /manage/tenants/{id}/signoz/health) enforces bearer token validation and doesn't silently fall through to an unauthenticated code path. Worth a quick sanity check if not already confirmed.

claude

PR Review Summary

(1) Total Issues | Risk: High

🔴❗ Critical (1) ❗🔴

Inline Comments:

🔴 Critical: conversations/[conversationId]/route.ts:716 Undefined variables cookieHeader and authHeader cause runtime error

🟡 Minor (1) 🟡

Inline Comments:

🟡 Minor: api-route-auth.ts:147 Silent error swallowing hinders debugging

💭 Consider (1) 💭

Inline Comments:

💭 Consider: api-route-auth.ts:164 Bearer tokens pass through without validation — confirm this is intentional

📋 Overall Assessment

What this PR does well:

✅ Clean, well-structured auth helper module with good separation of concerns
✅ Proper discriminated union types (ok: true/false) for result handling
✅ Good test coverage for the core authentication flows
✅ Cookie filtering using shared isAuthCookie from @inkeep/agents-core ensures consistency
✅ Consistent integration pattern across all 5 modified route handlers

Areas for attention:

Critical bug must be fixed — Line 716 in conversations route references undefined variables
Missing changeset — This PR modifies runtime behavior for agents-manage-ui and should have a changeset (per AGENTS.md guidelines)
Error logging — Consider adding structured logging for auth failures to aid debugging

📦 Missing Changeset

Per the AGENTS.md guidelines, this PR should include a changeset since it modifies runtime behavior:

pnpm bump patch --pkg agents-manage-ui "Add authentication helpers for API routes"

🚫 REQUEST CHANGES

Summary: The auth module design is solid and the integration is consistent, but there's a critical runtime bug that must be fixed before merging. Line 716 in the conversations route still references the old cookieHeader and authHeader variables that were removed during the refactoring — this will cause a ReferenceError when the usage events query runs. The fix is straightforward (use authResult.headers like the other call sites).

Discarded (0)

No findings were discarded.

Reviewers (1)

Reviewer	Returned	Main Findings	Consider	While You're Here	Inline Comments	Pending Recs	Discarded
`orchestrator (manual)`	3	0	0	0	3	0	0
Total	3	0	0	0	3	0	0

Note: Subagent dispatch failed; conducted manual review.

claude · 2026-04-22T14:19:56Z

+    return {
+      ok: false,
+      response: internalErrorResponse(),
+    };


🟡 Minor: Silent error swallowing hinders debugging

Issue: Errors are caught and discarded without logging. When auth validation fails for non-401 reasons (network errors, JSON parse errors, unexpected responses), the error details are lost.

Why: Without logging, debugging production auth issues requires reproducing the exact scenario. Adding structured logging here would significantly help diagnose intermittent auth failures.

Fix: Consider adding error logging before returning the error response:

} catch (error) { const logger = getLogger('api-route-auth'); logger.error({ error: error instanceof Error ? error.message : String(error) }, 'Session validation failed'); return { ok: false, response: internalErrorResponse(), }; }

Refs:

getLogger pattern used elsewhere — imports from @/lib/logger

claude · 2026-04-22T14:20:27Z

+      headers: {
+        Authorization: authorizationHeader,
+      },
+    };


💭 Consider: Bearer tokens pass through without validation

Issue: Bearer tokens are accepted and forwarded without any validation. This differs from session auth which validates against agents-api.

Why: This is likely intentional (agents-api validates the token downstream), but worth confirming:

Is there any risk from accepting malformed bearer tokens?

Should there be basic format validation (e.g., minimum length)?

Are there specific token patterns expected?

If this is intentional and agents-api validates all bearer tokens, this is fine as-is. Just flagging for explicit confirmation of the design decision.

claude

PR Review Summary

(1) Total Issues | Risk: High

🔴❗ Critical (1) ❗🔴

Inline Comments:

🔴 Critical: route.ts:716 Undefined variables cookieHeader and authHeader still cause runtime ReferenceError

🕐 Pending Recommendations (2)

🟡 Silent error swallowing in catch block — Consider adding structured logging for auth failures
💭 Bearer token passthrough — Confirm this is intentional design (agents-api validates downstream)

📦 Missing Changeset

This PR modifies runtime behavior for agents-manage-ui and should have a changeset per AGENTS.md guidelines:

pnpm bump patch --pkg agents-manage-ui "Add authentication helpers for API routes"

🚫 REQUEST CHANGES

Summary: The critical bug at line 716 flagged in the prior review is still not fixed. The code still references cookieHeader and authHeader which were removed — this will throw a ReferenceError at runtime whenever the usage-events query executes. The fix is a simple one-liner (use authResult.headers like line 701).

Discarded (0)

No findings were discarded.

Reviewers (1)

Reviewer	Returned	Main Findings	Consider	While You're Here	Inline Comments	Pending Recs	Discarded
`orchestrator (delta)`	1	0	0	0	1	2	0
Total	1	0	0	0	1	2	0

Note: Delta review scoped to 3 commits since last automated review (commit 2599e18c4582).

…e + preview paths (#198) * fix(ci): close remaining silent strandings in release cascade + bridge + preview paths Bundle of 10 code + 1 doc changes, each closing a distinct silent-failure mode where CI would pass green (or sit red invisibly) while production work stranded. Complement to #194's release-cascade hardening — each fix here addresses a gap that pass missed. Release cascade (5) - release-handler.yml: pin GitHub Release to client_payload.commit_sha, not origin/main. Auto-format landing between sync and handler used to create a SHA drift where Vercel deployed a newer commit than npm got. - release-handler.yml: 3-attempt bash retry around gh release view/edit/ create. Mirrors the dispatch-side retry from #174 — one transient 5xx used to leave npm published with no Release (so no Vercel deploy) and no alert. - release-handler.yml: notify-on-failure job. If the success job itself errors (app-token, checkout, or handler logic), the dispatch was already delivered and public-side retries don't re-fire. Without this notifier, npm publishes but the cascade stops silently. - public-mirror-sync.yml: 3-attempt retry on sync PR approval. One dropped GH API call used to leave the sync PR open with no approval; now exhaustion exits non-zero so the workflow turns red and the janitor's sync-PR sweep (below) catches long-open PRs as a safety net. - scripts/check-monorepo-traps.mjs: flip check:override-masks-bump from warn to hard-fail. Current tree is clean of masks, so flipping is safe and prevents future PR #170-class silent divergence. Preview envs (2) - deploy-vercel-preview.sh: validate captured deployment URL matches *.vercel.app pattern before alias-set. The log-grep fallback could silently capture a docs/inspection URL and alias api.preview.inkeep.com to the wrong place. - public-agents-preview-environments.yml: add teardown-failure-notify job. Railway and Vercel teardowns run in parallel by design; this closes the observability gap when either silently fails — Railway continues billing, Vercel env vars pollute, and the next 6h janitor sweep re-attempts cleanup. Dependency sync + bridge (2) - dependabot-sync-root-lockfile.yml: on push failure, full-reset onto origin/$HEAD_REF and re-regenerate both lockfiles (root + public/agents) from the new baseline. Previous simple rebase would replay our stale lockfile on top of a different package.json after a Dependabot force-push, yielding inconsistent committed state that either broke --frozen-lockfile or silently shipped drift into main. - bridge-public-pr-to-monorepo.mjs (both copies): truncate bridged PR body to fit GitHub's 65,536-char limit, with a link to the original PR. Was failing 422 "body is too long" on Dependabot mega-bumps, stranding those PRs outside the agents-private review surface. Janitor (1) - public-agents-preview-janitor.yml: new sweep-stale-sync-prs job. Every 6h, identify open copybara/sync PRs on inkeep/agents >4h old, open (idempotent) tracking issue on agents-private, re-dispatch public-mirror-sync.yml. Does NOT auto-close the PR — destructive actions stay human-driven. Docs (1) - CI_RUNBOOK.md: partial-npm-publish recovery path. Documents what to do when pnpm changeset publish fails on package N of 10 (some live, some not): confirm via npm view, re-run the failed job (publish is idempotent), verify all 10 present before unblocking cascade. Each change verified in isolation with pnpm check:structural; YAML validated with safe_load; bridge truncation logic tested at boundary. * fix(ci): add checkout step and remove heredoc indentation in release-handler notify Addresses review feedback on PR #198: - notify-success-failure now does a Checkout before gh calls, matching the failure job pattern (without it, gh CLI lacks git context and issue-create would silently fail with 'fatal: not a git repository'). - Body is now built with echo into a temp file instead of a heredoc. The original 10-space indent inside run: | would have been preserved literally by bash, making the tracking issue body unreadable. * fix(ci): address review feedback on release-handler retry + docs drift Review feedback from #198 (Claude + Pullfrog): release-handler.yml: - Capture stderr in gh_with_retry so on-call sees the actual API error (rate-limit, 5xx, auth) instead of a generic 'command failed' line. Was making incident triage 15-30m slower. (Claude Major) - Stop wrapping 'gh release view' in gh_with_retry. It returns non-zero for the legitimate 'release doesn't exist' case (the normal new-release happy path); retrying burned 15s of backoff before falling through to create. Only mutating calls (edit/create) need retry. (Pullfrog + Claude) - notify-success-failure now fails loudly (exit 1) if gh issue create fails, matching the failure job's pattern. Silent swallow would mean the success handler failed AND the notifier failed with no visible signal beyond a red check — exactly the class this PR closes elsewhere. (Claude Major) Docs drift (Pullfrog): - AGENTS.md: remove '(soft today)' / 'non-blocking today' annotations from override-masks-bump in both the cheatsheet and the trap-list. - AGENTS.md trap #4: flip 'Currently a warning, not a hard fail' to 'Hard-fail (flipped...)' to match the code. - CI_RUNBOOK.md: rename the override-masks-bump section header from 'warns (non-blocking today)' to 'fails — root override masks a workspace bump'. Preview janitor (Claude Consider): - Document that sweep-stale-sync-prs runs unconditionally regardless of PREVIEW_ENVIRONMENTS_ENABLED. Pr body said so; workflow didn't. Bridge script (Claude Consider): - console.log when PR body is truncated, with original + new sizes. Helps CI log debugging when a bridged PR looks shorter than expected. Applied to both copies (public/agents + public/agents-optional-local-dev). Intentionally not addressed: - dependabot-sync $MSG staleness after reset: Pullfrog flagged as intentional (message describes the operation, not the specific baseline), Claude flagged as cosmetic minor. Siding with Pullfrog. - Preview teardown notify swallowed error: lower severity given the janitor re-attempts cleanup every 6h; accepting current pattern. * fix(ci): pre-fetch PR base into agents-private before git apply --3way Closes the bridge-failure class seen on inkeep/agents#3171 (and eventually on any bridged PR with a conflicting hunk). Root cause: `git apply --index --3way` resolves the patch's `index <old>..<new>` blob SHAs against the local repo's object store. The SHAs come from inkeep/agents' object graph; agents-private has never fetched from inkeep/agents, so those SHAs aren't resolvable. Clean patches work (no --3way fallback needed); any conflict triggers --3way, which fails with 'repository lacks the necessary blob to perform 3-way merge' — bridge stops with no useful diagnostic, PR strands outside the canonical review surface. Fix: Before calling git apply, shallow-fetch the PR's base commit from the public repo into agents-private's object store. GitHub permits fetching by SHA when the SHA is reachable from a ref (PR base is on main, always reachable), and shallow=1 keeps it cheap. After fetch, --3way can find the `<old>` blob and resolve conflicts into merge markers instead of a hard error. Why not drop --3way: Dropping --3way makes every conflicting PR fail hard with no recovery path. Pre-fetching preserves the --3way fallback so conflicting hunks produce visible conflict markers in the bridged PR, which is reviewable rather than opaque. Failure handling: If the fetch itself fails (rate-limit, network flake), we log and proceed. The subsequent git apply will hit the same blob error as before the fix — no regression, just no improvement for that run. Applied to both script copies: public/agents and public/agents-optional-local-dev. Bundled into PR #198 since it's the same release/sync hardening scope. * Revert "fix(ci): pre-fetch PR base into agents-private before git apply --3way" This reverts commit e7e0640e2345bbd7cdea1185eb3abc666f2c60fa. * fix(ci): switch bridge patch fetch from .patch to .diff media type Root cause of bridge failures on multi-commit PRs (eg inkeep/agents#3171, 9 commits): `gh api .../pulls/N Accept: application/vnd.github.patch` returns a mailbox-format patch series — one patch per PR commit, each with `index <old>..<new>` lines referencing INTERMEDIATE blob SHAs created during the PR's history on inkeep/agents. Those intermediates exist only in inkeep/agents' object store; agents-private has never seen them. The first patch applies against agents-private's file (blob content matches the PR base even if SHAs differ), git's index advances to the intermediate state (only referenced by SHA, never fetched as an object). Any subsequent conflicting hunk triggers `git apply --3way` fallback, which needs the intermediate blob → 'repository lacks the necessary blob' → stranded PR. Single-commit PRs don't hit this (no intermediates). That's why the bridge used to work for most Dependabot bumps but started failing on longer branches like #3171. Fix: switch `Accept: application/vnd.github.patch` → `Accept: application/vnd.github.diff` in both bridge-public-pr-to- monorepo.mjs copies. .diff returns a single consolidated base-vs-head unified diff with only the PR's base blob SHAs referenced. Those are content-identical to agents-private's copies (Copybara mirrors blobs 1:1 without content rewrites for .ts files), so --3way can resolve them locally from agents-private's own object store. Also updated githubRequest's text-response check to include .diff alongside .patch (both are text, not JSON). Previous pre-fetch commit (`e7e0640e2`) was reverted in a prior commit — it was addressing the wrong layer (only PR-base blob, not intermediates). This commit supersedes that approach entirely. * chore(ci): biome format fix on bridge script console.log GitOrigin-RevId: a26d5dac74ed9938fc512703af21a25d82616504

claude

PR Review Summary

(1) Total Issues | Risk: High

🔴❗ Critical (1) ❗🔴

Inline Comments:

🔴 Critical: route.ts:716 Undefined variables cookieHeader and authHeader cause runtime ReferenceError

🕐 Pending Recommendations (2)

🟡 api-route-auth.ts:147 — Silent error swallowing in catch block hinders debugging; consider adding structured logging
💭 api-route-auth.ts:164 — Bearer tokens pass through without validation; confirm this is intentional (agents-api validates downstream)

📦 Missing Changeset

This PR modifies runtime behavior for agents-manage-ui and should have a changeset per AGENTS.md guidelines:

pnpm bump patch --pkg agents-manage-ui "Add authentication helpers for API routes"

📋 Delta Review Note

The 3 commits since the last review (fe9c4d9d4, 37f8a3e4a, b33cc4604) only touched .github/scripts/bridge-public-pr-to-monorepo.mjs — a CI bridge script unrelated to the auth code under review. The auth module itself has not changed since the prior reviews.

🚫 REQUEST CHANGES

Summary: The critical bug at line 716 flagged in prior reviews remains unfixed. The code still references cookieHeader and authHeader which were removed — this will throw a ReferenceError at runtime whenever the usage-events query executes. The fix is a simple one-liner: use authResult.headers like the corrected call on line 701. A 1-click suggestion is provided in the inline comment.

Discarded (0)

No findings were discarded.

Reviewers (1)

Reviewer	Returned	Main Findings	Consider	While You're Here	Inline Comments	Pending Recs	Discarded
`orchestrator (delta)`	1	0	0	0	1	2	0
Total	1	0	0	0	1	2	0

Note: Delta review scoped to 3 commits since last automated review. Sub-reviewers not dispatched as delta only touched CI scripts unrelated to auth code.

itoqa · 2026-04-22T16:26:06Z

Ito Test Report ❌

16 test cases ran. 1 failed, 1 additional finding, 14 passed.

Across 16 QA scenarios, 14 passed and 2 failed, indicating strong stability in render regeneration and mockData preservation, permission and session enforcement, bearer/session auth routing behavior, concurrent trace request isolation, retry envelope coherence, and usage-events fault-tolerance.
The most important issues are two medium-severity defects: the unauthenticated conversation details page incorrectly renders a project-not-found (404-style) state instead of an auth-required experience even though the API correctly returns 401, and trace endpoints handle wrong-tenant requests inconsistently while the conversation trace handler contains a confirmed stale signozQuery call-site bug that can drop usage-events data.

❌ Failed (1)

Category	Summary	Screenshot
Traces	🟠 Confirmed real bug: tenant-mismatch responses are mapped inconsistently across trace endpoints (health route returns 200 connection_failed while conversation/span routes return error HTTP statuses), violating consistent tenant-safe rejection behavior; conversation route also contains an undeclared-variables call-site defect in usage-events query path.

🟠 Inconsistent tenant rejection in trace endpoints

What failed: Wrong-tenant requests are handled inconsistently across traces endpoints, and the conversation route has a stale call site that invokes signozQuery with undeclared variables, causing the usage-events branch to fail and return empty fallback data.
Impact: Users investigating traces can receive inconsistent error semantics across endpoints and miss usage-events data in conversation details. This weakens debuggability for tenant-scoped trace analysis even when core timeline data still loads.
Steps to reproduce:
1. Call GET /api/traces?tenantId=wrong-tenant with a valid bearer bypass token.
2. Call GET /api/traces/conversations/conversation-1?tenantId=wrong-tenant with the same token.
3. Call GET /api/traces/spans/span-1?tenantId=wrong-tenant&conversationId=conversation-1 and compare status semantics across the three endpoints.
Stub / mock context: A deterministic local auth and traces-fixture bypass configuration was enabled to keep tenant-routing checks repeatable; this setup does not explain the stale variable call-site defect found in production route code.
Code analysis: I reviewed the traces health, conversation, and span handlers plus shared auth helper logic. The conversation handler now defines signozQuery with authHeaders but still calls it once with removed variables (cookieHeader, authHeader), which is a concrete production-code defect; error handling across trace endpoints also maps upstream tenant/auth failures to different HTTP semantics.
Why this is likely a bug: The conversation route contains a mismatched function call that references undefined identifiers after a signature refactor, which is a direct code defect independent of test setup.

Relevant code:

agents-manage-ui/src/app/api/traces/conversations/[conversationId]/route.ts (lines 102-106)

async function signozQuery(
  payload: any,
  tenantId: string,
  authHeaders: Record<string, string>
): Promise<SigNozResp> {

agents-manage-ui/src/app/api/traces/conversations/[conversationId]/route.ts (lines 714-722)

(async () => {
      const batchStart = Date.now();
      try {
        const result = await signozQuery(usageEventsPayload, tenantId, cookieHeader, authHeader);
        logger.info(
          { batch: 'usage-events', ms: Date.now() - batchStart },
          'signoz batch complete'
        );

agents-manage-ui/src/app/api/traces/route.ts (lines 233-253)

if (!response.ok) {
    logger.error(
      {
        error: data,
        status: response.status,
      },
      'SigNoz health check failed'
    );

    let errorMessage = 'Failed to check SigNoz configuration';
    if (response.status === 401 || response.status === 403) {
      errorMessage = 'Authentication required';
    } else if (data?.error) {
      errorMessage = data.error;
    }

    return NextResponse.json({
      status: 'connection_failed',

✅ Passed (14)

Category	Summary	Screenshot
Conversation	Usage-events fault-mode handling preserved conversation timeline integrity and returned stable responses.	N/A
Conversation	Repeated stale/fault usage-events calls stayed non-crashing and kept core conversation data intact.	N/A
Conversation	Alternating usage-events failure conditions did not break Promise.all assembly ordering or completeness.	N/A
Render	Regeneration completed and preserved existing mockData during instruction-based updates after deterministic local interception removed external dependency noise.	N/A
Render	Permission-denial path returned 403 Forbidden and blocked generation when edit access was not granted.	N/A
Render	Cross-tenant and mismatched project scope request was denied with no generation stream output.	N/A
Render	Permission-preflight failure returned a controlled 500 validation error and did not emit partial stream chunks.	N/A
Session	Valid Better Auth session passed route authentication and reached downstream project validation, which then returned project-not-found for missing fixture data.	N/A
Session	Requests using only non-auth cookies and unauthenticated conversation-trace access were correctly rejected with 401 responses.	N/A
Traces	Request with both bearer token and Better Auth cookie returned 200 without session-auth rejection; response reached upstream path (connection_failed from SigNoz config).
Traces	Bearer-only request to /api/traces returned 200 (non-401), demonstrating bypass bearer auth works without Better Auth session cookies.
Traces	Concurrent batch and non-batch trace POST requests stayed isolated during auth-header rotation.
Traces	Malformed bearer headers correctly fell back to session auth when a valid session path was available.
Traces	Repeated batch upstream failures returned coherent retry/error envelopes without mixed payload segments.

ℹ️ Additional Findings (1)

These findings are unrelated to the current changes but were observed during testing.

Category	Summary	Screenshot
Conversation	🟠 Unauthenticated conversation details page shows a project-not-found state while the underlying traces API correctly returns authentication required.

🟠 Unauthenticated conversation page shows 404

What failed: The UI resolves to a project 404 state instead of an authentication-required experience, while the API route correctly enforces auth and returns 401.
Impact: Unauthenticated users are shown a misleading "project not found" failure instead of a clear login-required state. This obscures root cause and can send users down incorrect debugging or support paths.
Steps to reproduce:
1. Open /default/projects/project-1/traces/conversations/conversation-1 in a fresh browser session without logging in.
2. Observe the page-level error state rendered for the project route.
3. Call /api/traces/conversations/conversation-1?tenantId=default without auth and compare the HTTP 401 response.
Stub / mock context: The run used local auth/session helpers and seeded conversation fixtures; this case explicitly cleared login state and compared page behavior against direct unauthenticated API responses without adding endpoint response stubs.
Code analysis: The traces conversation API route enforces authentication up front, but the project layout always prefetches project permissions and converts fetch failures into a project-scoped FullPageError. In the same flow, project permission middleware maps access failures to not_found, so unauthenticated page entry can surface as a 404-style project error rather than an auth-first UX.
Why this is likely a bug: Production code paths map the unauthenticated page load into a project-not-found UI state while the paired API path explicitly reports authentication required, creating a clear and reproducible auth-state mismatch.

Relevant code:

agents-manage-ui/src/app/api/traces/conversations/[conversationId]/route.ts (lines 652-659)

export async function GET(
  req: NextRequest,
  context: RouteContext<'/api/traces/conversations/[conversationId]'>
) {
  const authResult = await requireApiRouteSessionOrBearer(req);
  if (!authResult.ok) {
    return authResult.response;
  }

agents-manage-ui/src/app/[tenantId]/projects/[projectId]/layout.tsx (lines 15-31)

try {
  const queryClient = new QueryClient();
  const permissions = await fetchProjectPermissions(tenantId, projectId);
  queryClient.setQueryData(projectQueryKeys.permissions(tenantId, projectId), permissions);

  return <HydrationBoundary state={dehydrate(queryClient)}>{children}</HydrationBoundary>;
} catch (error) {
  return (
    <FullPageError
      errorCode={getErrorCode(error)}
      context="project"
    />
  );
}

agents-api/src/middleware/projectAccess.ts (lines 92-98)

if (!hasAccess) {
  throw createApiError({
    code: 'not_found',
    message: 'Project not found',
    instance: c.req.path,
  });
}

Commit: b33cc46

View Full Run

Tell us how we did: Give Ito Feedback

…e + preview paths (inkeep#198) (inkeep#3172) * fix(ci): close remaining silent strandings in release cascade + bridge + preview paths Bundle of 10 code + 1 doc changes, each closing a distinct silent-failure mode where CI would pass green (or sit red invisibly) while production work stranded. Complement to inkeep#194's release-cascade hardening — each fix here addresses a gap that pass missed. Release cascade (5) - release-handler.yml: pin GitHub Release to client_payload.commit_sha, not origin/main. Auto-format landing between sync and handler used to create a SHA drift where Vercel deployed a newer commit than npm got. - release-handler.yml: 3-attempt bash retry around gh release view/edit/ create. Mirrors the dispatch-side retry from inkeep#174 — one transient 5xx used to leave npm published with no Release (so no Vercel deploy) and no alert. - release-handler.yml: notify-on-failure job. If the success job itself errors (app-token, checkout, or handler logic), the dispatch was already delivered and public-side retries don't re-fire. Without this notifier, npm publishes but the cascade stops silently. - public-mirror-sync.yml: 3-attempt retry on sync PR approval. One dropped GH API call used to leave the sync PR open with no approval; now exhaustion exits non-zero so the workflow turns red and the janitor's sync-PR sweep (below) catches long-open PRs as a safety net. - scripts/check-monorepo-traps.mjs: flip check:override-masks-bump from warn to hard-fail. Current tree is clean of masks, so flipping is safe and prevents future PR inkeep#170-class silent divergence. Preview envs (2) - deploy-vercel-preview.sh: validate captured deployment URL matches *.vercel.app pattern before alias-set. The log-grep fallback could silently capture a docs/inspection URL and alias api.preview.inkeep.com to the wrong place. - public-agents-preview-environments.yml: add teardown-failure-notify job. Railway and Vercel teardowns run in parallel by design; this closes the observability gap when either silently fails — Railway continues billing, Vercel env vars pollute, and the next 6h janitor sweep re-attempts cleanup. Dependency sync + bridge (2) - dependabot-sync-root-lockfile.yml: on push failure, full-reset onto origin/$HEAD_REF and re-regenerate both lockfiles (root + public/agents) from the new baseline. Previous simple rebase would replay our stale lockfile on top of a different package.json after a Dependabot force-push, yielding inconsistent committed state that either broke --frozen-lockfile or silently shipped drift into main. - bridge-public-pr-to-monorepo.mjs (both copies): truncate bridged PR body to fit GitHub's 65,536-char limit, with a link to the original PR. Was failing 422 "body is too long" on Dependabot mega-bumps, stranding those PRs outside the agents-private review surface. Janitor (1) - public-agents-preview-janitor.yml: new sweep-stale-sync-prs job. Every 6h, identify open copybara/sync PRs on inkeep/agents >4h old, open (idempotent) tracking issue on agents-private, re-dispatch public-mirror-sync.yml. Does NOT auto-close the PR — destructive actions stay human-driven. Docs (1) - CI_RUNBOOK.md: partial-npm-publish recovery path. Documents what to do when pnpm changeset publish fails on package N of 10 (some live, some not): confirm via npm view, re-run the failed job (publish is idempotent), verify all 10 present before unblocking cascade. Each change verified in isolation with pnpm check:structural; YAML validated with safe_load; bridge truncation logic tested at boundary. * fix(ci): add checkout step and remove heredoc indentation in release-handler notify Addresses review feedback on PR inkeep#198: - notify-success-failure now does a Checkout before gh calls, matching the failure job pattern (without it, gh CLI lacks git context and issue-create would silently fail with 'fatal: not a git repository'). - Body is now built with echo into a temp file instead of a heredoc. The original 10-space indent inside run: | would have been preserved literally by bash, making the tracking issue body unreadable. * fix(ci): address review feedback on release-handler retry + docs drift Review feedback from inkeep#198 (Claude + Pullfrog): release-handler.yml: - Capture stderr in gh_with_retry so on-call sees the actual API error (rate-limit, 5xx, auth) instead of a generic 'command failed' line. Was making incident triage 15-30m slower. (Claude Major) - Stop wrapping 'gh release view' in gh_with_retry. It returns non-zero for the legitimate 'release doesn't exist' case (the normal new-release happy path); retrying burned 15s of backoff before falling through to create. Only mutating calls (edit/create) need retry. (Pullfrog + Claude) - notify-success-failure now fails loudly (exit 1) if gh issue create fails, matching the failure job's pattern. Silent swallow would mean the success handler failed AND the notifier failed with no visible signal beyond a red check — exactly the class this PR closes elsewhere. (Claude Major) Docs drift (Pullfrog): - AGENTS.md: remove '(soft today)' / 'non-blocking today' annotations from override-masks-bump in both the cheatsheet and the trap-list. - AGENTS.md trap inkeep#4: flip 'Currently a warning, not a hard fail' to 'Hard-fail (flipped...)' to match the code. - CI_RUNBOOK.md: rename the override-masks-bump section header from 'warns (non-blocking today)' to 'fails — root override masks a workspace bump'. Preview janitor (Claude Consider): - Document that sweep-stale-sync-prs runs unconditionally regardless of PREVIEW_ENVIRONMENTS_ENABLED. Pr body said so; workflow didn't. Bridge script (Claude Consider): - console.log when PR body is truncated, with original + new sizes. Helps CI log debugging when a bridged PR looks shorter than expected. Applied to both copies (public/agents + public/agents-optional-local-dev). Intentionally not addressed: - dependabot-sync $MSG staleness after reset: Pullfrog flagged as intentional (message describes the operation, not the specific baseline), Claude flagged as cosmetic minor. Siding with Pullfrog. - Preview teardown notify swallowed error: lower severity given the janitor re-attempts cleanup every 6h; accepting current pattern. * fix(ci): pre-fetch PR base into agents-private before git apply --3way Closes the bridge-failure class seen on inkeep#3171 (and eventually on any bridged PR with a conflicting hunk). Root cause: `git apply --index --3way` resolves the patch's `index <old>..<new>` blob SHAs against the local repo's object store. The SHAs come from inkeep/agents' object graph; agents-private has never fetched from inkeep/agents, so those SHAs aren't resolvable. Clean patches work (no --3way fallback needed); any conflict triggers --3way, which fails with 'repository lacks the necessary blob to perform 3-way merge' — bridge stops with no useful diagnostic, PR strands outside the canonical review surface. Fix: Before calling git apply, shallow-fetch the PR's base commit from the public repo into agents-private's object store. GitHub permits fetching by SHA when the SHA is reachable from a ref (PR base is on main, always reachable), and shallow=1 keeps it cheap. After fetch, --3way can find the `<old>` blob and resolve conflicts into merge markers instead of a hard error. Why not drop --3way: Dropping --3way makes every conflicting PR fail hard with no recovery path. Pre-fetching preserves the --3way fallback so conflicting hunks produce visible conflict markers in the bridged PR, which is reviewable rather than opaque. Failure handling: If the fetch itself fails (rate-limit, network flake), we log and proceed. The subsequent git apply will hit the same blob error as before the fix — no regression, just no improvement for that run. Applied to both script copies: public/agents and public/agents-optional-local-dev. Bundled into PR inkeep#198 since it's the same release/sync hardening scope. * Revert "fix(ci): pre-fetch PR base into agents-private before git apply --3way" This reverts commit e7e0640e2345bbd7cdea1185eb3abc666f2c60fa. * fix(ci): switch bridge patch fetch from .patch to .diff media type Root cause of bridge failures on multi-commit PRs (eg inkeep#3171, 9 commits): `gh api .../pulls/N Accept: application/vnd.github.patch` returns a mailbox-format patch series — one patch per PR commit, each with `index <old>..<new>` lines referencing INTERMEDIATE blob SHAs created during the PR's history on inkeep/agents. Those intermediates exist only in inkeep/agents' object store; agents-private has never seen them. The first patch applies against agents-private's file (blob content matches the PR base even if SHAs differ), git's index advances to the intermediate state (only referenced by SHA, never fetched as an object). Any subsequent conflicting hunk triggers `git apply --3way` fallback, which needs the intermediate blob → 'repository lacks the necessary blob' → stranded PR. Single-commit PRs don't hit this (no intermediates). That's why the bridge used to work for most Dependabot bumps but started failing on longer branches like inkeep#3171. Fix: switch `Accept: application/vnd.github.patch` → `Accept: application/vnd.github.diff` in both bridge-public-pr-to- monorepo.mjs copies. .diff returns a single consolidated base-vs-head unified diff with only the PR's base blob SHAs referenced. Those are content-identical to agents-private's copies (Copybara mirrors blobs 1:1 without content rewrites for .ts files), so --3way can resolve them locally from agents-private's own object store. Also updated githubRequest's text-response check to include .diff alongside .patch (both are text, not JSON). Previous pre-fetch commit (`e7e0640e2`) was reverted in a prior commit — it was addressing the wrong layer (only PR-base blob, not intermediates). This commit supersedes that approach entirely. * chore(ci): biome format fix on bridge script console.log GitOrigin-RevId: a26d5dac74ed9938fc512703af21a25d82616504 Co-authored-by: Varun Varahabhotla <[email protected]>

* chore(sync): mirror #3171 * apply review * style: auto-format with biome * Update public/agents/agents-manage-ui/src/app/api/traces/conversations/[conversationId]/route.ts * chore: add changeset for manage UI auth guards --------- GitOrigin-RevId: 5a2dd89d0f85597bd83b9938fb778a08b50e71fd Co-authored-by: inkeep-internal-ci[bot] <259778081+inkeep-internal-ci[bot]@users.noreply.github.com> Co-authored-by: robert-inkeep <[email protected]> Co-authored-by: Varun Varahabhotla <[email protected]> Co-authored-by: Dima <[email protected]> Co-authored-by: Dimitri POSTOLOV <[email protected]> Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com> Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com> Co-authored-by: Dimitri POSTOLOV <[email protected]>

dimaMachina and others added 11 commits March 25, 2026 00:36

add authentication for Next.js endpoints

875b666

add authentication for Next.js endpoints

841565e

polish

fdabea6

polish

11e24a7

fix pullfrog review

0d88df5

Merge branch 'main' into prd-6370

3473046

apply review

ce78e75

Merge branch 'main' into prd-6370

0e400de

Apply suggestion from @dimaMachina

2391b74

style: auto-format with biome

30b9cee

Remove cookie header from fetch request

2599e18

Removed cookie header from request headers.

pullfrog Bot reviewed Apr 22, 2026

View reviewed changes

claude Bot reviewed Apr 22, 2026

View reviewed changes

github-actions Bot deleted a comment from claude Bot Apr 22, 2026

Merge branch 'main' into prd-6370-copy

0e5448c

claude Bot reviewed Apr 22, 2026

View reviewed changes

github-actions Bot deleted a comment from claude Bot Apr 22, 2026

vnv-varun and others added 2 commits April 22, 2026 11:21

chore: retrigger monorepo-pr-bridge after #198 .diff fix landed

fe9c4d9

Merge branch 'main' into prd-6370-copy

b33cc46

claude Bot reviewed Apr 22, 2026

View reviewed changes

github-actions Bot deleted a comment from claude Bot Apr 22, 2026

inkeep-oss-sync Bot closed this Apr 24, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prd 6370 copy#3171

Prd 6370 copy#3171
robert-inkeep wants to merge 14 commits intomainfrom
prd-6370-copy

robert-inkeep commented Apr 22, 2026

Uh oh!

changeset-bot Bot commented Apr 22, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 22, 2026 •

edited by inkeep-oss-sync Bot

Loading

Uh oh!

pullfrog Bot commented Apr 22, 2026 •

edited

Loading

Uh oh!

pullfrog Bot left a comment

Uh oh!

pullfrog Bot Apr 22, 2026

Uh oh!

pullfrog Bot Apr 22, 2026

Uh oh!

claude Bot left a comment

Uh oh!

claude Bot Apr 22, 2026

Uh oh!

claude Bot Apr 22, 2026

Uh oh!

claude Bot left a comment

Uh oh!

claude Bot left a comment

Uh oh!

itoqa Bot commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

robert-inkeep commented Apr 22, 2026

Uh oh!

changeset-bot Bot commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ No Changeset found

Uh oh!

github-actions Bot commented Apr 22, 2026 • edited by inkeep-oss-sync Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pullfrog Bot commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR #3171 — Prd 6370 copy

TL;DR

Key Changes

New centralized auth module (api-route-auth.ts)

Artifact & data component generate-render routes now require project edit permission

Trace routes now require session or bearer auth with cleaner header forwarding

Comprehensive test coverage

Uh oh!

pullfrog Bot left a comment

Choose a reason for hiding this comment

Uh oh!

pullfrog Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

pullfrog Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

PR Review Summary

🔴❗ Critical (1) ❗🔴

🟡 Minor (1) 🟡

💭 Consider (1) 💭

📋 Overall Assessment

📦 Missing Changeset

🚫 REQUEST CHANGES

Uh oh!

claude Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

PR Review Summary

🔴❗ Critical (1) ❗🔴

🕐 Pending Recommendations (2)

📦 Missing Changeset

🚫 REQUEST CHANGES

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

PR Review Summary

🔴❗ Critical (1) ❗🔴

🕐 Pending Recommendations (2)

📦 Missing Changeset

📋 Delta Review Note

🚫 REQUEST CHANGES

Uh oh!

itoqa Bot commented Apr 22, 2026

Ito Test Report ❌

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

changeset-bot Bot commented Apr 22, 2026 •

edited

Loading

github-actions Bot commented Apr 22, 2026 •

edited by inkeep-oss-sync Bot

Loading

pullfrog Bot commented Apr 22, 2026 •

edited

Loading

New centralized auth module (`api-route-auth.ts`)

Artifact & data component generate-render routes now require project `edit` permission