Conversation
Removed cookie header from request headers.
|
|
The matching internal PR #203 has been merged in
Closing this public PR so the public repo stays aligned with the monorepo merge surface. |
PR #3171 — Prd 6370 copyTL;DRIntroduces a centralized authentication and authorization layer for Next.js API routes in the manage UI. Previously, these routes blindly forwarded raw cookie/authorization headers to Key Changes
New centralized auth module (
|
| Function | Purpose |
|---|---|
requireApiRouteSession |
Validates the caller has a valid Better Auth session via cookie. Filters cookies, calls agents-api /api/auth/get-session, returns a discriminated union (ok: true with session data or ok: false with a NextResponse). |
requireApiRouteSessionOrBearer |
Accepts either a Bearer token (preferred, skips session fetch) or falls back to cookie-based session auth. Returns a headers record ready to spread into downstream fetch calls. |
requireApiRouteProjectPermission |
Validates session first, then checks project-level permissions (canView/canUse/canEdit) by calling the agents-api permissions endpoint. Returns 403 when the user lacks the requested permission level. |
Helper: filterAuthCookieHeader strips non-auth cookies from the raw Cookie header using the shared isAuthCookie predicate from @inkeep/agents-core/auth/cookie-names.
Before: Routes had no server-side auth validation — they forwarded whatever headers the browser sent.
After: Every route validates authentication before doing any work. Unauthenticated requests get a 401; unauthorized requests get a 403.
Artifact & data component generate-render routes now require project edit permission
Files: artifact-components/[artifactComponentId]/generate-render/route.ts, data-components/[dataComponentId]/generate-render/route.ts
Before: No auth check — any request with a tenantId/projectId in the body could trigger generation.
After: Calls requireApiRouteProjectPermission(request, { tenantId, projectId, level: 'edit' }) and short-circuits with the error response if the check fails.
Trace routes now require session or bearer auth with cleaner header forwarding
Files: traces/route.ts, traces/conversations/[conversationId]/route.ts, traces/spans/[spanId]/route.ts
Before: Each route manually extracted req.headers.get('cookie') and req.headers.get('authorization'), then conditionally set them on downstream fetch headers — no actual auth validation occurred.
After:
- Each route calls
requireApiRouteSessionOrBearer(req)at the top and returns early on failure. - The validated
authResult.headersrecord (containing eitherCookieorAuthorization) is spread into downstream fetch headers, replacing the ad-hoc cookie/auth header forwarding. - The
signozQueryhelper in the conversation route is simplified from(payload, tenantId, cookieHeader, authHeader)to(payload, tenantId, authHeaders).
Comprehensive test coverage
File: api-route-auth.test.ts (199 lines)
Tests cover:
- Cookie filtering: non-auth cookies stripped; auth cookies (
better-auth.session_token,__Secure-better-auth.session_data) preserved - Missing auth: returns 401 without calling
agents-api - Valid session: calls
/api/auth/get-sessionwith filtered cookies, returns session data - Bearer-only: returns immediately with
authType: 'bearer'without fetching session - Bearer + cookie: bearer takes priority; no session fetch
- Session fallback: when no bearer token, falls back to cookie-based session auth
- Project permission rejection: user with
canEdit: falsegets 403 whenlevel: 'edit'is required
Claude Opus | 𝕏
There was a problem hiding this comment.
Severity: has a runtime-breaking bug. The auth hardening itself is well-structured — clean discriminated union types, cookie filtering via the shared isAuthCookie utility, proper session validation, and good test coverage.
Critical: conversations/[conversationId]/route.ts line 716 (outside the diff) still calls signozQuery(usageEventsPayload, tenantId, cookieHeader, authHeader) with 4 arguments using the now-removed cookieHeader and authHeader variables. signozQuery was updated to accept 3 parameters (payload, tenantId, authHeaders: Record<string, string>), and the old variable declarations were removed on lines 677–678. This will throw a ReferenceError at runtime whenever the usage-events query path executes. Fix: signozQuery(usageEventsPayload, tenantId, authResult.headers) (same pattern as the corrected call on line 701).
Claude Opus | 𝕏
| payloads.map(async (p, i) => { | ||
| const batchStart = Date.now(); | ||
| const result = await signozQuery(p, tenantId, cookieHeader, authHeader); | ||
| const result = await signozQuery(p, tenantId, authResult.headers); |
There was a problem hiding this comment.
This call was correctly updated, but there is a second signozQuery call ~15 lines below (line 716, outside this diff hunk) that was not updated:
const result = await signozQuery(usageEventsPayload, tenantId, cookieHeader, authHeader);cookieHeader and authHeader no longer exist in scope (removed in this PR on old lines 677–678). This will throw a ReferenceError at runtime. It should be:
const result = await signozQuery(usageEventsPayload, tenantId, authResult.headers);| if (authorizationHeader) { | ||
| return { | ||
| ok: true, | ||
| authType: 'bearer', | ||
| authorizationHeader, | ||
| headers: { | ||
| Authorization: authorizationHeader, | ||
| }, | ||
| }; |
There was a problem hiding this comment.
Informational (not blocking): The bearer path trusts the token without server-side validation here — it passes it through to agents-api. This is a valid pass-through pattern as long as every downstream agents-api endpoint (e.g. /manage/tenants/{id}/signoz/query, /manage/tenants/{id}/signoz/health) enforces bearer token validation and doesn't silently fall through to an unauthenticated code path. Worth a quick sanity check if not already confirmed.
There was a problem hiding this comment.
PR Review Summary
(1) Total Issues | Risk: High
🔴❗ Critical (1) ❗🔴
Inline Comments:
- 🔴 Critical:
conversations/[conversationId]/route.ts:716Undefined variablescookieHeaderandauthHeadercause runtime error
🟡 Minor (1) 🟡
Inline Comments:
- 🟡 Minor:
api-route-auth.ts:147Silent error swallowing hinders debugging
💭 Consider (1) 💭
Inline Comments:
- 💭 Consider:
api-route-auth.ts:164Bearer tokens pass through without validation — confirm this is intentional
📋 Overall Assessment
What this PR does well:
- ✅ Clean, well-structured auth helper module with good separation of concerns
- ✅ Proper discriminated union types (
ok: true/false) for result handling - ✅ Good test coverage for the core authentication flows
- ✅ Cookie filtering using shared
isAuthCookiefrom@inkeep/agents-coreensures consistency - ✅ Consistent integration pattern across all 5 modified route handlers
Areas for attention:
- Critical bug must be fixed — Line 716 in conversations route references undefined variables
- Missing changeset — This PR modifies runtime behavior for
agents-manage-uiand should have a changeset (per AGENTS.md guidelines) - Error logging — Consider adding structured logging for auth failures to aid debugging
📦 Missing Changeset
Per the AGENTS.md guidelines, this PR should include a changeset since it modifies runtime behavior:
pnpm bump patch --pkg agents-manage-ui "Add authentication helpers for API routes"🚫 REQUEST CHANGES
Summary: The auth module design is solid and the integration is consistent, but there's a critical runtime bug that must be fixed before merging. Line 716 in the conversations route still references the old cookieHeader and authHeader variables that were removed during the refactoring — this will cause a ReferenceError when the usage events query runs. The fix is straightforward (use authResult.headers like the other call sites).
Discarded (0)
No findings were discarded.
Reviewers (1)
| Reviewer | Returned | Main Findings | Consider | While You're Here | Inline Comments | Pending Recs | Discarded |
|---|---|---|---|---|---|---|---|
orchestrator (manual) |
3 | 0 | 0 | 0 | 3 | 0 | 0 |
| Total | 3 | 0 | 0 | 0 | 3 | 0 | 0 |
Note: Subagent dispatch failed; conducted manual review.
| return { | ||
| ok: false, | ||
| response: internalErrorResponse(), | ||
| }; |
There was a problem hiding this comment.
🟡 Minor: Silent error swallowing hinders debugging
Issue: Errors are caught and discarded without logging. When auth validation fails for non-401 reasons (network errors, JSON parse errors, unexpected responses), the error details are lost.
Why: Without logging, debugging production auth issues requires reproducing the exact scenario. Adding structured logging here would significantly help diagnose intermittent auth failures.
Fix: Consider adding error logging before returning the error response:
} catch (error) {
const logger = getLogger('api-route-auth');
logger.error({ error: error instanceof Error ? error.message : String(error) }, 'Session validation failed');
return {
ok: false,
response: internalErrorResponse(),
};
}Refs:
- getLogger pattern used elsewhere — imports from
@/lib/logger
| headers: { | ||
| Authorization: authorizationHeader, | ||
| }, | ||
| }; |
There was a problem hiding this comment.
💭 Consider: Bearer tokens pass through without validation
Issue: Bearer tokens are accepted and forwarded without any validation. This differs from session auth which validates against agents-api.
Why: This is likely intentional (agents-api validates the token downstream), but worth confirming:
- Is there any risk from accepting malformed bearer tokens?
- Should there be basic format validation (e.g., minimum length)?
- Are there specific token patterns expected?
If this is intentional and agents-api validates all bearer tokens, this is fine as-is. Just flagging for explicit confirmation of the design decision.
There was a problem hiding this comment.
PR Review Summary
(1) Total Issues | Risk: High
🔴❗ Critical (1) ❗🔴
Inline Comments:
- 🔴 Critical:
route.ts:716Undefined variablescookieHeaderandauthHeaderstill cause runtimeReferenceError
🕐 Pending Recommendations (2)
- 🟡 Silent error swallowing in catch block — Consider adding structured logging for auth failures
- 💭 Bearer token passthrough — Confirm this is intentional design (agents-api validates downstream)
📦 Missing Changeset
This PR modifies runtime behavior for agents-manage-ui and should have a changeset per AGENTS.md guidelines:
pnpm bump patch --pkg agents-manage-ui "Add authentication helpers for API routes"🚫 REQUEST CHANGES
Summary: The critical bug at line 716 flagged in the prior review is still not fixed. The code still references cookieHeader and authHeader which were removed — this will throw a ReferenceError at runtime whenever the usage-events query executes. The fix is a simple one-liner (use authResult.headers like line 701).
Discarded (0)
No findings were discarded.
Reviewers (1)
| Reviewer | Returned | Main Findings | Consider | While You're Here | Inline Comments | Pending Recs | Discarded |
|---|---|---|---|---|---|---|---|
orchestrator (delta) |
1 | 0 | 0 | 0 | 1 | 2 | 0 |
| Total | 1 | 0 | 0 | 0 | 1 | 2 | 0 |
Note: Delta review scoped to 3 commits since last automated review (commit 2599e18c4582).
…e + preview paths (#198) * fix(ci): close remaining silent strandings in release cascade + bridge + preview paths Bundle of 10 code + 1 doc changes, each closing a distinct silent-failure mode where CI would pass green (or sit red invisibly) while production work stranded. Complement to #194's release-cascade hardening — each fix here addresses a gap that pass missed. Release cascade (5) - release-handler.yml: pin GitHub Release to client_payload.commit_sha, not origin/main. Auto-format landing between sync and handler used to create a SHA drift where Vercel deployed a newer commit than npm got. - release-handler.yml: 3-attempt bash retry around gh release view/edit/ create. Mirrors the dispatch-side retry from #174 — one transient 5xx used to leave npm published with no Release (so no Vercel deploy) and no alert. - release-handler.yml: notify-on-failure job. If the success job itself errors (app-token, checkout, or handler logic), the dispatch was already delivered and public-side retries don't re-fire. Without this notifier, npm publishes but the cascade stops silently. - public-mirror-sync.yml: 3-attempt retry on sync PR approval. One dropped GH API call used to leave the sync PR open with no approval; now exhaustion exits non-zero so the workflow turns red and the janitor's sync-PR sweep (below) catches long-open PRs as a safety net. - scripts/check-monorepo-traps.mjs: flip check:override-masks-bump from warn to hard-fail. Current tree is clean of masks, so flipping is safe and prevents future PR #170-class silent divergence. Preview envs (2) - deploy-vercel-preview.sh: validate captured deployment URL matches *.vercel.app pattern before alias-set. The log-grep fallback could silently capture a docs/inspection URL and alias api.preview.inkeep.com to the wrong place. - public-agents-preview-environments.yml: add teardown-failure-notify job. Railway and Vercel teardowns run in parallel by design; this closes the observability gap when either silently fails — Railway continues billing, Vercel env vars pollute, and the next 6h janitor sweep re-attempts cleanup. Dependency sync + bridge (2) - dependabot-sync-root-lockfile.yml: on push failure, full-reset onto origin/$HEAD_REF and re-regenerate both lockfiles (root + public/agents) from the new baseline. Previous simple rebase would replay our stale lockfile on top of a different package.json after a Dependabot force-push, yielding inconsistent committed state that either broke --frozen-lockfile or silently shipped drift into main. - bridge-public-pr-to-monorepo.mjs (both copies): truncate bridged PR body to fit GitHub's 65,536-char limit, with a link to the original PR. Was failing 422 "body is too long" on Dependabot mega-bumps, stranding those PRs outside the agents-private review surface. Janitor (1) - public-agents-preview-janitor.yml: new sweep-stale-sync-prs job. Every 6h, identify open copybara/sync PRs on inkeep/agents >4h old, open (idempotent) tracking issue on agents-private, re-dispatch public-mirror-sync.yml. Does NOT auto-close the PR — destructive actions stay human-driven. Docs (1) - CI_RUNBOOK.md: partial-npm-publish recovery path. Documents what to do when pnpm changeset publish fails on package N of 10 (some live, some not): confirm via npm view, re-run the failed job (publish is idempotent), verify all 10 present before unblocking cascade. Each change verified in isolation with pnpm check:structural; YAML validated with safe_load; bridge truncation logic tested at boundary. * fix(ci): add checkout step and remove heredoc indentation in release-handler notify Addresses review feedback on PR #198: - notify-success-failure now does a Checkout before gh calls, matching the failure job pattern (without it, gh CLI lacks git context and issue-create would silently fail with 'fatal: not a git repository'). - Body is now built with echo into a temp file instead of a heredoc. The original 10-space indent inside run: | would have been preserved literally by bash, making the tracking issue body unreadable. * fix(ci): address review feedback on release-handler retry + docs drift Review feedback from #198 (Claude + Pullfrog): release-handler.yml: - Capture stderr in gh_with_retry so on-call sees the actual API error (rate-limit, 5xx, auth) instead of a generic 'command failed' line. Was making incident triage 15-30m slower. (Claude Major) - Stop wrapping 'gh release view' in gh_with_retry. It returns non-zero for the legitimate 'release doesn't exist' case (the normal new-release happy path); retrying burned 15s of backoff before falling through to create. Only mutating calls (edit/create) need retry. (Pullfrog + Claude) - notify-success-failure now fails loudly (exit 1) if gh issue create fails, matching the failure job's pattern. Silent swallow would mean the success handler failed AND the notifier failed with no visible signal beyond a red check — exactly the class this PR closes elsewhere. (Claude Major) Docs drift (Pullfrog): - AGENTS.md: remove '(soft today)' / 'non-blocking today' annotations from override-masks-bump in both the cheatsheet and the trap-list. - AGENTS.md trap #4: flip 'Currently a warning, not a hard fail' to 'Hard-fail (flipped...)' to match the code. - CI_RUNBOOK.md: rename the override-masks-bump section header from 'warns (non-blocking today)' to 'fails — root override masks a workspace bump'. Preview janitor (Claude Consider): - Document that sweep-stale-sync-prs runs unconditionally regardless of PREVIEW_ENVIRONMENTS_ENABLED. Pr body said so; workflow didn't. Bridge script (Claude Consider): - console.log when PR body is truncated, with original + new sizes. Helps CI log debugging when a bridged PR looks shorter than expected. Applied to both copies (public/agents + public/agents-optional-local-dev). Intentionally not addressed: - dependabot-sync $MSG staleness after reset: Pullfrog flagged as intentional (message describes the operation, not the specific baseline), Claude flagged as cosmetic minor. Siding with Pullfrog. - Preview teardown notify swallowed error: lower severity given the janitor re-attempts cleanup every 6h; accepting current pattern. * fix(ci): pre-fetch PR base into agents-private before git apply --3way Closes the bridge-failure class seen on inkeep/agents#3171 (and eventually on any bridged PR with a conflicting hunk). Root cause: `git apply --index --3way` resolves the patch's `index <old>..<new>` blob SHAs against the local repo's object store. The SHAs come from inkeep/agents' object graph; agents-private has never fetched from inkeep/agents, so those SHAs aren't resolvable. Clean patches work (no --3way fallback needed); any conflict triggers --3way, which fails with 'repository lacks the necessary blob to perform 3-way merge' — bridge stops with no useful diagnostic, PR strands outside the canonical review surface. Fix: Before calling git apply, shallow-fetch the PR's base commit from the public repo into agents-private's object store. GitHub permits fetching by SHA when the SHA is reachable from a ref (PR base is on main, always reachable), and shallow=1 keeps it cheap. After fetch, --3way can find the `<old>` blob and resolve conflicts into merge markers instead of a hard error. Why not drop --3way: Dropping --3way makes every conflicting PR fail hard with no recovery path. Pre-fetching preserves the --3way fallback so conflicting hunks produce visible conflict markers in the bridged PR, which is reviewable rather than opaque. Failure handling: If the fetch itself fails (rate-limit, network flake), we log and proceed. The subsequent git apply will hit the same blob error as before the fix — no regression, just no improvement for that run. Applied to both script copies: public/agents and public/agents-optional-local-dev. Bundled into PR #198 since it's the same release/sync hardening scope. * Revert "fix(ci): pre-fetch PR base into agents-private before git apply --3way" This reverts commit e7e0640e2345bbd7cdea1185eb3abc666f2c60fa. * fix(ci): switch bridge patch fetch from .patch to .diff media type Root cause of bridge failures on multi-commit PRs (eg inkeep/agents#3171, 9 commits): `gh api .../pulls/N Accept: application/vnd.github.patch` returns a mailbox-format patch series — one patch per PR commit, each with `index <old>..<new>` lines referencing INTERMEDIATE blob SHAs created during the PR's history on inkeep/agents. Those intermediates exist only in inkeep/agents' object store; agents-private has never seen them. The first patch applies against agents-private's file (blob content matches the PR base even if SHAs differ), git's index advances to the intermediate state (only referenced by SHA, never fetched as an object). Any subsequent conflicting hunk triggers `git apply --3way` fallback, which needs the intermediate blob → 'repository lacks the necessary blob' → stranded PR. Single-commit PRs don't hit this (no intermediates). That's why the bridge used to work for most Dependabot bumps but started failing on longer branches like #3171. Fix: switch `Accept: application/vnd.github.patch` → `Accept: application/vnd.github.diff` in both bridge-public-pr-to- monorepo.mjs copies. .diff returns a single consolidated base-vs-head unified diff with only the PR's base blob SHAs referenced. Those are content-identical to agents-private's copies (Copybara mirrors blobs 1:1 without content rewrites for .ts files), so --3way can resolve them locally from agents-private's own object store. Also updated githubRequest's text-response check to include .diff alongside .patch (both are text, not JSON). Previous pre-fetch commit (`e7e0640e2`) was reverted in a prior commit — it was addressing the wrong layer (only PR-base blob, not intermediates). This commit supersedes that approach entirely. * chore(ci): biome format fix on bridge script console.log GitOrigin-RevId: a26d5dac74ed9938fc512703af21a25d82616504
There was a problem hiding this comment.
PR Review Summary
(1) Total Issues | Risk: High
🔴❗ Critical (1) ❗🔴
Inline Comments:
- 🔴 Critical:
route.ts:716Undefined variablescookieHeaderandauthHeadercause runtimeReferenceError
🕐 Pending Recommendations (2)
- 🟡
api-route-auth.ts:147— Silent error swallowing in catch block hinders debugging; consider adding structured logging - 💭
api-route-auth.ts:164— Bearer tokens pass through without validation; confirm this is intentional (agents-api validates downstream)
📦 Missing Changeset
This PR modifies runtime behavior for agents-manage-ui and should have a changeset per AGENTS.md guidelines:
pnpm bump patch --pkg agents-manage-ui "Add authentication helpers for API routes"📋 Delta Review Note
The 3 commits since the last review (fe9c4d9d4, 37f8a3e4a, b33cc4604) only touched .github/scripts/bridge-public-pr-to-monorepo.mjs — a CI bridge script unrelated to the auth code under review. The auth module itself has not changed since the prior reviews.
🚫 REQUEST CHANGES
Summary: The critical bug at line 716 flagged in prior reviews remains unfixed. The code still references cookieHeader and authHeader which were removed — this will throw a ReferenceError at runtime whenever the usage-events query executes. The fix is a simple one-liner: use authResult.headers like the corrected call on line 701. A 1-click suggestion is provided in the inline comment.
Discarded (0)
No findings were discarded.
Reviewers (1)
| Reviewer | Returned | Main Findings | Consider | While You're Here | Inline Comments | Pending Recs | Discarded |
|---|---|---|---|---|---|---|---|
orchestrator (delta) |
1 | 0 | 0 | 0 | 1 | 2 | 0 |
| Total | 1 | 0 | 0 | 0 | 1 | 2 | 0 |
Note: Delta review scoped to 3 commits since last automated review. Sub-reviewers not dispatched as delta only touched CI scripts unrelated to auth code.
Ito Test Report ❌16 test cases ran. 1 failed, 1 additional finding, 14 passed. Across 16 QA scenarios, 14 passed and 2 failed, indicating strong stability in render regeneration and mockData preservation, permission and session enforcement, bearer/session auth routing behavior, concurrent trace request isolation, retry envelope coherence, and usage-events fault-tolerance. ❌ Failed (1)🟠 Inconsistent tenant rejection in trace endpoints
Relevant code:
async function signozQuery(
payload: any,
tenantId: string,
authHeaders: Record<string, string>
): Promise<SigNozResp> {
(async () => {
const batchStart = Date.now();
try {
const result = await signozQuery(usageEventsPayload, tenantId, cookieHeader, authHeader);
logger.info(
{ batch: 'usage-events', ms: Date.now() - batchStart },
'signoz batch complete'
);
if (!response.ok) {
logger.error(
{
error: data,
status: response.status,
},
'SigNoz health check failed'
);
let errorMessage = 'Failed to check SigNoz configuration';
if (response.status === 401 || response.status === 403) {
errorMessage = 'Authentication required';
} else if (data?.error) {
errorMessage = data.error;
}
return NextResponse.json({
status: 'connection_failed',✅ Passed (14)ℹ️ Additional Findings (1)
🟠 Unauthenticated conversation page shows 404
Relevant code:
export async function GET(
req: NextRequest,
context: RouteContext<'/api/traces/conversations/[conversationId]'>
) {
const authResult = await requireApiRouteSessionOrBearer(req);
if (!authResult.ok) {
return authResult.response;
}
try {
const queryClient = new QueryClient();
const permissions = await fetchProjectPermissions(tenantId, projectId);
queryClient.setQueryData(projectQueryKeys.permissions(tenantId, projectId), permissions);
return <HydrationBoundary state={dehydrate(queryClient)}>{children}</HydrationBoundary>;
} catch (error) {
return (
<FullPageError
errorCode={getErrorCode(error)}
context="project"
/>
);
}
if (!hasAccess) {
throw createApiError({
code: 'not_found',
message: 'Project not found',
instance: c.req.path,
});
}Commit: Tell us how we did: Give Ito Feedback |
…e + preview paths (inkeep#198) (inkeep#3172) * fix(ci): close remaining silent strandings in release cascade + bridge + preview paths Bundle of 10 code + 1 doc changes, each closing a distinct silent-failure mode where CI would pass green (or sit red invisibly) while production work stranded. Complement to inkeep#194's release-cascade hardening — each fix here addresses a gap that pass missed. Release cascade (5) - release-handler.yml: pin GitHub Release to client_payload.commit_sha, not origin/main. Auto-format landing between sync and handler used to create a SHA drift where Vercel deployed a newer commit than npm got. - release-handler.yml: 3-attempt bash retry around gh release view/edit/ create. Mirrors the dispatch-side retry from inkeep#174 — one transient 5xx used to leave npm published with no Release (so no Vercel deploy) and no alert. - release-handler.yml: notify-on-failure job. If the success job itself errors (app-token, checkout, or handler logic), the dispatch was already delivered and public-side retries don't re-fire. Without this notifier, npm publishes but the cascade stops silently. - public-mirror-sync.yml: 3-attempt retry on sync PR approval. One dropped GH API call used to leave the sync PR open with no approval; now exhaustion exits non-zero so the workflow turns red and the janitor's sync-PR sweep (below) catches long-open PRs as a safety net. - scripts/check-monorepo-traps.mjs: flip check:override-masks-bump from warn to hard-fail. Current tree is clean of masks, so flipping is safe and prevents future PR inkeep#170-class silent divergence. Preview envs (2) - deploy-vercel-preview.sh: validate captured deployment URL matches *.vercel.app pattern before alias-set. The log-grep fallback could silently capture a docs/inspection URL and alias api.preview.inkeep.com to the wrong place. - public-agents-preview-environments.yml: add teardown-failure-notify job. Railway and Vercel teardowns run in parallel by design; this closes the observability gap when either silently fails — Railway continues billing, Vercel env vars pollute, and the next 6h janitor sweep re-attempts cleanup. Dependency sync + bridge (2) - dependabot-sync-root-lockfile.yml: on push failure, full-reset onto origin/$HEAD_REF and re-regenerate both lockfiles (root + public/agents) from the new baseline. Previous simple rebase would replay our stale lockfile on top of a different package.json after a Dependabot force-push, yielding inconsistent committed state that either broke --frozen-lockfile or silently shipped drift into main. - bridge-public-pr-to-monorepo.mjs (both copies): truncate bridged PR body to fit GitHub's 65,536-char limit, with a link to the original PR. Was failing 422 "body is too long" on Dependabot mega-bumps, stranding those PRs outside the agents-private review surface. Janitor (1) - public-agents-preview-janitor.yml: new sweep-stale-sync-prs job. Every 6h, identify open copybara/sync PRs on inkeep/agents >4h old, open (idempotent) tracking issue on agents-private, re-dispatch public-mirror-sync.yml. Does NOT auto-close the PR — destructive actions stay human-driven. Docs (1) - CI_RUNBOOK.md: partial-npm-publish recovery path. Documents what to do when pnpm changeset publish fails on package N of 10 (some live, some not): confirm via npm view, re-run the failed job (publish is idempotent), verify all 10 present before unblocking cascade. Each change verified in isolation with pnpm check:structural; YAML validated with safe_load; bridge truncation logic tested at boundary. * fix(ci): add checkout step and remove heredoc indentation in release-handler notify Addresses review feedback on PR inkeep#198: - notify-success-failure now does a Checkout before gh calls, matching the failure job pattern (without it, gh CLI lacks git context and issue-create would silently fail with 'fatal: not a git repository'). - Body is now built with echo into a temp file instead of a heredoc. The original 10-space indent inside run: | would have been preserved literally by bash, making the tracking issue body unreadable. * fix(ci): address review feedback on release-handler retry + docs drift Review feedback from inkeep#198 (Claude + Pullfrog): release-handler.yml: - Capture stderr in gh_with_retry so on-call sees the actual API error (rate-limit, 5xx, auth) instead of a generic 'command failed' line. Was making incident triage 15-30m slower. (Claude Major) - Stop wrapping 'gh release view' in gh_with_retry. It returns non-zero for the legitimate 'release doesn't exist' case (the normal new-release happy path); retrying burned 15s of backoff before falling through to create. Only mutating calls (edit/create) need retry. (Pullfrog + Claude) - notify-success-failure now fails loudly (exit 1) if gh issue create fails, matching the failure job's pattern. Silent swallow would mean the success handler failed AND the notifier failed with no visible signal beyond a red check — exactly the class this PR closes elsewhere. (Claude Major) Docs drift (Pullfrog): - AGENTS.md: remove '(soft today)' / 'non-blocking today' annotations from override-masks-bump in both the cheatsheet and the trap-list. - AGENTS.md trap inkeep#4: flip 'Currently a warning, not a hard fail' to 'Hard-fail (flipped...)' to match the code. - CI_RUNBOOK.md: rename the override-masks-bump section header from 'warns (non-blocking today)' to 'fails — root override masks a workspace bump'. Preview janitor (Claude Consider): - Document that sweep-stale-sync-prs runs unconditionally regardless of PREVIEW_ENVIRONMENTS_ENABLED. Pr body said so; workflow didn't. Bridge script (Claude Consider): - console.log when PR body is truncated, with original + new sizes. Helps CI log debugging when a bridged PR looks shorter than expected. Applied to both copies (public/agents + public/agents-optional-local-dev). Intentionally not addressed: - dependabot-sync $MSG staleness after reset: Pullfrog flagged as intentional (message describes the operation, not the specific baseline), Claude flagged as cosmetic minor. Siding with Pullfrog. - Preview teardown notify swallowed error: lower severity given the janitor re-attempts cleanup every 6h; accepting current pattern. * fix(ci): pre-fetch PR base into agents-private before git apply --3way Closes the bridge-failure class seen on inkeep#3171 (and eventually on any bridged PR with a conflicting hunk). Root cause: `git apply --index --3way` resolves the patch's `index <old>..<new>` blob SHAs against the local repo's object store. The SHAs come from inkeep/agents' object graph; agents-private has never fetched from inkeep/agents, so those SHAs aren't resolvable. Clean patches work (no --3way fallback needed); any conflict triggers --3way, which fails with 'repository lacks the necessary blob to perform 3-way merge' — bridge stops with no useful diagnostic, PR strands outside the canonical review surface. Fix: Before calling git apply, shallow-fetch the PR's base commit from the public repo into agents-private's object store. GitHub permits fetching by SHA when the SHA is reachable from a ref (PR base is on main, always reachable), and shallow=1 keeps it cheap. After fetch, --3way can find the `<old>` blob and resolve conflicts into merge markers instead of a hard error. Why not drop --3way: Dropping --3way makes every conflicting PR fail hard with no recovery path. Pre-fetching preserves the --3way fallback so conflicting hunks produce visible conflict markers in the bridged PR, which is reviewable rather than opaque. Failure handling: If the fetch itself fails (rate-limit, network flake), we log and proceed. The subsequent git apply will hit the same blob error as before the fix — no regression, just no improvement for that run. Applied to both script copies: public/agents and public/agents-optional-local-dev. Bundled into PR inkeep#198 since it's the same release/sync hardening scope. * Revert "fix(ci): pre-fetch PR base into agents-private before git apply --3way" This reverts commit e7e0640e2345bbd7cdea1185eb3abc666f2c60fa. * fix(ci): switch bridge patch fetch from .patch to .diff media type Root cause of bridge failures on multi-commit PRs (eg inkeep#3171, 9 commits): `gh api .../pulls/N Accept: application/vnd.github.patch` returns a mailbox-format patch series — one patch per PR commit, each with `index <old>..<new>` lines referencing INTERMEDIATE blob SHAs created during the PR's history on inkeep/agents. Those intermediates exist only in inkeep/agents' object store; agents-private has never seen them. The first patch applies against agents-private's file (blob content matches the PR base even if SHAs differ), git's index advances to the intermediate state (only referenced by SHA, never fetched as an object). Any subsequent conflicting hunk triggers `git apply --3way` fallback, which needs the intermediate blob → 'repository lacks the necessary blob' → stranded PR. Single-commit PRs don't hit this (no intermediates). That's why the bridge used to work for most Dependabot bumps but started failing on longer branches like inkeep#3171. Fix: switch `Accept: application/vnd.github.patch` → `Accept: application/vnd.github.diff` in both bridge-public-pr-to- monorepo.mjs copies. .diff returns a single consolidated base-vs-head unified diff with only the PR's base blob SHAs referenced. Those are content-identical to agents-private's copies (Copybara mirrors blobs 1:1 without content rewrites for .ts files), so --3way can resolve them locally from agents-private's own object store. Also updated githubRequest's text-response check to include .diff alongside .patch (both are text, not JSON). Previous pre-fetch commit (`e7e0640e2`) was reverted in a prior commit — it was addressing the wrong layer (only PR-base blob, not intermediates). This commit supersedes that approach entirely. * chore(ci): biome format fix on bridge script console.log GitOrigin-RevId: a26d5dac74ed9938fc512703af21a25d82616504 Co-authored-by: Varun Varahabhotla <[email protected]>
* chore(sync): mirror #3171 * apply review * style: auto-format with biome * Update public/agents/agents-manage-ui/src/app/api/traces/conversations/[conversationId]/route.ts * chore: add changeset for manage UI auth guards --------- GitOrigin-RevId: 5a2dd89d0f85597bd83b9938fb778a08b50e71fd Co-authored-by: inkeep-internal-ci[bot] <259778081+inkeep-internal-ci[bot]@users.noreply.github.com> Co-authored-by: robert-inkeep <[email protected]> Co-authored-by: Varun Varahabhotla <[email protected]> Co-authored-by: Dima <[email protected]> Co-authored-by: Dimitri POSTOLOV <[email protected]> Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com> Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com> Co-authored-by: Dimitri POSTOLOV <[email protected]>
No description provided.