Skip to content

chore: sync from agents-private#3172

Merged
inkeep-oss-sync[bot] merged 1 commit intomainfrom
copybara/sync
Apr 22, 2026
Merged

chore: sync from agents-private#3172
inkeep-oss-sync[bot] merged 1 commit intomainfrom
copybara/sync

Conversation

@inkeep-oss-sync
Copy link
Copy Markdown
Contributor

Automated sync from agents-private via Copybara mirror.

…e + preview paths (#198)

* fix(ci): close remaining silent strandings in release cascade + bridge + preview paths

Bundle of 10 code + 1 doc changes, each closing a distinct silent-failure
mode where CI would pass green (or sit red invisibly) while production
work stranded. Complement to #194's release-cascade hardening — each fix
here addresses a gap that pass missed.

Release cascade (5)
- release-handler.yml: pin GitHub Release to client_payload.commit_sha,
  not origin/main. Auto-format landing between sync and handler used to
  create a SHA drift where Vercel deployed a newer commit than npm got.
- release-handler.yml: 3-attempt bash retry around gh release view/edit/
  create. Mirrors the dispatch-side retry from #174 — one transient 5xx
  used to leave npm published with no Release (so no Vercel deploy) and
  no alert.
- release-handler.yml: notify-on-failure job. If the success job itself
  errors (app-token, checkout, or handler logic), the dispatch was
  already delivered and public-side retries don't re-fire. Without this
  notifier, npm publishes but the cascade stops silently.
- public-mirror-sync.yml: 3-attempt retry on sync PR approval. One
  dropped GH API call used to leave the sync PR open with no approval;
  now exhaustion exits non-zero so the workflow turns red and the
  janitor's sync-PR sweep (below) catches long-open PRs as a safety net.
- scripts/check-monorepo-traps.mjs: flip check:override-masks-bump from
  warn to hard-fail. Current tree is clean of masks, so flipping is safe
  and prevents future PR #170-class silent divergence.

Preview envs (2)
- deploy-vercel-preview.sh: validate captured deployment URL matches
  *.vercel.app pattern before alias-set. The log-grep fallback could
  silently capture a docs/inspection URL and alias api.preview.inkeep.com
  to the wrong place.
- public-agents-preview-environments.yml: add teardown-failure-notify
  job. Railway and Vercel teardowns run in parallel by design; this
  closes the observability gap when either silently fails — Railway
  continues billing, Vercel env vars pollute, and the next 6h janitor
  sweep re-attempts cleanup.

Dependency sync + bridge (2)
- dependabot-sync-root-lockfile.yml: on push failure, full-reset onto
  origin/$HEAD_REF and re-regenerate both lockfiles (root + public/agents)
  from the new baseline. Previous simple rebase would replay our stale
  lockfile on top of a different package.json after a Dependabot
  force-push, yielding inconsistent committed state that either broke
  --frozen-lockfile or silently shipped drift into main.
- bridge-public-pr-to-monorepo.mjs (both copies): truncate bridged PR
  body to fit GitHub's 65,536-char limit, with a link to the original PR.
  Was failing 422 "body is too long" on Dependabot mega-bumps, stranding
  those PRs outside the agents-private review surface.

Janitor (1)
- public-agents-preview-janitor.yml: new sweep-stale-sync-prs job. Every
  6h, identify open copybara/sync PRs on inkeep/agents >4h old, open
  (idempotent) tracking issue on agents-private, re-dispatch
  public-mirror-sync.yml. Does NOT auto-close the PR — destructive
  actions stay human-driven.

Docs (1)
- CI_RUNBOOK.md: partial-npm-publish recovery path. Documents what to
  do when pnpm changeset publish fails on package N of 10 (some live,
  some not): confirm via npm view, re-run the failed job (publish is
  idempotent), verify all 10 present before unblocking cascade.

Each change verified in isolation with pnpm check:structural; YAML
validated with safe_load; bridge truncation logic tested at boundary.

* fix(ci): add checkout step and remove heredoc indentation in release-handler notify

Addresses review feedback on PR #198:
- notify-success-failure now does a Checkout before gh calls, matching
  the failure job pattern (without it, gh CLI lacks git context and
  issue-create would silently fail with 'fatal: not a git repository').
- Body is now built with echo into a temp file instead of a heredoc.
  The original 10-space indent inside run: | would have been preserved
  literally by bash, making the tracking issue body unreadable.

* fix(ci): address review feedback on release-handler retry + docs drift

Review feedback from #198 (Claude + Pullfrog):

release-handler.yml:
- Capture stderr in gh_with_retry so on-call sees the actual API error
  (rate-limit, 5xx, auth) instead of a generic 'command failed' line.
  Was making incident triage 15-30m slower. (Claude Major)
- Stop wrapping 'gh release view' in gh_with_retry. It returns non-zero
  for the legitimate 'release doesn't exist' case (the normal new-release
  happy path); retrying burned 15s of backoff before falling through to
  create. Only mutating calls (edit/create) need retry. (Pullfrog + Claude)
- notify-success-failure now fails loudly (exit 1) if gh issue create
  fails, matching the failure job's pattern. Silent swallow would mean
  the success handler failed AND the notifier failed with no visible
  signal beyond a red check — exactly the class this PR closes elsewhere.
  (Claude Major)

Docs drift (Pullfrog):
- AGENTS.md: remove '(soft today)' / 'non-blocking today' annotations
  from override-masks-bump in both the cheatsheet and the trap-list.
- AGENTS.md trap #4: flip 'Currently a warning, not a hard fail' to
  'Hard-fail (flipped...)' to match the code.
- CI_RUNBOOK.md: rename the override-masks-bump section header from
  'warns (non-blocking today)' to 'fails — root override masks a
  workspace bump'.

Preview janitor (Claude Consider):
- Document that sweep-stale-sync-prs runs unconditionally regardless of
  PREVIEW_ENVIRONMENTS_ENABLED. Pr body said so; workflow didn't.

Bridge script (Claude Consider):
- console.log when PR body is truncated, with original + new sizes.
  Helps CI log debugging when a bridged PR looks shorter than expected.
  Applied to both copies (public/agents + public/agents-optional-local-dev).

Intentionally not addressed:
- dependabot-sync $MSG staleness after reset: Pullfrog flagged as
  intentional (message describes the operation, not the specific
  baseline), Claude flagged as cosmetic minor. Siding with Pullfrog.
- Preview teardown notify swallowed error: lower severity given the
  janitor re-attempts cleanup every 6h; accepting current pattern.

* fix(ci): pre-fetch PR base into agents-private before git apply --3way

Closes the bridge-failure class seen on #3171 (and
eventually on any bridged PR with a conflicting hunk).

Root cause:
`git apply --index --3way` resolves the patch's `index <old>..<new>`
blob SHAs against the local repo's object store. The SHAs come from
inkeep/agents' object graph; agents-private has never fetched from
inkeep/agents, so those SHAs aren't resolvable. Clean patches work
(no --3way fallback needed); any conflict triggers --3way, which
fails with 'repository lacks the necessary blob to perform 3-way
merge' — bridge stops with no useful diagnostic, PR strands outside
the canonical review surface.

Fix:
Before calling git apply, shallow-fetch the PR's base commit from
the public repo into agents-private's object store. GitHub permits
fetching by SHA when the SHA is reachable from a ref (PR base is on
main, always reachable), and shallow=1 keeps it cheap. After fetch,
--3way can find the `<old>` blob and resolve conflicts into merge
markers instead of a hard error.

Why not drop --3way:
Dropping --3way makes every conflicting PR fail hard with no
recovery path. Pre-fetching preserves the --3way fallback so
conflicting hunks produce visible conflict markers in the bridged
PR, which is reviewable rather than opaque.

Failure handling:
If the fetch itself fails (rate-limit, network flake), we log and
proceed. The subsequent git apply will hit the same blob error as
before the fix — no regression, just no improvement for that run.

Applied to both script copies: public/agents and
public/agents-optional-local-dev. Bundled into PR #198 since it's
the same release/sync hardening scope.

* Revert "fix(ci): pre-fetch PR base into agents-private before git apply --3way"

This reverts commit e7e0640e2345bbd7cdea1185eb3abc666f2c60fa.

* fix(ci): switch bridge patch fetch from .patch to .diff media type

Root cause of bridge failures on multi-commit PRs (eg #3171,
9 commits):

`gh api .../pulls/N Accept: application/vnd.github.patch` returns a
mailbox-format patch series — one patch per PR commit, each with
`index <old>..<new>` lines referencing INTERMEDIATE blob SHAs created
during the PR's history on inkeep/agents. Those intermediates exist
only in inkeep/agents' object store; agents-private has never seen
them. The first patch applies against agents-private's file (blob
content matches the PR base even if SHAs differ), git's index advances
to the intermediate state (only referenced by SHA, never fetched as
an object). Any subsequent conflicting hunk triggers `git apply
--3way` fallback, which needs the intermediate blob → 'repository
lacks the necessary blob' → stranded PR.

Single-commit PRs don't hit this (no intermediates). That's why the
bridge used to work for most Dependabot bumps but started failing on
longer branches like #3171.

Fix: switch `Accept: application/vnd.github.patch` →
`Accept: application/vnd.github.diff` in both bridge-public-pr-to-
monorepo.mjs copies. .diff returns a single consolidated base-vs-head
unified diff with only the PR's base blob SHAs referenced. Those are
content-identical to agents-private's copies (Copybara mirrors blobs
1:1 without content rewrites for .ts files), so --3way can resolve
them locally from agents-private's own object store.

Also updated githubRequest's text-response check to include .diff
alongside .patch (both are text, not JSON).

Previous pre-fetch commit (`e7e0640e2`) was reverted in a prior commit
— it was addressing the wrong layer (only PR-base blob, not
intermediates). This commit supersedes that approach entirely.

* chore(ci): biome format fix on bridge script console.log

GitOrigin-RevId: a26d5dac74ed9938fc512703af21a25d82616504
@inkeep-oss-sync inkeep-oss-sync Bot enabled auto-merge April 22, 2026 14:58
Copy link
Copy Markdown
Contributor

@inkeep-internal-ci inkeep-internal-ci Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated approval from agents-private public-mirror-sync (run: https://github.com/inkeep/agents-private/actions/runs/24785536984). Source of truth is the monorepo; direct edits on inkeep/agents are overwritten on next sync.

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented Apr 22, 2026

⚠️ No Changeset found

Latest commit: 74addc7

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@inkeep-oss-sync inkeep-oss-sync Bot added this pull request to the merge queue Apr 22, 2026
Merged via the queue into main with commit 37f8a3e Apr 22, 2026
14 checks passed
@inkeep-oss-sync inkeep-oss-sync Bot deleted the copybara/sync branch April 22, 2026 15:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant