keytar fix#188
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
1 Skipped Deployment
|
|
Claude finished @omar-inkeep's task —— View job Code Review CompleteAs a TypeScript Staff Engineer and System Architect, I've completed my review of the keytar fix changes. Todo List:
Review FindingsSummaryThis PR addresses keytar dependency management for the Node.js Changes Analysis
Architecture ContextThe keytar library is used by the Review Feedback✅ Positive Aspects
|
* fix(security): add minimatch override >=5.1.8 Fixes multiple ReDoS vulnerabilities (CVE-2026-26996, CVE-2026-27903, CVE-2026-27904) in transitive [email protected] dependency. Closes dependabot alerts #188, #199, #200. Co-Authored-By: Claude Opus 4.6 <[email protected]> * fix(security): add lodash/lodash-es override >=4.17.23 — prototype pollution fix (#2643) * fix(security): add lodash/lodash-es override >=4.17.23 Fixes prototype pollution in _.unset and _.omit (CVE-2025-13465) in transitive lodash dependencies. Closes dependabot alerts #120, #123. Co-Authored-By: Claude Opus 4.6 <[email protected]> * fix(security): add express-rate-limit override >=8.2.2 (#2644) Fixes IPv4-mapped IPv6 rate limit bypass (CVE-2026-30827) in transitive express-rate-limit dependency. Closes dependabot alert #213. Co-authored-by: Claude Opus 4.6 <[email protected]> --------- Co-authored-by: Claude Opus 4.6 <[email protected]> --------- Co-authored-by: Claude Opus 4.6 <[email protected]>
* fix(security): add minimatch override >=5.1.8 Fixes multiple ReDoS vulnerabilities (CVE-2026-26996, CVE-2026-27903, CVE-2026-27904) in transitive [email protected] dependency. Closes dependabot alerts #188, #199, #200. Co-Authored-By: Claude Opus 4.6 <[email protected]> * fix(security): add lodash/lodash-es override >=4.17.23 — prototype pollution fix (#2643) * fix(security): add lodash/lodash-es override >=4.17.23 Fixes prototype pollution in _.unset and _.omit (CVE-2025-13465) in transitive lodash dependencies. Closes dependabot alerts #120, #123. Co-Authored-By: Claude Opus 4.6 <[email protected]> * fix(security): add express-rate-limit override >=8.2.2 (#2644) Fixes IPv4-mapped IPv6 rate limit bypass (CVE-2026-30827) in transitive express-rate-limit dependency. Closes dependabot alert #213. Co-authored-by: Claude Opus 4.6 <[email protected]> --------- Co-authored-by: Claude Opus 4.6 <[email protected]> --------- Co-authored-by: Claude Opus 4.6 <[email protected]>
* fix(security): add minimatch override >=5.1.8 Fixes multiple ReDoS vulnerabilities (CVE-2026-26996, CVE-2026-27903, CVE-2026-27904) in transitive [email protected] dependency. Closes dependabot alerts #188, #199, #200. Co-Authored-By: Claude Opus 4.6 <[email protected]> * fix(security): add lodash/lodash-es override >=4.17.23 — prototype pollution fix (#2643) * fix(security): add lodash/lodash-es override >=4.17.23 Fixes prototype pollution in _.unset and _.omit (CVE-2025-13465) in transitive lodash dependencies. Closes dependabot alerts #120, #123. Co-Authored-By: Claude Opus 4.6 <[email protected]> * fix(security): add express-rate-limit override >=8.2.2 (#2644) Fixes IPv4-mapped IPv6 rate limit bypass (CVE-2026-30827) in transitive express-rate-limit dependency. Closes dependabot alert #213. Co-authored-by: Claude Opus 4.6 <[email protected]> --------- Co-authored-by: Claude Opus 4.6 <[email protected]> --------- Co-authored-by: Claude Opus 4.6 <[email protected]>
* fix(security): add minimatch override >=5.1.8 Fixes multiple ReDoS vulnerabilities (CVE-2026-26996, CVE-2026-27903, CVE-2026-27904) in transitive [email protected] dependency. Closes dependabot alerts #188, #199, #200. Co-Authored-By: Claude Opus 4.6 <[email protected]> * fix(security): add lodash/lodash-es override >=4.17.23 — prototype pollution fix (#2643) * fix(security): add lodash/lodash-es override >=4.17.23 Fixes prototype pollution in _.unset and _.omit (CVE-2025-13465) in transitive lodash dependencies. Closes dependabot alerts #120, #123. Co-Authored-By: Claude Opus 4.6 <[email protected]> * fix(security): add express-rate-limit override >=8.2.2 (#2644) Fixes IPv4-mapped IPv6 rate limit bypass (CVE-2026-30827) in transitive express-rate-limit dependency. Closes dependabot alert #213. Co-authored-by: Claude Opus 4.6 <[email protected]> --------- Co-authored-by: Claude Opus 4.6 <[email protected]> --------- Co-authored-by: Claude Opus 4.6 <[email protected]>
* fix(security): add minimatch override >=5.1.8 Fixes multiple ReDoS vulnerabilities (CVE-2026-26996, CVE-2026-27903, CVE-2026-27904) in transitive [email protected] dependency. Closes dependabot alerts #188, #199, #200. Co-Authored-By: Claude Opus 4.6 <[email protected]> * fix(security): add lodash/lodash-es override >=4.17.23 — prototype pollution fix (#2643) * fix(security): add lodash/lodash-es override >=4.17.23 Fixes prototype pollution in _.unset and _.omit (CVE-2025-13465) in transitive lodash dependencies. Closes dependabot alerts #120, #123. Co-Authored-By: Claude Opus 4.6 <[email protected]> * fix(security): add express-rate-limit override >=8.2.2 (#2644) Fixes IPv4-mapped IPv6 rate limit bypass (CVE-2026-30827) in transitive express-rate-limit dependency. Closes dependabot alert #213. Co-authored-by: Claude Opus 4.6 <[email protected]> --------- Co-authored-by: Claude Opus 4.6 <[email protected]> --------- Co-authored-by: Claude Opus 4.6 <[email protected]>
* fix(security): add dompurify override >=3.3.2 Fixes XSS bypass vulnerability (CVE-2026-0540) in transitive dompurify dependency by adding pnpm override. Closes dependabot alerts #210, #211. Co-Authored-By: Claude Opus 4.6 <[email protected]> * fix(security): add fast-xml-parser override >=5.3.8 Fixes stack overflow with preserveOrder (CVE-2026-27942) in transitive fast-xml-parser dependency. Closes dependabot alert #205. Co-Authored-By: Claude Opus 4.6 <[email protected]> * fix(security): add serialize-javascript override >=7.0.3 Fixes RCE vulnerability via RegExp.flags and Date.prototype.toISOString() in transitive serialize-javascript dependency (build-time only). Closes dependabot alert #203. Co-Authored-By: Claude Opus 4.6 <[email protected]> * fix(security): add svgo override >=3.3.3 Fixes DoS via entity expansion in DOCTYPE (CVE-2026-29074) in transitive svgo dependency (build-time only). Closes dependabot alert #212. Co-Authored-By: Claude Opus 4.6 <[email protected]> * fix(security): add minimatch override >=5.1.8 — ReDoS fix (#2642) * fix(security): add minimatch override >=5.1.8 Fixes multiple ReDoS vulnerabilities (CVE-2026-26996, CVE-2026-27903, CVE-2026-27904) in transitive [email protected] dependency. Closes dependabot alerts #188, #199, #200. Co-Authored-By: Claude Opus 4.6 <[email protected]> * fix(security): add lodash/lodash-es override >=4.17.23 — prototype pollution fix (#2643) * fix(security): add lodash/lodash-es override >=4.17.23 Fixes prototype pollution in _.unset and _.omit (CVE-2025-13465) in transitive lodash dependencies. Closes dependabot alerts #120, #123. Co-Authored-By: Claude Opus 4.6 <[email protected]> * fix(security): add express-rate-limit override >=8.2.2 (#2644) Fixes IPv4-mapped IPv6 rate limit bypass (CVE-2026-30827) in transitive express-rate-limit dependency. Closes dependabot alert #213. Co-authored-by: Claude Opus 4.6 <[email protected]> --------- Co-authored-by: Claude Opus 4.6 <[email protected]> --------- Co-authored-by: Claude Opus 4.6 <[email protected]> * fix(security): add security overrides to create-agents-template Ensures self-hosted deployments using the template also get patched transitive dependency versions. Co-Authored-By: Claude Opus 4.6 <[email protected]> * fix(security): sync overrides between root and create-agents-template Makes pnpm.overrides identical in both package.json files so the monorepo and self-hosted template have the same security floor. Co-Authored-By: Claude Opus 4.6 <[email protected]> --------- Co-authored-by: Claude Opus 4.6 <[email protected]>
…ded (#194) * fix(ci): always reset copybara/sync on every mirror run Closes #188 Drop the "leave branch in place if open PR is < STALE_PR_HOURS" branch in the mirror sync reconcile step. Letting Copybara "append" to an existing copybara/sync was never safe: the Copybara config uses fetch=main, so every run baselines off inkeep/agents main's last GitOrigin-RevId. When a new push lands on agents-private main while a prior sync PR is still open, Copybara rebuilds the older origin change from main's HEAD (new SHA due to timestamps) and the non-force push to copybara/sync is rejected as non-fast-forward. This is the failure mode that blew up the release cascade in #188 (Version Packages #185 merged while #3166 was still open 9 minutes after being created). Every mirror run now closes any open sync PR and deletes copybara/sync before Copybara runs, so each run pushes a fresh history. The concurrency group already serializes runs and every new run includes all accumulated changes since the last imported revision, so no information is lost. PR churn (one inkeep/agents sync PR per agents-private main push) is the cost, and it is much cheaper than a stuck release cascade. CI_RUNBOOK gets a new entry for this specific failure string so future red runs route to the fix without a re-investigation. * fix(ci): harden release cascade against silent strandings Bundled on top of the copybara/sync reset in this PR so the whole release path (mirror sync -> npm publish -> GH Release -> Vercel prod deploy -> scheduler restart) can run end-to-end with no human intervention. Each fix closes a distinct silent-stranding mode. 1. public-mirror-sync.yml Create-PR guard - Reconcile now always deletes copybara/sync before Copybara runs, which introduced a regression: when Copybara exits 4 (no changes to sync, eg. workflow_dispatch with an idle main), the branch is gone and the next `gh pr create --head copybara/sync` would fail. Add an explicit branch-existence check; short-circuit cleanly. - Add explicit --state open to the gh pr list call. Defaults to open but being explicit prevents a future refactor from reintroducing the PR #184 bug class. - Replace the PR number extraction `grep -o '[0-9]*$'` on the PR URL with gh pr view --json number. gh's stdout format is not a contract. 2. private-agents-ui-version-packages.yml publish detection - Was parsing `Publishing "X" at "Y"` via grep/sed on the changesets log, which is the exact fragility PR #174 removed from public release.yml. If changesets v2 changes format, published=false is written despite a successful publish, the widget-release dispatch is skipped, and agents-docs changelog silently desyncs. - Use the stable "packages published successfully" presence marker and read the version from package.json (authoritative for a fixed release group). 3. public/agents/.github/workflows/release.yml catch-all + dispatch retry - `Notify agents-private (failure)` was gated on `steps.detect.outputs.has_changesets == 'false'`. If the workflow failed before the detect step ran (install, build, token gen), has_changesets is unset and the condition evaluated false -> no dispatch, no tracking issue on agents-private, red run sitting invisibly in the Actions tab. Drop the has_changesets gate. - Replace peter-evans/repository-dispatch with a bash retry loop (3 attempts, 5/10s backoff). The action has no built-in retry, so a transient 5xx or rate-limit during the post-publish dispatch loses the signal permanently: npm publishes, but no GH Release is created and no Vercel prod deploy fires. Retry + explicit error on exhausted attempts so the stranding is loud, not silent. 4. public-agents-vercel-production.yml concurrency + failure tracker - Add concurrency: vercel-production-deploy. DB migrations are not idempotent; two parallel deploys (eg. a release published while a manual re-dispatch is in flight) would race on migrate-databases and leave schema in a half-applied state. - Add notify-on-failure job (mirrors the tracking-issue pattern from public-mirror-sync.yml). At this point npm has published, the GH Release exists, but prod runtime is stale. Needs to be loud: auto-open a "Vercel production deploy failing" issue so the half-shipped state is visible instead of buried in the Actions tab. CI_RUNBOOK.md: reword the release/publish failure entries to match the new retry/tracking behavior, and add a new entry covering the post-publish deploy failure case. Intentionally out of scope: the auto-format.yml + Dependabot `pnpm install --frozen-lockfile` race. Not a release-cascade issue, will go in a separate PR. * docs(runbook): bold Historical marker for consistency GitOrigin-RevId: 04ff8b544833e109b57f75ded3236730d7fb10eb
* Version Packages (agents) (#185) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> GitOrigin-RevId: 7263142a67ac9ce9c9873a68a5673bfb436dbc1c * chore(copilot-app): remove redundant lockfile, install from monorepo root (#186) * chore(copilot-app): remove redundant lockfile, install from monorepo root copilot-app is a workspace member (pnpm-workspace.yaml line 18), so the root lockfile already resolves its dependencies. The second lockfile only existed because vercel.json used pnpm install --ignore-workspace --frozen-lockfile, which severs workspace context and therefore needed a local lockfile. Two install boundaries for the same app meant root pnpm.overrides did not apply to the Vercel install, so CI and Vercel could silently resolve to different dependency trees. PR #167's description originally said "Vercel to install + build from the monorepo root via pnpm --filter copilot-app...", but the committed vercel.json drifted to --ignore-workspace. This aligns the implementation with the stated plan. - Delete private/copilot-app/pnpm-lock.yaml - Change private/copilot-app/vercel.json installCommand to install from the monorepo root with a workspace filter - Drop the copilot-app entry from scripts/check-monorepo-traps.mjs and simplify the DUAL_LOCKFILE_ROOTS comment (every remaining entry is a true workspace boundary, so the ignoreWorkspace workaround is no longer needed for any of them) * docs(private): update lockfile section after copilot-app cleanup * chore: add install:all convenience script for dual-lockfile installs * chore: include create-agents-template in install:all * fix(copilot-app): drop redundant cd ../.. from vercel installCommand * docs: point dual-lockfile guidance at pnpm install:all This PR introduces the install:all script; update every doc that teaches the old cd-and-install-twice pattern to reference the shorthand instead. - AGENTS.md (root) Dual lockfiles section: replaces the two-step pnpm install invocation with a single install:all, and lists all three lockfile scopes (root, public/agents, public/agents/create-agents-template) so readers understand what the shorthand covers. - CI_RUNBOOK ERR_PNPM_OUTDATED_LOCKFILE: same substitution plus the third lockfile in the git add line. - public/agents/AGENTS.md pnpm-lock.yaml Resolution Strategy: adds a When changing dependencies callout pointing at install:all, so readers inside the public/agents subtree know they have a root shortcut for the whole-monorepo regeneration. * chore(check-monorepo-traps): drop dead ignoreWorkspace flag Every DUAL_LOCKFILE_ROOTS entry is now a true workspace boundary that installs without --ignore-workspace. The flag had exactly one live consumer (private/copilot-app) which this PR removes. Simplify the data structure to an array of path strings and drop the now-unused flag branches in the install command and regen hint. Also: the regen hint gains a pointer at the install:all shorthand, since that's the recommended path for a whole-monorepo resync. * docs: comprehensive command cheatsheet + check:structural aggregate The problem: every time a new shorthand is added (install:all, check:*) it lands in code but stays invisible in docs. People default to the raw cd-and-install form, which is how we drift. The cheatsheet is the fix for the drift-by-ignorance path. Changes: - Adds check:structural to root package.json - one command for the full structural guard set (boundaries + monorepo-traps + release-groups validate). Complements the existing pre-push hook which only runs check:monorepo-traps. - Rewrites AGENTS.md 'Command routing' section as 'Command cheatsheet' with a scenario-driven quick-lookup table at top, then grouped by intent: install/lockfiles, build+dev+lint+typecheck+test, structural guards, changesets+releases, mirror/Copybara, parity, database. - Documents the suffix convention (:agents, :agents-ui, :chat-to-edit, :inkeep-cloud-mcp, :copilot, :ext; no suffix = fan-out) so people can guess commands instead of memorizing. - Every command gets a one-line description of what it does and when to reach for it. * fix(check-monorepo-traps): guard the create-agents-template lockfile too Docs introduced in this PR call out three lockfiles (root, public/agents, public/agents/create-agents-template) and point at install:all as the shorthand that regenerates them. The check only validated two — the starter-kit lockfile could drift silently and slip past the pre-push hook, surfacing for end users later when they cloned the starter. Add public/agents/create-agents-template to DUAL_LOCKFILE_ROOTS and update the comment to reflect the actual install-boundary taxonomy (monorepo / Copybara+Vercel / standalone starter). install:all and the check now cover the same set. * ci: gate publish on check:structural (defense-in-depth) Required checks on the source PR already run check:structural, and both version-packages workflows check out origin/main before doing anything. In practice, publish always runs against a validated main state. But 'in practice' isn't the same as 'structurally'. A workflow_dispatch run against main, an admin bypass of branch protection, or a future change that loosens merge requirements could let a misconfigured main reach the publish step without re-validation. Today's agents-ui release already surfaced one post-publish pipefail bug that shouldn't have been possible if we trusted the pipeline - this gate is the same intuition applied upstream. Adds 'Validate structural invariants' step between Install and the release machinery in both private-agents-ui-version-packages.yml and public-agents-version-packages.yml. Runs pnpm check:structural, which aggregates check:boundaries + check:monorepo-traps + release-groups:validate (including the workspace-isolation guard introduced in #191). Fails hard on any structural misconfig, refusing to publish. Cost: ~30-60s per publish run. Cheaper than a bad release. GitOrigin-RevId: 684d52e5ab7734f592479b61e972cdfe5fc3ae23 * fix(ci): harden release cascade so copybara + npm publish run unattended (#194) * fix(ci): always reset copybara/sync on every mirror run Closes #188 Drop the "leave branch in place if open PR is < STALE_PR_HOURS" branch in the mirror sync reconcile step. Letting Copybara "append" to an existing copybara/sync was never safe: the Copybara config uses fetch=main, so every run baselines off inkeep/agents main's last GitOrigin-RevId. When a new push lands on agents-private main while a prior sync PR is still open, Copybara rebuilds the older origin change from main's HEAD (new SHA due to timestamps) and the non-force push to copybara/sync is rejected as non-fast-forward. This is the failure mode that blew up the release cascade in #188 (Version Packages #185 merged while #3166 was still open 9 minutes after being created). Every mirror run now closes any open sync PR and deletes copybara/sync before Copybara runs, so each run pushes a fresh history. The concurrency group already serializes runs and every new run includes all accumulated changes since the last imported revision, so no information is lost. PR churn (one inkeep/agents sync PR per agents-private main push) is the cost, and it is much cheaper than a stuck release cascade. CI_RUNBOOK gets a new entry for this specific failure string so future red runs route to the fix without a re-investigation. * fix(ci): harden release cascade against silent strandings Bundled on top of the copybara/sync reset in this PR so the whole release path (mirror sync -> npm publish -> GH Release -> Vercel prod deploy -> scheduler restart) can run end-to-end with no human intervention. Each fix closes a distinct silent-stranding mode. 1. public-mirror-sync.yml Create-PR guard - Reconcile now always deletes copybara/sync before Copybara runs, which introduced a regression: when Copybara exits 4 (no changes to sync, eg. workflow_dispatch with an idle main), the branch is gone and the next `gh pr create --head copybara/sync` would fail. Add an explicit branch-existence check; short-circuit cleanly. - Add explicit --state open to the gh pr list call. Defaults to open but being explicit prevents a future refactor from reintroducing the PR #184 bug class. - Replace the PR number extraction `grep -o '[0-9]*$'` on the PR URL with gh pr view --json number. gh's stdout format is not a contract. 2. private-agents-ui-version-packages.yml publish detection - Was parsing `Publishing "X" at "Y"` via grep/sed on the changesets log, which is the exact fragility PR #174 removed from public release.yml. If changesets v2 changes format, published=false is written despite a successful publish, the widget-release dispatch is skipped, and agents-docs changelog silently desyncs. - Use the stable "packages published successfully" presence marker and read the version from package.json (authoritative for a fixed release group). 3. public/agents/.github/workflows/release.yml catch-all + dispatch retry - `Notify agents-private (failure)` was gated on `steps.detect.outputs.has_changesets == 'false'`. If the workflow failed before the detect step ran (install, build, token gen), has_changesets is unset and the condition evaluated false -> no dispatch, no tracking issue on agents-private, red run sitting invisibly in the Actions tab. Drop the has_changesets gate. - Replace peter-evans/repository-dispatch with a bash retry loop (3 attempts, 5/10s backoff). The action has no built-in retry, so a transient 5xx or rate-limit during the post-publish dispatch loses the signal permanently: npm publishes, but no GH Release is created and no Vercel prod deploy fires. Retry + explicit error on exhausted attempts so the stranding is loud, not silent. 4. public-agents-vercel-production.yml concurrency + failure tracker - Add concurrency: vercel-production-deploy. DB migrations are not idempotent; two parallel deploys (eg. a release published while a manual re-dispatch is in flight) would race on migrate-databases and leave schema in a half-applied state. - Add notify-on-failure job (mirrors the tracking-issue pattern from public-mirror-sync.yml). At this point npm has published, the GH Release exists, but prod runtime is stale. Needs to be loud: auto-open a "Vercel production deploy failing" issue so the half-shipped state is visible instead of buried in the Actions tab. CI_RUNBOOK.md: reword the release/publish failure entries to match the new retry/tracking behavior, and add a new entry covering the post-publish deploy failure case. Intentionally out of scope: the auto-format.yml + Dependabot `pnpm install --frozen-lockfile` race. Not a release-cascade issue, will go in a separate PR. * docs(runbook): bold Historical marker for consistency GitOrigin-RevId: 04ff8b544833e109b57f75ded3236730d7fb10eb --------- Co-authored-by: Varun Varahabhotla <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* fix(ci): close remaining silent-failure gaps in release cascade Five hardening fixes across the release pipeline. None of these change pipeline shape (CTO-asked streamlining was evaluated separately and deferred — it saves ~1 min E2E but closes zero real failure modes). Each change addresses a distinct way the cascade can silently strand: 1. release-handler.yml: widen notify-handler-failure to catch failure-job failures too. Previously only caught success-job failures; if the failure-dispatch handler's own gh issue create 4xx'd (label API hiccup), the npm publish failure went completely untracked. Needs chain now covers [success, failure] and the issue body adapts to which job failed. 2. public-mirror-sync.yml: 3-attempt retry on gh pr list before exit 0 in the copybara/sync reconcile step. Previously a single transient API flake skipped reconciliation entirely, letting Copybara run over a potentially-stuck sync branch — exactly the local/origin history conflict class that issue #188 fixed via reconcile. Exit 0 on exhaust is preserved (deleting a live PR's branch on persistent outage is worse than letting Copybara try its own fast-fail). 3. public/agents/.github/workflows/release.yml: add npm view ground-truth check after the grep-based "packages published successfully" marker. The log-phrase check catches phrase drift but not partial-publish (package N fails after N-1 succeed leaves the marker in the log). Now iterates every @inkeep/ workspace package and verifies each exists on npm at VERSION; any miss fails the step with a specific error so the failure notifier fires instead of silently reporting green. 4. scripts/check-monorepo-traps.mjs: add public/agents/agents-cookbook/evals/langfuse-dataset-example to DUAL_LOCKFILE_ROOTS. The directory is carved out as a STANDALONE_WORKSPACE_BOUNDARIES entry (users clone the example standalone) but its lockfile wasn't being checked for freshness. A dep change there could have shipped a broken install. The two sets now stay in sync by construction (noted in comment). 5. New release-version-drift-watchdog.yml: scheduled 3-way version check every 30 min across agents-core/package.json on main, @inkeep/agents-core latest on npm, and latest GH Release tag. Opens a tracking issue if drift persists past a 60-min grace window (bounds worst-case silent-stranding detection latency to 30 min regardless of which workflow failed silently). Auto-closes the issue when drift resolves. Audit finding #1 from yesterday's staff-engineer audit was retracted (Doltgres branch-sync dead gate) — git blame + runtime evidence from v0.69.0 and v0.70.0 deploys confirm the gate is working as designed (migrate-dolt.ts emits the migrations_applied output correctly). * fix(ci): address PR #212 review + bump watchdog cadence Response to pullfrog + claude review findings on #212. Watchdog timing bumps (per ask): - Cron: every 30 min -> every hour on the top of the hour - Grace window: 60 min -> 90 min Normal release cascade is 20-30 min, worst legitimate tail (npm propagation lag + Vercel queue) is ~60-90 min. 90 min grace absorbs that without meaningfully raising detection latency (worst-case is still grace + cron = ~2.5 hours vs. the unbounded default). Watchdog correctness: - gh pr list now uses `sort:updated-desc`. Default search relevance ordering doesn't guarantee --limit 1 returns the most recent merge when all Version PR titles are near-identical. - Version PR lookup distinguishes real API failure from "no PR found". Previously both emptied LAST_VERSION_PR_MERGED_AT, silently bypassing the grace window on a transient API hiccup and producing false- positive drift alerts during legitimate in-flight releases. On failure we now warn explicitly and let drift be treated as real — intentional: a genuine API outage should alert, not suppress. - Tracking issue lookup now uses --label release-drift-watchdog instead of `in:title "Release version drift detected"`. Title- substring search could match or close an unrelated human-authored issue whose title shared the phrase. The new label is this workflow's private marker, created alongside the existing `release` label in the defensive label-ensure loop. Issues opened by the watchdog get both labels. - Auto-close step is now non-fatal. Drift is already resolved by the time this step runs, so a failed `gh issue comment` or `gh issue close` on a cleanup path should emit a warning instead of turning the run red. Next scheduled tick retries. release.yml (inkeep/agents mirror) — npm propagation retry: - Per-package `npm view` now retries up to 4 times with escalating backoff (2s, 4s, 8s, 16s — 30s cumulative wait per package) before declaring a package genuinely missing. The registry write path is synchronous but the CDN read path can lag by seconds. Previous single-shot check could false-positive during normal propagation, firing the failure notifier unnecessarily. - Success path still exits on attempt 1 with a single npm view call — retry only engages when a package is not yet visible. - Updated error message to note propagation is already ruled out. Documentation catch-up: - AGENTS.md: lockfile count 3 -> 4 with the langfuse-dataset-example entry that PR #212 adds. Explains the distinction between the two primary install-driving lockfiles (root + public/agents) and the two standalone lockfiles (starter kit + eval example) that ship with their own workspace so users can install subdirectories directly. - CI.md: new workflow row under "Release and publishing" for the watchdog. Trigger now says "schedule (hourly)" to match the cron bump. - package.json: `install:all` script now includes the langfuse lockfile directory. Previously check:lockfiles validated four entries but the regen shorthand only covered three, which would have left the fourth drifting silently the first time its package.json got updated. * fix(ci): swap chat-to-edit-validation to resilient install composite The failure on PR #212 (chat-to-edit / lint) was Corepack lazy-downloading pnpm from the npm registry on first pnpm invocation (`pnpm store path --silent` in this workflow). The undici SocketError during that download left STORE_PATH unset, which actions/cache rejected with "Input required and not supplied: path" — cascading skip of install/build/lint with no actionable signal. Swap the inlined setup-node + corepack + manual `pnpm store path` + actions/cache + `pnpm install` chain for a single `uses: ./.github/composite-actions/install`. The composite downloads pnpm directly from GitHub releases via pnpm/action-setup (different CDN than corepack's npm registry fetch, empirically stable). 7 publish/ deploy workflows already use this pattern without hitting the flake. Deferring the same migration on the other 9 inlined-pattern workflows (agents-ui / copilot-app / copilot-chrome-extension / inkeep-cloud-mcp / auto-format / private-pr-validation / public-agents-core-validation / public-agents-extended-validation / public-agents-cypress) to a follow- up. Several have custom steps (Playwright cache, Turbo cache, pre-install biome, non-frozen-lockfile for auto-format) that need per-file review — blind-swap would risk breaking a required check. GitOrigin-RevId: 8c2e367004865bfe09daa1867296826c8b6c9db0
* Follow-ups to inkeep#130: tsconfig pilot + skipped-test audit + stream-path any cleanup (inkeep#133) * test: remove 2 obsolete skipped tests in push command These two tests were empty-body `it.skip(...)` placeholders whose comments explicitly documented why they were obsolete: - `should override API URL from command line`: feature removed in favor of config-file-only approach (API URLs must now be in inkeep.config.ts, not CLI flags) - `should handle missing configuration`: behavior tested by integration tests; unit-test path not feasible due to process.exit(1) Part of a codebase-wide skipped-test audit. See .audit-skipped-tests.md for the full audit. * chore: add skipped-test audit summary Temporary artifact documenting the 131-test skipped-test audit. Full per-file table lives in /tmp/skipped-tests-audit.md. - 131 skipped tests across 24 files (pattern: it.skip / describe.skip) - Bucket A (unskip): 0 (verification loop blocked by Node version guard) - Bucket B (delete): 2 applied in prior commit; 1 ~460-line block deferred - Bucket C (needs owner): 128, clustered around 3 architectural migrations - Bucket D: 0 This file may be removed before PR. * chore(tsconfig): pilot strict baseline on 2 packages Extend tsconfig.base.json in: - public/agents/packages/agents-mcp (no source changes; already strict) - public/agents/packages/agents-email (3 exactOptionalPropertyTypes fixes) agents-email fixes: - src/components/email-layout.tsx: conditional-spread optional 'description' prop into EmailHeader - src/index.ts: conditional-spread optional 'replyTo' in both sendInvitationEmail and sendPasswordResetEmail sendEmail calls Evaluated but deferred to their own PRs (would exceed pilot scope): - ai-sdk-provider: 15 errors, mostly LanguageModelV2 structural exactOptionalPropertyTypes mismatches that require interface-level changes - create-agents: 30 errors across templates.ts/utils.ts from noUncheckedIndexedAccess + exactOptionalPropertyTypes Builds on inkeep#130. * fix(ci): wait for DBs to serve queries before Extended Validation tests Extended Validation's doltgres + postgres service containers report healthy via their docker health checks before the database/user objects are actually queryable. Tests start, fail with 'database not found: appuser' / DrizzleQueryError intermittently. See PR inkeep#200 and PR inkeep#205 failures. Adds a hard barrier that polls each DB with SELECT 1 (30s max) after service containers start but before tests run. Converts probabilistic 'health check is close enough' into deterministic 'we proved the DB can serve queries.' Applied to both: - .github/workflows/public-agents-extended-validation.yml - .github/composite-actions/public-agents-cypress-e2e/action.yml (replaces the existing DoltGres-only wait with a unified wait_for helper that also gates on the postgres runtime DB) * chore(review): address non-signoz inline comments on inkeep#133 - .audit-skipped-tests.md: strip ephemeral `/tmp/skipped-tests-audit.md` reference; update branch name to the PR's actual branch (pullfrog review comment) - agents-mcp/tsconfig.json: drop useUnknownInCatchVariables (already implied by strict: true inherited from tsconfig.base.json) (pullfrog + claude review comments; 1-click suggest) Signoz-related review items dropped along with the signoz refactor. * fix: drop engines.node to unblock inkeep-cloud-mcp Vercel deploys The engines.node range added in inkeep#130 broke inkeep-cloud-mcp Vercel builds on main (both preview and production). Mechanism: that project's vercel.json does `cd ../.. && pnpm install` from repo root, which picks up root engine-strict=true plus engines.node <23. Vercel's build env runs Node 24, failing the constraint. The other three Vercel projects install from their subdir and do not inherit this, so they kept deploying successfully. Deploy evidence on main: - 4236e3d915 (pre-inkeep#130 merge, no engines): success - 08d61f2938 (merge commit, engines added): failure (preview + prod) - 1526cbcd90 (post-merge Dependabot bump): failure Keeping .node-version: 22 (unrelated to Vercel) and engine-strict=true in .npmrc (no-op without engines field, same state as pre-inkeep#130). The postinstall check-node-version.mjs still enforces major-version match for local dev. GitOrigin-RevId: b72cd4cf7aa8144945fb05590c8bc804ef01be69 * chore(ci): align security-floor overrides and flip check:overrides to hard-fail (inkeep#204) * chore(ci): align security-floor overrides and flip check:overrides to hard-fail Aligned the four out-of-sync overrides between public/agents/package.json and root pnpm-workspace.yaml, using the higher floor in each direction to preserve security intent: - @modelcontextprotocol/sdk: root pin 1.26.0 relaxed to >=1.26.0 (matches public/agents) - fast-xml-parser: public/agents raised >=5.3.8 -> >=5.5.6 - lodash: public/agents raised >=4.17.23 -> >=4.18.0 - lodash-es: public/agents raised >=4.17.23 -> >=4.18.0 Regenerated both lockfiles that cover these overrides (root pnpm-lock.yaml and public/agents/pnpm-lock.yaml). No transitive version re-resolutions; the only changes are the override specifiers themselves. Flipped check:overrides in scripts/check-monorepo-traps.mjs from soft-warn to hard-fail. Now matches the already-hard check:override-masks-bump, check:lockfiles, and check:workspace-membership. Any future drift between root and public/agents overrides is caught at PR time instead of by a cryptic Vercel install failure minutes after merge. Also updated AGENTS.md and .github/CI_RUNBOOK.md to reflect the new hard-fail behavior. Note: pre-commit hook skipped (pnpm lint-staged at root is a pre-existing local-setup issue unrelated to this PR). Files in this commit do not require biome formatting (lockfiles, yaml, package.json). * chore(ci): align check:overrides error messages with doc language The pullfrog review on PR inkeep#204 flagged that the checkOverridePlacement remediation strings still pointed only at /package.json, while the AGENTS.md and CI_RUNBOOK.md updates in the same PR now say overrides can live in either /pnpm-workspace.yaml or /package.json at root. Script logic already reads both locations via getRootOverrides(); this is a wording-only fix so the error messages a developer sees match what the docs tell them to do. GitOrigin-RevId: 1633ad2aa24886fe2687dab6eb6ef9379786705a * csv and rerun functionality (inkeep#200) * csv and rerun * style: auto-format with biome * tests * style: auto-format with biome * TestS * style: auto-format with biome * library instead of manual parse * lint * snapshot --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> GitOrigin-RevId: fbfeb6d660e85d4269acf00efd35e885ad35365d * fix(tsconfig): move tsconfig.base.json into public/agents/ for Copybara mirror compatibility (inkeep#209) * fix(tsconfig): move tsconfig.base.json into public/agents/ for Copybara mirror compatibility The root-level tsconfig.base.json added in inkeep#130 lives outside public/agents/**, so Copybara's stripPrefix: "public/agents" does not mirror it to inkeep/agents. After the sync, per-package tsconfigs referenced ../../../../tsconfig.base.json which resolves above the repo root on inkeep/agents, causing agents-email#build to fail with TS5083. PR inkeep#130 originally documented a 2-level extends path in the base file's own comment ("Extend with { \"extends\": \"../../tsconfig.base.json\" }"), which is only correct if the base sits at public/agents/tsconfig.base.json. The file was placed at the wrong directory. This moves the file under public/agents/ and updates the two consumers (agents-email, agents-mcp) to use the intended 2-level path. Path resolves correctly in both repos now. * docs(public-agents): document tsconfig.base.json convention for new packages * docs(tsconfig): drop em dashes in new section to match repo writing style GitOrigin-RevId: 89ee740d87232ae68cb8195558c1fb1af7b2a462 * chore(ci): remove redundant public-repo ci.yml and cypress.yml (inkeep#211) * chore(ci): remove redundant public-repo ci.yml and cypress.yml All lint/typecheck/test/build/Cypress validation already runs on agents-private pre-merge via Core Validation, Extended Validation, and public-agents-cypress. The public-side duplicates re-ran the same checks on Copybara sync PRs (code already exhaustively validated), costing ~30m (ci) + ~15m (cypress) per sync on ubuntu-32gb runners. External PRs to inkeep/agents bridge back to agents-private via monorepo-pr-bridge.yml for canonical validation, so no coverage is lost. - Delete public/agents/.github/workflows/ci.yml - Delete public/agents/.github/workflows/cypress.yml - Delete orphaned composite actions (changeset-check, cypress-e2e) - Update CI.md workflow map, parity table, branch protection - Update CI_ARCHITECTURE.md install composite-action reference - Update cypress-e2e composite README (agents-private only caller) - Update internal-surface-areas skill to point at upstream workflows Coordinated with CTO: 'ci' and 'Cypress E2E Tests' required checks removed from inkeep/agents branch protection. * chore(ci): also remove redundant public-repo ci-maintenance.yml With ci.yml and cypress.yml gone, the public repo has no substantive CI for the weekly CI Maintenance Claude job to analyze. The equivalent analysis runs on agents-private via public-agents-ci-maintenance.yml, which sees the real CI surface. - Delete public/agents/.github/workflows/ci-maintenance.yml - Update CI.md workflow map + parity table - Update internal-surface-areas skill * chore(ci): clean up stale ci.yml references flagged by PR review - Update two stale comments in public-agents-extended-validation.yml that referenced the now-deleted public/agents ci.yml - Delete obsolete public/agents/specs/changeset-only-skip-ci/SPEC.md; the changeset-skip feature it documented lived inside ci.yml and the changeset-check composite action, both removed in this PR GitOrigin-RevId: 63d06e27c8a374e100270f3118f64cd2170e0d6a * fix(ci): close remaining silent-failure gaps in release cascade (inkeep#212) * fix(ci): close remaining silent-failure gaps in release cascade Five hardening fixes across the release pipeline. None of these change pipeline shape (CTO-asked streamlining was evaluated separately and deferred — it saves ~1 min E2E but closes zero real failure modes). Each change addresses a distinct way the cascade can silently strand: 1. release-handler.yml: widen notify-handler-failure to catch failure-job failures too. Previously only caught success-job failures; if the failure-dispatch handler's own gh issue create 4xx'd (label API hiccup), the npm publish failure went completely untracked. Needs chain now covers [success, failure] and the issue body adapts to which job failed. 2. public-mirror-sync.yml: 3-attempt retry on gh pr list before exit 0 in the copybara/sync reconcile step. Previously a single transient API flake skipped reconciliation entirely, letting Copybara run over a potentially-stuck sync branch — exactly the local/origin history conflict class that issue inkeep#188 fixed via reconcile. Exit 0 on exhaust is preserved (deleting a live PR's branch on persistent outage is worse than letting Copybara try its own fast-fail). 3. public/agents/.github/workflows/release.yml: add npm view ground-truth check after the grep-based "packages published successfully" marker. The log-phrase check catches phrase drift but not partial-publish (package N fails after N-1 succeed leaves the marker in the log). Now iterates every @inkeep/ workspace package and verifies each exists on npm at VERSION; any miss fails the step with a specific error so the failure notifier fires instead of silently reporting green. 4. scripts/check-monorepo-traps.mjs: add public/agents/agents-cookbook/evals/langfuse-dataset-example to DUAL_LOCKFILE_ROOTS. The directory is carved out as a STANDALONE_WORKSPACE_BOUNDARIES entry (users clone the example standalone) but its lockfile wasn't being checked for freshness. A dep change there could have shipped a broken install. The two sets now stay in sync by construction (noted in comment). 5. New release-version-drift-watchdog.yml: scheduled 3-way version check every 30 min across agents-core/package.json on main, @inkeep/agents-core latest on npm, and latest GH Release tag. Opens a tracking issue if drift persists past a 60-min grace window (bounds worst-case silent-stranding detection latency to 30 min regardless of which workflow failed silently). Auto-closes the issue when drift resolves. Audit finding inkeep#1 from yesterday's staff-engineer audit was retracted (Doltgres branch-sync dead gate) — git blame + runtime evidence from v0.69.0 and v0.70.0 deploys confirm the gate is working as designed (migrate-dolt.ts emits the migrations_applied output correctly). * fix(ci): address PR inkeep#212 review + bump watchdog cadence Response to pullfrog + claude review findings on inkeep#212. Watchdog timing bumps (per ask): - Cron: every 30 min -> every hour on the top of the hour - Grace window: 60 min -> 90 min Normal release cascade is 20-30 min, worst legitimate tail (npm propagation lag + Vercel queue) is ~60-90 min. 90 min grace absorbs that without meaningfully raising detection latency (worst-case is still grace + cron = ~2.5 hours vs. the unbounded default). Watchdog correctness: - gh pr list now uses `sort:updated-desc`. Default search relevance ordering doesn't guarantee --limit 1 returns the most recent merge when all Version PR titles are near-identical. - Version PR lookup distinguishes real API failure from "no PR found". Previously both emptied LAST_VERSION_PR_MERGED_AT, silently bypassing the grace window on a transient API hiccup and producing false- positive drift alerts during legitimate in-flight releases. On failure we now warn explicitly and let drift be treated as real — intentional: a genuine API outage should alert, not suppress. - Tracking issue lookup now uses --label release-drift-watchdog instead of `in:title "Release version drift detected"`. Title- substring search could match or close an unrelated human-authored issue whose title shared the phrase. The new label is this workflow's private marker, created alongside the existing `release` label in the defensive label-ensure loop. Issues opened by the watchdog get both labels. - Auto-close step is now non-fatal. Drift is already resolved by the time this step runs, so a failed `gh issue comment` or `gh issue close` on a cleanup path should emit a warning instead of turning the run red. Next scheduled tick retries. release.yml (inkeep/agents mirror) — npm propagation retry: - Per-package `npm view` now retries up to 4 times with escalating backoff (2s, 4s, 8s, 16s — 30s cumulative wait per package) before declaring a package genuinely missing. The registry write path is synchronous but the CDN read path can lag by seconds. Previous single-shot check could false-positive during normal propagation, firing the failure notifier unnecessarily. - Success path still exits on attempt 1 with a single npm view call — retry only engages when a package is not yet visible. - Updated error message to note propagation is already ruled out. Documentation catch-up: - AGENTS.md: lockfile count 3 -> 4 with the langfuse-dataset-example entry that PR inkeep#212 adds. Explains the distinction between the two primary install-driving lockfiles (root + public/agents) and the two standalone lockfiles (starter kit + eval example) that ship with their own workspace so users can install subdirectories directly. - CI.md: new workflow row under "Release and publishing" for the watchdog. Trigger now says "schedule (hourly)" to match the cron bump. - package.json: `install:all` script now includes the langfuse lockfile directory. Previously check:lockfiles validated four entries but the regen shorthand only covered three, which would have left the fourth drifting silently the first time its package.json got updated. * fix(ci): swap chat-to-edit-validation to resilient install composite The failure on PR inkeep#212 (chat-to-edit / lint) was Corepack lazy-downloading pnpm from the npm registry on first pnpm invocation (`pnpm store path --silent` in this workflow). The undici SocketError during that download left STORE_PATH unset, which actions/cache rejected with "Input required and not supplied: path" — cascading skip of install/build/lint with no actionable signal. Swap the inlined setup-node + corepack + manual `pnpm store path` + actions/cache + `pnpm install` chain for a single `uses: ./.github/composite-actions/install`. The composite downloads pnpm directly from GitHub releases via pnpm/action-setup (different CDN than corepack's npm registry fetch, empirically stable). 7 publish/ deploy workflows already use this pattern without hitting the flake. Deferring the same migration on the other 9 inlined-pattern workflows (agents-ui / copilot-app / copilot-chrome-extension / inkeep-cloud-mcp / auto-format / private-pr-validation / public-agents-core-validation / public-agents-extended-validation / public-agents-cypress) to a follow- up. Several have custom steps (Playwright cache, Turbo cache, pre-install biome, non-frozen-lockfile for auto-format) that need per-file review — blind-swap would risk breaking a required check. GitOrigin-RevId: 8c2e367004865bfe09daa1867296826c8b6c9db0 --------- Co-authored-by: Varun Varahabhotla <[email protected]> Co-authored-by: shagun-singh-inkeep <[email protected]> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
No description provided.