Skip to content

cloning json#157

Merged
robert-inkeep merged 8 commits intomainfrom
debug
Sep 13, 2025
Merged

cloning json#157
robert-inkeep merged 8 commits intomainfrom
debug

Conversation

@shagun-singh-inkeep
Copy link
Copy Markdown
Collaborator

No description provided.

@vercel
Copy link
Copy Markdown

vercel Bot commented Sep 13, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
agents-manage-api Ready Ready Preview Comment Sep 13, 2025 0:56am
agents-manage-ui Ready Ready Preview Comment Sep 13, 2025 0:56am
agents-run-api Ready Ready Preview Comment Sep 13, 2025 0:56am
1 Skipped Deployment
Project Deployment Preview Comments Updated (UTC)
agents-docs Skipped Skipped Sep 13, 2025 0:56am

@vercel vercel Bot temporarily deployed to Preview – agents-docs September 13, 2025 00:50 Inactive
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Sep 13, 2025

Claude finished @shagun-singh-inkeep's task —— View job


Code Review: PR #157 - "cloning json"

Review Progress

  • Read modified file and understand changes
  • Research existing codebase patterns and context
  • Review CLAUDE.md for repo guidelines
  • Analyze changes for engineering best practices
  • Provide comprehensive review feedback

Summary

This PR makes several changes to the HTTP request body parsing middleware in the agents-run-api service. The key changes include cloning the request body to avoid consumption issues, adding extensive debug logging for OpenTelemetry baggage, and reorganizing imports.

Analysis

Changes Made:

  1. Request body cloning: Lines 204-206 now clone the request before parsing JSON to extract conversation ID
  2. Import reorganization: Moved Hono import to the top (cosmetic change)
  3. Enhanced debug logging: Added baggage entry logging and verification (lines 233-246)
  4. Unused parameter cleanup: Prefixed unused middleware parameters with underscore
  5. TODO comment: Added //look into comment on line 202

Issues and Recommendations

🔴 Critical Issues

1. Request Body Consumption Race Condition (High Priority)

  • Location: agents-run-api/src/app.ts:204-206
  • Issue: The current approach of cloning c.req.raw and parsing JSON may still consume the original request body, making it unavailable for downstream route handlers
  • Evidence: In Hono, once a request body is read via c.req.json(), it cannot be read again. Even with cloning, there are edge cases where this can fail
  • Impact: Could cause 400 errors or empty request bodies in chat endpoints
  • Recommendation:
    // Better approach - read body once and store it
    const body = await c.req.json().catch(() => null);
    c.set('parsedBody', body); // Make available to downstream handlers
    conversationId = body?.conversationId;

2. Incomplete TODO Investigation

  • Location: agents-run-api/src/app.ts:202
  • Issue: The //look into comment suggests this code needs investigation but provides no context
  • Impact: Technical debt and unclear code intent
  • Recommendation: Either complete the investigation or document what needs to be looked into

🟡 Architecture & Design Concerns

3. Middleware Ordering and Responsibility

  • Issue: The baggage middleware (lines 187-248) has grown complex and handles multiple concerns: context extraction, JSON parsing, baggage setup, and extensive logging
  • Impact: Violates single responsibility principle, harder to test and maintain
  • Recommendation: Consider splitting into focused middleware functions:
    app.use('*', extractConversationContext());
    app.use('*', setupBaggage());
    app.use('*', logBaggageForDebug()); // Only in development

4. Inconsistent Error Handling

  • Location: agents-run-api/src/app.ts:207-209
  • Issue: Silent catch with debug logging vs other middleware that throws or returns errors
  • Pattern: Other parts of the codebase (like contextValidation.ts:414) properly handle JSON parsing failures
  • Recommendation: Align with existing error handling patterns

5. Excessive Debug Logging in Production

  • Location: agents-run-api/src/app.ts:233-246
  • Issue: Logging every baggage entry twice per request will create noise in production logs
  • Impact: Performance overhead and log volume issues
  • Recommendation:
    if (process.env.NODE_ENV === 'development' || process.env.LOG_LEVEL === 'debug') {
      logger.debug({ baggageEntries }, 'Baggage set');
    }

🟢 Code Quality & Best Practices

6. Import Organization

  • Observation: The import reordering (moving Hono to top) improves consistency
  • Recommendation: Consider using an automated import sorter like eslint-plugin-import to maintain consistency

7. Parameter Naming

  • Positive: Properly prefixed unused parameters with underscore (_c) following TypeScript conventions

Comparison with Existing Patterns

Request Body Handling: The existing contextValidation.ts:414 shows the standard pattern:

const body = await c.req.json();
const conversationId = body.conversationId || '';

Your approach with cloning is more defensive but may not solve the underlying issue.

Testing Recommendations

Since this affects critical request processing middleware:

  1. Unit Tests Needed: Test body cloning behavior with various request types
  2. Integration Tests: Ensure downstream handlers can still access request body
  3. Load Testing: Verify the additional logging doesn't impact performance
  4. Error Case Testing: Test behavior when JSON parsing fails

Missing Components (Per Repository Guidelines)

This PR appears to be debugging/maintenance work rather than a new feature, so the standard requirements (tests, UI components, docs) may not apply. However:

  • Tests: No tests added for the new body cloning logic
  • Documentation: The //look into comment should be expanded

Severity Assessment

Overall Assessment

This PR addresses a real issue (request body consumption) but the solution may be incomplete. The extensive debug logging suggests active debugging of baggage/tracing issues, which is valuable for troubleshooting but should be production-ready.

Recommendation: Address the request body consumption issue with a more robust solution before merging, and consider the middleware complexity for future refactoring.


@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Sep 13, 2025

Claude finished @shagun-singh-inkeep's task —— View job


Code Review Complete

I've analyzed PR #157 "cloning json" across two commits. Here's my comprehensive review as a TypeScript Staff Engineer and System Architect:

Summary of Changes

  1. Import reordering - Moving Hono import to the top
  2. Request cloning for body parsing (agents-run-api/src/app.ts:204-206)
  3. Enhanced debugging logging - Added baggage entry logging and verification
  4. Parameter cleanup - Prefixed unused context parameters with underscore

🚨 CRITICAL ISSUE: Request Body Consumption Race Condition

Location: agents-run-api/src/app.ts:204-206

The change from await c.req.json() to c.req.raw.clone().json() fixes a critical race condition where the request body was being consumed twice:

  1. First in this baggage middleware for extracting conversationId
  2. Later in downstream handlers that also need the request body

However, there are several concerns with the current implementation:

1. Inconsistent Pattern Across Codebase

  • contextValidation.ts:414 uses await c.req.json() directly
  • ❌ This middleware now uses c.req.raw.clone().json()
  • ❌ Other handlers still use await c.req.json()

Recommendation: Establish a consistent pattern. Consider either:

  • Option A: Use c.req.raw.clone() consistently everywhere request body needs to be read
  • Option B: Implement a middleware that parses body once and makes it available via c.set('parsedBody', body)

2. Error Handling Inconsistency

// Current implementation - silent failure
const body = await cloned.json().catch(() => null);

// vs contextValidation.ts - throws error
const body = await c.req.json(); // No error handling

Recommendation: Align error handling strategies. The silent failure here might mask legitimate JSON parsing errors.

3. Performance Impact

Cloning requests adds overhead. For high-traffic APIs, consider parsing body once in earlier middleware.


🔍 ARCHITECTURAL CONSIDERATIONS

Request Body Parsing Strategy

The codebase lacks a unified approach to request body parsing. Consider implementing:

// Suggested middleware approach
app.use('*', async (c, next) => {
  if (c.req.header('content-type')?.includes('application/json')) {
    try {
      const body = await c.req.json();
      c.set('parsedBody', body);
    } catch (error) {
      // Handle parsing errors consistently
    }
  }
  return next();
});

Baggage Context Management

The extensive baggage logging (lines 233-246) suggests debugging needs. Consider:

  • Production Impact: These logs will be very verbose in production
  • Conditional Logging: Use debug level or environment-based toggling
  • Performance: Multiple baggage operations per request

🛡️ SECURITY & RELIABILITY

Request Cloning Safety

  • ✅ Using .clone() prevents body consumption conflicts
  • ⚠️ Ensure all places that read request body use consistent approach
  • ⚠️ Consider memory implications of cloning large payloads

Error Boundaries

The middleware has good error handling, but consider:

  • What happens if clone() fails?
  • Should malformed JSON fail silently or return 400?

🔧 TECHNICAL DEBT & MAINTAINABILITY

Comment Quality

Line 202: //look into - This TODO comment should either be:

  • Expanded with specific investigation points
  • Converted to a proper GitHub issue
  • Removed if investigation is complete

Variable Naming

The cloned variable name is clear, but consider clonedRequest for better clarity in complex middleware.


POSITIVE ASPECTS

  1. Problem Recognition: Correctly identified and fixed the request body consumption issue
  2. Debugging Enhancement: Added comprehensive baggage logging for troubleshooting
  3. Code Cleanup: Proper parameter prefixing for unused variables
  4. Non-Breaking: Changes maintain API compatibility

📋 RECOMMENDATIONS

High Priority

  1. Standardize Request Body Parsing: Implement consistent pattern across all handlers
  2. Error Handling Strategy: Align JSON parsing error handling with existing patterns
  3. Remove TODO Comment: Address the //look into comment

Medium Priority

  1. Performance Review: Evaluate baggage logging impact in production
  2. Documentation: Add inline docs explaining the cloning approach
  3. Testing: Add tests for request body consumption scenarios

Low Priority

  1. Import Organization: Consider automated import sorting tools
  2. Monitoring: Add metrics for request parsing failures

🧪 TESTING GAPS

Consider adding tests for:

  • Multiple middleware reading same request body
  • Malformed JSON handling in baggage extraction
  • Large request body cloning performance
  • Baggage context propagation

Overall Assessment: This PR fixes a critical race condition but introduces consistency issues. The core fix is sound, but the implementation should align with broader codebase patterns.

@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Sep 13, 2025

Claude finished @shagun-singh-inkeep's task —— View job


Code Review: PR #157 - "cloning json"

Analyzing PR #157 as a TypeScript Staff Engineer and System Architect.

Review Progress

  • Read modified file and understand changes
  • Research existing codebase patterns and context
  • Review CLAUDE.md for repo guidelines
  • Analyze changes for engineering best practices
  • Check for common pitfalls and architectural considerations
  • Provide comprehensive review feedback

Summary

This PR removes debug logging statements that were inspecting OpenTelemetry baggage entries. The change removes 13 lines of logging code from the baggage middleware in agents-run-api/src/app.ts.

Changes Analysis

What was removed (lines 233-245):

// Log baggage entries properly
const baggageEntries = Object.fromEntries(
  bag.getAllEntries().map(([key, entry]) => [key, entry.value])
);
logger.info({ baggageEntries }, 'Baggage set');

const ctxWithBag = propagation.setBaggage(otelContext.active(), bag);

// Verify baggage is set in context  
const verifyBag = propagation.getBaggage(ctxWithBag);
const verifyEntries = verifyBag
  ? Object.fromEntries(verifyBag.getAllEntries().map(([key, entry]) => [key, entry.value]))
  : {};
logger.info({ verifyEntries }, 'Context with baggage set');

What remains:

const ctxWithBag = propagation.setBaggage(otelContext.active(), bag);
return otelContext.with(ctxWithBag, () => next());

Review Feedback

✅ Positive Aspects

  1. Cleaner Production Code: Removing debug logging reduces noise in production logs and improves performance by eliminating unnecessary baggage enumeration operations.

  2. Proper Cleanup: This appears to be intentional cleanup of debugging code that was likely added during development of the baggage middleware functionality.

🔍 Areas for Consideration

Medium Priority: Logging Strategy & Observability

The removal of baggage logging eliminates visibility into what context is being propagated through the system. Consider:

  1. Structured Debug Logging: Instead of complete removal, consider conditional debug-level logging:

    if (logger.isLevelEnabled('debug')) {
      const baggageEntries = Object.fromEntries(
        bag.getAllEntries().map(([key, entry]) => [key, entry.value])
      );
      logger.debug({ baggageEntries }, 'OpenTelemetry baggage configured');
    }
  2. OpenTelemetry Integration: Since this system heavily uses OpenTelemetry (as seen in instrumentation.ts with BaggageSpanProcessor), the baggage data should automatically appear in traces via the configured processors, reducing the need for explicit logging.

Low Priority: Error Handling Consideration

The original code had verification logic that could help catch baggage propagation issues. Consider whether any error boundaries are needed around the baggage operations, especially given the existing safety check for test environments (lines 56-57).

Documentation Opportunity

The comment //look into on line 202 suggests this middleware was under active investigation. Consider documenting the final approach in the codebase or removing outdated investigative comments.

🏗️ System Architecture Considerations

  1. Observability Strategy: This change aligns with good observability practices - baggage data will still be captured by the OpenTelemetry instrumentation but won't clutter application logs.

  2. Performance Impact: Positive - eliminating the getAllEntries() calls and object transformations reduces overhead in the hot path of every request.

  3. Consistency: The change maintains consistency with the broader codebase's logging patterns, which appear to use structured logging primarily for business logic rather than infrastructure debugging.

Recommendation

✅ Approve - This appears to be appropriate cleanup of debug logging code. The OpenTelemetry instrumentation setup ensures that baggage data is still observable through distributed tracing without cluttering application logs.

Suggestions for future:

  • Consider adding a debug-level log statement if runtime baggage inspection becomes needed for troubleshooting
  • Ensure the Jaeger tracing setup (referenced in CLAUDE.md debugging commands) provides adequate visibility into baggage propagation for operational needs

@vercel vercel Bot temporarily deployed to Preview – agents-docs September 13, 2025 00:53 Inactive
@shagun-singh-inkeep shagun-singh-inkeep marked this pull request as ready for review September 13, 2025 00:53
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented Sep 13, 2025

Claude finished @shagun-singh-inkeep's task —— View job


Code Review: PR #157 - "cloning json"

As a TypeScript Staff Engineer and System Architect, I've analyzed this PR and identified several critical issues that need to be addressed before merging.

Todo List

  • Analyze the changes in agents-run-api/src/app.ts
  • Research existing codebase patterns and architecture
  • Review for engineering best practices and potential issues
  • Check for common pitfalls and architectural considerations
  • Provide comprehensive feedback on system design impact

Summary

This PR makes minimal changes to remove underscore prefixes from unused middleware parameters in agents-run-api/src/app.ts. However, based on the PR title "cloning json" and recent commit history, there appear to have been more substantial changes that were later reverted or modified.

Critical Analysis

🔴 Critical Issues

1. Request Body Consumption Race Condition

Location: agents-run-api/src/app.ts:204-206

const cloned = c.req.raw.clone();
const body = await cloned.json().catch(() => null);
conversationId = body?.conversationId;

Issue: The current cloning approach may still cause request body consumption issues. In Hono/Web API, once a request body stream is read, it cannot be read again. Even with cloning, there are edge cases where this fails if:

  • The original request body was already consumed upstream
  • The clone operation fails due to body size or type constraints
  • Downstream handlers expect the original body stream

Evidence from Codebase: The established pattern in contextValidation.ts:414 directly calls await c.req.json() without cloning, suggesting this middleware comes after body parsing is complete.

Recommendation:

// Better approach - coordinate with route handlers
const body = await c.req.json().catch(() => ({}));
c.set('parsedBody', body); // Make available to downstream
conversationId = body?.conversationId;

2. Incomplete Investigation Indicator

Location: Line 202 contains //look into comment

//look into
try {
  const cloned = c.req.raw.clone();

Impact: This suggests the current implementation is experimental or problematic. The lack of context makes it impossible to assess if the implementation is complete or still needs investigation.

🟡 Architecture & Design Concerns

3. Middleware Ordering Dependency Issues

The baggage middleware (lines 187-248) assumes it runs before route handlers that consume request bodies. This creates a fragile dependency chain where:

  • Risk: If middleware order changes, this could break
  • Coupling: Tight coupling between middleware execution order and request parsing
  • Testing: Difficult to test in isolation due to request body state dependencies

Current Pattern Inconsistency:

  • contextValidation.ts:414: Uses await c.req.json() directly
  • app.ts:204-206: Uses cloning approach
  • Other routes: Expect body to be available for parsing

4. Parameter Usage Inconsistency

The changes remove underscore prefixes from unused parameters (_cc), but the parameters are still unused. This creates inconsistency:

Current state:

app.use('/tenants/*', async (c, next) => { // c is unused
  await next();
  await batchProcessor.forceFlush();
});

Recommendations:

  1. Keep _c if parameter is genuinely unused
  2. Or utilize the parameter for enhanced logging/tracing if valuable

🟢 Code Quality Observations

5. Missing Test Coverage

Per repository guidelines in CLAUDE.md, ALL new features must include:

  • Unit tests using Vitest
  • Tests in __tests__ directories

Current Status: No tests found for the baggage middleware changes or request body cloning logic.

6. Error Handling Pattern

The silent catch pattern is inconsistent with other middleware:

const body = await cloned.json().catch(() => null);

vs the more explicit error handling elsewhere in the codebase.


System-Level Architectural Concerns

1. Request Body Parsing Strategy

The system lacks a cohesive strategy for request body parsing across middleware layers:

  • Baggage Middleware: Attempts to parse for conversation ID
  • Route Handlers: Parse for business logic
  • Validation Middleware: Parses for context validation

Recommendation: Implement a single body parsing middleware that stores the parsed result for all subsequent middleware and handlers.

2. Observability and Debugging

The title suggests this PR addresses JSON cloning for debugging purposes, but:

  • No additional logging or tracing added
  • No debug flags or conditional behavior
  • Unclear how this aids debugging

3. Performance Implications

Cloning request bodies adds overhead:

  • Memory: Duplicate body content in memory
  • CPU: Additional parsing cycles
  • Latency: Extra async operations in hot path

For high-throughput APIs, this could be significant.


Recommendations

Immediate Actions Required

  1. Clarify Intent: Document why cloning is needed vs. alternative approaches
  2. Complete Investigation: Resolve the //look into comment with proper documentation
  3. Add Tests: Implement unit tests for the baggage middleware parsing logic
  4. Coordinate with Routes: Ensure request body is available for downstream consumers

Architectural Improvements

  1. Unified Body Parsing: Implement single middleware for body parsing
  2. Error Boundaries: Add proper error handling for JSON parsing failures
  3. Performance Monitoring: Add metrics for middleware execution time
  4. Documentation: Document middleware execution order and dependencies

Alternative Approaches

Consider these patterns from similar systems:

// Option 1: Single parsing middleware
app.use('*', bodyParsingMiddleware()); // Parses once, stores result

// Option 2: Lazy parsing with caching  
const getBody = () => c.get('cachedBody') ?? parseAndCache();

// Option 3: Stream tee for multiple consumers
const [stream1, stream2] = req.body.tee();

Conclusion

While the current changes appear minimal (parameter renaming), the context suggests more significant changes were attempted and potentially reverted. The request body cloning approach addresses a real problem but may introduce new issues.

Recommendation: Don't merge until the investigation is complete and the approach is properly tested and documented.

Severity: High - potential for runtime failures in production due to request body consumption issues.

@robert-inkeep robert-inkeep merged commit a0af7d4 into main Sep 13, 2025
7 of 8 checks passed
@robert-inkeep robert-inkeep deleted the debug branch September 13, 2025 00:55
inkeep-oss-sync Bot pushed a commit that referenced this pull request Apr 17, 2026
* chore(ci): clean up stale monorepo-migration artifacts

- Delete public/agents/.github/workflows/coverage.yml.disabled. The
  workflow references package paths (execution-api, management-api,
  agent-builder) that no longer exist post-monorepo-migration, so it
  couldn't be re-enabled as-is. Also delete the companion doc
  public/agents/agents-docs/CI_SETUP.md, which was framed entirely
  around this workflow. Drop the stale row in
  public/agents/.agents/skills/internal-surface-areas/SKILL.md.

- Add *.disabled to the copybara manifest exclude list so any future
  dead workflow files don't silently leak into the public mirror.
  Regenerated public-agents.bara.sky from the manifest.

- Remove the dead chat-to-edit branch trigger from both
  public/agents/.github/workflows/release.yml and
  .github/workflows/public-agents-version-packages.yml. The branch
  returns 404 on both inkeep/agents-private and inkeep/agents.

Out of scope (investigated, confirmed load-bearing): the
agents-cli/** and agents-api/** path filters on public release.yml
stay - both exist as top-level dirs on the public inkeep/agents repo
(hoisted layout) and removing them would skip npm publishes for
CLI/API-only changes.

* Revert chat-to-edit branch removal (load-bearing)

chat-to-edit branch on public inkeep/agents drives a dedicated
snapshot publish step at public/agents/.github/workflows/release.yml
lines 109-117 (pnpm changeset version --snapshot chat-to-edit). The
branch returning 404 on both repos just means nobody has pushed to
it right now; it's created on-demand for dev snapshot publishes.

Keep the trigger in both release.yml and
public-agents-version-packages.yml.

GitOrigin-RevId: 5de85ee7359c4033fa2d3b7cd69910c57e1070ac
inkeep-oss-sync Bot pushed a commit that referenced this pull request Apr 17, 2026
* chore(ci): clean up stale monorepo-migration artifacts

- Delete public/agents/.github/workflows/coverage.yml.disabled. The
  workflow references package paths (execution-api, management-api,
  agent-builder) that no longer exist post-monorepo-migration, so it
  couldn't be re-enabled as-is. Also delete the companion doc
  public/agents/agents-docs/CI_SETUP.md, which was framed entirely
  around this workflow. Drop the stale row in
  public/agents/.agents/skills/internal-surface-areas/SKILL.md.

- Add *.disabled to the copybara manifest exclude list so any future
  dead workflow files don't silently leak into the public mirror.
  Regenerated public-agents.bara.sky from the manifest.

- Remove the dead chat-to-edit branch trigger from both
  public/agents/.github/workflows/release.yml and
  .github/workflows/public-agents-version-packages.yml. The branch
  returns 404 on both inkeep/agents-private and inkeep/agents.

Out of scope (investigated, confirmed load-bearing): the
agents-cli/** and agents-api/** path filters on public release.yml
stay - both exist as top-level dirs on the public inkeep/agents repo
(hoisted layout) and removing them would skip npm publishes for
CLI/API-only changes.

* Revert chat-to-edit branch removal (load-bearing)

chat-to-edit branch on public inkeep/agents drives a dedicated
snapshot publish step at public/agents/.github/workflows/release.yml
lines 109-117 (pnpm changeset version --snapshot chat-to-edit). The
branch returning 404 on both repos just means nobody has pushed to
it right now; it's created on-demand for dev snapshot publishes.

Keep the trigger in both release.yml and
public-agents-version-packages.yml.

GitOrigin-RevId: 5de85ee7359c4033fa2d3b7cd69910c57e1070ac
inkeep-oss-sync Bot pushed a commit that referenced this pull request Apr 20, 2026
* chore(ci): clean up stale monorepo-migration artifacts

- Delete public/agents/.github/workflows/coverage.yml.disabled. The
  workflow references package paths (execution-api, management-api,
  agent-builder) that no longer exist post-monorepo-migration, so it
  couldn't be re-enabled as-is. Also delete the companion doc
  public/agents/agents-docs/CI_SETUP.md, which was framed entirely
  around this workflow. Drop the stale row in
  public/agents/.agents/skills/internal-surface-areas/SKILL.md.

- Add *.disabled to the copybara manifest exclude list so any future
  dead workflow files don't silently leak into the public mirror.
  Regenerated public-agents.bara.sky from the manifest.

- Remove the dead chat-to-edit branch trigger from both
  public/agents/.github/workflows/release.yml and
  .github/workflows/public-agents-version-packages.yml. The branch
  returns 404 on both inkeep/agents-private and inkeep/agents.

Out of scope (investigated, confirmed load-bearing): the
agents-cli/** and agents-api/** path filters on public release.yml
stay - both exist as top-level dirs on the public inkeep/agents repo
(hoisted layout) and removing them would skip npm publishes for
CLI/API-only changes.

* Revert chat-to-edit branch removal (load-bearing)

chat-to-edit branch on public inkeep/agents drives a dedicated
snapshot publish step at public/agents/.github/workflows/release.yml
lines 109-117 (pnpm changeset version --snapshot chat-to-edit). The
branch returning 404 on both repos just means nobody has pushed to
it right now; it's created on-demand for dev snapshot publishes.

Keep the trigger in both release.yml and
public-agents-version-packages.yml.

GitOrigin-RevId: 5de85ee7359c4033fa2d3b7cd69910c57e1070ac
inkeep-oss-sync Bot pushed a commit that referenced this pull request Apr 20, 2026
* chore(ci): clean up stale monorepo-migration artifacts

- Delete public/agents/.github/workflows/coverage.yml.disabled. The
  workflow references package paths (execution-api, management-api,
  agent-builder) that no longer exist post-monorepo-migration, so it
  couldn't be re-enabled as-is. Also delete the companion doc
  public/agents/agents-docs/CI_SETUP.md, which was framed entirely
  around this workflow. Drop the stale row in
  public/agents/.agents/skills/internal-surface-areas/SKILL.md.

- Add *.disabled to the copybara manifest exclude list so any future
  dead workflow files don't silently leak into the public mirror.
  Regenerated public-agents.bara.sky from the manifest.

- Remove the dead chat-to-edit branch trigger from both
  public/agents/.github/workflows/release.yml and
  .github/workflows/public-agents-version-packages.yml. The branch
  returns 404 on both inkeep/agents-private and inkeep/agents.

Out of scope (investigated, confirmed load-bearing): the
agents-cli/** and agents-api/** path filters on public release.yml
stay - both exist as top-level dirs on the public inkeep/agents repo
(hoisted layout) and removing them would skip npm publishes for
CLI/API-only changes.

* Revert chat-to-edit branch removal (load-bearing)

chat-to-edit branch on public inkeep/agents drives a dedicated
snapshot publish step at public/agents/.github/workflows/release.yml
lines 109-117 (pnpm changeset version --snapshot chat-to-edit). The
branch returning 404 on both repos just means nobody has pushed to
it right now; it's created on-demand for dev snapshot publishes.

Keep the trigger in both release.yml and
public-agents-version-packages.yml.

GitOrigin-RevId: 5de85ee7359c4033fa2d3b7cd69910c57e1070ac
Zeeeepa pushed a commit to Zeeeepa/inkeep_agents that referenced this pull request Apr 23, 2026
* Delete duplicate Vercel workflows from public/agents (inkeep#152)

These three files duplicated agents-private's root-level workflows:
  public/agents/.github/workflows/vercel-production.yml
  public/agents/.github/workflows/preview-environments.yml
  public/agents/.github/workflows/preview-janitor.yml

Vercel projects (agents-api, agents-manage-ui, agents-docs) are owned
and deployed from agents-private. Running duplicate workflows on the
copybara-mirrored inkeep/agents repo means every sync fires BOTH
repos' workflows against the same Vercel projects — wasted CI minutes
at best, race-condition deploys at worst (both repos carry
VERCEL_PROJECT_ID secrets targeting the same projects).

The agents-private root-level replacements are:
  .github/workflows/public-agents-vercel-production.yml
  .github/workflows/public-agents-preview-environments.yml
  .github/workflows/public-agents-preview-janitor.yml

Under the new release-split architecture (PR inkeep#144), the public mirror's
only CI responsibility is the npm publish step in release.yml. Deleting
these three keeps that scope clean.

Copybara will propagate the deletions to inkeep/agents on next sync —
the origin_files glob covers public/agents/** with no exclude for
these filenames, so their disappearance from the source will delete
them from the destination.

GitOrigin-RevId: 2fa1f228c728dadf17bedbae88d776a11a8e127b

* Fix Public Mirror Sync: auto-approve sync PRs + handle stuck PRs (inkeep#150)

* Fix Public Mirror Sync: auto-approve sync PRs + handle stuck PRs

Problem observed today: every push to main for the last ~18 hours failed
the Public Mirror Sync workflow with

  Failed to push to inkeep/agents [HEAD:refs/heads/copybara/sync],
  because local/origin history is behind destination

Root cause is two-part:

1. inkeep/agents `main` branch protection was tightened to require one
   approval on every PR (require_last_push_approval=true). Sync PRs
   authored by the inkeep-oss-sync app are never approved, so they sit
   BLOCKED with auto-merge enabled but no approver. Bypass actor status
   on the ruleset does NOT propagate through GitHub's auto-merge path.

2. Once one sync PR is stuck open, copybara on every subsequent agents-
   private push tries to push new commits to `copybara/sync` but the
   branch has diverging history from our local generation, so the push
   is rejected. The existing cleanup step explicitly leaves the branch
   alone while a sync PR is open, so the workflow wedges indefinitely.

Fix:

- Add `public/agents/.github/workflows/auto-approve-sync.yml`. Copybara
  mirrors files under `public/agents/**` into inkeep/agents, so this
  workflow ends up at `.github/workflows/auto-approve-sync.yml` on the
  public repo. It listens on pull_request_target for PRs authored by
  inkeep-oss-sync[bot] against the copybara/sync head ref and posts an
  approving review from github-actions[bot]. Handles the synchronize
  event too so the approval is refreshed after every copybara push
  (require_last_push_approval dismisses prior approvals on new commits).

- Rework the "Clean up stale copybara/sync branch" step into a proper
  reconciliation step. Three cases now handled:
    (a) previous sync PR merged (squashed) — delete the stale branch
    (b) open sync PR older than STALE_PR_HOURS (default 2h) — close
        the PR and delete the branch so this run opens a fresh one
        with all accumulated changes
    (c) open sync PR recent — leave as-is; copybara appends commits

Staff-engineer framing: (1) fixes the approval gap at the source
(inkeep/agents) so no human is in the loop for routine syncs, while
(2) ensures the agents-private sync workflow self-heals if the approval
path ever breaks again (bot outage, policy tightening, etc.) instead
of silently wedging.

Bootstrap: PR inkeep#3150 (the currently-stuck sync PR) was manually approved
and enqueued to unblock today. Once this PR merges to agents-private
and the next sync lands the auto-approve workflow on inkeep/agents,
subsequent syncs will no longer need manual approval.

* Address review: harden reconcile error handling, elevate warnings, add timeout

Pullfrog + Claude bot review on inkeep#150:

- Separate "gh pr list failed" from "no open PR found" in the reconcile
  step. A silent API failure previously fell through to the branch-
  existence block and could delete the branch of a live PR. Now we exit
  early with ::warning:: and skip reconciliation for this run (Copybara
  may still succeed if the branch state is clean).
- Remove the `exit 0` after closing a stale PR so execution falls through
  to the branch-existence fallback. If the DELETE of copybara/sync fails
  on the primary path, the fallback block gets a second attempt.
- Elevate all cleanup/failure `|| echo` fallbacks to ::warning::
  annotations so operators see them in the workflow summary instead of
  buried in stdout.
- Replace stale comment on the `gh pr merge` call. The oss-sync app's
  bypass-actor status does NOT propagate through auto-merge (that's the
  whole reason auto-approve-sync.yml now exists). Document the real
  mechanism.
- Add `timeout-minutes: 5` to the approve job in auto-approve-sync.yml.

GitOrigin-RevId: 9dcc4f2f123f531a6c303cb8f43a4d096384c736

* fix(create-agents-template): narrow postinstall VERCEL guard to opt-in env var (inkeep#140)

Same class of bug as inkeep#137 (chat-to-edit postinstall), but shipped in the
customer-facing template. Every downstream user who scaffolds this template
and deploys to Vercel hits the same ambient postinstall landmine.

Background: create-agents-template/scripts/postinstall.ts ran

  if (process.env.VERCEL === '1' && !skip) {
    execSync('inkeep dev --export --output-dir ./apps/manage-ui', ...)
    execSync('pnpm -C apps/manage-ui install ...', ...)
  }

The gate `VERCEL === '1'` is true on every Vercel build anywhere. Today
the template only has one Vercel target (apps/agents-api), so the gate
happened to correlate with the intended context. But:

1. Any customer who adds a second Vercel project targeting a different
   app (e.g. apps/mcp) silently fires the export, which assumes
   ./apps/manage-ui exists and `inkeep` is on PATH - neither is guaranteed
   for non-agents-api deploys.
2. Any Vercel deploy that installs with --production skips devDeps.
   `@inkeep/agents-cli` (which provides the `inkeep` CLI) is a devDep,
   so the postinstall would crash.
3. Replicates the exact failure mode that broke inkeep-cloud-mcp production
   in the main monorepo (inkeep#137).

Fix: narrow to an explicit opt-in env var INKEEP_QUICKSTART_EXPORT=1 set
only by the apps/agents-api vercel.json installCommand. The postinstall
requires BOTH VERCEL=1 AND INKEEP_QUICKSTART_EXPORT=1 before running.

Files:
- scripts/postinstall.ts: require INKEEP_QUICKSTART_EXPORT=1 gate
- apps/agents-api/vercel.json: prepend INKEEP_QUICKSTART_EXPORT=1 to
  existing installCommand

Backward-compat impact: existing Vercel projects customers created from
prior template versions will skip the export on their next deploy until
they update their vercel.json (or re-scaffold from the template). They
have two paths:
- Pull in this template change (agents-api/vercel.json now carries the var)
- Manually add INKEEP_QUICKSTART_EXPORT=1 to their installCommand

Both recover cleanly. Existing apps/manage-ui exports from prior deploys
remain on disk; the only loss is the refresh on the next deploy, which
is repaired by either of the above actions.

GitOrigin-RevId: a5fc5646e7fb2ae2c6f81cdc34453ed89d8191f9

* chore(ci): clean up stale monorepo-migration artifacts (inkeep#157)

* chore(ci): clean up stale monorepo-migration artifacts

- Delete public/agents/.github/workflows/coverage.yml.disabled. The
  workflow references package paths (execution-api, management-api,
  agent-builder) that no longer exist post-monorepo-migration, so it
  couldn't be re-enabled as-is. Also delete the companion doc
  public/agents/agents-docs/CI_SETUP.md, which was framed entirely
  around this workflow. Drop the stale row in
  public/agents/.agents/skills/internal-surface-areas/SKILL.md.

- Add *.disabled to the copybara manifest exclude list so any future
  dead workflow files don't silently leak into the public mirror.
  Regenerated public-agents.bara.sky from the manifest.

- Remove the dead chat-to-edit branch trigger from both
  public/agents/.github/workflows/release.yml and
  .github/workflows/public-agents-version-packages.yml. The branch
  returns 404 on both inkeep/agents-private and inkeep/agents.

Out of scope (investigated, confirmed load-bearing): the
agents-cli/** and agents-api/** path filters on public release.yml
stay - both exist as top-level dirs on the public inkeep/agents repo
(hoisted layout) and removing them would skip npm publishes for
CLI/API-only changes.

* Revert chat-to-edit branch removal (load-bearing)

chat-to-edit branch on public inkeep/agents drives a dedicated
snapshot publish step at public/agents/.github/workflows/release.yml
lines 109-117 (pnpm changeset version --snapshot chat-to-edit). The
branch returning 404 on both repos just means nobody has pushed to
it right now; it's created on-demand for dev snapshot publishes.

Keep the trigger in both release.yml and
public-agents-version-packages.yml.

GitOrigin-RevId: 5de85ee7359c4033fa2d3b7cd69910c57e1070ac

* fix(preview): wire SigNoz to PR preview environments (inkeep#158)

* fix(preview): wire SigNoz to PR preview environments

Preview PR deployments had broken tracing: the manage-ui /api/traces
proxy dropped the bypass secret so SigNoz health checks 401'd, and
neither the Vercel projects nor agents-api had SIGNOZ_URL / otel
exporter config. Adds a shared-observability Railway stack (signoz +
otel-collector + clickhouse) and injects the endpoints + JWT-format
API key at preview env provisioning time.

- upsert-vercel-preview-env.sh: accept PREVIEW_SIGNOZ_URL /
  PREVIEW_SIGNOZ_API_KEY / PREVIEW_SIGNOZ_INGESTION_KEY /
  PREVIEW_OTEL_EXPORTER_OTLP_TRACES_ENDPOINT and upsert into both
  Vercel projects when set.
- public-agents-preview-environments.yml: forward the new secrets
  into the upsert step.
- agents-api signoz.ts: support JWT-format SIGNOZ_API_KEY via
  Authorization: Bearer (the shared preview instance exposes a
  rotating refresh token, not a PAT). Production PATs still go
  through the SIGNOZ-API-KEY header.
- agents-manage-ui /api/traces: forward the manage-api bypass secret
  so SigNoz proxy calls authenticate through to agents-api.

* fix(preview): export OTLP traces without requiring ingestion key

Self-hosted SigNoz collector accepts unauthenticated OTLP, so gating
OTEL_EXPORTER_OTLP_TRACES_ENDPOINT on PREVIEW_SIGNOZ_INGESTION_KEY was
skipping trace export entirely. Split the guards, also wire OTLP env
vars on the manage-ui project (which emits traces too) and add
OTEL_RESOURCE_ATTRIBUTES so preview spans are filterable by PR branch.

* refactor: address PR review polish items

- signoz.ts: document why the 'eyJ' prefix safely discriminates JWT
  refresh tokens from SigNoz PATs (preview-stack workaround for broken
  v0.119 enterprise PAT endpoint).
- traces/route.ts GET handler: reuse extractRequestContext instead of
  duplicating cookie + bypass-secret header construction.

GitOrigin-RevId: 942b6298a47afca1c49c3ae06fb10bce04f4f27f

* Move entitlement lock query to DAL layer (inkeep#113)

* Move entitlement lock query to DAL, remove /auth/ exclusion from boundary check

* style: auto-format with biome

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
GitOrigin-RevId: 6afc2828e2b2bd136280daece631469d44fafe39

* feat(apps): add quick actions editor to support_copilot config (inkeep#164)

* feat(apps): add quick actions editor to support_copilot config

Adds quickActions schema to agents-core, persists the field through the
apps PATCH route, and introduces a drag-and-drop quick actions section
in the support_copilot app form.

* style: auto-format with biome

* fix(apps): address review feedback on support_copilot quick actions

- Add max length constraints (label: 100, prompt: 4000, group: 100) in
  both the agents-core schema and the manage-ui form validation.
- Disable the "Add group" button when the last group has no name, so
  users can't stack empty groups.
- Add screen-reader announcements to both the group-level and
  action-level DndContexts.
- Fix pre-existing a11y lint errors (noLabelWithoutControl,
  noStaticElementInteractions) by switching the action popover to
  FormLabel and moving keyboard shortcuts onto the inputs themselves.

* chore: sync public/agents lockfile for @dnd-kit deps

The initial commit only regenerated the root pnpm-lock.yaml. Vercel
builds from public/agents/ which uses its own lockfile, so the
@dnd-kit/* deps added to agents-manage-ui/package.json must be present
there too or Vercel fails with ERR_PNPM_OUTDATED_LOCKFILE.

* chore(apps): drop unused SupportCopilotQuickActionFormInput type

Only the group-level type is consumed (app-update-form.tsx). The
single-action form type was exported but never imported, flagged by
knip in CI.

* test(api): regenerate openapi snapshot for SupportCopilotQuickAction schemas

Adds the new SupportCopilotQuickAction and SupportCopilotQuickActionGroup
schema entries (with min/max length constraints) and the quickActions
field on SupportCopilotConfig that were introduced for the quick-actions
editor in agents-manage-ui.

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
GitOrigin-RevId: 32bce4fd36e8dd4ae12a8ab2ac3607ea29fdd2f1

* Agent conversation self-reference ({{$conversation.id}}) (inkeep#141)

* spec: agent conversation self-reference

Add SPEC.md with β-pure design (template-variable-only) for exposing
{{$conversation.id}} as a prompt template variable.

Includes evidence files (render-site inventory, A2A propagation trace,
prior-art analysis, downstream-surfaces analysis, template-engine details),
audit + design-challenge findings, and full changelog capturing the
β→α→β-pure pivot history with rationale.

Implementation follows in subsequent commits per tmp/ship/spec.json
(US-001 through US-009).

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

* [US-001] Add TemplateEngine.renderPrompt() with PromptRenderOptions

New static method resolves $-prefixed paths from runtime-provided builtins
via dotted-path walk, falling through to the existing built-in dispatch on
miss. render() signature and behavior are preserved byte-exact — only
renderPrompt() accepts runtimeBuiltins, structurally enforcing the scope
invariant that non-prompt render sites cannot opt into runtime builtins.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

* [US-002] Wire {{$conversation.id}} resolution in buildSystemPrompt

Switch both agent-prompt render sites in buildSystemPrompt from
TemplateEngine.render() to TemplateEngine.renderPrompt() and narrow
the contextConfigId guard so agents without contextConfig and without
{{$conversation.*}} in their prompt see zero behavior change (G4).

- Extend runtimeContext type literal to include metadata.taskId
- Compute runtimeBuiltins from runtimeContext.metadata.conversationId,
  treating the 'default' sentinel and undefined as empty
- Narrow !resolvedContext path: only render when prompt contains
  {{$conversation., with preserveUnresolved=true so non-$conversation
  literals are preserved
- Add 6 integration tests covering contextConfig present/absent,
  G4 zero-drift regression, and 'default'/undefined sentinel cases

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

* [US-003] A2A delegation propagation test — child resolves {{$conversation.id}} to parent's conversationId

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

* [US-004] Negative-path tests — {{$conversation.id}} does not resolve at non-prompt render sites (D6)

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

* [US-005] Recognize {{\$conversation.id}} in Monaco prompt editor lint + autocomplete

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

* [US-006] Preserve \$-prefixed template variables through pull-v4 round-trip

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

* [US-007] Customer-support cookbook demonstrates {{\$conversation.id}} in Zendesk write-back

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

* [US-008] Docs: central template-variable reference page

Adds public/agents/agents-docs/content/typescript-sdk/prompt-template-variables.mdx
documenting all four prompt template variables: contextVariable.*, headers.*,
$env.*, and $conversation.id. Registers the new page in typescript-sdk/meta.json
navigation.

Per SPEC §6 (Must) and §13 Deployment considerations — first centralization of
this vocabulary. Promotes $env.* (currently undocumented) alongside the new
$conversation.id variable so D3's "mirrors $env.* convention" rationale is
grounded in published docs rather than invisible convention.

Spec: public/agents/specs/2026-04-16-agent-conversation-self-reference/SPEC.md

Co-Authored-By: Claude Opus 4.6 <[email protected]>

* [US-009] Changesets: minor bumps for {{\$conversation.id}}

Adds four changesets per SPEC §13 / §16 SCOPE:
- agents-core: new TemplateEngine.renderPrompt() + PromptRenderOptions
- agents-api: resolves {{\$conversation.id}} in agent prompts; A2A propagation
- agents-cli: preserves \$-prefixed template variables through pull-v4 round-trip
- agents-manage-ui: Monaco prompt editor lint + autocomplete recognition

All four bump minor. Changeset config version-locks @inkeep/* packages together
via "fixed" rule. agents-docs and cookbook-templates are in the ignore list;
no changeset needed for them.

Spec: public/agents/specs/2026-04-16-agent-conversation-self-reference/SPEC.md

Co-Authored-By: Claude Opus 4.6 <[email protected]>

* docs: trim prompt-template-variables.mdx to customer-facing variables only

Drops the {{\$env.*}} section from the new reference page and shortens the
overall page from 150 → ~65 lines. Page now covers three variables:
contextVariable.*, headers.*, and \$conversation.*, linking out to the
existing context-fetchers.mdx and headers.mdx pages for the first two and
fully documenting {{\$conversation.id}} inline.

Rationale — \$env.* has no traceable spec, PRD, or introducing PR: it
shipped in the root commit of inkeep/agents (c39fdd0, 2025-09-05, initial
squash) as pre-public scaffolding and was preserved without explanation by
inkeep#818 (2025-10-24) when the other builtins were removed. Zero cookbook
templates use it; zero customer-facing docs ever mentioned it. The draft
docs page would have been the first act of promoting it to a customer
feature — without the security/privacy review that warrants. Rendered env
values are visible to the LLM and can flow through to tool arguments,
assistant output, and traces. Keep the undocumented-but-functional status
quo; address \$env.* product surface in a separate spec if warranted.

Spec updates:
- §6 Functional — "Must: docs page" rewritten: three variables, justification
  restated as single discovery surface + primary home for {{\$conversation.id}}
- §9 Proposed solution — Docs bullet rewritten to match
- §10 Decision log — adds D8 (LOCKED): intentional omission of \$env.* with
  full git-archaeology rationale
- §16 SCOPE — agents-docs entry names exact file path and meta.json nav step
- meta/_changelog.md — appends post-approval pivot entry explaining the
  trigger (PR inkeep#141 review), evidence, decision, and surgical edits

D3 remains LOCKED on its own merits — dollar-prefix still prevents syntactic
collision with user-defined contextVariable names regardless of whether
\$env.* is a documented convention.

Co-Authored-By: Claude Opus 4.6 <[email protected]>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <[email protected]>
GitOrigin-RevId: 52d0831b4f6e0c7790597c979f1711a5b1a4cd9c

* Version Packages (agents) (inkeep#138)

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
GitOrigin-RevId: 967594b0c065f561e566bc905512e45385cf7080

* fix: approve sync PRs from source side, drop target-repo auto-approve (inkeep#171)

* fix: approve sync PRs from source side, drop target-repo auto-approve

Moves copybara sync PR approval into public-mirror-sync.yml on agents-private
using the INTERNAL_CI_APP token. Deletes the target-repo auto-approve-sync.yml
workflow it replaces.

Background

PR inkeep#150 introduced auto-approve-sync.yml on the public tree to satisfy the
required-approval rule that blocks copybara sync PRs on inkeep/agents. That
design has a bootstrap problem: pull_request_target evaluates workflows from
the target branch, so the very PR that first mirrors auto-approve-sync.yml
to inkeep/agents/main cannot auto-approve itself. inkeep/agents PR inkeep#3157 is
stuck in exactly that state, and because copybara/sync is blocked by that
stuck PR, subsequent mirror runs on agents-private are failing at "Run
agents mirror".

Fix

Approve from the source side instead. After public-mirror-sync.yml creates
or updates the sync PR and enables auto-merge, a new step mints a second
app token from INTERNAL_CI_APP (a different identity from inkeep-oss-sync,
since GitHub blocks self-approval) and posts an approving review. The
approval is scoped to the current head SHA and re-runs on every sync, which
satisfies require_last_push_approval=true.

- .github/workflows/public-mirror-sync.yml
  Adds Generate approver token step and Approve sync PR step. Skips
  re-approval when an approval already stands on the current head SHA.
- public/agents/.github/workflows/auto-approve-sync.yml
  Deleted; replaced by the source-side step above.

Prereq to verify once: INTERNAL_CI_APP is installed on inkeep/agents with
Pull requests: Read and write. The app already powers Version Packages and
release-handler paths that write to inkeep/agents, so the install is almost
certainly already in place.

* docs: update CI.md and runbook to reflect source-side sync approval

Review feedback on inkeep#171: CI.md and CI_RUNBOOK.md still pointed at the
deleted public/agents/.github/workflows/auto-approve-sync.yml.

- CI.md: drop the row for auto-approve-sync.yml in the public workflows
  table; in the public/private parity table, replace the old row with one
  explaining approval now comes from public-mirror-sync.yml on the source
  side; update the OSS_SYNC_APP secrets row to drop "auto-approve sync"
  and add Copybara sync PR approval to INTERNAL_CI_APP usage.
- CI_RUNBOOK.md: rewrite the "Sync PR on public repo is stuck open" entry
  to point operators at the Approve sync PR step and INTERNAL_CI_APP app
  installation instead of the deleted workflow.

GitOrigin-RevId: 51edfa6a3a1e0863ba99d8498efad88c5dec1f06

---------

Co-authored-by: Varun Varahabhotla <[email protected]>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: omar-inkeep <[email protected]>
Co-authored-by: tim-inkeep <[email protected]>
Co-authored-by: Claude Opus 4.6 (1M context) <[email protected]>
Co-authored-by: inkeep-internal-ci[bot] <259778081+inkeep-internal-ci[bot]@users.noreply.github.com>
Zeeeepa pushed a commit to Zeeeepa/inkeep_agents that referenced this pull request Apr 23, 2026
…nkeep#3177)

* chore: post-migration productivity hardening (tooling, CI, DX)

Rebased onto main after significant churn. Dropped items that became
redundant with inkeep#143 (monorepo trap guards) and inkeep#153 (Dependabot lockfile
auto-sync), kept the rest. Addresses review comments inline.

KEPT (11 items)

Tooling:
- .npmrc: add engine-strict, auto-install-peers, strict-peer-dependencies,
  resolution-mode=highest
- .node-version: 22 -> 22.18.0 (patch pin for reproducibility)
- package.json: preinstall `only-allow pnpm` + postinstall
  `check-node-version.mjs` + `check:node-version` script
- scripts/check-node-version.mjs: hardened against IO errors + malformed
  .node-version values (addresses pullfrog/claude review comments about
  try/catch on readFileSync + lts/* handling)
- turbo.json: globalDependencies now invalidates on root pnpm-lock.yaml,
  .node-version, pnpm-workspace.yaml (was only watching public/agents/)
- tsconfig.base.json: strict baseline for opt-in package migration
  (used by PR inkeep#133)

DX:
- setup-dev.js: validateEnvironmentEarly() fails fast on missing
  ANTHROPIC_API_KEY before any Docker/install work. parseEnvFile
  readFileSync wrapped in try/catch for EACCES resilience.

CI:
- public-agents-extended-validation.yml: turbo affected filter
  `...[origin/base_ref]` on PR events; `merge_group`/`push` keep full run.
  Ported to the new single `turbo check` command structure introduced
  by inkeep#125 (the original diff targeted the pre-inkeep#125 matrix).
- public-agents-cypress.yml + composite action: 4-way deterministic
  shard matrix (no Cypress record key required); gate job fans in on
  default needs behavior.
- private-master-ci.yml: clarifying comment about turbo affected filter
  not applying (workflow_dispatch only).

DROPPED (vs original inkeep#130)

- scripts/check-lockfile-sync.mjs + `check:lockfile-sync` script:
  superseded by inkeep#143's `check-monorepo-traps.mjs lockfiles` which actually
  runs `pnpm install --frozen-lockfile` in both directories (strictly
  stronger than my mtime heuristic). inkeep#153 auto-syncs Dependabot lockfile
  PRs, killing the main scenario this script was protecting.
- biome.jsonc noExplicitAny "off" -> "warn": would break CI because main's
  Core Validation uses `biome lint --error-on-warnings` and there are
  16+ pre-existing `any` usages in agents-docs + agents-cookbook. Defer
  the flip to a separate PR that also grinds down those violations.
- coverage.yml workflow: no team demand surfaced; non-blocking but still
  shows red. Revisit when someone owns coverage tracking.

COMMENTS ADDRESSED

- claude[bot]: IO error handling on readFileSync/statSync/readdirSync in
  check-node-version.mjs + setup-dev.js parseEnvFile -> wrapped with
  graceful fallbacks
- claude[bot]: malformed .node-version (lts/*, latest) -> regex validation
  skips with a warning instead of producing confusing "Required: v" output
- pullfrog[bot]: github.base_ref is only populated on pull_request events
  -> added in-source comment explaining the trap for future maintainers
- pullfrog[bot]: check-lockfile-sync missed public/agents/agents-* layer
  -> moot, file is dropped
- pullfrog[bot]: coverage.yml missing paths filter + prepare:public-agents-build
  -> moot, file is dropped
- claude[bot]: grep -c exit code -> handled by `|| echo 0` fallback (minor,
  no change)

Not addressed (intentional)
- Biome format/explicit-any violations in agents-docs + agents-cookbook
  flagged by PR inkeep#133's run -> pre-existing on main; out of scope for this
  PR. Will surface again when biome.jsonc flip lands.

* fix: address two CI failures on the rebased inkeep#130

1. check-node-version.mjs: skip in CI/Vercel/GitHub Actions. Vercel's
   build env runs Node 24.14.1 regardless of what .node-version says,
   which caused the postinstall hook to reject and fail the install
   with:

     [check-node-version] Node version mismatch
       Required: v22.18.0 (major v22.x)
       Current:  v24.14.1

   The script's purpose is to catch DEVS on the wrong Node locally, not
   to gate deploys — the platform manages Node. Skip when CI=true,
   VERCEL=1, or GITHUB_ACTIONS=true.

2. public-agents-cypress.yml: strip the public/agents/agents-manage-ui/
   prefix from shard spec paths. cypress runs from manage-ui as cwd
   (via pnpm --dir --filter exec), so repo-root paths double up:

     public/agents/agents-manage-ui/public/agents/agents-manage-ui/cypress/e2e/...

   Now outputs cypress-relative paths like cypress/e2e/agent-prompt.cy.ts.

* fix(ci): drop stale `private/_migration-docs/IMPORT_STATUS.md` check

Reintroduced by accident when I ported the workflow over during the
rebase. Main removed this path in inkeep#157 (`chore(ci): clean up stale
monorepo-migration artifacts`). With the line reintroduced, any
workflow_dispatch run would fail at `test -f`.

Addresses claude[bot] CRITICAL review comment on inkeep#130.

* fix(ci): use App bot identity for auto-format commits

The workflow generates an INTERNAL_CI_APP token and pushes with it
specifically so downstream CI fires on the bot's commit. But the
commit is authored as github-actions[bot], which GitHub treats as
a GITHUB_TOKEN commit and suppresses synchronize for regardless of
the push credentials. Seen on inkeep#172: required checks never reported
on the auto-format HEAD and the PR was stuck BLOCKED.

Resolve the App's own bot slug and numeric user id from the newly
minted token and use <slug>[bot] as the committer. Push still uses
the same App token; synchronize fires as intended.

* ci(extended-validation): auto-update OpenAPI snapshot on PRs

Mirrors the pattern already in public/agents/.github/workflows/ci.yml:
when a PR changes agents-api routes, openapi.*, or createApp.*, regenerate
the OpenAPI snapshot and commit it back to the PR branch using the
INTERNAL_CI_APP token so downstream CI re-runs.

Avoids recurring "OpenAPI snapshot mismatch" test failures (e.g. PR inkeep#200)
where contributors add routes without running
`pnpm --filter @inkeep/agents-api openapi:update-snapshot` locally.

- Gated on non-fork PRs (GITHUB_TOKEN on forks is read-only)
- Uses GitHub App token so commits trigger downstream workflows
- Runs before service-container setup so failure modes are cheap

* chore: align .node-version with repo convention, declare engines.node range

Two fixes to the Node pinning block:

.node-version: 22.18.0 -> 22
  The patch pin was an outlier. Every CI workflow in this repo pins
  `node-version: 22` or `22.x` (24 workflows, zero patch pins), every
  vercel.json has no nodeVersion field (Vercel uses 22.x auto-patch),
  and main's .node-version was just `22`. A patch pin creates monthly
  maintenance (fnm re-install + bump PR) without catching any bug the
  major-level pin doesn't.

package.json: add engines.node = ">=22.0.0 <23"
  .npmrc sets engine-strict=true but there was no engines field for it
  to enforce, making it a no-op. This range aligns with the major-level
  convention used everywhere else and makes engine-strict bite when a
  dev is on Node 18/24.

Belt-and-suspenders: postinstall script catches major drift at install
time (already major-only via .split('.')[0]); engines+engine-strict
catches it at dependency-resolution time. Both skip in CI/Vercel.
GitOrigin-RevId: 08d61f29389bfbbb487ed3093999449ca18b9e98

Co-authored-by: Varun Varahabhotla <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants