Skip to content

feat: durable delegated tool approval flow#2966

Merged
anubra266 merged 8 commits intomainfrom
feat/durable-delegated-tool-approval
Apr 7, 2026
Merged

feat: durable delegated tool approval flow#2966
anubra266 merged 8 commits intomainfrom
feat/durable-delegated-tool-approval

Conversation

@anubra266
Copy link
Copy Markdown
Contributor

@anubra266 anubra266 commented Apr 1, 2026

Summary

  • Adds support for tool approvals in delegated sub-agents running in durable execution mode
    Fixes PRD-6419

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented Apr 1, 2026

🦋 Changeset detected

Latest commit: f7312d1

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 10 packages
Name Type
@inkeep/agents-api Patch
@inkeep/agents-manage-ui Patch
@inkeep/agents-cli Patch
@inkeep/agents-core Patch
@inkeep/agents-email Patch
@inkeep/agents-mcp Patch
@inkeep/agents-sdk Patch
@inkeep/agents-work-apps Patch
@inkeep/ai-sdk-provider Patch
@inkeep/create-agents Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 1, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
agents-api Ready Ready Preview, Comment Apr 6, 2026 9:04pm
agents-docs Ready Ready Preview, Comment Apr 6, 2026 9:04pm
agents-manage-ui Ready Ready Preview, Comment Apr 6, 2026 9:04pm

Request Review

@pullfrog
Copy link
Copy Markdown
Contributor

pullfrog Bot commented Apr 1, 2026

TL;DR — Routes tool approval requests from delegated sub-agents through the durable workflow hook system (WDK) instead of the in-memory pub/sub bus, fixes a re-delegation loop where approving a delegated tool caused the parent to re-send the original user message instead of continuing with the tool result, and preserves tool denial context across durable workflow steps so the LLM can acknowledge user redirects.

Key changes

  • Parent-side detection of durable-approval-required artifacts in tool-wrapperwrapToolWithStreaming inspects delegation results for durable-approval-required data artifacts and populates pendingDurableApproval with delegatedApproval context including the sub-agent's actual toolCallId.
  • Post-approval continuation prompt with denial awareness — New isPostApproval flag in the workflow loop sends a continuation message instead of re-sending the original user input, breaking the re-delegation loop. When denials are present, the prompt summarizes what was denied and why so the LLM can acknowledge the user's redirect.
  • Delegated approval forwarding through A2A metadatarelationTools passes durableWorkflowRunId and delegatedToolApproval decision through delegation metadata so the sub-agent receives pre-approved tool calls on re-execution.
  • CredentialStoreRegistry and baseUrl fix in durable execution pathbuildAgentForStep now creates a CredentialStoreRegistry for parity with classic mode and appends /run/agents to the base URL so delegation A2A calls route correctly.
  • Remove in-memory pub/sub for durable delegated pathtool-approval.ts drops the toolApprovalUiBus.publish branch for delegated agents; replaced by the artifact flow.
  • Delegated approval SSE streamingcallLlmStep streams toolInputStart, toolInputDelta, toolInputAvailable, and toolApprovalRequest SSE events using the delegated tool's actual toolCallId so the UI displays the correct tool.
  • Handle AI SDK stopWhen throw for durable approval detectioncallLlmStep wraps agent.generate() in a try/catch so that when the AI SDK's stopWhen callback throws (instead of returning cleanly), pending durable approvals are still detected and surfaced. The same pattern is added to generateTaskHandler for the non-durable A2A path via a shared buildDurableApprovalResult helper.
  • Tool denial context preserved across durable workflow stepsDenialRedirect type tracks denied tool calls with their reason. Denials flow from executeToolStepagentExecutioncallLlmStep so subsequent LLM turns see which tools were denied and can respond appropriately.
  • Improved denied-tool conversation history format — Denied tool results now include [DENIED BY USER] in the header, the original input, and a structured Denial Reason section instead of a bare output line.
  • Extract shared types and constantsPendingDurableApproval interface, DurableApprovalData interface, and DELEGATE_TOOL_PREFIX/TRANSFER_TOOL_PREFIX constants replace inline types and string literals across multiple files.

Summary | 11 files | 6 commits | base: mainfeat/durable-delegated-tool-approval


Parent-side detection of delegated approval artifacts

Before: A delegated sub-agent needing tool approval used the in-memory toolApprovalUiBus which doesn't work across durable workflow boundaries, or threw an error caught by a try/catch wrapper.
After: The sub-agent returns a durable-approval-required data artifact that the parent's tool-wrapper detects after normal execution — no error-path handling needed.

wrapToolWithStreaming scans both message-level parts (result.parts[]) and artifact-level parts (result.artifacts[].parts[]) via a findApprovalRequired helper. When found, it populates ctx.pendingDurableApproval with the delegated approval context including subAgentId, toolCallId, toolName, and args. Delegation results are now stored in conversation history for durable workflows (skipHistoryStorage is false for delegate_to_* tools when durableWorkflowRunId is set), ensuring the next callLlmStep sees the delegation outcome.

The raw artifact data is now cast through a DurableApprovalData interface and tool name matching uses extracted DELEGATE_TOOL_PREFIX / TRANSFER_TOOL_PREFIX constants.

tool-wrapper.ts · agent-types.ts · Agent.ts


Post-approval continuation prompt with denial awareness

Before: After the user approved or denied a delegated tool call, the workflow re-sent the original user message to the LLM — causing it to re-delegate to the sub-agent in an infinite loop. Denials were lost between durable steps.
After: An isPostApproval flag switches the user message to a context-aware continuation prompt. When denials exist, the prompt includes a structured summary of denied tools and reasons so the LLM acknowledges the user's redirect instead of retrying.

Scenario Continuation prompt
Approval (no denials) "Continue the conversation. The tool results above contain the information needed to respond to the user."
One or more denials Lists each denied tool and reason, instructs the LLM to acknowledge the redirect

The flag is set after each approval round in agentExecution.ts and reset on transfer. DenialRedirect objects accumulate across rounds and are passed to callLlmStep via denialRedirects.

agentExecution.ts · agentExecutionSteps.ts


Delegated approval forwarding through A2A delegation metadata

Before: relationTools built delegation metadata without workflow or approval context — the sub-agent had no way to receive pre-approved tool calls on re-execution.
After: durable_workflow_run_id and approved_tool_calls (serialized JSON) are injected into the delegation metadata when the agentRunContext carries them.

Metadata field Source Purpose
durable_workflow_run_id agentRunContext.durableWorkflowRunId Links delegation to the parent's durable workflow
approved_tool_calls agentRunContext.delegatedToolApproval Pre-approves the specific tool call in the sub-agent

The agentRunContext is now threaded through relation-tools.tsrelationTools.tscreateDelegateToAgentTool.

relationTools.ts · relation-tools.ts


CredentialStoreRegistry, base URL fix, and SSE streaming for delegated approvals

Before: Durable execution path passed credentialStoreRegistry: undefined for MCP tools, used the bare API root as baseUrl (missing /run/agents), and had no SSE streaming for delegated tool approval UI events.
After: buildAgentForStep instantiates a CredentialStoreRegistry with default credential stores, appends /run/agents to the base URL so A2A delegation calls route correctly, and callLlmStep emits the full SSE event sequence (toolInputStarttoolInputDeltatoolInputAvailabletoolApprovalRequest) using the delegated tool's toolCallId.

The redundant 'http://localhost:3002' fallback for apiBaseUrl has been removed — env.INKEEP_AGENTS_API_URL is now used directly. The hook token in agentExecution.ts uses delegatedApproval.toolCallId (the sub-agent's actual tool call ID) so the WDK hook matches the approval the UI sends back.

agentExecutionSteps.ts · tool-approval.ts


Handle AI SDK stopWhen throw for durable approval detection

Before: callLlmStep called agent.generate() without a try/catch, relying on stopWhen returning cleanly when a pending approval was detected. If the AI SDK threw instead (e.g. during tool result processing), the pending approval was lost and the workflow errored out.
After: callLlmStep wraps generate() in a try/catch that checks for pendingDurableApproval before re-throwing. The same pattern is added to generateTaskHandler's error handler via a shared buildDurableApprovalResult helper that now accepts the PendingDurableApproval interface.

Why does stopWhen sometimes throw? The AI SDK's stopWhen callback runs during the generation loop. When it signals a stop, the SDK may throw an internal error rather than returning a clean response — depending on where in the tool-call lifecycle the stop occurs. The try/catch ensures pending approvals are captured regardless of how the SDK surfaces the stop.

agentExecutionSteps.ts · generateTaskHandler.ts


Tool denial context preserved across durable workflow steps

Before: When a user denied a tool call, the denial reason was lost between durable workflow steps. The next LLM turn had no visibility into what was denied or why, leading to generic or confused responses.
After: executeToolStep returns denial information via a new DenialRedirect type. agentExecution accumulates denials across approval rounds and passes them to callLlmStep, which injects them into the taskDenialRedirects context and crafts a denial-aware continuation prompt.

The denied-tool conversation history format is also improved — headers now include [DENIED BY USER], the original input is preserved, and a structured Denial Reason section replaces the bare output line.

agentExecutionSteps.ts · agentExecution.ts · tool-result-for-conversation-history.ts

Pullfrog  | View workflow run | Triggered by Pullfrog | Using Claude Opus𝕏

Copy link
Copy Markdown
Contributor

@pullfrog pullfrog Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid approach — the artifact-based signaling pattern for durable delegated approvals is a clean replacement for the in-memory pub/sub, and the isPostApproval continuation prompt correctly breaks the re-delegation loop. Three issues to flag, one potentially breaking.

Pullfrog  | Fix all ➔Fix 👍s ➔View workflow run | Using Claude Opus𝕏

Comment thread agents-api/src/domains/run/workflow/steps/agentExecutionSteps.ts Outdated
Comment thread agents-api/src/domains/run/agents/relationTools.ts Outdated
Comment thread agents-api/src/domains/run/agents/tools/tool-wrapper.ts Outdated
Comment thread agents-api/src/domains/run/workflow/steps/agentExecutionSteps.ts Outdated
Comment thread agents-api/src/domains/run/agents/generateTaskHandler.ts Outdated
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 1, 2026

Preview URLs

Use these stable preview aliases for testing this PR:

These point to the same Vercel preview deployment as the bot comment, but they stay stable and easier to find.

Raw Vercel deployment URLs

Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review Summary

(7) Total Issues | Risk: Medium

🟠⚠️ Major (3) 🟠⚠️

Inline Comments:

  • 🟠 Major: tool-wrapper.ts:231 Unsafe type coercion of approval artifact fields without validation
  • 🟠 Major: agentExecutionSteps.ts:555-576 SSE streaming for delegated approvals lacks error handling
  • 🟠 Major: agentExecution.ts:82-85 Durable workflow hook awaits indefinitely without timeout

🟡 Minor (1) 🟡

Inline Comments:

  • 🟡 Minor: agentExecution.ts:82-86 Improve logging for delegated approval debugging

🕐 Pending Recommendations (3)

From pullfrog's prior review (still applicable):


🚫 REQUEST CHANGES

Summary: The artifact-based signaling pattern is a solid architectural choice for durable delegated approvals, and the isPostApproval continuation prompt correctly addresses the re-delegation loop. However, there are three major issues that should be addressed before merging:

  1. Artifact validation — The durable-approval-required artifact fields are used without validation, which could cause silent corruption or unhelpful runtime errors if a sub-agent returns malformed data.

  2. SSE error handling — The delegated approval SSE streaming block has no error handling. If writes fail, the workflow suspends awaiting approval but the client never receives the approval request UI — a particularly confusing failure mode.

  3. Indefinite workflow suspension — Unlike the non-durable path (10-minute timeout), the durable workflow hook can suspend indefinitely. Consider either adding a timeout or documenting the expected cleanup strategy.

The pullfrog findings about baseUrl convention and type casting are worth addressing as follow-ups but are less critical.

Discarded (3)
Location Issue Reason Discarded
agentExecutionSteps.ts:155-167 Empty catch blocks in enhanceRelation Pre-existing code not introduced by this PR
tool-wrapper.ts:196 Nested artifact parsing without bounds Low likelihood of malicious sub-agent responses in controlled environment
agentExecutionSteps.ts:503 isPostApproval may not survive workflow replays Workflow engine handles step replay correctly; flag is set consistently in workflow scope
Reviewers (3)
Reviewer Returned Main Findings Consider While You're Here Inline Comments Pending Recs Discarded
pr-review-errors 4 0 0 0 2 0 2
pr-review-sre 5 0 0 0 2 0 3
pr-review-standards 0 0 0 0 0 0 0
Total 9 0 0 0 4 0 5

Note: pr-review-standards found no issues — validated that baseUrl change is consistent with existing patterns in executionHandler.ts and data/agents.ts.

Comment thread agents-api/src/domains/run/agents/tools/tool-wrapper.ts Outdated
Comment thread agents-api/src/domains/run/workflow/steps/agentExecutionSteps.ts Outdated
Comment thread agents-api/src/domains/run/workflow/functions/agentExecution.ts
Comment thread agents-api/src/domains/run/workflow/functions/agentExecution.ts
Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review Summary

(8) Total Issues | Risk: Medium | Review Scope: Delta (1 commit since last review)

This is a re-review scoped to changes in commit 5aea41ec7 ("fix: handle AI SDK stopWhen throw for durable approval detection").

🟠⚠️ Major (1) 🟠⚠️

Inline Comments:

  • 🟠 Major: agentExecutionSteps.ts:533-537 Swallowing errors silently when pendingDurableApproval is set

🟡 Minor (1) 🟡

Inline Comments:

  • 🟡 Minor: generateTaskHandler.ts:585-588 Catch block suppresses original error without logging

🕐 Pending Recommendations (6)

From prior automated review (claude):

From pullfrog's prior review:


🚫 REQUEST CHANGES

Summary: The delta commit (5aea41ec7) correctly addresses the AI SDK stopWhen throw behavior by catching the error and extracting the pending approval state. However, silently discarding caught errors without logging creates a debugging blind spot. When errors are suppressed because pendingDurableApproval exists, engineers have no visibility into what actually failed. Add logging before suppressing these errors.

The 6 pending recommendations from the prior review remain unaddressed and should be considered alongside this delta feedback.

Discarded (1)
Location Issue Reason Discarded
agentExecutionSteps.ts:603-605 Error message lacks context Low severity — the check is effectively dead code given the control flow (if catch re-throws, this line is unreachable)
Reviewers (1)
Reviewer Returned Main Findings Consider While You're Here Inline Comments Pending Recs Discarded
pr-review-errors 3 0 0 0 2 0 1
Total 3 0 0 0 2 0 1

Note: Delta review — only dispatched error handling reviewer since the changes were focused on error/exception handling patterns.

Comment thread agents-api/src/domains/run/workflow/steps/agentExecutionSteps.ts Outdated
Comment thread agents-api/src/domains/run/agents/generateTaskHandler.ts
Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review Summary

(0) Total Issues | Risk: Low

This is a re-review following prior feedback from pullfrog and two automated claude reviews. All previously identified issues have been addressed.

✅ Issues Resolved Since Prior Reviews

Prior Issue Resolution
🟠 Unsafe type coercion of approval artifact fields (tool-wrapper.ts:232-245) Fixed — Added validation with typeof checks and error logging before using delegatedToolCallId and delegatedToolName
🟠 SSE streaming for delegated approvals lacks error handling (agentExecutionSteps.ts:583-611) Fixed — Wrapped SSE writes in try-catch with error logging and re-throw
🟠 Durable workflow hook awaits indefinitely (agentExecution.ts:98-101) Documented — Comment explains the design decision and cleanup strategy via external job
🟡 Improve logging for delegated approval debugging (agentExecution.ts:88-96) Fixed — Added logger.info with hookToolCallId, parentToolCallId, isDelegated, and workflowRunId
🟡 Swallowing errors silently when pendingDurableApproval is set (agentExecutionSteps.ts:545-553) Fixed — Added logger.info capturing error message, error name, requestId, and currentSubAgentId
🟡 Catch block suppresses original error without logging (generateTaskHandler.ts:591-598) Fixed — Added logger.info with toolCallId, toolName, and error message
🟡 baseUrl convention deviation (agentExecutionSteps.ts:117-121) Documented — Comment explains the intentional deviation for durable mode
🟡 Type casting delegationMeta (relationTools.ts:434-452) Accepted — The spread pattern is safe here since we're adding fields, not removing them
🟡 Continuation prompt dependency on conversation history (agentExecutionSteps.ts:507-510) Documented — Comment explains the dependency on skipHistoryStorage fix

Architecture Assessment

The artifact-based signaling pattern for durable delegated approvals is well-designed:

  1. Artifact flow — Sub-agents return durable-approval-required artifacts that the parent's tool-wrapper detects and propagates through pendingDurableApproval
  2. Re-delegation loop preventionisPostApproval flag switches to continuation prompt after approval rounds
  3. Delegation metadata forwardingdurableWorkflowRunId and approvedToolCalls flow through A2A metadata
  4. SSE event streaming — Delegated approvals stream the correct toolCallId from the sub-agent

⚠️ Operational Note

The durable workflow hook (toolApprovalHook.create()) suspends indefinitely by design. The comment at lines 98-101 documents that stale suspended workflows should be cleaned up by an external job querying workflow_executions with status='suspended'. Ensure this cleanup mechanism exists or is planned.

Missing Changeset

The changeset-bot flagged that this PR has no changeset. Since this adds support for tool approvals in delegated sub-agents running in durable execution mode, consider adding:

pnpm bump patch --pkg agents-api "Add support for tool approvals in delegated sub-agents during durable execution"

✅ APPROVE

Summary: All prior review feedback has been addressed. The implementation is solid — artifact-based signaling cleanly replaces the in-memory pub/sub for durable delegated approvals, error handling is comprehensive with logging, and the isPostApproval continuation prompt correctly breaks the re-delegation loop. Ship it! 🚀

Reviewers (0)
Reviewer Returned Main Findings Consider While You're Here Inline Comments Pending Recs Discarded
Total 0 0 0 0 0 0 0

Note: This re-review validated that all prior findings have been addressed. No new reviewers were dispatched since the diff showed only fixes to previously identified issues.

@github-actions github-actions Bot deleted a comment from claude Bot Apr 2, 2026
@anubra266 anubra266 force-pushed the feat/durable-delegated-tool-approval branch from f6e94ae to e8954e7 Compare April 2, 2026 00:45
@vercel vercel Bot temporarily deployed to Preview – agents-docs April 2, 2026 00:45 Inactive
Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review Summary

(0) Total Issues | Risk: Low

This is a re-review triggered by CI re-run. The code is unchanged from the prior APPROVE review — all previously identified issues remain resolved.

✅ Verified Fixes (from prior reviews)

Prior Issue Status
🟠 Unsafe type coercion of approval artifact fields ✅ Fixed — tool-wrapper.ts:232-245 has typeof validation
🟠 SSE streaming lacks error handling ✅ Fixed — agentExecutionSteps.ts:583-611 has try-catch with re-throw
🟠 Durable workflow hook awaits indefinitely ✅ Documented — agentExecution.ts:92-95 explains cleanup strategy
🟡 Missing logging for delegated approval debugging ✅ Fixed — agentExecution.ts:85-90 logs hook details
🟡 Silent error swallowing when pendingDurableApproval set ✅ Fixed — agentExecutionSteps.ts:545-553 logs caught errors
🟡 Catch block suppresses original error ✅ Fixed — generateTaskHandler.ts:591-598 logs error context
🟡 baseUrl convention deviation ✅ Documented — agentExecutionSteps.ts:117-121 explains intentional deviation
🟡 Continuation prompt history dependency ✅ Documented — agentExecutionSteps.ts:507-510 explains reliance on skipHistoryStorage

Architecture Assessment

The artifact-based signaling pattern for durable delegated approvals is well-designed:

  1. Artifact flow — Sub-agents return durable-approval-required artifacts that the parent's tool-wrapper detects and propagates via pendingDurableApproval
  2. Re-delegation loop preventionisPostApproval flag switches to continuation prompt after approval rounds
  3. Delegation metadata forwardingdurableWorkflowRunId and approvedToolCalls flow through A2A metadata
  4. SSE event streaming — Delegated approvals stream the correct toolCallId from the sub-agent

⚠️ CI Note

The "Auto Format" workflow failure is a CI infrastructure issue (pnpm setup step), not a code problem.

Missing Changeset

Consider adding a changeset for this feature:

pnpm bump patch --pkg agents-api "Add support for tool approvals in delegated sub-agents during durable execution"

✅ APPROVE

Summary: All prior review feedback has been addressed. The implementation is solid — artifact-based signaling cleanly replaces the in-memory pub/sub for durable delegated approvals, error handling is comprehensive with logging, and the isPostApproval continuation prompt correctly breaks the re-delegation loop. Ship it! 🚀

Reviewers (0)
Reviewer Returned Main Findings Consider While You're Here Inline Comments Pending Recs Discarded
Total 0 0 0 0 0 0 0

Note: Re-review — validated prior fixes are still present. No new reviewers dispatched.

@amikofalvy
Copy link
Copy Markdown
Collaborator

Follow-up: Duplication beyond tool approval flows

Looking beyond the approval-specific duplication from my previous comment, there are additional sharing opportunities between executionHandler.ts (classic) and agentExecutionSteps.ts (durable).


1. Task creation — ~60 lines identical in both engines (HIGH)

Classic: executionHandler.ts:180-255
Durable: initializeTaskStep in agentExecutionSteps.ts:329-393

Both do the exact same thing:

  • Generate task ID with task_${conversationId}-${requestId}
  • Call createTask() with identical metadata structure (conversationId, threadId, streamRequestId, streamBaseUrl, apiKey, etc.)
  • Handle isUniqueConstraintError by fetching the existing task
  • Read maxTransfers from agentEntry.stopWhen?.transferCountIs ?? AGENT_EXECUTION_TRANSFER_COUNT_DEFAULT

Proposal: Extract a shared createOrReuseTask(payload, defaultSubAgentId) that both engines call.


2. Stream/session registration — 3-line sequence repeated 3 times (LOW effort, HIGH clarity)

The sequence:

registerStreamHelper(requestId, sseHelper);
agentSessionManager.createSession(requestId, executionContext, conversationId);
if (emitOperations) agentSessionManager.enableEmitOperations(requestId);

Appears in:

  • executionHandler.ts:105-111
  • callLlmStep in agentExecutionSteps.ts:474-477
  • executeToolStep in agentExecutionSteps.ts:826-829

Proposal: setupStreamSession(requestId, sseHelper, executionContext, conversationId, emitOperations)


3. OTel baggage propagation — identical 5-line block within durable path (MEDIUM)

const bag = (propagation.getBaggage(otelContext.active()) ?? propagation.createBaggage())
  .setEntry('conversation.id', { value: conversationId })
  .setEntry('tenant.id', { value: tenantId })
  .setEntry('project.id', { value: projectId })
  .setEntry('agent.id', { value: agentId });
const ctxWithBaggage = propagation.setBaggage(otelContext.active(), bag);

Identical in callLlmStep:487-493 and executeToolStep:832-838. Classic doesn't do this at all.

Proposal: createAgentOTelContext(conversationId, tenantId, projectId, agentId)


4. Runtime context object — identical 8-line object within durable path (MEDIUM)

const runtimeContext = {
  contextId: conversationId,
  metadata: {
    conversationId, threadId: conversationId, taskId,
    streamRequestId: requestId, apiKey: executionContext.apiKey,
  },
};

Identical in callLlmStep:504-515 and executeToolStep:844-853.

Proposal: buildStepRuntimeContext(conversationId, taskId, requestId, apiKey)


Not duplication (structurally forced by durable architecture)

  • buildAgentForStep — must rebuild agents per step (process may restart). Classic doesn't build agents in-process.
  • In-process tool execution — durable calls toolDef.execute() directly; classic delegates via A2A HTTP.
  • Transfer detection — both detect transfer_to_ but through different APIs (hasToolCallWithPrefix vs isTransferTask). Similar logic, but the response shapes differ enough that sharing is non-trivial.

Combined extraction summary (both comments)

Extraction Effort Deduplicates
streamToolApprovalRequest() Small 4 copies of SSE approval sequence
createOrReuseTask() Small ~60 lines × 2 engines
extractDelegatedApproval() Small 70-line durable block in shared tool-wrapper
setupStreamSession() Trivial 3-line sequence × 3 locations
createAgentOTelContext() Trivial 5-line block × 2 steps
buildStepRuntimeContext() Trivial 8-line block × 2 steps
Canonical approval types Trivial 5+ inline type re-declarations
serialize/deserializeApprovalDecisions() Small Fragile JSON round-trip across files

Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review Summary

(6) Total Issues | Risk: Medium | Review Scope: Delta + Human Feedback

This is a re-review incorporating the delta since the last automated review (46a02c31aa74daba31) and addressing the human reviewer's (amikofalvy) outstanding feedback.

Delta Analysis

Commit a74daba31: "Fix tool denial context lost across durable workflow steps"

The delta adds preservation of tool denial context across durable workflow step boundaries:

  1. tool-result-for-conversation-history.ts — Improved denied tool result formatting to include [DENIED BY USER] marker and show both input and denial reason
  2. agentExecution.ts — Added denialRedirects array accumulator to preserve denials across workflow iterations
  3. agentExecutionSteps.ts — Added DenialRedirect type, denialRedirects param, and post-approval prompt that acknowledges denials

Assessment: The delta changes are well-structured. The denial context flows correctly: executeToolStep returns denials → agentExecution accumulates them → callLlmStep uses them to construct an appropriate post-approval prompt. This correctly addresses the issue where tool denial context was lost.

🟡 Minor (4) 🟡

🟡 1) agentExecution.ts:90 Use logger utility instead of console.info

Issue: Uses console.info directly while the rest of the codebase uses structured logger.

Why: Bypasses structured logging, making production debugging harder.

Fix: Import and use logger utility.

Refs: Human review comment

🟡 2) agentExecutionSteps.ts:128 Default localhost URL should come from env.ts

Issue: Hardcoded 'http://localhost:3002' fallback appears in multiple places.

Why: Scattered defaults create inconsistency and maintenance burden.

Fix: Source the default from env.ts as the single source of truth.

Refs: Human review comment

🟡 3) tool-wrapper.ts:113 Extract delegate_to_ prefix to constant

Issue: String literal 'delegate_to_' is duplicated across multiple files.

Why: Creates maintenance burden and typo risk.

Fix: Define DELEGATE_TO_PREFIX constant in a shared location.

Refs: Human review comment

🟡 4) tool-wrapper.ts:232 Type validation is defensive but types are weak

Issue: Runtime typeof checks are needed because artifact parsing produces unknown types.

Why: The validation is necessary given the untyped artifact data, but the human reviewer is right that stronger types would be better.

Fix: Define a Zod schema for the durable-approval-required artifact shape and use safeParse for validation, which provides both type safety and runtime validation.

Refs: Human review comments

💭 Consider (1) 💭

💭 1) agent-types.ts:273-289 Extract shared approval types

Issue: The pendingDurableApproval.delegatedApproval shape and DelegatedApprovalContext in agentExecutionSteps.ts are duplicates.

Why: amikofalvy's analysis identified 5+ inline type re-declarations for approval-related shapes. Canonical types would reduce duplication.

Fix: Define and export DelegatedApprovalContext and ToolApprovalDecision types in agent-types.ts, import everywhere.

Refs: amikofalvy's duplication analysis

🕐 Pending Recommendations (6)

From amikofalvy's review (active, unaddressed):

✅ Resolved from Prior Automated Reviews

All technical issues from prior automated reviews have been addressed:

  • ✅ Artifact validation with typeof checks — present at tool-wrapper.ts:232-245
  • ✅ SSE streaming error handling — present at agentExecutionSteps.ts:647-655
  • ✅ Durable hook timeout documented — present at agentExecution.ts:97-100
  • ✅ Error logging when suppressing — present at agentExecutionSteps.ts:586-594
  • ✅ baseUrl deviation documented — present at agentExecutionSteps.ts:129-132

✅ Changeset Present

The PR now includes a changeset (tiny-eels-burn.md) for the fix.


💡 APPROVE WITH SUGGESTIONS

Summary: The delta correctly preserves tool denial context across durable workflow steps. The feature implementation is solid — artifact-based signaling cleanly replaces the in-memory pub/sub for durable delegated approvals, and the isPostApproval continuation prompt correctly handles both approvals and denials.

Outstanding items: amikofalvy's 6 active code comments should be addressed. These are reasonable code quality improvements:

  1. console.info → logger — Simple fix, ensures structured logging
  2. Hardcoded localhost:3002 — Should come from env.ts
  3. String literal duplication — Extract delegate_to_ to constant
  4. Weak typing — Consider Zod schemas for approval artifact validation

None are blocking bugs, but resolving them before merge would improve maintainability.

Inline Comments:

  • 🟡 Minor: agentExecution.ts:90 Use logger utility
  • 🟡 Minor: agentExecutionSteps.ts:128 Env.ts for default URL
  • 🟡 Minor: tool-wrapper.ts:113 Extract delegate prefix constant
  • 💭 Consider: agent-types.ts:273-289 Shared approval types
Discarded (2)
Location Issue Reason Discarded
agentExecutionSteps.ts:630 Inline for loop for chunking instead of chunkString helper Valid duplication concern raised by amikofalvy, but addressed as part of broader streamToolApprovalRequest extraction proposal — a follow-up refactor, not a blocking issue
generateTaskHandler.ts:631 console.error instead of logger Pre-existing code not introduced by this PR
Reviewers (2)
Reviewer Returned Main Findings Consider While You're Here Inline Comments Pending Recs Discarded
orchestrator 6 0 1 0 4 6 2
pr-review-consistency 0 0 0 0 0 0 0
Total 6 0 1 0 4 6 2

Note: This review focused on synthesizing the human reviewer's feedback and validating the delta changes. No sub-reviewers were dispatched since the primary work was addressing amikofalvy's outstanding comments.

Comment thread agents-api/src/domains/run/workflow/functions/agentExecution.ts
Comment thread agents-api/src/domains/run/workflow/steps/agentExecutionSteps.ts Outdated
Comment thread agents-api/src/domains/run/agents/tools/tool-wrapper.ts Outdated
Comment thread agents-api/src/domains/run/agents/agent-types.ts Outdated
@github-actions github-actions Bot deleted a comment from claude Bot Apr 6, 2026
@vercel vercel Bot temporarily deployed to Preview – agents-docs April 6, 2026 19:35 Inactive
Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review Summary

(0) Total Issues | Risk: Low | Review Scope: Delta

This is a re-review scoped to the delta since the last automated review (a74daba3135decfaf4).

Delta Analysis

Single commit: 35decfaf4 Address review: extract tool prefix constants, add shared types, remove redundant localhost fallback

This commit directly addresses amikofalvy's human review feedback:

Review Comment Resolution
Extract tool prefix to constant ✅ Added DELEGATE_TOOL_PREFIX and TRANSFER_TOOL_PREFIX constants in tool-wrapper.ts:14-15
Enforce stronger types for approval data ✅ Added DurableApprovalData interface in tool-wrapper.ts:17-28
Better types in generateTaskHandler.ts ✅ Now uses imported PendingDurableApproval type
Better types in Agent.ts getPendingDurableApproval() returns PendingDurableApproval type
Remove hardcoded localhost:3002 default ✅ Removed fallback at agentExecutionSteps.ts:128

✅ All Human Review Comments Addressed (5/6)

The delta resolves 5 of 6 human review comments. The remaining item (console.info → logger at agentExecution.ts:90) is in a file that was not modified in this commit — it was part of the earlier feature implementation that prior automated reviews already covered and approved.

Architecture Assessment

The type improvements are well-structured:

  • PendingDurableApproval — Canonical shared type in agent-types.ts:225-235 eliminates inline type re-declarations
  • DurableApprovalData — Provides type safety for artifact parsing in tool-wrapper.ts
  • ConstantsDELEGATE_TOOL_PREFIX / TRANSFER_TOOL_PREFIX reduce string literal duplication

🕐 Pending Recommendations (1)

From amikofalvy's review (file not in delta):

This was flagged in prior reviews and is still applicable, but the file wasn't touched in this commit. Consider addressing as a follow-up or in the same PR before merge.


✅ APPROVE

Summary: The delta commit successfully addresses the human reviewer's feedback — type safety improvements, constant extraction, and removal of the hardcoded localhost fallback. All prior technical issues from automated reviews remain resolved. The only outstanding item is the console.info → logger change at agentExecution.ts:90, which is a minor improvement that can be addressed before merge or as a follow-up. Ship it! 🚀

Reviewers (0)
Reviewer Returned Main Findings Consider While You're Here Inline Comments Pending Recs Discarded
Total 0 0 0 0 0 1 0

Note: Delta review — no sub-reviewers dispatched since the changes were direct fixes to prior human review feedback. Verified all type improvements are correct and follow existing patterns.

@anubra266
Copy link
Copy Markdown
Contributor Author

anubra266 commented Apr 6, 2026

@amikofalvy For this PR scope: the type extraction (PendingDurableApproval) is done in the latest commit. The remaining extractions (createOrReuseTask, setupStreamSession, createAgentOTelContext, buildStepRuntimeContext, streamToolApprovalRequest) are cleanup work that would benefit from their own PR to keep the review surface focused. Happy to tackle those as a follow-up.

@anubra266 anubra266 requested a review from amikofalvy April 6, 2026 19:51
Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review Summary

(1) Total Issues | Risk: Low | Review Scope: Delta

This is a re-review scoped to the delta since the last automated review (a9be51148f7312d16a).

Delta Analysis

The delta consists of:

  • Merge commit (f7312d16a) — merging main into the feature branch (23 commits from main)
  • Feature code unchanged — all 11 feature files have no modifications in this delta

The merged changes from main include unrelated work: feedback system, scheduled triggers, model resolver fixes, React compiler updates, project deletion improvements, etc. None of these interact with the durable delegated tool approval feature.

🟡 Minor (1) 🟡

Inline Comments:

  • 🟡 Minor: agentExecution.ts:90-95 Use structured logger instead of console.info

✅ Prior Issues Resolved

All issues from prior reviews have been addressed:

Prior Issue Status
🟠 Type extraction (PendingDurableApproval) ✅ Done in 35decfaf4
🟠 Artifact validation with typeof checks ✅ Present at tool-wrapper.ts:250-262
🟠 SSE streaming error handling ✅ Present at agentExecutionSteps.ts:647-655
🟠 Durable hook timeout documented ✅ Present at agentExecution.ts:97-100
🟡 Hardcoded localhost:3002 removed ✅ Done in 35decfaf4
🟡 Tool prefix constants extracted DELEGATE_TOOL_PREFIX/TRANSFER_TOOL_PREFIX in tool-wrapper.ts:14-15
🟡 DurableApprovalData interface added ✅ Present at tool-wrapper.ts:17-28
🟡 Error logging when suppressing ✅ Present at agentExecutionSteps.ts:586-594

⚠️ Ito Test Failure Note

The Ito test report flagged that the reconnect endpoint (/run/api/executions/{executionId}/stream) misses post-approval continuation events. This is a valid finding but it's a pre-existing issue in executions.ts:344 that exists independently of this PR:

// executions.ts:344 — does NOT use namespace
const readable = run.getReadable({ startIndex });

// chatDataStream.ts:183-194 — DOES use namespace correctly
const namespace = (durableExecution.metadata as any)?.continuationStreamNamespace;
const readable = run.getReadable({ namespace });

This PR correctly implements namespace handling in the /run/api/chat path but doesn't modify executions.ts. Consider filing a follow-up issue to align the reconnect endpoint behavior.

Architecture Assessment

The artifact-based signaling pattern for durable delegated approvals is well-designed:

  1. Artifact flow — Sub-agents return durable-approval-required artifacts that the parent's tool-wrapper detects and propagates through pendingDurableApproval
  2. Re-delegation loop preventionisPostApproval flag switches to continuation prompt after approval rounds
  3. Delegation metadata forwardingdurableWorkflowRunId and approved_tool_calls flow through A2A metadata
  4. SSE event streaming — Delegated approvals stream the correct toolCallId from the sub-agent
  5. Denial context preservationDenialRedirect type tracks denied tools across workflow steps

Human Reviewer Follow-up Items

amikofalvy's analysis identified duplication opportunities for follow-up (acknowledged by author as out of PR scope):

  • streamToolApprovalRequest() helper (4 copies of SSE approval sequence)
  • createOrReuseTask() (60 lines × 2 engines)
  • setupStreamSession() (3-line sequence × 3 locations)
  • Canonical types for approval shapes

✅ APPROVE

Summary: The delta contains only a merge from main — no modifications to the feature implementation. All prior review feedback has been addressed and verified. The one remaining minor issue (console.info → logger) is low-severity and doesn't block merge.

The implementation is solid: artifact-based signaling cleanly replaces the in-memory pub/sub for durable delegated approvals, comprehensive error handling with logging, and the isPostApproval continuation prompt correctly breaks the re-delegation loop. The Ito test failure is a pre-existing issue in a different file — recommend filing a follow-up issue. Ship it! 🚀

Discarded (1)
Location Issue Reason Discarded
executions.ts:344 Missing namespace in reconnect stream Pre-existing issue not introduced by this PR — recommend follow-up issue
Reviewers (0)
Reviewer Returned Main Findings Consider While You're Here Inline Comments Pending Recs Discarded
Total 0 0 0 0 1 0 1

Note: Delta review — no sub-reviewers dispatched since feature code is unchanged. Only verified prior fixes remain in place and analyzed the merge commit scope.

parentToolCallId: toolCall.toolCallId,
isDelegated: !!llmResult.delegatedApproval,
workflowRunId,
});
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Minor: Use structured logger instead of console.info

Issue: Uses console.info directly while the rest of the codebase uses the structured logger utility.

Why: Using console.info bypasses structured logging, making production debugging harder. Log aggregation systems can't filter/search by logger name, and the output lacks the consistent formatting used elsewhere.

Fix:

import { getLogger } from '../../../../logger';
const logger = getLogger('agentExecution');

// Then at line 90:
logger.info({
  hookToolCallId,
  parentToolCallId: toolCall.toolCallId,
  isDelegated: !!llmResult.delegatedApproval,
  workflowRunId,
}, 'Creating tool approval hook');

Refs:

@itoqa
Copy link
Copy Markdown

itoqa Bot commented Apr 6, 2026

Ito Test Report ✅

15 test cases ran. 2 additional findings, 13 passed.

Across 15 end-to-end and code-verified cases, 13 passed and core durable delegated-approval behavior was broadly validated: approval and denial continuations worked, denied-result formatting and history persistence guards were correct, endpoint-based resume by executionId/toolCallId succeeded, race/navigation/mobile UX scenarios were stable and idempotent, and stale approval replay plus guessed stream probing showed no data-leak side effects. The two key defects were a High-severity bug in POST /run/api/chat where batched approvals are incorrectly scoped to one latest conversation execution (dropping additional toolCallId decisions and leaving runs inconsistent) and a Medium-severity auth issue where unauthenticated /run/api requests in dev/test fall through to a fabricated default context and return 404 Project not found instead of 401/403.

✅ Passed (13)
Category Summary Screenshot
Adversarial Stale/unknown approval replay requests were safely rejected with no execution resume side effects. ADV-1
Adversarial Guessed execution/conversation stream probes returned safe responses with no foreign data leakage. ADV-5
Edge Duplicate approve actions are handled idempotently with single-resolution semantics. EDGE-1
Edge Approve/deny race resolves to one terminal decision and treats follow-up as already processed. EDGE-2
Edge Suspended durable approvals can be recovered and resumed after refresh/reconnect. EDGE-3
Edge Navigation churn does not create duplicate approval continuations due to UI and backend guards. EDGE-4
Edge Mobile viewport approval UI remained operable; prior blocked run was a tooling timeout artifact, not a product defect. EDGE-5
Logic Denied tool-result formatting includes explicit denial markers and fields. LOGIC-1
Logic Durable delegation history persistence guard prevents re-delegation loops. LOGIC-2
Logic Post-approval denial guidance injection is implemented in workflow and task handling. LOGIC-3
Happy-path Durable approval response handling is implemented and targeted verification passed. ROUTE-1
Happy-path Durable denial continuation handling is implemented and validated. ROUTE-2
Happy-path Durable execution resumed and completed after delegated approval via /run/api/executions/:executionId/approvals/:toolCallId. ROUTE-3
ℹ️ Additional Findings (2)

These findings are unrelated to the current changes but were observed during testing.

Category Summary Screenshot
Adversarial 🟠 Unauthenticated run API calls fell through to a default dev context and returned project-not-found 404 instead of auth rejection. ADV-2
Happy-path ⚠️ Batch approval responses in one durable /run/api/chat request only resume one execution; additional toolCallIds are ignored. ROUTE-4
🟠 Unauthenticated run API requests return 404 via dev fallback instead of 401/403
  • What failed: Unauthenticated requests were expected to return 401/403, but middleware injected fallback test tenant/project context and requests continued until project lookup failed with 404 Project not found.
  • Impact: Unauthorized requests are not rejected at the auth boundary in development/test mode, which can mask real authorization regressions and produce misleading response semantics. This weakens confidence in authz-negative testing and can hide security issues until later environments.
  • Steps to reproduce:
    1. Start the API in a development or test environment.
    2. Send POST /run/api/executions//approvals/ without Authorization header or session cookies.
    3. Send POST /run/api/chat with an approval-responded payload from the same unauthenticated context.
    4. Observe that requests proceed past auth and fail with 404 Project not found instead of 401/403.
  • Stub / mock context: Authentication bypasses were removed before this test, and requests were issued from a fresh no-cookie context to verify how protected run endpoints behave without credentials.
  • Code analysis: I traced the /run/api/* middleware chain and confirmed requests always pass through runApiKeyAuth() first, then project loading. In runAuth.ts, the development/test branch accepts missing or invalid auth by creating a default context via createDevContext, so unauthenticated calls proceed; projectConfig.ts then returns a 404 for the fallback test-project, matching the observed behavior.
  • Why this is likely a bug: The auth middleware explicitly permits unauthenticated requests in dev/test by fabricating execution context, so protected endpoints can bypass expected auth rejection and fail later with unrelated 404s.

Relevant code:

agents-api/src/createApp.ts (lines 242-245)

app.use('/run/tenants/*', runApiKeyAuthExcept(isWebhookRoute));
app.use('/run/agents/*', runApiKeyAuth());
app.use('/run/v1/*', runApiKeyAuth());
app.use('/run/api/*', runApiKeyAuth());

agents-api/src/middleware/runAuth.ts (lines 874-898)

const isDev = process.env.ENVIRONMENT === 'development' || process.env.ENVIRONMENT === 'test';

// Development/test environment handling
if (isDev) {
  logger.info({}, 'development environment');

  const attempt = await authenticateRequest(reqData);

  if (attempt.authResult) {
    c.set('executionContext', buildExecutionContext(attempt.authResult, reqData));
  } else {
    logger.info(
      {},
      reqData.apiKey
        ? 'Development/test environment - fallback to default context due to invalid API key'
        : 'Development/test environment - no API key provided, using default context'
    );
    c.set('executionContext', buildExecutionContext(createDevContext(reqData), reqData));
  }

agents-api/src/middleware/projectConfig.ts (lines 88-95)

if (error.isNotFound) {
  return c.json(
    {
      error: 'Project not found',
      message: `Project ${projectId} not found for tenant ${tenantId}`,
    },
    404
  );
}
⚠️ Batch approval responses via /run/api/chat during suspended durable run
  • What failed: The endpoint should apply each approval part to its corresponding suspended execution, but it binds all parts to one conversation-scoped execution and silently ignores non-matching approvals.
  • Impact: Multi-approval durable workflows can lose user decisions and leave executions suspended or inconsistent. This can block task completion and produce unreliable approval outcomes for end users.
  • Steps to reproduce:
    1. Create two suspended durable approvals with different toolCallIds.
    2. Send a single POST /run/api/chat request containing two approval-responded parts, one for each toolCallId.
    3. Observe that only one suspended execution resumes and the other approval is not applied.
  • Stub / mock context: Two pending approvals were generated through a local deterministic delegated-approval bypass so batch approval behavior could be exercised without external model dependencies. The defect confirmation came from server-side durable approval handling code that processes batched approval parts.
  • Code analysis: I reviewed the durable approval branch in the chat route and the execution lookup helper used by that branch. The route fetches only one latest execution for the conversation, builds all resume tokens with that single execution ID, and suppresses resume errors for non-matching tokens, which explains why additional toolCallIds are dropped.
  • Why this is likely a bug: The production route logic deterministically scopes every batch approval part to one latest execution ID and swallows non-match resume errors, so dropping additional valid approvals is an inherent code-path behavior rather than a test artifact.

Relevant code:

agents-api/src/domains/run/routes/chatDataStream.ts (lines 154-179)

const durableExecution = await getWorkflowExecutionByConversation(runDbClient)({
  tenantId,
  projectId,
  conversationId,
});

const isDurable = durableExecution?.status === 'suspended';

if (isDurable && durableExecution) {
  await Promise.allSettled(
    approvalParts.map(async (approvalPart: any) => {
      const toolCallId = approvalPart.toolCallId as string;
      const approved = !!approvalPart.approval?.approved;
      const reason = approvalPart.approval?.reason as string | undefined;
      const token = `tool-approval:${conversationId}:${durableExecution.id}:${toolCallId}`;
      try {
        await toolApprovalHook.resume(token, {
          approved,
          reason: approved ? undefined : reason,
        });
      } catch (error) {
        const message = error instanceof Error ? error.message : String(error);
        if (!message.includes('not found') && !message.includes('already')) {
          throw error;
        }
      }
    })
  );
}

packages/agents-core/src/data-access/runtime/workflowExecutions.ts (lines 45-63)

export const getWorkflowExecutionByConversation =
  (db: AgentsRunDatabaseClient) =>
  async (params: {
    tenantId: string;
    projectId: string;
    conversationId: string;
  }): Promise<WorkflowExecutionSelect | null> => {
    const result = await db
      .select()
      .from(workflowExecutions)
      .where(
        and(
          eq(workflowExecutions.tenantId, params.tenantId),
          eq(workflowExecutions.projectId, params.projectId),
          eq(workflowExecutions.conversationId, params.conversationId)
        )
      )
      .orderBy(desc(workflowExecutions.createdAt))
      .limit(1);

Commit: f7312d1

View Full Run


Tell us how we did: Give Ito Feedback

@anubra266 anubra266 added this pull request to the merge queue Apr 7, 2026
Merged via the queue into main with commit a951178 Apr 7, 2026
30 checks passed
@anubra266 anubra266 deleted the feat/durable-delegated-tool-approval branch April 7, 2026 02:25
inkeep Bot added a commit that referenced this pull request Apr 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants