Skip to content

Tool Result Passing as Arguments (Tool Chaining)#2304

Merged
tim-inkeep merged 39 commits intomainfrom
feature/tool-result-chaining
Feb 27, 2026
Merged

Tool Result Passing as Arguments (Tool Chaining)#2304
tim-inkeep merged 39 commits intomainfrom
feature/tool-result-chaining

Conversation

@tim-inkeep
Copy link
Copy Markdown
Contributor

  • Agents can pass saved artifacts directly to tools as arguments using a sentinel reference; the system resolves full artifact data (including non-preview fields) before the tool executes — no manual reconstruction needed
  • Agents can chain tool calls by piping the raw output of one tool directly into the next without creating an artifact for intermediate results; primitive return types (strings, numbers, booleans) are fully supported
  • Fixed in-session artifact cache key mismatch that caused getArtifactFull to return the wrong data structure for same-turn artifacts
  • Improved artifact creation prompting to clearly distinguish what artifact:create captures (all fields), what artifact:ref displays (preview only), and what tool argument passing resolves (all fields)

@vercel
Copy link
Copy Markdown

vercel Bot commented Feb 24, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
agents-api Ready Ready Preview, Comment Feb 27, 2026 10:43pm
agents-docs Ready Ready Preview, Comment Feb 27, 2026 10:43pm
agents-manage-ui Ready Ready Preview, Comment Feb 27, 2026 10:43pm

Request Review

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented Feb 24, 2026

🦋 Changeset detected

Latest commit: 60934c4

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 9 packages
Name Type
@inkeep/agents-api Patch
@inkeep/agents-manage-ui Patch
@inkeep/agents-cli Patch
@inkeep/agents-core Patch
@inkeep/agents-mcp Patch
@inkeep/agents-sdk Patch
@inkeep/agents-work-apps Patch
@inkeep/ai-sdk-provider Patch
@inkeep/create-agents Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review Summary

(8) Total Issues | Risk: High

🔴❗ Critical (1) ❗🔴

🔴 1) ArtifactService.ts:654 Error object created but never thrown

Issue: An error object is constructed with new Error(...) but is never thrown. The code immediately logs and returns, allowing artifacts to be saved with missing required fields.

Why: This completely defeats the validation logic. Artifacts missing required fields will be saved to the database and potentially cause downstream failures when consumers expect those fields to exist. This is a silent data integrity bug that will be very hard to diagnose in production.

Fix: Add throw before the new Error(...):

throw new Error(
  `Cannot save artifact: Missing required fields [${summaryValidation.missingRequired.join(', ')}] ` +
  ...
);

Refs:

Inline Comments:

  • 🔴 Critical: ArtifactService.ts:660 Error created but never thrown — validation is ineffective

🟠⚠️ Major (4) 🟠⚠️

🟠 1) function-tools.mdx Documentation missing sentinel reference syntax

Issue: The new tool chaining feature allows agents to pass tool results directly as arguments using sentinel references ({ "$tool": "call_id" } or { "$artifact": "id", "$tool": "call_id" }), but this syntax is not documented in the function tools documentation.

Why: Developers won't know this feature exists or how to use it. The PR description mentions this as a key feature ("chain tool calls by piping the raw output of one tool directly into the next"), but there's no documentation for SDK users to discover or learn it.

Fix: Add a new section to function-tools.mdx documenting:

  1. The sentinel reference syntax for tool chaining
  2. When to use $tool vs $artifact + $tool patterns
  3. How primitive return types are handled
  4. Example showing a pipeline where one tool's output feeds into another

Refs:

🟠 2) ArtifactService.test.ts Missing test coverage for cache key fix

Issue: The PR description mentions fixing an "in-session artifact cache key mismatch that caused getArtifactFull to return the wrong data structure for same-turn artifacts." This is a critical bugfix, but there's no dedicated test case covering the cache key change from data to full.

Why: Without a regression test, this bug could easily be reintroduced. The cache key naming (summary/full vs summary/data) is subtle and the consequences are severe (wrong data returned to tools).

Fix: Add a test in ArtifactService.test.ts that:

  1. Saves an artifact in the same turn
  2. Retrieves it via getArtifactFull
  3. Verifies the correct data structure is returned (not the summary)

Refs:

🟠 3) ArtifactService.test.ts Missing tests for getArtifactFull fallback chain

Issue: The getArtifactFull method has a complex fallback chain (in-memory cache → database with type schema validation → raw JSON fallback), but there are no tests covering these paths or the new getToolResultRaw method.

Why: These are the core methods enabling tool chaining. Without test coverage, regressions in the fallback logic could silently break tool-to-tool data passing.

Fix: Add tests covering:

  1. getArtifactFull cache hit path
  2. getArtifactFull database fallback with schema validation
  3. getArtifactFull raw JSON fallback
  4. getToolResultRaw returning stored tool results
  5. getToolResultRaw returning undefined for missing results

Refs:

🟠 4) agents-cli Missing changeset for major CLI refactoring

Issue: The PR includes a significant refactoring from pull-v3 to pull-v4 in the CLI (moving from LLM-based merging to ts-morph AST-based generation), but there's no changeset for agents-cli. The only changeset is deep-mugs-start.md which covers agents-api with a vague message.

Why: Users of agents-cli won't see any changelog entry for this major change. The pull command behavior may change significantly with the AST-based approach.

Fix: Create a changeset for agents-cli:

pnpm bump minor --pkg agents-cli "Refactor pull command to use ts-morph for deterministic AST-based code generation instead of LLM-based merging"

Refs:

Inline Comments:

  • 🟠 Major: deep-mugs-start.md:5 Vague changeset — should describe sentinel references and cache fix

🟡 Minor (3) 🟡

🟡 1) IncrementalStreamParser.ts:80 Empty catch block hides initialization failures

Issue: When getting the shared ArtifactService, errors are silently swallowed with an empty catch block.

Why: If getArtifactService throws (e.g., due to misconfiguration), the parser will silently fall back to creating a new instance, potentially causing inconsistent behavior that's hard to debug.

Fix: Log the error at debug/warn level before falling back.

Refs:

Inline Comments:

  • 🟡 Minor: IncrementalStreamParser.ts:80 Empty catch block — add debug logging
  • 🟡 Minor: ArtifactParser.ts:285 Empty catch block — add debug logging for JSON parse failures

🟡 2) visual-builder/tools/function-tools.mdx Missing cross-reference to tool chaining

Issue: The Visual Builder function tools documentation doesn't mention tool chaining capabilities.

Why: Users configuring tools via the UI won't discover that their tools can be chained together.

Fix: Add a brief section or note referencing the TypeScript SDK documentation for tool chaining patterns.

Refs:

🟡 3) ArtifactService.ts Primitive wrapper asymmetry

Issue: The wrapPrimitiveResult method wraps strings/booleans with { text: ..., _toolCallId } but numbers with { value: ..., _toolCallId }. This inconsistency could confuse consumers.

Why: Downstream code must handle both text and value properties depending on the primitive type.

Fix: Consider using a consistent property name (e.g., value for all primitives) or document the asymmetry clearly.

Refs:

💭 Consider (3) 💭

💭 1) ArtifactParser.ts Explicit type for sentinel references

Issue: The sentinel reference patterns ({ $artifact, $tool } and { $tool }) are checked via runtime property access but have no TypeScript type definition.

Fix: Consider adding a discriminated union type for sentinel references to improve type safety and documentation.

💭 2) scope-helpers.ts Database query timeout consideration

Issue: The scope helpers construct complex SQL queries that could be slow on large datasets.

Fix: Consider adding query timeouts or pagination for production safety.

💭 3) PromptConfig.ts Schema display consolidation

Issue: The system prompt constructs artifact schemas inline with template literals spanning many lines.

Fix: Consider extracting schema formatting to a helper function for maintainability.


💡 APPROVE WITH SUGGESTIONS

Summary: This PR introduces valuable tool chaining capabilities via sentinel references ({ "$tool": "call_id" } and { "$artifact": "id", "$tool": "call_id" }), along with a CLI refactoring to use ts-morph. However, there's a critical bug where validation errors are created but never thrown (ArtifactService.ts:654), which must be fixed before merge. Additionally, the new tool chaining feature needs documentation, and the cache key bugfix needs regression tests to prevent reintroduction. The CLI changeset is also missing.

Discarded (9)
Location Issue Reason Discarded
PromptConfig.ts Missing artifact lifecycle docs Already documented in artifact-components.mdx
scope-helpers.ts Schema changes without migration No schema changes detected, just helper refactoring
pull-v4/ Missing integration tests Has comprehensive unit test coverage via snapshots
ArtifactParser.ts Silent fallback on unresolved refs Intentional design — fallback allows graceful degradation
streaming.ts Missing timeout on stream operations Out of scope for this PR
slack/blocks Citation block accessibility Out of scope, pre-existing pattern
github/mcp Branch protection bypass risk Tool properly checks permissions, no bypass
nango.ts Credential exposure risk Uses existing secure patterns
utility.ts Type changes could break consumers Changes are additive/internal
Reviewers (12)
Reviewer Returned Main Findings Consider While You're Here Inline Comments Pending Recs Discarded
pr-review-errors 8 1 0 0 2 0 5
pr-review-devops 8 1 0 0 1 0 6
pr-review-sre 7 0 1 0 0 0 6
pr-review-consistency 7 1 0 0 0 0 6
pr-review-llm 6 0 1 0 0 0 5
pr-review-tests 5 2 0 0 0 0 3
pr-review-docs 5 2 0 0 0 0 3
pr-review-product 5 0 0 0 0 0 5
pr-review-breaking-changes 4 0 0 0 0 0 4
pr-review-types 4 0 1 0 0 0 3
pr-review-architecture 4 0 0 0 0 0 4
pr-review-standards 0 0 0 0 0 0 0
Total 63 7 3 0 3 0 50

Comment thread .changeset/deep-mugs-start.md Outdated
"@inkeep/agents-api": patch
---

Updated artifact parsing
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 MAJOR: Changeset message too vague for significant feature

Issue: "Updated artifact parsing" doesn't communicate the user-facing impact. This PR introduces tool result chaining ({ "$tool": "call_id" } syntax), fixes artifact cache key issues, and improves prompting clarity — none of which is communicated.

Why: Per CLAUDE.md, changesets should "be specific about what changed and why it matters to consumers." Developers reading changelogs need to understand the new capabilities and behavior changes.

Fix:

Suggested change
Updated artifact parsing
Add tool result chaining to pipe raw tool outputs directly into subsequent tool calls. Improve artifact prompting to clarify which fields are captured, displayed, and passed to tools.

Refs:

@github-actions github-actions Bot deleted a comment from claude Bot Feb 24, 2026
Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review Summary

(0) New Issues | Risk: Low (delta-only review)

This is a delta review — scoped to the 4 files changed since the prior automated review.

✅ Resolved in This Delta

The Critical and Major issues from the prior review have been addressed:

Prior Issue Resolution
🔴 Critical: Error object created but never thrown (ArtifactService.ts:654) ✅ Dead code removed — the orphaned new Error(...) statement that was ineffective has been deleted
🟠 Major: Vague changeset message (petite-ends-trade.md) ✅ Changeset now properly describes the feature: artifact/tool result passing, sentinel reference resolution, and primitive chaining

📝 Delta Changes Reviewed

File Change Assessment
petite-ends-trade.md Expanded changeset description ✅ Good — clear user-facing description
ArtifactService.ts Removed 8 lines of dead error code ✅ Good — addresses Critical bug
PromptConfig.artifactSchema.test.ts Updated test assertions (PREVIEWDISPLAYED to user, FULLPASSED to tools) ✅ Good — tests match new schema labels
functionToolApprovals.test.ts Relaxed assertion to expect.objectContaining({ ok: true }) ✅ Good — accommodates additional return properties

🕐 Pending Recommendations (4)

These issues were raised in the prior review and remain unaddressed:


💡 APPROVE WITH SUGGESTIONS

Summary: The delta changes directly address the Critical issue (dead error code) and improve the changeset messaging. The remaining Major items from the prior review (documentation and test coverage gaps) are still outstanding but do not block the core functionality. Nice work fixing the validation bug! 🎉

Reviewers (0)
Reviewer Returned Main Findings Consider While You're Here Inline Comments Pending Recs Discarded
Delta review — no subagents dispatched

Note: Delta was too small (4 files, 28 lines) to warrant subagent dispatch. Review performed directly by orchestrator.

@github-actions github-actions Bot deleted a comment from claude Bot Feb 24, 2026
Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review Summary

(1) Total Issues | Risk: High

This is a delta review scoped to 3 files changed since the last automated review (commit 598ab96652af).

🔴❗ Critical (1) ❗🔴

Inline Comments:

  • 🔴 Critical: PromptConfig.ts:753 Template literal corrupted by formatter — LLM system prompt will be gibberish

📝 Delta Changes Assessed

File Change Assessment
PromptConfig.ts getToolChainingGuidance() method corrupted 🔴 CRITICAL — See inline comment
conversations.artifact-replacement.test.ts New test file (140 lines) ✅ Good — well-structured tests for artifact replacement in conversation history
ArtifactParser.ts Import reordering ✅ Trivial — formatter change, no functional impact

Root Cause Analysis

The corruption in PromptConfig.ts appears to be the result of an aggressive auto-formatter (Biome) that parsed the template literal content as JavaScript code and "fixed" it. The original prose:

TOOL RESULT CHAINING:
Any tool argument can reference the raw output...

Was transformed into:

TOOL;
RESULT;
CHAINING: Any;
tool;
argument;
can;
reference;

This happens when formatters don't recognize prose content inside template literals and attempt to apply JS/TS rules to it.

🗣️ Open Discussion

The validation plan discussion for tool result chaining remains active. This is a follow-up feature consideration, not a blocking issue for this PR.


🚫 REQUEST CHANGES

Summary: The delta introduces a critical regression — the getToolChainingGuidance() method in PromptConfig.ts has been corrupted by a formatter, rendering the tool chaining system prompt unintelligible. This must be fixed before merge. The new test file looks good, and the import reordering is trivial.

To fix: Revert PromptConfig.ts to the state at 598ab96652af or manually restore the template literal. Consider adding a biome ignore comment to protect prose-heavy template literals from formatter interference.

Discarded (0)

No findings were discarded.

Reviewers (0)
Reviewer Returned Main Findings Consider While You're Here Inline Comments Pending Recs Discarded
Delta review — no subagents dispatched

Note: Delta was small (3 files, ~160 net lines in the critical file) and the issue was immediately apparent. Review performed directly by orchestrator.

Comment thread agents-api/src/domains/run/agents/versions/v1/PromptConfig.ts Outdated
@github-actions github-actions Bot deleted a comment from claude Bot Feb 27, 2026
Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review Summary

(0) New Issues in Delta | Risk: High (prior Critical unresolved)

This is a delta review scoped to the 1 file changed since the last automated review (commit c1b6ccecb398).

📝 Delta Changes Assessed

File Change Assessment
ArtifactService.test.ts 12 lines: result!.result?. ✅ Good — safer optional chaining in test assertions

The delta consists entirely of test style improvements — changing non-null assertions (!) to optional chaining (?.) in test expectations. This is a minor code quality improvement that aligns with TypeScript best practices for handling potentially null values.

🕐 Pending Recommendations (1)

🔴 PromptConfig.ts:753 CRITICAL: Template literal corrupted by formatter — The getToolChainingGuidance() method remains corrupted from the prior review. The LLM system prompt will contain gibberish instead of tool chaining instructions. This is blocking.

🗣️ Open Discussion

The validation plan discussion for tool result chaining remains active — this is a follow-up feature consideration, not blocking for this PR.


🚫 REQUEST CHANGES

Summary: The delta changes (test assertion style) are fine. However, the CRITICAL issue from the prior review remains unresolved — the getToolChainingGuidance() method in PromptConfig.ts is still corrupted by formatter, which will break the tool chaining feature entirely. This must be fixed before merge.

To fix: Revert PromptConfig.ts:752-944 to the working state at 598ab96652af or manually restore the template literal prose.

Discarded (0)

No findings were discarded.

Reviewers (0)
Reviewer Returned Main Findings Consider While You're Here Inline Comments Pending Recs Discarded
Delta review — no subagents dispatched

Note: Delta was trivial (1 file, 12 lines of test assertion style changes). Review performed directly by orchestrator.

@github-actions github-actions Bot deleted a comment from claude Bot Feb 27, 2026
Copy link
Copy Markdown
Collaborator

@amikofalvy amikofalvy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pullfrog, address PromptConfig.ts formatting issue

@pullfrog
Copy link
Copy Markdown
Contributor

pullfrog Bot commented Feb 27, 2026

Restored the corrupted getToolChainingGuidance() template literal in PromptConfig.ts. The formatter had mangled the prose content into nonsensical JavaScript statements (TOOL; RESULT; CHAINING: Any; tool; argument; etc.). Replaced with the correct version from commit 598ab966 — 1 file changed, 21 insertions, 160 deletions. Pushed as 8dbb79fd.

Pullfrog  | View workflow run | Using Claude Code | Triggered by Pullfrogpullfrog.com𝕏

@vercel vercel Bot temporarily deployed to Preview – agents-docs February 27, 2026 22:38 Inactive
tim-inkeep and others added 5 commits February 27, 2026 17:39
…fore compression (#2427)

* identified compressor bugs and fixed them

* updated changeset

* updated agents api

* updated

* fix: use summary-only fallback for rehydrated artifact data

* revert: restore full data fallback for rehydrated artifact summaryData

* fix: add span attribute on artifact lookup failure

* style: auto-format with biome

* fix(agents-api): improve artifact lookup failure observability

Add toolCallCount to BaseCompressor warn log for artifact batch lookup failures.
Add span attribute artifact_lookup.failed in conversations.ts for tracing.

* removed bad code

---------

Co-authored-by: pullfrog[bot] <226033991+pullfrog[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review Summary

(0) Total Issues | Risk: Low


Delta Review Scope

This is a delta review of 9 commits since the previous automated review (7e149317b627). The prior Critical issue (corrupted template literal in PromptConfig.ts) was addressed by pullfrog in an earlier commit.


Delta Changes Analyzed

1. Compression Infrastructure Refactoring (BaseCompressor.ts, ConversationCompressor.ts)

  • Type rename ArtifactInfoCompressedArtifactInfo with new summaryData field
  • New findExistingArtifacts() returns richer metadata (isOversized, toolArgs, toolName, summaryData)
  • toolCallInputMap for tracking tool call inputs across message processing
  • Constructor options pattern for ConversationCompressor with priorSummary support
  • Comment removal improves signal-to-noise ratio without losing semantic clarity

2. Message Formatting Abstraction (distill-conversation-tool.ts, distill-conversation-history-tool.ts)

  • Extracted inline formatMessages logic into a messageFormatter: (maxChars?: number) => string callback
  • Enables cleaner separation between message formatting and LLM distillation
  • Added _fallback: boolean field to summary schemas to indicate when fallback summaries are used

3. Pre-Compression Artifact Replacement (conversations.ts)

  • New getConversationHistoryWithCompression() replaces oversized tool results with compact artifact refs before compression
  • Batched getLedgerArtifacts() lookup (solves N+1 query pattern)
  • Graceful degradation on artifact lookup failure with warning log

4. Test Coverage

  • New conversations.artifact-replacement.test.ts covers artifact replacement scenarios
  • Tests verify: batched lookups, compact ref format, unchanged behavior when no artifact exists
  • Agent.test.ts updated to include baseModel in test fixtures

Assessment

The delta changes are well-structured refactoring that improves:

Type safety — Richer artifact metadata enables better downstream handling
Separation of concernsmessageFormatter callback cleanly decouples formatting from distillation
Error visibility_fallback flag makes it explicit when fallback summaries are used
Performance — Batched artifact lookups prevent N+1 queries
Resilience — Graceful degradation on artifact lookup failures

No new issues introduced in this delta. The prior Critical issue is confirmed resolved.


✅ APPROVE

Summary: Clean refactoring delta that improves type safety, separation of concerns, and error visibility. The prior Critical issue (corrupted template literal) was resolved in an earlier commit. The new artifact replacement feature has appropriate test coverage and follows existing patterns.


Discarded (0)

No findings discarded — all delta changes are clean.

Reviewers (1)
Reviewer Returned Main Findings Consider While You're Here Inline Comments Pending Recs Discarded
orchestrator-direct 0 0 0 0 0 0 0
Total 0 0 0 0 0 0 0

Note: Sub-agent dispatch failed due to permission errors; review performed directly by orchestrator. Delta scope was clean refactoring with no issues identified.

@github-actions github-actions Bot deleted a comment from claude Bot Feb 27, 2026
@tim-inkeep
Copy link
Copy Markdown
Contributor Author

@pullfrog, please re-review the WHOLE PR and tell me if all concerns have been alleviated.

Copy link
Copy Markdown
Contributor

@pullfrog pullfrog Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Full PR Re-Review Summary

(0) Critical/Major Issues | (2) Minor | (3) Consider | Risk: Low

This is a full re-review of the entire PR (17 files, ~1338 lines) as requested. All prior concerns have been verified.

✅ All Prior Issues Resolved

Prior Issue Status Details
🔴 Error object created but never thrown (ArtifactService.ts:654) ✅ Resolved Dead new Error(...) removed; code now logs and returns early
🔴 getToolChainingGuidance() template literal corrupted by formatter ✅ Resolved Contains well-structured prose with clear LLM instructions, examples, and disambiguation
🟠 Vague changeset message ✅ Resolved petite-ends-trade.md clearly describes artifact/tool result passing and primitive chaining
🟠 Docs missing sentinel reference syntax ✅ Resolved "Tool Output Pipelines" section added to function-tools.mdx (both SDK and visual builder)
🟠 Missing cache key regression test (data → full) ✅ Resolved 2 tests verify parts[0].data.full structure and round-trip correctness
🟠 Missing tests for getArtifactFull fallback chain ✅ Resolved 5 tests cover cache hit, DB fallback, taskId fallback, artifactMap, and null
🟡 CLI changeset N/A CLI changes were from base branch merge, not this PR

Implementation Assessment

The core feature is solid:

  • resolveArgs — Properly handles both {$artifact, $tool} (artifact passing) and {$tool} (ephemeral chaining) sentinels. Throws ToolChainResolutionError on failure instead of silently returning unresolved refs. Recursively resolves nested sentinel refs in larger arg objects.
  • getToolResultRaw — Correctly unwraps MCP text/image content arrays, AI SDK text format, and passes through raw results. Blocks failed tool results from propagating.
  • enhanceToolResultWithStructureHints — Wraps primitives with _toolCallId for LLM visibility, injects _toolCallId into object results even without artifact components.
  • Constants extractionARTIFACT_TAG, SENTINEL_KEY, ARTIFACT_TOOL constants in artifact-syntax.ts eliminate ~40 hardcoded string occurrences.
  • Test coverage — 32+ tests covering all new code paths across ArtifactParser, ArtifactService, and conversation compression.
  • Documentation — "Tool Output Pipelines" sections added across TypeScript SDK and Visual Builder docs with clear examples and cross-references.

🟡 Minor (2)

See inline comments.

💭 Consider (3)

  1. getToolResultRaw only inspects content[0] — Multi-item MCP content arrays will drop items after the first. Acceptable for most tools but worth noting.
  2. fullShape serialized 3x in schema description — Token cost optimization opportunity by printing once and back-referencing.
  3. XML template files still use hardcoded sentinel strings — Can't use TS constants in raw XML; consider a comment in artifact-syntax.ts noting template file dependencies.

✅ APPROVE

Summary: All Critical and Major issues from prior reviews are resolved. The implementation is well-structured with proper error handling (throw on unresolvable refs), comprehensive test coverage across all new code paths, and clear documentation. The remaining suggestions are non-blocking improvements. Great feature — tool chaining is clean and well-tested.

Pullfrog  | View workflow runpullfrog.com𝕏

Comment on lines +2628 to +2636
// Wrap primitive results (string, number, boolean, null) with _toolCallId so the LLM can
// reference them for tool chaining. The AI SDK serializes plain objects as JSON for the
// Anthropic API, so the LLM sees the full object including _toolCallId — unlike bare
// primitives which have no place to embed it. The raw value is stored separately in
// ToolSessionManager, so downstream tools still receive the bare primitive via {$tool}.
// Covers all primitives including falsy ones (false, 0, "", null).
if (typeof result !== 'object' || result === null) {
if (toolCallId) {
return { [typeof result === 'string' ? 'text' : 'value']: result, _toolCallId: toolCallId };
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Minor: No tests for primitive result wrapping. The branching logic (typeof result === 'string' ? 'text' : 'value') determines what the LLM sees for tool chaining but has no unit tests. Since enhanceToolResultWithStructureHints is private, consider extracting the wrapping logic to a testable utility or adding integration tests that verify the shape of tool results. Easy to regress — a follow-up item, not blocking.

Comment on lines +669 to +680
if (artifactParser && toolDefinition.parameters?.safeParse) {
const resolvedChanged =
JSON.stringify(parsedArgsForResolution) !== JSON.stringify(resolvedArgs);
if (resolvedChanged) {
const validation = toolDefinition.parameters.safeParse(resolvedArgs);
if (!validation.success) {
throw new Error(
`Resolved tool args failed schema validation for '${toolName}': ${validation.error.message}`
);
}
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Minor: No tests for post-resolution schema validation. This validation prevents runtime type mismatches when resolved $artifact/$tool data doesn't match the tool's Zod schema. Good defensive addition — consider adding tests for the happy path, validation failure (verify error message), and the skip-when-unchanged optimization as a follow-up.

@tim-inkeep tim-inkeep merged commit 00c21ec into main Feb 27, 2026
19 checks passed
@tim-inkeep tim-inkeep deleted the feature/tool-result-chaining branch February 27, 2026 23:21
@github-actions
Copy link
Copy Markdown
Contributor

🔎💬 Inkeep AI search and chat service is syncing content for source 'Inkeep Agent Framework Docs'

@itoqa
Copy link
Copy Markdown

itoqa Bot commented Feb 28, 2026

Ito Test Report ❌

19 test cases ran. 16 passed, 3 failed.

This test run verified the Tool Result Passing as Arguments (Tool Chaining) feature from PR #2304. The core ephemeral $tool sentinel chaining works correctly, allowing agents to pipe raw tool outputs directly into subsequent tools. However, three issues were identified: (1) the combined $artifact + $tool sentinel pattern fails to resolve correctly when the sentinel is passed as a stringified JSON value rather than a proper object, (2) the manage API does not expose compact artifact references in conversation history responses (the replacement logic is only applied internally during LLM context preparation), and (3) schema validation after argument resolution is dead code for function tools because the validation checks toolDefinition.parameters?.safeParse but AI SDK stores schemas under inputSchema, not parameters.

✅ Passed (16)
Test Case Summary Timestamp Screenshot
ROUTE-1 Verified tool chaining via $tool sentinel. Agent correctly used {$tool: 'tool_call_id'} to pipe fetch-page output to extract-text. 21:20 ROUTE-1_21-20.png
ROUTE-4 All 6 documentation pages verified: Tool Output Pipelines sections render correctly with code examples and cross-reference links. 57:20 ROUTE-4_57-20.png
ROUTE-5 System prompt includes complete TOOL RESULT CHAINING section with $tool syntax, correct/incorrect examples, and primitive resolution guidance. 1:03:04 ROUTE-5_1-03-04.png
ROUTE-6 Artifact type schema labels correctly updated to "DISPLAYED to user" and "PASSED to tools" (replacing old PREVIEW/FULL labels). 1:03:49 ROUTE-6_1-03-49.png
EDGE-1 Failed tool call chaining correctly throws ToolChainResolutionError with message 'Tool result not found or failed'. 1:19:55 EDGE-1_1-19-55.png
EDGE-2 Null artifact resolution correctly throws ToolChainResolutionError with both artifact ID and tool call ID in error message. 1:21:28 EDGE-2_1-21-28.png
EDGE-4 String primitive chaining verified: fetch-page HTML resolved via $tool sentinel and passed as raw string to extract-text. 1:40:20 EDGE-4_1-40-20.png
EDGE-5 MCP multi-content result correctly extracts only first content item text (43 chars) when chained via $tool sentinel. 2:35:47 EDGE-5_2-35-47.png
EDGE-6 Nested sentinel resolution confirmed: $tool sentinels within object properties resolve correctly with recursive depth support. 1:43:11 EDGE-6_1-43-11.png
EDGE-7 Mixed sentinel resolution verified: $artifact+$tool uses getArtifactFull, $tool-only uses getToolResultRaw, recursive resolution handles both. 1:45:51 EDGE-7_1-45-51.png
EDGE-8 Conversation without artifacts correctly retains original tool-result content with no compact artifact references. 54:22 EDGE-8_54-22.png
ADV-1 Cross-session isolation confirmed: different sessionIds prevent access to other sessions' tool results. 2:00:29 ADV-1_2-00-29.png
ADV-2 Expired/destroyed session tool results correctly throw ToolChainResolutionError; TTL configured at 300000ms with 60000ms cleanup. 1:58:14 ADV-2_1-58-14.png
ADV-3 Sentinel injection via parseEmbeddedJson fails safely: string is hydrated but resolution throws ToolChainResolutionError for non-existent IDs. 2:00:21 ADV-3_2-00-21.png
ADV-4 Large tool results handled gracefully: 500KB succeeds, 5MB returns error 'Output size exceeded limit of 1048576 bytes' with no crashes. 2:11:26 ADV-4_2-11-26.png
ADV-5 Concurrent tool chaining: 3 simultaneous requests all succeeded with unique session IDs, correct results, zero errors. 2:12:48 ADV-5_2-12-48.png
❌ Failed (3)
Test Case Summary Timestamp Screenshot
ROUTE-2 $artifact + $tool sentinel resolution fails when sentinel is passed as stringified JSON rather than proper object. 31:56 ROUTE-2_31-56.png
ROUTE-3 Manage API does not return compact artifact references in conversation history; replacement only applied internally for LLM context. 52:19 ROUTE-3_52-19.png
EDGE-3 Schema validation after resolution is dead code for function tools; checks parameters?.safeParse but schema is in inputSchema. 1:31:48 EDGE-3_1-31-48.png
Agent chat with artifact passing via $artifact + $tool sentinel – Failed
  • Where: Chat interface / tool argument resolution during agent execution

  • Steps to reproduce:

    1. Configure an agent with function tools that support artifact creation (e.g., fetch-page, extract-text)
    2. Send a message that causes the agent to: call fetch-page, create an artifact from the result, then pass that artifact to extract-text using {$artifact: 'artifact-id', $tool: 'tool-call-id'} sentinel
    3. Observe the tool execution and error handling
  • What failed: The agent correctly generated the $artifact + $tool sentinel pattern in tool arguments, but resolution failed with Error status. The sentinel was passed as a stringified JSON value (e.g., "{\"$artifact\": \"example-domain-report\", \"$tool\": \"toolu_...\"}") rather than a proper parsed object, causing the resolver to fail. The agent recovered using a $tool-only sentinel fallback.

  • Code analysis: Examined ArtifactParser.ts resolveArgs method (lines 230-284) and Agent.ts tool execution wrapper (lines 660-682). The resolution logic at ArtifactParser.ts:232-251 correctly checks for sentinel keys (SENTINEL_KEY.ARTIFACT and SENTINEL_KEY.TOOL) but requires them to be object properties, not a stringified JSON string. The parseEmbeddedJson call at Agent.ts:664 should parse embedded JSON strings, but may not be handling all cases where the LLM outputs the sentinel as a string.

  • Relevant code:

    agents-api/src/domains/run/services/ArtifactParser.ts (lines 230-251)

    async resolveArgs(args: any): Promise<any> {
      if (args !== null && typeof args === 'object' && !Array.isArray(args)) {
        if (
          typeof args[SENTINEL_KEY.ARTIFACT] === 'string' &&
          typeof args[SENTINEL_KEY.TOOL] === 'string'
        ) {
          const fullData = await this.artifactService.getArtifactFull(
            args[SENTINEL_KEY.ARTIFACT],
            args[SENTINEL_KEY.TOOL]
          );
          if (fullData?.data) {
            logger.debug(
              { artifactId: args[SENTINEL_KEY.ARTIFACT], toolCallId: args[SENTINEL_KEY.TOOL] },
              'Resolved artifact ref in tool arg'
            );
            return fullData.data;
          }
          throw new ToolChainResolutionError(
            args[SENTINEL_KEY.TOOL],
            `Artifact '${args[SENTINEL_KEY.ARTIFACT]}' from tool call '${args[SENTINEL_KEY.TOOL]}' could not be resolved`
          );
        }

    agents-api/src/domains/run/agents/Agent.ts (lines 664-667)

    const parsedArgsForResolution = artifactParser ? parseEmbeddedJson(args) : args;
    const resolvedArgs = artifactParser
      ? await artifactParser.resolveArgs(parsedArgsForResolution)
      : args;
  • Why this is likely a bug: The $artifact + $tool sentinel pattern is a documented feature of this PR, but when the LLM outputs the sentinel as a stringified JSON value in a string property (e.g., {html: '{"$artifact":"...","$tool":"..."}'}), the parseEmbeddedJson function does not recursively parse nested string properties, leaving the sentinel as a string instead of an object. The resolveArgs function then cannot match the sentinel pattern.

  • Introduced by this PR: Yes – this PR introduced the $artifact + $tool sentinel resolution feature in ArtifactParser.ts and the parseEmbeddedJson integration in Agent.ts.

  • Timestamp: 31:56

Conversation history shows compact artifact references – Failed
  • Where: Manage API endpoint for conversation history retrieval

  • Steps to reproduce:

    1. Create a conversation that includes tool calls resulting in artifact creation
    2. Send a follow-up message to ensure the conversation history is populated
    3. Call the manage API to retrieve conversation history: GET /manage/api/conversations/{id}
    4. Inspect the returned tool-result messages
  • What failed: The manage API endpoint returns raw tool output in tool-result messages instead of compact [Artifact: "name" (id: artifact-id) | args: ... | summary: ...] references. The compact artifact replacement logic exists in getConversationHistoryWithCompression (lines 464-520) and works correctly per unit tests (3/3 pass), but this function is only invoked internally during agent LLM context preparation, not in the manage API response.

  • Code analysis: The getConversationHistoryWithCompression function in conversations.ts (lines 407-520) implements artifact replacement for tool-result messages. It queries getLedgerArtifacts to find matching artifacts by toolCallId, then replaces the message content with a compact reference. However, the manage API routes appear to use a different code path that returns raw conversation data without this transformation.

  • Relevant code:

    agents-api/src/domains/run/data/conversations.ts (lines 464-511)

    // Replace tool-result content with compact artifact references BEFORE compression.
    const toolCallIds = messagesToFormat
      .filter((msg) => msg.messageType === 'tool-result')
      .map((msg) => msg.metadata?.a2a_metadata?.toolCallId)
      .filter((id): id is string => !!id);
    
    if (toolCallIds.length > 0) {
      try {
        const artifacts = await getLedgerArtifacts(runDbClient)({
          scopes: { tenantId, projectId },
          toolCallIds,
        });
        const artifactsByToolCallId = new Map(
          artifacts.filter((a) => a.toolCallId).map((a) => [a.toolCallId as string, a])
        );
        if (artifactsByToolCallId.size > 0) {
          messagesToFormat = messagesToFormat.map((msg) => {
            // ... replacement logic
            return {
              ...msg,
              content: { text: `[${refParts.join(' | ')}]` },
            };
          });
        }
      }
    }
  • Why this is likely a bug: The PR description mentions "conversation history artifact replacement" and the test plan explicitly tests for compact [Artifact: ...] references in the API response. The implementation exists and unit tests pass, but the manage API endpoint does not invoke this code path. This is likely a missing integration between the internal compression function and the external API.

  • Introduced by this PR: Yes – this PR added the artifact replacement logic in getConversationHistoryWithCompression and the test file conversations.artifact-replacement.test.ts, but did not integrate it with the manage API routes.

  • Timestamp: 52:19

Schema validation failure after argument resolution – Failed
  • Where: Tool execution in Agent.ts during argument resolution

  • Steps to reproduce:

    1. Configure an agent with two tools where the first returns an object and the second expects a string (e.g., returns-object returns {key: 'value'}, extract-text expects {html: string})
    2. Send a message that causes the agent to chain the object result to the string-expecting tool via {$tool: 'tool-call-id'} sentinel
    3. Observe whether schema validation catches the type mismatch or the tool executes with invalid args
  • What failed: Schema validation after resolution did NOT catch the type mismatch. When returns-object (returns object) was chained to extract-text (expects html: string), the tool executed with invalid args and failed at runtime with 'html.replace is not a function' instead of being caught by the expected schema validation error 'Resolved tool args failed schema validation for extract-text'. The validation code at Agent.ts:669 checks toolDefinition.parameters?.safeParse but AI SDK stores the schema under inputSchema, making this validation code dead code for function tools.

  • Code analysis: The validation logic at Agent.ts:669-679 checks if toolDefinition.parameters?.safeParse exists, but function tools created via AI SDK's tool() helper store their Zod schema under inputSchema, not parameters. The parameters property is undefined for function tools, so the schema validation block is never executed.

  • Relevant code:

    agents-api/src/domains/run/agents/Agent.ts (lines 669-679)

    if (artifactParser && toolDefinition.parameters?.safeParse) {
      const resolvedChanged =
        JSON.stringify(parsedArgsForResolution) !== JSON.stringify(resolvedArgs);
      if (resolvedChanged) {
        const validation = toolDefinition.parameters.safeParse(resolvedArgs);
        if (!validation.success) {
          throw new Error(
            `Resolved tool args failed schema validation for '${toolName}': ${validation.error.message}`
          );
        }
      }
    }

    agents-api/src/domains/run/agents/Agent.ts (line 2023)

    inputSchema: (tool as any).inputSchema || (tool as any).parameters || {},
  • Why this is likely a bug: The PR introduced this validation code to catch schema mismatches after sentinel resolution, but it checks the wrong property. AI SDK function tools use inputSchema (which may be a Zod schema or JSON Schema), not parameters. The check should be toolDefinition.inputSchema?.safeParse or should convert the inputSchema to a Zod schema for validation.

  • Introduced by this PR: Yes – the validation code at lines 669-679 was added in this PR as part of the tool chaining feature (49 lines added to Agent.ts).

  • Timestamp: 1:31:48

📋 View Recording

Screen Recording

dimaMachina pushed a commit that referenced this pull request Mar 2, 2026
* allowed artifacts as tool arguments

* upadted prompt

* Added Artifact Components as Tool Arguments

* fixed linter

* added test

* updated

* updated

* style: auto-format with biome

* testing some tool result chaining

* testing some tool result chaining

* updated artifacts and prompting

* updated agents-docs

* updated changeset

* fixed tests

* docs: clarify reconstructMessageText JSDoc

* address review: remove any types and excess comment

* refactor: rename typeSchemaMap to artifactSchemasByType

* refactor: improve collectProjectArtifactComponents clarity

* style: auto-format with biome

* fix: correct type cast in collectProjectArtifactComponents

* refactor: extract artifact syntax constants and improve sentinel check clarity

* test: add regression tests for cache key fix and getArtifactFull fallback chain

* updated tests

* fix: restore getToolChainingGuidance template literal corrupted by formatter

* fix(agents-api): Replace oversized tool results with artifact refs before compression (#2427)

* identified compressor bugs and fixed them

* updated changeset

* updated agents api

* updated

* fix: use summary-only fallback for rehydrated artifact data

* revert: restore full data fallback for rehydrated artifact summaryData

* fix: add span attribute on artifact lookup failure

* style: auto-format with biome

* fix(agents-api): improve artifact lookup failure observability

Add toolCallCount to BaseCompressor warn log for artifact batch lookup failures.
Add span attribute artifact_lookup.failed in conversations.ts for tracing.

* removed bad code

---------

Co-authored-by: pullfrog[bot] <226033991+pullfrog[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* keep changeset

* Add delete button to external agents edit page (#2441)

* updated agents api

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: pullfrog[bot] <226033991+pullfrog[bot]@users.noreply.github.com>
Co-authored-by: sarah <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants