Tool Result Passing as Arguments (Tool Chaining) by tim-inkeep · Pull Request #2304 · inkeep/agents

tim-inkeep · 2026-02-24T17:05:03Z

Agents can pass saved artifacts directly to tools as arguments using a sentinel reference; the system resolves full artifact data (including non-preview fields) before the tool executes — no manual reconstruction needed
Agents can chain tool calls by piping the raw output of one tool directly into the next without creating an artifact for intermediate results; primitive return types (strings, numbers, booleans) are fully supported
Fixed in-session artifact cache key mismatch that caused getArtifactFull to return the wrong data structure for same-turn artifacts
Improved artifact creation prompting to clearly distinguish what artifact:create captures (all fields), what artifact:ref displays (preview only), and what tool argument passing resolves (all fields)

vercel · 2026-02-24T17:05:10Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
agents-api	Ready	Preview, Comment	Feb 27, 2026 10:43pm
agents-docs	Ready	Preview, Comment	Feb 27, 2026 10:43pm
agents-manage-ui	Ready	Preview, Comment	Feb 27, 2026 10:43pm

changeset-bot · 2026-02-24T17:05:11Z

🦋 Changeset detected

Latest commit: 60934c4

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 9 packages

Name	Type
@inkeep/agents-api	Patch
@inkeep/agents-manage-ui	Patch
@inkeep/agents-cli	Patch
@inkeep/agents-core	Patch
@inkeep/agents-mcp	Patch
@inkeep/agents-sdk	Patch
@inkeep/agents-work-apps	Patch
@inkeep/ai-sdk-provider	Patch
@inkeep/create-agents	Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

claude

PR Review Summary

(8) Total Issues | Risk: High

🔴❗ Critical (1) ❗🔴

🔴 1) ArtifactService.ts:654 Error object created but never thrown

Issue: An error object is constructed with new Error(...) but is never thrown. The code immediately logs and returns, allowing artifacts to be saved with missing required fields.

Why: This completely defeats the validation logic. Artifacts missing required fields will be saved to the database and potentially cause downstream failures when consumers expect those fields to exist. This is a silent data integrity bug that will be very hard to diagnose in production.

Fix: Add throw before the new Error(...):

throw new Error(
  `Cannot save artifact: Missing required fields [${summaryValidation.missingRequired.join(', ')}] ` +
  ...
);

Refs:

ArtifactService.ts:654

Inline Comments:

🔴 Critical: ArtifactService.ts:660 Error created but never thrown — validation is ineffective

🟠⚠️ Major (4) 🟠⚠️

🟠 1) function-tools.mdx Documentation missing sentinel reference syntax

Issue: The new tool chaining feature allows agents to pass tool results directly as arguments using sentinel references ({ "$tool": "call_id" } or { "$artifact": "id", "$tool": "call_id" }), but this syntax is not documented in the function tools documentation.

Why: Developers won't know this feature exists or how to use it. The PR description mentions this as a key feature ("chain tool calls by piping the raw output of one tool directly into the next"), but there's no documentation for SDK users to discover or learn it.

Fix: Add a new section to function-tools.mdx documenting:

The sentinel reference syntax for tool chaining
When to use $tool vs $artifact + $tool patterns
How primitive return types are handled
Example showing a pipeline where one tool's output feeds into another

Refs:

🟠 2) ArtifactService.test.ts Missing test coverage for cache key fix

Issue: The PR description mentions fixing an "in-session artifact cache key mismatch that caused getArtifactFull to return the wrong data structure for same-turn artifacts." This is a critical bugfix, but there's no dedicated test case covering the cache key change from data to full.

Why: Without a regression test, this bug could easily be reintroduced. The cache key naming (summary/full vs summary/data) is subtle and the consequences are severe (wrong data returned to tools).

Fix: Add a test in ArtifactService.test.ts that:

Saves an artifact in the same turn
Retrieves it via getArtifactFull
Verifies the correct data structure is returned (not the summary)

Refs:

🟠 3) ArtifactService.test.ts Missing tests for getArtifactFull fallback chain

Issue: The getArtifactFull method has a complex fallback chain (in-memory cache → database with type schema validation → raw JSON fallback), but there are no tests covering these paths or the new getToolResultRaw method.

Why: These are the core methods enabling tool chaining. Without test coverage, regressions in the fallback logic could silently break tool-to-tool data passing.

Fix: Add tests covering:

getArtifactFull cache hit path
getArtifactFull database fallback with schema validation
getArtifactFull raw JSON fallback
getToolResultRaw returning stored tool results
getToolResultRaw returning undefined for missing results

Refs:

🟠 4) agents-cli Missing changeset for major CLI refactoring

Issue: The PR includes a significant refactoring from pull-v3 to pull-v4 in the CLI (moving from LLM-based merging to ts-morph AST-based generation), but there's no changeset for agents-cli. The only changeset is deep-mugs-start.md which covers agents-api with a vague message.

Why: Users of agents-cli won't see any changelog entry for this major change. The pull command behavior may change significantly with the AST-based approach.

Fix: Create a changeset for agents-cli:

pnpm bump minor --pkg agents-cli "Refactor pull command to use ts-morph for deterministic AST-based code generation instead of LLM-based merging"

Refs:

AGENTS.md — changeset guidance

Inline Comments:

🟠 Major: deep-mugs-start.md:5 Vague changeset — should describe sentinel references and cache fix

🟡 Minor (3) 🟡

🟡 1) IncrementalStreamParser.ts:80 Empty catch block hides initialization failures

Issue: When getting the shared ArtifactService, errors are silently swallowed with an empty catch block.

Why: If getArtifactService throws (e.g., due to misconfiguration), the parser will silently fall back to creating a new instance, potentially causing inconsistent behavior that's hard to debug.

Fix: Log the error at debug/warn level before falling back.

Refs:

IncrementalStreamParser.ts:80

Inline Comments:

🟡 Minor: IncrementalStreamParser.ts:80 Empty catch block — add debug logging
🟡 Minor: ArtifactParser.ts:285 Empty catch block — add debug logging for JSON parse failures

🟡 2) visual-builder/tools/function-tools.mdx Missing cross-reference to tool chaining

Issue: The Visual Builder function tools documentation doesn't mention tool chaining capabilities.

Why: Users configuring tools via the UI won't discover that their tools can be chained together.

Fix: Add a brief section or note referencing the TypeScript SDK documentation for tool chaining patterns.

Refs:

visual-builder/tools/function-tools.mdx

🟡 3) ArtifactService.ts Primitive wrapper asymmetry

Issue: The wrapPrimitiveResult method wraps strings/booleans with { text: ..., _toolCallId } but numbers with { value: ..., _toolCallId }. This inconsistency could confuse consumers.

Why: Downstream code must handle both text and value properties depending on the primitive type.

Fix: Consider using a consistent property name (e.g., value for all primitives) or document the asymmetry clearly.

Refs:

ArtifactService.ts:800-815

💭 Consider (3) 💭

💭 1) ArtifactParser.ts Explicit type for sentinel references

Issue: The sentinel reference patterns ({ $artifact, $tool } and { $tool }) are checked via runtime property access but have no TypeScript type definition.

Fix: Consider adding a discriminated union type for sentinel references to improve type safety and documentation.

💭 2) scope-helpers.ts Database query timeout consideration

Issue: The scope helpers construct complex SQL queries that could be slow on large datasets.

Fix: Consider adding query timeouts or pagination for production safety.

💭 3) PromptConfig.ts Schema display consolidation

Issue: The system prompt constructs artifact schemas inline with template literals spanning many lines.

Fix: Consider extracting schema formatting to a helper function for maintainability.

💡 APPROVE WITH SUGGESTIONS

Summary: This PR introduces valuable tool chaining capabilities via sentinel references ({ "$tool": "call_id" } and { "$artifact": "id", "$tool": "call_id" }), along with a CLI refactoring to use ts-morph. However, there's a critical bug where validation errors are created but never thrown (ArtifactService.ts:654), which must be fixed before merge. Additionally, the new tool chaining feature needs documentation, and the cache key bugfix needs regression tests to prevent reintroduction. The CLI changeset is also missing.

Discarded (9)

Location	Issue	Reason Discarded
`PromptConfig.ts`	Missing artifact lifecycle docs	Already documented in artifact-components.mdx
`scope-helpers.ts`	Schema changes without migration	No schema changes detected, just helper refactoring
`pull-v4/`	Missing integration tests	Has comprehensive unit test coverage via snapshots
`ArtifactParser.ts`	Silent fallback on unresolved refs	Intentional design — fallback allows graceful degradation
`streaming.ts`	Missing timeout on stream operations	Out of scope for this PR
`slack/blocks`	Citation block accessibility	Out of scope, pre-existing pattern
`github/mcp`	Branch protection bypass risk	Tool properly checks permissions, no bypass
`nango.ts`	Credential exposure risk	Uses existing secure patterns
`utility.ts`	Type changes could break consumers	Changes are additive/internal

Reviewers (12)

Reviewer	Returned	Main Findings	Consider	Inline Comments	Discarded
`pr-review-errors`	8	1	0	2	5
`pr-review-devops`	8	1	0	1	6
`pr-review-sre`	7	0	1	0	6
`pr-review-consistency`	7	1	0	0	6
`pr-review-llm`	6	0	1	0	5
`pr-review-tests`	5	2	0	0	3
`pr-review-docs`	5	2	0	0	3
`pr-review-product`	5	0	0	0	5
`pr-review-breaking-changes`	4	0	0	0	4
`pr-review-types`	4	0	1	0	3
`pr-review-architecture`	4	0	0	0	4
`pr-review-standards`	0	0	0	0	0
Total	63	7	3	3	50

claude · 2026-02-24T17:14:01Z

+"@inkeep/agents-api": patch
+---
+
+Updated artifact parsing


🟠 MAJOR: Changeset message too vague for significant feature

Issue: "Updated artifact parsing" doesn't communicate the user-facing impact. This PR introduces tool result chaining ({ "$tool": "call_id" } syntax), fixes artifact cache key issues, and improves prompting clarity — none of which is communicated.

Why: Per CLAUDE.md, changesets should "be specific about what changed and why it matters to consumers." Developers reading changelogs need to understand the new capabilities and behavior changes.

Fix:

Suggested change

Updated artifact parsing

Add tool result chaining to pipe raw tool outputs directly into subsequent tool calls. Improve artifact prompting to clarify which fields are captured, displayed, and passed to tools.

Refs:

CLAUDE.md: Changeset guidance

claude

PR Review Summary

(0) New Issues | Risk: Low (delta-only review)

This is a delta review — scoped to the 4 files changed since the prior automated review.

✅ Resolved in This Delta

The Critical and Major issues from the prior review have been addressed:

Prior Issue	Resolution
🔴 Critical: Error object created but never thrown (`ArtifactService.ts:654`)	✅ Dead code removed — the orphaned `new Error(...)` statement that was ineffective has been deleted
🟠 Major: Vague changeset message (`petite-ends-trade.md`)	✅ Changeset now properly describes the feature: artifact/tool result passing, sentinel reference resolution, and primitive chaining

📝 Delta Changes Reviewed

File	Change	Assessment
`petite-ends-trade.md`	Expanded changeset description	✅ Good — clear user-facing description
`ArtifactService.ts`	Removed 8 lines of dead error code	✅ Good — addresses Critical bug
`PromptConfig.artifactSchema.test.ts`	Updated test assertions (`PREVIEW`→`DISPLAYED to user`, `FULL`→`PASSED to tools`)	✅ Good — tests match new schema labels
`functionToolApprovals.test.ts`	Relaxed assertion to `expect.objectContaining({ ok: true })`	✅ Good — accommodates additional return properties

🕐 Pending Recommendations (4)

These issues were raised in the prior review and remain unaddressed:

🟠 function-tools.mdx Documentation missing sentinel reference syntax for tool chaining
🟠 ArtifactService.test.ts Missing test coverage for cache key fix (data → full)
🟠 ArtifactService.test.ts Missing tests for getArtifactFull fallback chain and getToolResultRaw
🟠 agents-cli Missing changeset for CLI pull-v4 refactoring

💡 APPROVE WITH SUGGESTIONS

Summary: The delta changes directly address the Critical issue (dead error code) and improve the changeset messaging. The remaining Major items from the prior review (documentation and test coverage gaps) are still outstanding but do not block the core functionality. Nice work fixing the validation bug! 🎉

Reviewers (0)

Reviewer	Returned	Main Findings	Consider	While You're Here	Inline Comments	Pending Recs	Discarded
Delta review — no subagents dispatched	—	—	—	—	—	—	—

Note: Delta was too small (4 files, 28 lines) to warrant subagent dispatch. Review performed directly by orchestrator.

…t-chaining

claude

PR Review Summary

(1) Total Issues | Risk: High

This is a delta review scoped to 3 files changed since the last automated review (commit 598ab96652af).

🔴❗ Critical (1) ❗🔴

Inline Comments:

🔴 Critical: PromptConfig.ts:753 Template literal corrupted by formatter — LLM system prompt will be gibberish

📝 Delta Changes Assessed

File	Change	Assessment
`PromptConfig.ts`	`getToolChainingGuidance()` method corrupted	🔴 CRITICAL — See inline comment
`conversations.artifact-replacement.test.ts`	New test file (140 lines)	✅ Good — well-structured tests for artifact replacement in conversation history
`ArtifactParser.ts`	Import reordering	✅ Trivial — formatter change, no functional impact

Root Cause Analysis

The corruption in PromptConfig.ts appears to be the result of an aggressive auto-formatter (Biome) that parsed the template literal content as JavaScript code and "fixed" it. The original prose:

TOOL RESULT CHAINING:
Any tool argument can reference the raw output...

Was transformed into:

TOOL;
RESULT;
CHAINING: Any;
tool;
argument;
can;
reference;

This happens when formatters don't recognize prose content inside template literals and attempt to apply JS/TS rules to it.

🗣️ Open Discussion

The validation plan discussion for tool result chaining remains active. This is a follow-up feature consideration, not a blocking issue for this PR.

🚫 REQUEST CHANGES

Summary: The delta introduces a critical regression — the getToolChainingGuidance() method in PromptConfig.ts has been corrupted by a formatter, rendering the tool chaining system prompt unintelligible. This must be fixed before merge. The new test file looks good, and the import reordering is trivial.

To fix: Revert PromptConfig.ts to the state at 598ab96652af or manually restore the template literal. Consider adding a biome ignore comment to protect prose-heavy template literals from formatter interference.

Discarded (0)

No findings were discarded.

Reviewers (0)

Reviewer	Returned	Main Findings	Consider	While You're Here	Inline Comments	Pending Recs	Discarded
Delta review — no subagents dispatched	—	—	—	—	—	—	—

Note: Delta was small (3 files, ~160 net lines in the critical file) and the issue was immediately apparent. Review performed directly by orchestrator.

claude

PR Review Summary

(0) New Issues in Delta | Risk: High (prior Critical unresolved)

This is a delta review scoped to the 1 file changed since the last automated review (commit c1b6ccecb398).

📝 Delta Changes Assessed

File	Change	Assessment
`ArtifactService.test.ts`	12 lines: `result!.` → `result?.`	✅ Good — safer optional chaining in test assertions

The delta consists entirely of test style improvements — changing non-null assertions (!) to optional chaining (?.) in test expectations. This is a minor code quality improvement that aligns with TypeScript best practices for handling potentially null values.

🕐 Pending Recommendations (1)

🔴 PromptConfig.ts:753 CRITICAL: Template literal corrupted by formatter — The getToolChainingGuidance() method remains corrupted from the prior review. The LLM system prompt will contain gibberish instead of tool chaining instructions. This is blocking.

🗣️ Open Discussion

The validation plan discussion for tool result chaining remains active — this is a follow-up feature consideration, not blocking for this PR.

🚫 REQUEST CHANGES

Summary: The delta changes (test assertion style) are fine. However, the CRITICAL issue from the prior review remains unresolved — the getToolChainingGuidance() method in PromptConfig.ts is still corrupted by formatter, which will break the tool chaining feature entirely. This must be fixed before merge.

To fix: Revert PromptConfig.ts:752-944 to the working state at 598ab96652af or manually restore the template literal prose.

Discarded (0)

No findings were discarded.

Reviewers (0)

Reviewer	Returned	Main Findings	Consider	While You're Here	Inline Comments	Pending Recs	Discarded
Delta review — no subagents dispatched	—	—	—	—	—	—	—

Note: Delta was trivial (1 file, 12 lines of test assertion style changes). Review performed directly by orchestrator.

amikofalvy

@pullfrog, address PromptConfig.ts formatting issue

pullfrog · 2026-02-27T22:30:45Z

Restored the corrupted getToolChainingGuidance() template literal in PromptConfig.ts. The formatter had mangled the prose content into nonsensical JavaScript statements (TOOL; RESULT; CHAINING: Any; tool; argument; etc.). Replaced with the correct version from commit 598ab966 — 1 file changed, 21 insertions, 160 deletions. Pushed as 8dbb79fd.

^{｜ View workflow run ｜ Using Claude Code ｜ Triggered by Pullfrog ｜ pullfrog.com ｜ 𝕏}

…rmatter

…fore compression (#2427) * identified compressor bugs and fixed them * updated changeset * updated agents api * updated * fix: use summary-only fallback for rehydrated artifact data * revert: restore full data fallback for rehydrated artifact summaryData * fix: add span attribute on artifact lookup failure * style: auto-format with biome * fix(agents-api): improve artifact lookup failure observability Add toolCallCount to BaseCompressor warn log for artifact batch lookup failures. Add span attribute artifact_lookup.failed in conversations.ts for tracing. * removed bad code --------- Co-authored-by: pullfrog[bot] <226033991+pullfrog[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

claude

PR Review Summary

(0) Total Issues | Risk: Low

Delta Review Scope

This is a delta review of 9 commits since the previous automated review (7e149317b627). The prior Critical issue (corrupted template literal in PromptConfig.ts) was addressed by pullfrog in an earlier commit.

Delta Changes Analyzed

1. Compression Infrastructure Refactoring (BaseCompressor.ts, ConversationCompressor.ts)

Type rename ArtifactInfo → CompressedArtifactInfo with new summaryData field
New findExistingArtifacts() returns richer metadata (isOversized, toolArgs, toolName, summaryData)
toolCallInputMap for tracking tool call inputs across message processing
Constructor options pattern for ConversationCompressor with priorSummary support
Comment removal improves signal-to-noise ratio without losing semantic clarity

2. Message Formatting Abstraction (distill-conversation-tool.ts, distill-conversation-history-tool.ts)

Extracted inline formatMessages logic into a messageFormatter: (maxChars?: number) => string callback
Enables cleaner separation between message formatting and LLM distillation
Added _fallback: boolean field to summary schemas to indicate when fallback summaries are used

3. Pre-Compression Artifact Replacement (conversations.ts)

New getConversationHistoryWithCompression() replaces oversized tool results with compact artifact refs before compression
Batched getLedgerArtifacts() lookup (solves N+1 query pattern)
Graceful degradation on artifact lookup failure with warning log

4. Test Coverage

New conversations.artifact-replacement.test.ts covers artifact replacement scenarios
Tests verify: batched lookups, compact ref format, unchanged behavior when no artifact exists
Agent.test.ts updated to include baseModel in test fixtures

Assessment

The delta changes are well-structured refactoring that improves:

✅ Type safety — Richer artifact metadata enables better downstream handling
✅ Separation of concerns — messageFormatter callback cleanly decouples formatting from distillation
✅ Error visibility — _fallback flag makes it explicit when fallback summaries are used
✅ Performance — Batched artifact lookups prevent N+1 queries
✅ Resilience — Graceful degradation on artifact lookup failures

No new issues introduced in this delta. The prior Critical issue is confirmed resolved.

✅ APPROVE

Summary: Clean refactoring delta that improves type safety, separation of concerns, and error visibility. The prior Critical issue (corrupted template literal) was resolved in an earlier commit. The new artifact replacement feature has appropriate test coverage and follows existing patterns.

Discarded (0)

No findings discarded — all delta changes are clean.

Reviewers (1)

Reviewer	Returned	Main Findings	Consider	While You're Here	Inline Comments	Pending Recs	Discarded
`orchestrator-direct`	0	0	0	0	0	0	0
Total	0	0	0	0	0	0	0

Note: Sub-agent dispatch failed due to permission errors; review performed directly by orchestrator. Delta scope was clean refactoring with no issues identified.

tim-inkeep · 2026-02-27T22:50:22Z

@pullfrog, please re-review the WHOLE PR and tell me if all concerns have been alleviated.

pullfrog

Full PR Re-Review Summary

(0) Critical/Major Issues | (2) Minor | (3) Consider | Risk: Low

This is a full re-review of the entire PR (17 files, ~1338 lines) as requested. All prior concerns have been verified.

✅ All Prior Issues Resolved

Prior Issue	Status	Details
🔴 Error object created but never thrown (`ArtifactService.ts:654`)	✅ Resolved	Dead `new Error(...)` removed; code now logs and returns early
🔴 `getToolChainingGuidance()` template literal corrupted by formatter	✅ Resolved	Contains well-structured prose with clear LLM instructions, examples, and disambiguation
🟠 Vague changeset message	✅ Resolved	`petite-ends-trade.md` clearly describes artifact/tool result passing and primitive chaining
🟠 Docs missing sentinel reference syntax	✅ Resolved	"Tool Output Pipelines" section added to function-tools.mdx (both SDK and visual builder)
🟠 Missing cache key regression test (data → full)	✅ Resolved	2 tests verify `parts[0].data.full` structure and round-trip correctness
🟠 Missing tests for `getArtifactFull` fallback chain	✅ Resolved	5 tests cover cache hit, DB fallback, taskId fallback, artifactMap, and null
🟡 CLI changeset	N/A	CLI changes were from base branch merge, not this PR

Implementation Assessment

The core feature is solid:

resolveArgs — Properly handles both {$artifact, $tool} (artifact passing) and {$tool} (ephemeral chaining) sentinels. Throws ToolChainResolutionError on failure instead of silently returning unresolved refs. Recursively resolves nested sentinel refs in larger arg objects.
getToolResultRaw — Correctly unwraps MCP text/image content arrays, AI SDK text format, and passes through raw results. Blocks failed tool results from propagating.
enhanceToolResultWithStructureHints — Wraps primitives with _toolCallId for LLM visibility, injects _toolCallId into object results even without artifact components.
Constants extraction — ARTIFACT_TAG, SENTINEL_KEY, ARTIFACT_TOOL constants in artifact-syntax.ts eliminate ~40 hardcoded string occurrences.
Test coverage — 32+ tests covering all new code paths across ArtifactParser, ArtifactService, and conversation compression.
Documentation — "Tool Output Pipelines" sections added across TypeScript SDK and Visual Builder docs with clear examples and cross-references.

🟡 Minor (2)

See inline comments.

💭 Consider (3)

getToolResultRaw only inspects content[0] — Multi-item MCP content arrays will drop items after the first. Acceptable for most tools but worth noting.
fullShape serialized 3x in schema description — Token cost optimization opportunity by printing once and back-referencing.
XML template files still use hardcoded sentinel strings — Can't use TS constants in raw XML; consider a comment in artifact-syntax.ts noting template file dependencies.

✅ APPROVE

Summary: All Critical and Major issues from prior reviews are resolved. The implementation is well-structured with proper error handling (throw on unresolvable refs), comprehensive test coverage across all new code paths, and clear documentation. The remaining suggestions are non-blocking improvements. Great feature — tool chaining is clean and well-tested.

^{｜ View workflow run ｜ pullfrog.com ｜ 𝕏}

pullfrog · 2026-02-27T22:54:58Z

+    // Wrap primitive results (string, number, boolean, null) with _toolCallId so the LLM can
+    // reference them for tool chaining. The AI SDK serializes plain objects as JSON for the
+    // Anthropic API, so the LLM sees the full object including _toolCallId — unlike bare
+    // primitives which have no place to embed it. The raw value is stored separately in
+    // ToolSessionManager, so downstream tools still receive the bare primitive via {$tool}.
+    // Covers all primitives including falsy ones (false, 0, "", null).
+    if (typeof result !== 'object' || result === null) {
+      if (toolCallId) {
+        return { [typeof result === 'string' ? 'text' : 'value']: result, _toolCallId: toolCallId };


🟡 Minor: No tests for primitive result wrapping. The branching logic (typeof result === 'string' ? 'text' : 'value') determines what the LLM sees for tool chaining but has no unit tests. Since enhanceToolResultWithStructureHints is private, consider extracting the wrapping logic to a testable utility or adding integration tests that verify the shape of tool results. Easy to regress — a follow-up item, not blocking.

pullfrog · 2026-02-27T22:54:58Z

+          if (artifactParser && toolDefinition.parameters?.safeParse) {
+            const resolvedChanged =
+              JSON.stringify(parsedArgsForResolution) !== JSON.stringify(resolvedArgs);
+            if (resolvedChanged) {
+              const validation = toolDefinition.parameters.safeParse(resolvedArgs);
+              if (!validation.success) {
+                throw new Error(
+                  `Resolved tool args failed schema validation for '${toolName}': ${validation.error.message}`
+                );
+              }
+            }
+          }


🟡 Minor: No tests for post-resolution schema validation. This validation prevents runtime type mismatches when resolved $artifact/$tool data doesn't match the tool's Zod schema. Good defensive addition — consider adding tests for the happy path, validation failure (verify error message), and the skip-when-unchanged optimization as a follow-up.

github-actions · 2026-02-27T23:22:25Z

🔎💬 Inkeep AI search and chat service is syncing content for source 'Inkeep Agent Framework Docs'

itoqa · 2026-02-28T01:58:07Z

Ito Test Report ❌

19 test cases ran. 16 passed, 3 failed.

This test run verified the Tool Result Passing as Arguments (Tool Chaining) feature from PR #2304. The core ephemeral $tool sentinel chaining works correctly, allowing agents to pipe raw tool outputs directly into subsequent tools. However, three issues were identified: (1) the combined $artifact + $tool sentinel pattern fails to resolve correctly when the sentinel is passed as a stringified JSON value rather than a proper object, (2) the manage API does not expose compact artifact references in conversation history responses (the replacement logic is only applied internally during LLM context preparation), and (3) schema validation after argument resolution is dead code for function tools because the validation checks toolDefinition.parameters?.safeParse but AI SDK stores schemas under inputSchema, not parameters.

✅ Passed (16)

Test Case	Summary	Timestamp
ROUTE-1	Verified tool chaining via $tool sentinel. Agent correctly used {$tool: 'tool_call_id'} to pipe fetch-page output to extract-text.	21:20
ROUTE-4	All 6 documentation pages verified: Tool Output Pipelines sections render correctly with code examples and cross-reference links.	57:20
ROUTE-5	System prompt includes complete TOOL RESULT CHAINING section with $tool syntax, correct/incorrect examples, and primitive resolution guidance.	1:03:04
ROUTE-6	Artifact type schema labels correctly updated to "DISPLAYED to user" and "PASSED to tools" (replacing old PREVIEW/FULL labels).	1:03:49
EDGE-1	Failed tool call chaining correctly throws ToolChainResolutionError with message 'Tool result not found or failed'.	1:19:55
EDGE-2	Null artifact resolution correctly throws ToolChainResolutionError with both artifact ID and tool call ID in error message.	1:21:28
EDGE-4	String primitive chaining verified: fetch-page HTML resolved via $tool sentinel and passed as raw string to extract-text.	1:40:20
EDGE-5	MCP multi-content result correctly extracts only first content item text (43 chars) when chained via $tool sentinel.	2:35:47
EDGE-6	Nested sentinel resolution confirmed: $tool sentinels within object properties resolve correctly with recursive depth support.	1:43:11
EDGE-7	Mixed sentinel resolution verified: $artifact+$tool uses getArtifactFull, $tool-only uses getToolResultRaw, recursive resolution handles both.	1:45:51
EDGE-8	Conversation without artifacts correctly retains original tool-result content with no compact artifact references.	54:22
ADV-1	Cross-session isolation confirmed: different sessionIds prevent access to other sessions' tool results.	2:00:29
ADV-2	Expired/destroyed session tool results correctly throw ToolChainResolutionError; TTL configured at 300000ms with 60000ms cleanup.	1:58:14
ADV-3	Sentinel injection via parseEmbeddedJson fails safely: string is hydrated but resolution throws ToolChainResolutionError for non-existent IDs.	2:00:21
ADV-4	Large tool results handled gracefully: 500KB succeeds, 5MB returns error 'Output size exceeded limit of 1048576 bytes' with no crashes.	2:11:26
ADV-5	Concurrent tool chaining: 3 simultaneous requests all succeeded with unique session IDs, correct results, zero errors.	2:12:48

❌ Failed (3)

Test Case	Summary	Timestamp
ROUTE-2	$artifact + $tool sentinel resolution fails when sentinel is passed as stringified JSON rather than proper object.	31:56
ROUTE-3	Manage API does not return compact artifact references in conversation history; replacement only applied internally for LLM context.	52:19
EDGE-3	Schema validation after resolution is dead code for function tools; checks parameters?.safeParse but schema is in inputSchema.	1:31:48

Agent chat with artifact passing via $artifact + $tool sentinel – Failed

Where: Chat interface / tool argument resolution during agent execution
Steps to reproduce:
1. Configure an agent with function tools that support artifact creation (e.g., fetch-page, extract-text)
2. Send a message that causes the agent to: call fetch-page, create an artifact from the result, then pass that artifact to extract-text using {$artifact: 'artifact-id', $tool: 'tool-call-id'} sentinel
3. Observe the tool execution and error handling
What failed: The agent correctly generated the $artifact + $tool sentinel pattern in tool arguments, but resolution failed with Error status. The sentinel was passed as a stringified JSON value (e.g., "{\"$artifact\": \"example-domain-report\", \"$tool\": \"toolu_...\"}") rather than a proper parsed object, causing the resolver to fail. The agent recovered using a $tool-only sentinel fallback.
Code analysis: Examined ArtifactParser.ts resolveArgs method (lines 230-284) and Agent.ts tool execution wrapper (lines 660-682). The resolution logic at ArtifactParser.ts:232-251 correctly checks for sentinel keys (SENTINEL_KEY.ARTIFACT and SENTINEL_KEY.TOOL) but requires them to be object properties, not a stringified JSON string. The parseEmbeddedJson call at Agent.ts:664 should parse embedded JSON strings, but may not be handling all cases where the LLM outputs the sentinel as a string.

Relevant code:

agents-api/src/domains/run/services/ArtifactParser.ts (lines 230-251)

async resolveArgs(args: any): Promise<any> {
  if (args !== null && typeof args === 'object' && !Array.isArray(args)) {
    if (
      typeof args[SENTINEL_KEY.ARTIFACT] === 'string' &&
      typeof args[SENTINEL_KEY.TOOL] === 'string'
    ) {
      const fullData = await this.artifactService.getArtifactFull(
        args[SENTINEL_KEY.ARTIFACT],
        args[SENTINEL_KEY.TOOL]
      );
      if (fullData?.data) {
        logger.debug(
          { artifactId: args[SENTINEL_KEY.ARTIFACT], toolCallId: args[SENTINEL_KEY.TOOL] },
          'Resolved artifact ref in tool arg'
        );
        return fullData.data;
      }
      throw new ToolChainResolutionError(
        args[SENTINEL_KEY.TOOL],
        `Artifact '${args[SENTINEL_KEY.ARTIFACT]}' from tool call '${args[SENTINEL_KEY.TOOL]}' could not be resolved`
      );
    }

agents-api/src/domains/run/agents/Agent.ts (lines 664-667)

const parsedArgsForResolution = artifactParser ? parseEmbeddedJson(args) : args;
const resolvedArgs = artifactParser
  ? await artifactParser.resolveArgs(parsedArgsForResolution)
  : args;

Why this is likely a bug: The $artifact + $tool sentinel pattern is a documented feature of this PR, but when the LLM outputs the sentinel as a stringified JSON value in a string property (e.g., {html: '{"$artifact":"...","$tool":"..."}'}), the parseEmbeddedJson function does not recursively parse nested string properties, leaving the sentinel as a string instead of an object. The resolveArgs function then cannot match the sentinel pattern.
Introduced by this PR: Yes – this PR introduced the $artifact + $tool sentinel resolution feature in ArtifactParser.ts and the parseEmbeddedJson integration in Agent.ts.
Timestamp: 31:56

Conversation history shows compact artifact references – Failed

Where: Manage API endpoint for conversation history retrieval
Steps to reproduce:
1. Create a conversation that includes tool calls resulting in artifact creation
2. Send a follow-up message to ensure the conversation history is populated
3. Call the manage API to retrieve conversation history: GET /manage/api/conversations/{id}
4. Inspect the returned tool-result messages
What failed: The manage API endpoint returns raw tool output in tool-result messages instead of compact [Artifact: "name" (id: artifact-id) | args: ... | summary: ...] references. The compact artifact replacement logic exists in getConversationHistoryWithCompression (lines 464-520) and works correctly per unit tests (3/3 pass), but this function is only invoked internally during agent LLM context preparation, not in the manage API response.
Code analysis: The getConversationHistoryWithCompression function in conversations.ts (lines 407-520) implements artifact replacement for tool-result messages. It queries getLedgerArtifacts to find matching artifacts by toolCallId, then replaces the message content with a compact reference. However, the manage API routes appear to use a different code path that returns raw conversation data without this transformation.

Relevant code:

agents-api/src/domains/run/data/conversations.ts (lines 464-511)

// Replace tool-result content with compact artifact references BEFORE compression.
const toolCallIds = messagesToFormat
  .filter((msg) => msg.messageType === 'tool-result')
  .map((msg) => msg.metadata?.a2a_metadata?.toolCallId)
  .filter((id): id is string => !!id);

if (toolCallIds.length > 0) {
  try {
    const artifacts = await getLedgerArtifacts(runDbClient)({
      scopes: { tenantId, projectId },
      toolCallIds,
    });
    const artifactsByToolCallId = new Map(
      artifacts.filter((a) => a.toolCallId).map((a) => [a.toolCallId as string, a])
    );
    if (artifactsByToolCallId.size > 0) {
      messagesToFormat = messagesToFormat.map((msg) => {
        // ... replacement logic
        return {
          ...msg,
          content: { text: `[${refParts.join(' | ')}]` },
        };
      });
    }
  }
}

Why this is likely a bug: The PR description mentions "conversation history artifact replacement" and the test plan explicitly tests for compact [Artifact: ...] references in the API response. The implementation exists and unit tests pass, but the manage API endpoint does not invoke this code path. This is likely a missing integration between the internal compression function and the external API.
Introduced by this PR: Yes – this PR added the artifact replacement logic in getConversationHistoryWithCompression and the test file conversations.artifact-replacement.test.ts, but did not integrate it with the manage API routes.
Timestamp: 52:19

Schema validation failure after argument resolution – Failed

Where: Tool execution in Agent.ts during argument resolution
Steps to reproduce:
1. Configure an agent with two tools where the first returns an object and the second expects a string (e.g., returns-object returns {key: 'value'}, extract-text expects {html: string})
2. Send a message that causes the agent to chain the object result to the string-expecting tool via {$tool: 'tool-call-id'} sentinel
3. Observe whether schema validation catches the type mismatch or the tool executes with invalid args
What failed: Schema validation after resolution did NOT catch the type mismatch. When returns-object (returns object) was chained to extract-text (expects html: string), the tool executed with invalid args and failed at runtime with 'html.replace is not a function' instead of being caught by the expected schema validation error 'Resolved tool args failed schema validation for extract-text'. The validation code at Agent.ts:669 checks toolDefinition.parameters?.safeParse but AI SDK stores the schema under inputSchema, making this validation code dead code for function tools.
Code analysis: The validation logic at Agent.ts:669-679 checks if toolDefinition.parameters?.safeParse exists, but function tools created via AI SDK's tool() helper store their Zod schema under inputSchema, not parameters. The parameters property is undefined for function tools, so the schema validation block is never executed.

Relevant code:

agents-api/src/domains/run/agents/Agent.ts (lines 669-679)

if (artifactParser && toolDefinition.parameters?.safeParse) {
  const resolvedChanged =
    JSON.stringify(parsedArgsForResolution) !== JSON.stringify(resolvedArgs);
  if (resolvedChanged) {
    const validation = toolDefinition.parameters.safeParse(resolvedArgs);
    if (!validation.success) {
      throw new Error(
        `Resolved tool args failed schema validation for '${toolName}': ${validation.error.message}`
      );
    }
  }
}

agents-api/src/domains/run/agents/Agent.ts (line 2023)

inputSchema: (tool as any).inputSchema || (tool as any).parameters || {},

Why this is likely a bug: The PR introduced this validation code to catch schema mismatches after sentinel resolution, but it checks the wrong property. AI SDK function tools use inputSchema (which may be a Zod schema or JSON Schema), not parameters. The check should be toolDefinition.inputSchema?.safeParse or should convert the inputSchema to a Zod schema for validation.
Introduced by this PR: Yes – the validation code at lines 669-679 was added in this PR as part of the tool chaining feature (49 lines added to Agent.ts).
Timestamp: 1:31:48

📋 View Recording

Screen Recording

* allowed artifacts as tool arguments * upadted prompt * Added Artifact Components as Tool Arguments * fixed linter * added test * updated * updated * style: auto-format with biome * testing some tool result chaining * testing some tool result chaining * updated artifacts and prompting * updated agents-docs * updated changeset * fixed tests * docs: clarify reconstructMessageText JSDoc * address review: remove any types and excess comment * refactor: rename typeSchemaMap to artifactSchemasByType * refactor: improve collectProjectArtifactComponents clarity * style: auto-format with biome * fix: correct type cast in collectProjectArtifactComponents * refactor: extract artifact syntax constants and improve sentinel check clarity * test: add regression tests for cache key fix and getArtifactFull fallback chain * updated tests * fix: restore getToolChainingGuidance template literal corrupted by formatter * fix(agents-api): Replace oversized tool results with artifact refs before compression (#2427) * identified compressor bugs and fixed them * updated changeset * updated agents api * updated * fix: use summary-only fallback for rehydrated artifact data * revert: restore full data fallback for rehydrated artifact summaryData * fix: add span attribute on artifact lookup failure * style: auto-format with biome * fix(agents-api): improve artifact lookup failure observability Add toolCallCount to BaseCompressor warn log for artifact batch lookup failures. Add span attribute artifact_lookup.failed in conversations.ts for tracing. * removed bad code --------- Co-authored-by: pullfrog[bot] <226033991+pullfrog[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * keep changeset * Add delete button to external agents edit page (#2441) * updated agents api --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: pullfrog[bot] <226033991+pullfrog[bot]@users.noreply.github.com> Co-authored-by: sarah <[email protected]>

tim-inkeep and others added 15 commits February 19, 2026 17:35

allowed artifacts as tool arguments

dba3b78

upadted prompt

63407ab

Added Artifact Components as Tool Arguments

14f133c

fixed linter

9379049

added test

f8c082f

Merge branch 'main' into feature/artifacts-as-arguments

ae38f8d

updated

2d75809

Merge branch 'main' into feature/artifacts-as-arguments

34bf2c8

updated

150207d

style: auto-format with biome

0ef6af3

testing some tool result chaining

1dcbf6f

testing some tool result chaining

d267468

Merge branch 'main' into feature/tool-result-chaining

2f25fcc

updated artifacts and prompting

466dfb6

updated agents-docs

20d420a

vercel Bot deployed to Preview – agents-docs February 24, 2026 17:05 View deployment

updated changeset

a783c50

claude Bot reviewed Feb 24, 2026

View reviewed changes

github-actions Bot deleted a comment from claude Bot Feb 24, 2026

fixed tests

2b2a385

vercel Bot deployed to Preview – agents-manage-ui February 24, 2026 18:27 View deployment

vercel Bot deployed to Preview – agents-api February 24, 2026 18:28 View deployment

vercel Bot deployed to Preview – agents-docs February 24, 2026 18:28 View deployment

claude Bot reviewed Feb 24, 2026

View reviewed changes

github-actions Bot deleted a comment from claude Bot Feb 24, 2026

tim-inkeep added 3 commits February 24, 2026 13:36

Merge branch 'main' into feature/artifacts-as-arguments

2293898

Merge branch 'feature/artifacts-as-arguments' into feature/tool-resul…

d1e9fec

…t-chaining

Merge branch 'main' into feature/artifacts-as-arguments

8295b87

claude Bot reviewed Feb 27, 2026

View reviewed changes

Comment thread agents-api/src/domains/run/agents/versions/v1/PromptConfig.ts Outdated

github-actions Bot deleted a comment from claude Bot Feb 27, 2026

updated tests

7e14931

vercel Bot temporarily deployed to Preview – agents-docs February 27, 2026 21:47 Inactive

vercel Bot deployed to Preview – agents-api February 27, 2026 21:49 View deployment

claude Bot reviewed Feb 27, 2026

View reviewed changes

github-actions Bot deleted a comment from claude Bot Feb 27, 2026

amikofalvy requested changes Feb 27, 2026

View reviewed changes

fix: restore getToolChainingGuidance template literal corrupted by fo…

8dbb79f

…rmatter

vercel Bot temporarily deployed to Preview – agents-docs February 27, 2026 22:38 Inactive

tim-inkeep and others added 5 commits February 27, 2026 17:39

keep changeset

df65574

Add delete button to external agents edit page (#2441)

0fee1b9

updated agents api

c4052fc

keep changeset

60934c4

vercel Bot temporarily deployed to Preview – agents-docs February 27, 2026 22:41 Inactive

vercel Bot deployed to Preview – agents-api February 27, 2026 22:43 View deployment

vercel Bot deployed to Preview – agents-manage-ui February 27, 2026 22:43 View deployment

claude Bot reviewed Feb 27, 2026

View reviewed changes

github-actions Bot deleted a comment from claude Bot Feb 27, 2026

pullfrog Bot approved these changes Feb 27, 2026

View reviewed changes

amikofalvy approved these changes Feb 27, 2026

View reviewed changes

tim-inkeep merged commit 00c21ec into main Feb 27, 2026
19 checks passed

tim-inkeep deleted the feature/tool-result-chaining branch February 27, 2026 23:21

claude Bot mentioned this pull request Feb 27, 2026

Add MCP server instructions to system prompt and surface server defaults in UI #2370

Merged

	Updated artifact parsing
	Add tool result chaining to pipe raw tool outputs directly into subsequent tool calls. Improve artifact prompting to clarify which fields are captured, displayed, and passed to tools.

Conversation

tim-inkeep commented Feb 24, 2026

Uh oh!

vercel Bot commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

changeset-bot Bot commented Feb 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦋 Changeset detected

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

PR Review Summary

🔴❗ Critical (1) ❗🔴

🟠⚠️ Major (4) 🟠⚠️

🟡 Minor (3) 🟡

💭 Consider (3) 💭

💡 APPROVE WITH SUGGESTIONS

Uh oh!

claude Bot Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

PR Review Summary

✅ Resolved in This Delta

📝 Delta Changes Reviewed

🕐 Pending Recommendations (4)

💡 APPROVE WITH SUGGESTIONS

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

PR Review Summary

🔴❗ Critical (1) ❗🔴

📝 Delta Changes Assessed

Root Cause Analysis

🗣️ Open Discussion

🚫 REQUEST CHANGES

Uh oh!

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

PR Review Summary

📝 Delta Changes Assessed

🕐 Pending Recommendations (1)

🗣️ Open Discussion

🚫 REQUEST CHANGES

Uh oh!

amikofalvy left a comment

Choose a reason for hiding this comment

Uh oh!

pullfrog Bot commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

PR Review Summary

Delta Review Scope

Delta Changes Analyzed

Assessment

✅ APPROVE

Uh oh!

tim-inkeep commented Feb 27, 2026

Uh oh!

pullfrog Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Full PR Re-Review Summary

✅ All Prior Issues Resolved

Implementation Assessment

🟡 Minor (2)

💭 Consider (3)

✅ APPROVE

Uh oh!

pullfrog Bot Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

pullfrog Bot Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

vercel Bot commented Feb 24, 2026 •

edited

Loading

changeset-bot Bot commented Feb 24, 2026 •

edited

Loading

pullfrog Bot commented Feb 27, 2026 •

edited

Loading

pullfrog Bot left a comment •

edited

Loading