feat: reactive mid-generation compression + artifact exclusion by tim-inkeep · Pull Request #3133 · inkeep/agents

tim-inkeep · 2026-04-14T21:03:29Z

Summary

Implementation of spec #3129 (specs/2026-04-14-reactive-compression/).

Reactive mid-gen compression via wrapLanguageModel middleware. Compresses and retries only when the provider actually returns a context-overflow error. Preserves the AI SDK multi-step loop.
Oversized tool-output exclusion at tool-wrapper.ts's toModelOutput seam. Raw outputs exceeding 30% of the context window are replaced with a structured error-shaped stub matching default-tools.ts's retrieval_blocked shape — the LLM sees a tool failure it can react to instead of bloated content.
Removed proactive compression from prepareStep callback.
Removed _oversizedWarning injection from BaseCompressor and ArtifactService; metadata.isOversized remains the durable signal.

Stories delivered

	Story	Commit
US-001	Provider-aware context-overflow detector	`9f25d9c57`
US-002	`compressionRetryMiddleware` (wrapGenerate + wrapStream)	`555620fc2`
US-003	Wire middleware at generate call sites; remove proactive compression	`c12970714`
US-004	Oversized exclusion at `tool-wrapper.ts:111` `toModelOutput`	`4ec8c2131`
US-005	Remove `_oversizedWarning` injection	`83ae43a4e`
US-006	Changeset	`6bcd3bf2d`

Architecture highlights

compressionRetryMiddleware is per-request (factory closes over run context — DEC-13), so compression scope matches run scope.
doGenerate / doStream are nullary in @ai-sdk/[email protected] — retry uses options.model.doGenerate(modifiedOptions) directly. Verified middleware contract is identical across 3.0.2 and 3.0.4.
Stream peek uses async-iterator .next() + prepend-then-pipe generator. No tee(), no array buffering (DEC-07). TTFT on the happy path is within microseconds of an unwrapped model.
Post-commit mid-stream errors propagate without retry (R7) — safe because tool state may already be captured and text may already have been forwarded.
Pre-generation conversation-history compression is unchanged (acts as safety net).

Telemetry added

Span attributes: compression.trigger, compression.provider, compression.detector, compression.retry_number, compression.outcome, anthropic_overflow_regex_hit, tool.result.oversized_excluded.

Changeset

`@inkeep/agents-api` patch: "Make mid-generation compression reactive to provider overflow errors and exclude oversized tool outputs from LLM context"

Test plan

pnpm typecheck passes
pnpm lint passes
pnpm --filter @inkeep/agents-api test --run passes (includes new tests)
Review: compressionRetryMiddleware.ts + detectContextOverflow.ts (new code paths)
Review: tool-wrapper.ts:111 oversized-exclusion wiring
Review: removed token-budget branch in ai-sdk-callbacks.ts and compression.ts helper
Manual verification (recommended): run an agent with a tool that returns an oversized payload; confirm via Jaeger that tool.result.oversized_excluded = true appears and no content bytes reach the LLM.

Notes

Depends on spec PR spec: reactive mid-generation compression + artifact exclusion #3129 for spec artifacts context (spec files travel on this branch too, merging either first is fine).
No public SDK/API contract change. No DB migration. No UI change.

🤖 Generated with Claude Code

Ready-for-implementation spec for refactoring agents-api compression triggers: - Mid-gen compression becomes reactive (via AI SDK wrapLanguageModel middleware that catches provider-signaled overflow and retries once with compressed input) - Oversized tool outputs excluded at tool-wrapper.ts toModelOutput seam with structured error-shaped tool result (matching default-tools.ts retrieval_blocked) - Pre-generation conversation-history compression unchanged (safety net) Includes four evidence files (compression triggers, oversized artifact handling, provider overflow signals, middleware approach), audit + challenger findings, and audit-resolution changelog. 14 locked/directed decisions at HIGH confidence. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

- Correct @ai-sdk/provider version citation: 3.0.4 → 3.0.2 (verified middleware contract identical across both via dist/index.d.ts diff) - Fix formatOversizedReason → formatOversizedRetrievalReason (actual function exported from artifact-utils.ts:50) - Clarify R8 telemetry: all attributes are span attributes, not log fields - Remove local filesystem path from §17 References - Add End-User visibility gap to Risks (MEDIUM) with operator mitigation - Add Traces UI surfacing, enriched stub metadata, and retrieval-tool migration note to Future Work - Add docs-check step to Verification Plan No design changes. All review findings addressed. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

…roactive compression Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

… seam Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

…rtifactService Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

changeset-bot · 2026-04-14T21:03:34Z

🦋 Changeset detected

Latest commit: 10e0d12

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 10 packages

Name	Type
@inkeep/agents-api	Patch
@inkeep/agents-manage-ui	Patch
@inkeep/agents-cli	Patch
@inkeep/agents-core	Patch
@inkeep/agents-email	Patch
@inkeep/agents-mcp	Patch
@inkeep/agents-sdk	Patch
@inkeep/agents-work-apps	Patch
@inkeep/ai-sdk-provider	Patch
@inkeep/create-agents	Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

vercel · 2026-04-14T21:03:41Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
agents-api	Ready	Preview, Comment	Apr 15, 2026 2:09pm
agents-manage-ui	Ready	Preview, Comment	Apr 15, 2026 2:09pm

1 Skipped Deployment

Project	Deployment	Actions	Updated (UTC)
agents-docs	Skipped		Apr 15, 2026 2:09pm

github-actions · 2026-04-14T21:04:02Z

I could not sync this public PR into agents-private automatically.

Patch application failed. The diff could not be applied cleanly. Please rebase your PR on the latest main.

pullfrog · 2026-04-14T21:06:16Z

Summary

This PR fundamentally changes how mid-generation context compression is triggered: instead of proactively compressing based on token-budget predictions (which often fire unnecessarily), compression now only happens reactively when the LLM provider actually returns a context-overflow error. It also adds a new seam to exclude oversized tool outputs from reaching the LLM context at all, replacing them with structured error stubs.

Key changes

New compressionRetryMiddleware wraps the language model via AI SDK's wrapLanguageModel, intercepting overflow errors and retrying with compressed prompts
New detectContextOverflow module identifies provider-specific overflow errors (OpenAI context_length_exceeded code, Anthropic regex patterns)
Gutted the token-budget prediction branch from handlePrepareStepCompression — it now always returns {}
Moved compression logic from the prepareStep callback into a standalone buildCompressPrompt function reused by the middleware
tool-wrapper.ts's toModelOutput hook now checks for oversized tool outputs before they reach the LLM, substituting a structured error stub
Removed _oversizedWarning field injection from BaseCompressor and ArtifactService (metadata isOversized flag remains)

Reactive compression middleware

Before: The prepareStep AI SDK callback checked token counts after each step — using actual usage from the previous step or falling back to estimates — and preemptively compressed if the context appeared to exceed a per-model threshold (75-91%). This fired on many calls that would have succeeded, causing unnecessary LLM summarization calls, context detail loss, and injected synthetic messages.

After: A new compressionRetryMiddleware.ts implements LanguageModelMiddleware with wrapGenerate and wrapStream hooks. For non-streaming: it catches overflow errors from doGenerate(), compresses the prompt via buildCompressPrompt, and retries once. For streaming: it peeks the first chunk using async-iterator .next() — if it's an overflow error ("pre-commit"), it compresses and retries; if it's real data, it prepends the buffered chunk and pipes the rest through. A second overflow propagates without further retry. The middleware is constructed per-request, closing over the run-scoped compressor. The middleware includes a version comment noting verification against @ai-sdk/[email protected] and 3.0.4 — a reminder to re-verify if the SDK bumps to v4.

Context overflow detection

New detectContextOverflow.ts provides isContextOverflowError(err) which discriminates between real context overflow and other 400 errors. It checks for OpenAI's stable context_length_exceeded error code via data.error.code, and falls back to regex patterns for Anthropic messages (prompt is too long, input length and max_tokens exceed context limit). Explicitly excludes 413 (byte-limit) and non-400 status codes. Includes OTEL span attributes for monitoring regex-based detections.

Oversized tool output exclusion at the source

Before: tool-wrapper.ts's toModelOutput unconditionally called buildToolResultForModelInput(output), sending raw tool results — even gigantic ones — into the LLM context. Oversized detection only affected downstream compression/artifact storage.

After: toModelOutput now calls detectOversizedArtifact against the model's context window size. If the output exceeds the threshold, it returns a structured JSON stub with status: 'oversized', the tool name, args, structure info, and a recommendation to narrow queries. This matches the existing retrieval_blocked shape from default-tools.ts so the LLM interprets it as a tool-level failure. OTEL attributes are set for observability. The context window is now resolved lazily using ctx.currentModelSettings (stashed after configureModelSettings runs), fixing a bug where getModelContextWindow() was called without args and always returned the 120K default — making the 30% oversized threshold hardcoded at ~36K regardless of the actual model.

Compression logic extraction

The summary-building logic (artifact references, stop instructions, compression cycle tracking) that previously lived inline in handlePrepareStepCompression was extracted into buildCompressPrompt() in compression.ts. This function now accepts an originalMessageCount parameter and preserves the original message prefix (system + user message + pre-generation conversation history) when compressing — only the generated tail (tool calls + assistant responses accumulated during the current run) is summarized. This matches the pre-middleware behavior where originalMessages = stepMessages.slice(0, originalMessageCount) and the summary was appended after the preserved prefix.

Removal of `_oversizedWarning` field

The _oversizedWarning string field was removed from summaryData injection in both BaseCompressor.ts and ArtifactService.ts. Since oversized tool outputs are now excluded at the tool-wrapper seam before reaching the LLM, the warning in compression/artifact metadata is redundant. The metadata.isOversized boolean and _structureInfo field remain as the durable signals.

Async-iterator fallback fix

The toAsyncIterator helper in compressionRetryMiddleware.ts previously used an unsound as unknown as AsyncIterator cast on ReadableStream. This has been replaced with a proper Reader-to-iterator adapter that wraps reader.read() into .next() calls — safe if the fallback branch is ever triggered on a runtime without Symbol.asyncIterator on ReadableStream.

Testing

New compressionRetryMiddleware.test.ts (299 lines): Tests wrapGenerate passthrough, overflow retry success, double-overflow propagation, non-overflow error passthrough. Tests wrapStream with data passthrough, pre-commit overflow retry, pre-commit non-overflow error throw, and post-commit mid-stream error propagation. Includes a code-review constraint test verifying no .tee() or Array.from usage on streams.
New detectContextOverflow.test.ts (114 lines): Comprehensive coverage of OpenAI code-based detection, both Anthropic regex patterns, 413 exclusion, generic 400 without overflow indicators, non-API errors, null/undefined/string inputs, 500 status with overflow message, and wrong data shapes.
New tool-wrapper-oversized.test.ts (152 lines): Tests oversized exclusion stub structure for large outputs, passthrough for normal-sized outputs, and stub payload contents.
Updated ai-sdk-callbacks.test.ts: Gutted ~220 lines of token-budget prediction tests, replaced with 2 tests confirming the function is now a no-op.
Updated BaseCompressor.test.ts and OversizedArtifacts.test.ts: Assertions changed from expecting _oversizedWarning to be undefined.

Spec & documentation

The PR includes a full spec at specs/2026-04-14-reactive-compression/SPEC.md covering the problem statement (SCR format), goals/non-goals, personas, current vs target state, architecture diagram, and detailed design. Supporting evidence documents cover current compression triggers, middleware approach rationale, oversized artifact handling investigation, and provider overflow signal formats. Meta documents include a changelog, audit findings, and design challenge writeup.

^{｜ View workflow run ｜ Triggered by Pullfrog ｜ Using Claude Opus ｜ 𝕏}

pullfrog

Well-structured PR. The reactive compression approach (detect overflow from provider error, compress, retry once) is cleaner than the proactive token-budget check it replaces. The stream peek strategy is sound — single .next() call, no tee(), no buffering. The toModelOutput oversized exclusion wiring is the right seam for filtering before the LLM sees bloated content. Two issues worth addressing: the toAsyncIterator fallback has a broken cast, and the context window used for oversized detection is always the 120K default rather than the actual model's window.

^{｜ Fix all ➔ ｜ Fix 👍s ➔ ｜ View workflow run ｜ Using Claude Opus ｜ 𝕏}

pullfrog · 2026-04-14T21:07:27Z

+  if (Symbol.asyncIterator in stream) {
+    return (stream as AsyncIterable<StreamPart>)[Symbol.asyncIterator]();
+  }
+  return (stream as ReadableStream<StreamPart>).getReader() as unknown as AsyncIterator<StreamPart>;


ReadableStreamDefaultReader does not have .next() — it has .read(). This cast to AsyncIterator is unsound and would throw at runtime if the branch were reached.

In practice, modern Node.js ReadableStream has Symbol.asyncIterator, so the if branch on line 33 always triggers and this fallback is dead code. But the cast is still incorrect — either remove the fallback entirely, or wrap the reader in a proper async-iterator adapter:

const reader = (stream as ReadableStream<StreamPart>).getReader(); return { next: () => reader.read() as Promise<IteratorResult<StreamPart>>, };

pullfrog · 2026-04-14T21:07:27Z

  const relationshipId = getRelationshipIdForTool(ctx, toolName, toolType);

  const originalExecute = toolDefinition.execute;
+  const contextWindowSize = getModelContextWindow().contextWindow ?? 120000;


getModelContextWindow() without arguments always returns the 120K default (source: fallback). This means the 30% oversized threshold is hardcoded at ~36K tokens regardless of the actual model. An agent using Claude with a 200K context window would still exclude results above ~36K tokens instead of ~60K.

Consider passing model settings from ctx.config or accepting contextWindowSize as a parameter to wrapToolWithStreaming so the threshold matches the generation model.

pullfrog · 2026-04-14T21:07:27Z

+        }
+        return {
+          type: 'json' as const,
+          value: JSON.parse(


The JSON.parse(JSON.stringify({...})) round-trip is a no-op on this plain object literal — every value is already JSON-safe (strings, numbers, plain objects). You can return the object directly as value.

Suggested change

value: JSON.parse(

value: {

status: 'oversized',

toolCallId: lastToolCallId,

toolName,

warning:

'⚠️ Tool produced an oversized result that cannot be included in the conversation.',

reason: formatOversizedRetrievalReason(

detection.originalTokenSize,

detection.contextWindowSize ?? contextWindowSize

),

toolInfo: {

toolName,

toolArgs: lastArgs,

structureInfo: detection.structureInfo ?? '',

},

recommendation:

'Consider: 1) narrowing filters/queries on the next tool call, 2) asking the user to break down the request, 3) processing data differently.',

},

pullfrog · 2026-04-14T21:07:27Z

+    }
+
+    const summaryMessage = JSON.stringify(summaryData);
+    const systemMessages = messages.filter((m: any) => m.role === 'system');


The old handlePrepareStepCompression kept both system messages and original conversation messages (stepMessages.slice(0, originalMessageCount)). This new buildCompressPrompt only keeps role === 'system' messages. That means the original user query and any initial conversation history are dropped — the LLM only sees the compression summary.

Is this intentional? If the original user message carries important framing that the summary might not perfectly capture, the LLM could lose track of the original ask after compression.

pullfrog · 2026-04-14T21:07:27Z

+  ctx: CompressionRetryContext
+): LanguageModelMiddleware {
+  return {
+    specificationVersion: 'v3',


The specificationVersion: 'v3' field may tie this to a specific AI SDK middleware contract version. If the SDK upgrades the spec version (e.g. to v4), this middleware might silently stop being called. Consider adding a comment noting which @ai-sdk/provider versions this was verified against (you mention 3.0.2–3.0.4 in the PR description) so future maintainers know to re-check.

…nal messages - Context window (pullfrog #2, load-bearing): getModelContextWindow() was called without args and always returned the 120K default, so the 30% oversized threshold was hardcoded at ~36K regardless of the actual model. Added currentModelSettings to AgentRunContext, stashed after configureModelSettings, and read lazily inside toModelOutput. - Compression prompt (pullfrog #4, load-bearing): buildCompressPrompt only kept role==='system' messages, dropping the original user query and conversation-history prefix. Now takes originalMessageCount and preserves messages.slice(0, originalMessageCount) as the prefix — matching the pre-middleware handlePrepareStepCompression behavior. - Async-iterator fallback (pullfrog #1): replaced the unsound `as unknown as AsyncIterator` cast with a proper Reader → iterator adapter so the dead branch is safe if ever triggered. - Middleware spec-version comment (pullfrog #5): documented which @ai-sdk/provider versions the wrapGenerate/wrapStream contract was verified against. - JSON round-trip (pullfrog #3): kept as-is. The round-trip is not a no-op — it launders `unknown` tool args through JSONValue and strips non-JSON types. Added a comment explaining this. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

claude

PR Review Summary

(4) Total Issues | Risk: Medium

🟠⚠️ Major (2) 🟠⚠️

🟠 1) Missing test coverage Compression failure during retry is untested

Issue: The wrapGenerate and wrapStream implementations call ctx.compressPrompt() which can throw (e.g., summarizer model failure, network error), but there is no test verifying the behavior when compression itself fails. The compressPrompt call at lines 156 and 186 is outside any explicit error handling in the middleware.

Why: If compressPrompt throws during a retry attempt, the error propagation behavior is untested. A bug here could cause silent failures or incorrect error messages being surfaced to users. This is a realistic failure mode since compression involves an LLM call. The tests currently only verify success paths and overflow retry scenarios, not compression infrastructure failures.

Fix: Add test:

it('propagates compression errors during retry', async () => {
  const ctx = { 
    compressPrompt: vi.fn().mockRejectedValue(new Error('Summarizer failed')) 
  };
  const middleware = createCompressionRetryMiddleware(ctx);
  const doGenerate = vi.fn().mockRejectedValue(makeOverflowError());
  
  await expect(middleware.wrapGenerate?.({
    doGenerate,
    doStream: vi.fn(),
    params: { prompt: [] } as any,
    model: { doGenerate: vi.fn(), doStream: vi.fn() } as any,
  })).rejects.toThrow('Summarizer failed');
  
  expect(ctx.compressPrompt).toHaveBeenCalledOnce();
});

Refs:

compressionRetryMiddleware.test.ts

🟠 2) compression.ts buildCompressPrompt has no dedicated unit tests

Issue: The new buildCompressPrompt function (lines 41-104) transforms the prompt after compression, including critical logic for: (1) handling array vs object summary results, (2) injecting stop instructions based on compression cycle count, (3) mapping related_artifacts to artifact_reference tags, and (4) filtering system messages. These code paths are only tested implicitly through integration.

Why: Bugs in buildCompressPrompt could cause: (1) incorrect prompt structure after compression leading to LLM confusion, (2) stop instructions not being applied correctly causing infinite loops, (3) artifact references being malformed. The function has conditional branches (compressionCycle >= 1, hasNewWork checks) that should be explicitly tested.

Fix: Create unit tests for buildCompressPrompt:

describe('buildCompressPrompt', () => {
  it('injects stop instruction after multiple compression cycles', async () => {
    const compressor = makeCompressor({ 
      getCompressionCycleCount: vi.fn().mockReturnValue(2) 
    });
    const prompt = buildCompressPrompt(compressor);
    const result = await prompt([
      { role: 'system', content: 'sys' }, 
      { role: 'user', content: 'hi' }
    ]);
    expect(result[1].content).toContain('STOP ALL TOOL CALLS');
  });

  it('preserves system messages and removes others', async () => { /* ... */ });
  it('formats artifact references correctly', async () => { /* ... */ });
});

Refs:

compression.ts:41-104

Inline Comments:

🟠 Major: compressionRetryMiddleware.ts:156 Compression failure lacks telemetry
🟡 Minor: tool-wrapper.ts:136-155 Unnecessary JSON.parse(JSON.stringify()) wrapper

🕐 Pending Recommendations (2)

These issues were raised in the prior pullfrog review and are still unresolved:

🟠 compressionRetryMiddleware.ts:36 — toAsyncIterator fallback casts ReadableStreamDefaultReader to AsyncIterator, but these interfaces are incompatible. reader.next() will throw TypeError when the stream lacks Symbol.asyncIterator.
🟠 tool-wrapper.ts:114 — getModelContextWindow() called without arguments always returns 120K fallback. The oversized detection threshold will be wrong for models with different context windows.

🚫 REQUEST CHANGES

Summary: The reactive compression approach is architecturally sound — detecting overflow from provider errors and retrying once is cleaner than proactive token-budget prediction. The stream peek strategy (single .next(), no tee(), no buffering) is efficient. However, the two issues flagged by pullfrog remain unaddressed (broken toAsyncIterator cast, context window always using 120K fallback), and the new middleware lacks telemetry for compression failures and test coverage for failure paths. The JSON.parse/stringify wrapper on a plain object is also unnecessary overhead.

Discarded (8)

Location	Issue	Reason Discarded
`compressionRetryMiddleware.ts:140`	`specificationVersion: 'v3'` hardcoded	Valid observation but low risk — existing middleware in agents-core uses same pattern; adding a shared constant is a nice-to-have, not blocking
`compression.ts:41`	`safeCompress` result shape handling is fragile	The array vs object handling matches existing compressor contract; spread operator would be a refactor preference
`ai-sdk-callbacks.ts:11`	Vestigial `handlePrepareStepCompression` parameters	Intentional scaffolding per spec — prep for future step-based logic
`detectContextOverflow.ts:22`	413 status exclusion may silently misclassify	413 is bytes not tokens per HTTP spec; adding trace-level logging is a nitpick
`compressionRetryMiddleware.ts:173`	Original stream not explicitly released	The reader is consumed (one chunk read); explicit cancel is a nice-to-have for edge cases
`compressionRetryMiddleware.ts:12`	Loose `StreamPart` type with index signature	Intentional flexibility for AI SDK internal chunk variations
`compression.ts:45`	Prompt cast to `any[]` loses type info	Type guards would be nice but current implementation works; not blocking
`compressionRetryMiddleware.test.ts:n/a`	Missing test for empty stream (immediate EOF)	Edge case with synthetic finish chunk handling; nice-to-have

Reviewers (6)

Reviewer	Returned	Main Findings	Inline Comments	Pending Recs	Discarded
`pr-review-standards`	3	0	1	2	0
`pr-review-tests`	7	2	0	0	5
`pr-review-errors`	5	0	1	2	2
`pr-review-sre`	5	0	0	2	3
`pr-review-types`	5	0	0	2	3
`pr-review-architecture`	5	0	0	2	3
Total	30	2	2	2	8

Note: Multiple reviewers flagged the same two issues (toAsyncIterator cast, context window fallback) that were already raised by pullfrog — these were consolidated into Pending Recs rather than duplicated.

claude · 2026-04-14T21:15:03Z

+          'Context overflow detected in doGenerate, compressing and retrying'
+        );
+
+        const compressedPrompt = await ctx.compressPrompt(params.prompt as unknown[]);


🟠 MAJOR: Compression failure lacks telemetry

Issue: When ctx.compressPrompt() throws, the error propagates without calling setRetryTelemetry(). The compression.outcome span attribute is only set on success (line 163) or retry error (line 167), not on compression infrastructure failures.

Why: During incidents, operators cannot distinguish compression failures (summarizer timeout, model unavailable) from LLM errors in observability dashboards. This creates a blind spot that extends mean-time-to-resolution. The same pattern applies to the streaming path at line 186.

Fix:

let compressedPrompt: unknown[]; try { compressedPrompt = await ctx.compressPrompt(params.prompt as unknown[]); } catch (compressionErr) { setRetryTelemetry(provider, detector, 1, 'compression_failed'); logger.error( { provider, detector, error: compressionErr instanceof Error ? compressionErr.message : String(compressionErr) }, 'Compression failed during overflow retry' ); throw compressionErr; }

Refs:

compressionRetryMiddleware.ts:186 — same pattern needed

claude · 2026-04-14T21:15:13Z

+          value: JSON.parse(
+            JSON.stringify({
+              status: 'oversized',
+              toolCallId: lastToolCallId,
+              toolName,
+              warning:
+                '⚠️ Tool produced an oversized result that cannot be included in the conversation.',
+              reason: formatOversizedRetrievalReason(
+                detection.originalTokenSize,
+                detection.contextWindowSize ?? contextWindowSize
+              ),
+              toolInfo: {
+                toolName,
+                toolArgs: lastArgs,
+                structureInfo: detection.structureInfo ?? '',
+              },
+              recommendation:
+                'Consider: 1) narrowing filters/queries on the next tool call, 2) asking the user to break down the request, 3) processing data differently.',
+            })
+          ),


🟡 Minor: Unnecessary JSON.parse(JSON.stringify()) wrapper

Issue: The oversized stub is a plain object literal constructed inline with only primitive values and nested plain objects. The JSON.parse(JSON.stringify(...)) round-trip has no effect on such an object.

Why: This pattern is typically used for deep-cloning or stripping non-serializable properties. Here, the object is constructed from scratch with no circular references, functions, or special objects — the serialization round-trip is pure overhead (CPU, memory, potential error surface if any dynamic value is non-serializable).

Fix: (1-click apply)

Suggested change

value: JSON.parse(

JSON.stringify({

status: 'oversized',

toolCallId: lastToolCallId,

toolName,

warning:

'⚠️ Tool produced an oversized result that cannot be included in the conversation.',

reason: formatOversizedRetrievalReason(

detection.originalTokenSize,

detection.contextWindowSize ?? contextWindowSize

),

toolInfo: {

toolName,

toolArgs: lastArgs,

structureInfo: detection.structureInfo ?? '',

},

recommendation:

'Consider: 1) narrowing filters/queries on the next tool call, 2) asking the user to break down the request, 3) processing data differently.',

})

),

return {

type: 'json' as const,

value: {

status: 'oversized',

toolCallId: lastToolCallId,

toolName,

warning:

'⚠️ Tool produced an oversized result that cannot be included in the LLM context. The full result is stored as an artifact and can be retrieved later if needed.',

reason: formatOversizedRetrievalReason(

detection.originalTokenSize,

detection.contextWindowSize ?? contextWindowSize

),

toolInfo: {

toolName,

toolArgs: lastArgs,

structureInfo: detection.structureInfo ?? '',

},

recommendation:

'Consider: 1) narrowing filters/queries, 2) requesting specific fields, or 3) breaking into smaller requests.',

},

};

Refs:

default-tools.ts:214-246 — similar retrieval_blocked shape

itoqa · 2026-04-14T22:20:38Z

Ito Test Report ✅

16 test cases ran. 1 additional finding, 15 passed.

Across 16 test cases, 15 passed and 1 failed, showing generally stable local non-production behavior for baseline and edge chat flows (streaming/non-streaming completions, rapid same-conversation requests, mid-stream reload resume, and mobile 390x844), overflow handling (bounded recovery with a one-retry cap and no retries for non-overflow errors), oversized tool-output safeguards, and authorization/injection probes. The key finding was one High-severity validation gap in /run/api/chat where malformed approval-response-shaped payloads in messages[].content can bypass the approval guard when conversationId is omitted and proceed through normal execution instead of returning the intended HTTP 400, while no cross-tenant access bypass was observed.

✅ Passed (15)

Category	Summary	Screenshot
Adversarial	Oversized repeated-token streaming payload completed in bounded time with clean stream termination.
Adversarial	Forged x-target-* headers did not grant cross-tenant scope; requests were denied under local dev/test auth behavior.
Adversarial	Injection-shaped strings remain inert literal text through the oversized stub serialization path.
Edge	Using one shared conversationId, 4 rapid requests all completed quickly with deterministic terminal responses and no deadlock/hung stream.
Edge	A streaming request produced partial output, page reload occurred mid-stream, and a follow-up with the same conversationId completed (HTTP 200), showing no stuck pending stream blocker.
Edge	Out-of-order and missing-turn history payloads completed safely without crashes or stack-trace leakage.
Edge	Original BLOCKED result was not a product bug: prior run hung on a restart-server tooling step. After proper setup and non-production playground-token bypass for missing temp signing keys, the test was re-executed at 390x844 and mobile chat flow could be initiated and completed.
Logic	Non-streaming overflow-retry request completed with HTTP 200 and assistant output in bounded time.
Logic	Retry budget behavior is capped at one retry and a second overflow is propagated (verified via source and deterministic tests).
Logic	Invalid model/agent request returned an immediate terminal error with no retry side effects.
Logic	Streaming recoverable overflow path returned normal first chunks and completed cleanly.
Logic	Oversized tool output is replaced by a structured `status=oversized` stub and raw payload is excluded.
Logic	Normal-size tool output follows the standard pass-through path without oversized stubbing.
Logic	Oversized artifact metadata keeps `isOversized` and `_structureInfo` while `_oversizedWarning` remains absent.
Happy-path	Baseline streaming request returned HTTP 200 with SSE chunks and clean stream completion after local setup alignment.

ℹ️ Additional Findings (1)

These findings are unrelated to the current changes but were observed during testing.

Category	Summary	Screenshot
Adversarial	⚠️ Malformed approval-response content can bypass validation and proceed down normal chat execution flow.

⚠️ Approval-response payload bypasses validation without conversation context

What failed: The endpoint should reject malformed approval-response input with HTTP 400 and require conversationId, but this payload shape is not recognized as an approval response and execution proceeds as a normal chat request.
Impact: Clients can submit malformed approval-response payloads that bypass intended validation and trigger normal execution paths. This weakens approval flow integrity and can create unintended conversation state.
Steps to reproduce:
1. Send a POST request to /run/api/chat with a user message containing an approval-response-shaped object in messages[].content.
2. Omit conversationId from the request body.
3. Observe that the request can continue through normal execution flow instead of returning a 400 validation failure.
Stub / mock context: Requests were executed with local bypass authentication and a mock/default model; no extra route interception was used to fake the approval-response behavior itself.
Code analysis: I reviewed the chat route and message parsing types. Approval detection only inspects messages[].parts for state === 'approval-responded', while the request schema accepts content as unconstrained data and the normal path auto-creates a conversationId, so malformed content payloads avoid the approval guard and continue into execution setup.
Why this is likely a bug: The route enforces conversationId only in an approval branch that malformed content payloads can skip, while schema permissiveness and normal-path conversation creation allow unintended execution.

Relevant code:

agents-api/src/domains/run/routes/chatDataStream.ts (lines 126-145)

const approvalParts = (body.messages || [])
  .flatMap((m: any) => m?.parts || [])
  .filter((p: any) => p?.state === 'approval-responded' && typeof p?.toolCallId === 'string');

const isApprovalResponse = approvalParts.length > 0;

if (isApprovalResponse) {
  const conversationId = body.conversationId;
  if (!conversationId) {
    return c.json(
      {
        success: false,
        error: 'conversationId is required for approval response',
      },
      400
    );
  }
}

agents-api/src/domains/run/routes/chatDataStream.ts (lines 284-341)

const conversationId = body.conversationId ?? getConversationId();

return await otelContext.with(ctxWithBaggage, async () => {
  const agent = executionContext.project.agents[agentId];
  if (!agent) {
    throw createApiError({
      code: 'not_found',
      message: 'Agent not found',
    });
  }

  const activeAgent = await getActiveAgentForConversation(runDbClient)({
    scopes: { tenantId, projectId },
    conversationId,
  });
  if (!activeAgent) {
    await setActiveAgentForConversation(runDbClient)({
      scopes: { tenantId, projectId },
      conversationId,
      subAgentId: defaultSubAgentId,
      ref: executionContext.resolvedRef,
      agentId: agentId,
    });
  }
});

agents-api/src/domains/run/types/chat.ts (lines 132-136)

export const VercelMessageSchema = z.object({
  role: z.enum(['system', 'user', 'assistant', 'function', 'tool']),
  content: z.any(),
  parts: z.array(VercelMessagePartSchema).optional(),
});

Commit: 0b0d51f

View Full Run

Tell us how we did: Give Ito Feedback

claude

PR Review Summary

(0) New Issues in Delta | Risk: Low

✅ Delta Review: All Pullfrog Issues Addressed

The commit 10e0d12a3 successfully addresses all issues raised in the prior pullfrog review:

Issue	Status	Fix
`toAsyncIterator` broken cast	✅ Fixed	Reader now properly wrapped with `{ next: () => reader.read() }` instead of unsafe cast
Context window always 120K	✅ Fixed	`toModelOutput` now uses `getModelContextWindow(ctx.currentModelSettings)` with stored settings
Original messages dropped	✅ Fixed	`buildCompressPrompt` now preserves `originalMessages.slice(0, originalMessageCount)` prefix
SDK version comment	✅ Added	Comment documents verified SDK versions (3.0.2, 3.0.4) for future maintainers

🔍 Delta Changes Reviewed

5 files modified (+35/-9 lines):

agent-types.ts — Added currentModelSettings?: ModelSettings to AgentRunContext ✓
compression.ts — buildCompressPrompt now accepts originalMessageCount and preserves prefix ✓
generate.ts — Sets ctx.currentModelSettings = primaryModelSettings before tool wrapping ✓
tool-wrapper.ts — Context window lookup now uses actual model settings ✓
compressionRetryMiddleware.ts — Fixed async iterator adapter + added SDK version comment ✓

🕐 Pending Recommendations (4)

Issues from prior reviews that remain valid for unchanged code paths:

🟠 compressionRetryMiddleware.ts:167 — Compression failure during retry has no telemetry (compression.outcome: compression_failed)
🟠 compression.ts:41-104 — buildCompressPrompt lacks dedicated unit tests for stop instruction injection and artifact reference formatting
🟠 compressionRetryMiddleware.test.ts — Missing test for compression infrastructure failures (summarizer timeout/error)
🟡 tool-wrapper.ts:138-157 — JSON.parse(JSON.stringify({...})) is redundant on plain object literal

✅ APPROVE

Summary: The delta cleanly addresses all blocking issues from the prior review. The fixes are minimal and correct — the async iterator adapter now properly wraps the reader, context window detection uses actual model settings, and compression preserves the original conversation prefix. The reactive compression architecture is sound. The pending recommendations are valid improvements for test coverage and telemetry but are not blocking for merge.

Reviewers (2)

Reviewer	Returned	Main Findings	Consider	While You're Here	Inline Comments	Pending Recs	Discarded
`pr-review-standards`	0	0	0	0	0	0	0
`pr-review-types`	0	0	0	0	0	0	0
Total	0	0	0	0	0	0	0

Note: Delta scope — sub-reviewers found no new issues in the 5 changed files. Prior issues from full PR review remain as Pending Recommendations.

github-actions · 2026-04-15T14:14:57Z

Preview URLs

Use these stable preview aliases for testing this PR:

UI: https://pr-3133-ui.preview.inkeep.com
API: https://pr-3133-api.preview.inkeep.com
API health: https://pr-3133-api.preview.inkeep.com/health

These point to the same Vercel preview deployment as the bot comment, but they stay stable and easier to find.

Raw Vercel deployment URLs

UI deployment: https://agents-manage-kdjare8ps.preview.inkeep.com
API deployment: https://agents-91l60bnkg.preview.inkeep.com

itoqa · 2026-04-15T15:50:23Z

Ito Test Report ✅

12 test cases ran. 12 passed.

Unified QA results were fully green: 12 executed test cases passed with 0 failures (plus one verified-empty segment with no remaining cases), indicating stable behavior across the covered run and chat endpoints in the local non-production setup. The key findings were that earlier blockers were confirmed as harness/environment issues rather than product defects, while critical behaviors worked as designed—including SSE streaming completion and clean termination, interruption recovery on the same conversation, mobile and burst-submit transport stability, correct reactive overflow retry semantics (with 413 excluded), and correct tool-output threshold handling for below-threshold pass-through versus above-threshold structured oversized exclusion.

✅ Passed (12)

Category	Summary	Screenshot
Adversarial	Five rapid parallel submissions plus a same-conversation follow-up all returned HTTP 200 with no crash/500.
Edge	HTTP 413-style size errors remain excluded from overflow retry classification.
Edge	Not a real app bug. Boundary below 30% is intentionally treated as non-oversized and remains pass-through.
Edge	Not a real app bug. Above-threshold payloads are correctly excluded as oversized per detector and serialization logic.
Edge	Interrupted stream resumed on the same conversationId with HTTP 200 and a clean single-start [DONE] completion.
Edge	Mobile 390x844 chat submissions reached /run/api/chat with HTTP 200 and v2 stream headers.
Logic	Non-stream overflow retry succeeds once compression retry path is exercised.
Logic	Streaming pre-commit overflow retry resumes and completes via single retry path.
Logic	Not a real app bug. Oversized tool output exclusion and structured replacement are implemented and verified.
Logic	Near-limit non-overflow requests proceed without proactive compression injection.
Happy-path	`/run/v1/chat/completions` streamed successfully with HTTP 200 SSE, assistant chunks, and a clean finish marker after local project setup.
Happy-path	Not a real app bug. Prior blocked result was harness/infrastructure-related. Re-validation confirms under-threshold tool results are passed through normally via non-oversized path.

Commit: 10e0d12

View Full Run

Tell us how we did: Give Ito Feedback

github-actions · 2026-04-23T00:48:25Z

This pull request has been automatically marked as stale because it has not had recent activity.
It will be closed in 7 days if no further activity occurs.

If this PR is still relevant:

Rebase it on the latest main branch
Add a comment explaining its current status
Request a review if it's ready

Thank you for your contributions!

tim-inkeep and others added 8 commits April 14, 2026 13:43

[US-001] Create provider-aware context-overflow detector

9f25d9c

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

[US-002] Create compressionRetryMiddleware (wrapGenerate + wrapStream)

555620f

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

[US-003] Wire compression middleware at generate call sites, remove p…

c129707

…roactive compression Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

[US-004] Exclude oversized tool outputs at tool-wrapper toModelOutput…

4ec8c21

… seam Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

[US-005] Remove _oversizedWarning injection from BaseCompressor and A…

83ae43a

…rtifactService Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

[US-006] Add changeset for agents-api patch

6bcd3bf

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

vercel Bot had a problem deploying to Preview – agents-api April 14, 2026 21:03 Failure

vercel Bot deployed to Preview – agents-manage-ui April 14, 2026 21:04 View deployment

vercel Bot deployed to Preview – agents-docs April 14, 2026 21:05 View deployment

pullfrog Bot reviewed Apr 14, 2026

View reviewed changes

Merge branch 'main' into feat/reactive-compression-artifact-exclusion

0b0d51f

github-actions Bot deleted a comment from claude Bot Apr 14, 2026

vercel Bot had a problem deploying to Preview – agents-api April 14, 2026 21:13 Failure

vercel Bot deployed to Preview – agents-manage-ui April 14, 2026 21:13 View deployment

vercel Bot deployed to Preview – agents-docs April 14, 2026 21:14 View deployment

claude Bot reviewed Apr 14, 2026

View reviewed changes

github-actions Bot deleted a comment from claude Bot Apr 14, 2026

vercel Bot temporarily deployed to Preview – agents-docs April 15, 2026 14:06 Inactive

vercel Bot deployed to Preview – agents-manage-ui April 15, 2026 14:08 View deployment

vercel Bot deployed to Preview – agents-api April 15, 2026 14:09 View deployment

claude Bot reviewed Apr 15, 2026

View reviewed changes

github-actions Bot deleted a comment from claude Bot Apr 15, 2026

github-actions Bot added the stale label Apr 23, 2026

Conversation

tim-inkeep commented Apr 14, 2026

Summary

Stories delivered

Architecture highlights

Telemetry added

Changeset

Test plan

Notes

Uh oh!

changeset-bot Bot commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦋 Changeset detected

Uh oh!

vercel Bot commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Apr 14, 2026

Uh oh!

pullfrog Bot commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key changes

Reactive compression middleware

Context overflow detection

Oversized tool output exclusion at the source

Compression logic extraction

Removal of _oversizedWarning field

Async-iterator fallback fix

Testing

Spec & documentation

Uh oh!

pullfrog Bot left a comment

Choose a reason for hiding this comment

Uh oh!

pullfrog Bot Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

pullfrog Bot Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

pullfrog Bot Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

pullfrog Bot Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

pullfrog Bot Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

PR Review Summary

🟠⚠️ Major (2) 🟠⚠️

🕐 Pending Recommendations (2)

🚫 REQUEST CHANGES

Uh oh!

claude Bot Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

itoqa Bot commented Apr 14, 2026

Ito Test Report ✅

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

PR Review Summary

✅ Delta Review: All Pullfrog Issues Addressed

🔍 Delta Changes Reviewed

🕐 Pending Recommendations (4)

✅ APPROVE

Uh oh!

github-actions Bot commented Apr 15, 2026

Preview URLs

Uh oh!

itoqa Bot commented Apr 15, 2026

Ito Test Report ✅

Uh oh!

changeset-bot Bot commented Apr 14, 2026 •

edited

Loading

vercel Bot commented Apr 14, 2026 •

edited

Loading

pullfrog Bot commented Apr 14, 2026 •

edited

Loading

Removal of `_oversizedWarning` field