feat: reactive mid-generation compression + artifact exclusion#3133
feat: reactive mid-generation compression + artifact exclusion#3133tim-inkeep wants to merge 10 commits intomainfrom
Conversation
Ready-for-implementation spec for refactoring agents-api compression triggers: - Mid-gen compression becomes reactive (via AI SDK wrapLanguageModel middleware that catches provider-signaled overflow and retries once with compressed input) - Oversized tool outputs excluded at tool-wrapper.ts toModelOutput seam with structured error-shaped tool result (matching default-tools.ts retrieval_blocked) - Pre-generation conversation-history compression unchanged (safety net) Includes four evidence files (compression triggers, oversized artifact handling, provider overflow signals, middleware approach), audit + challenger findings, and audit-resolution changelog. 14 locked/directed decisions at HIGH confidence. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
- Correct @ai-sdk/provider version citation: 3.0.4 → 3.0.2 (verified middleware contract identical across both via dist/index.d.ts diff) - Fix formatOversizedReason → formatOversizedRetrievalReason (actual function exported from artifact-utils.ts:50) - Clarify R8 telemetry: all attributes are span attributes, not log fields - Remove local filesystem path from §17 References - Add End-User visibility gap to Risks (MEDIUM) with operator mitigation - Add Traces UI surfacing, enriched stub metadata, and retrieval-tool migration note to Future Work - Add docs-check step to Verification Plan No design changes. All review findings addressed. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
…roactive compression Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
… seam Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
…rtifactService Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
🦋 Changeset detectedLatest commit: 10e0d12 The changes in this PR will be included in the next version bump. This PR includes changesets to release 10 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
The latest updates on your projects. Learn more about Vercel for GitHub.
1 Skipped Deployment
|
|
I could not sync this public PR into Patch application failed. The diff could not be applied cleanly. Please rebase your PR on the latest main. |
SummaryThis PR fundamentally changes how mid-generation context compression is triggered: instead of proactively compressing based on token-budget predictions (which often fire unnecessarily), compression now only happens reactively when the LLM provider actually returns a context-overflow error. It also adds a new seam to exclude oversized tool outputs from reaching the LLM context at all, replacing them with structured error stubs. Key changes
Reactive compression middlewareBefore: The After: A new Context overflow detectionNew Oversized tool output exclusion at the sourceBefore: After: Compression logic extractionThe summary-building logic (artifact references, stop instructions, compression cycle tracking) that previously lived inline in Removal of
|
There was a problem hiding this comment.
Well-structured PR. The reactive compression approach (detect overflow from provider error, compress, retry once) is cleaner than the proactive token-budget check it replaces. The stream peek strategy is sound — single .next() call, no tee(), no buffering. The toModelOutput oversized exclusion wiring is the right seam for filtering before the LLM sees bloated content. Two issues worth addressing: the toAsyncIterator fallback has a broken cast, and the context window used for oversized detection is always the 120K default rather than the actual model's window.
Claude Opus | 𝕏
| if (Symbol.asyncIterator in stream) { | ||
| return (stream as AsyncIterable<StreamPart>)[Symbol.asyncIterator](); | ||
| } | ||
| return (stream as ReadableStream<StreamPart>).getReader() as unknown as AsyncIterator<StreamPart>; |
There was a problem hiding this comment.
ReadableStreamDefaultReader does not have .next() — it has .read(). This cast to AsyncIterator is unsound and would throw at runtime if the branch were reached.
In practice, modern Node.js ReadableStream has Symbol.asyncIterator, so the if branch on line 33 always triggers and this fallback is dead code. But the cast is still incorrect — either remove the fallback entirely, or wrap the reader in a proper async-iterator adapter:
const reader = (stream as ReadableStream<StreamPart>).getReader();
return {
next: () => reader.read() as Promise<IteratorResult<StreamPart>>,
};| const relationshipId = getRelationshipIdForTool(ctx, toolName, toolType); | ||
|
|
||
| const originalExecute = toolDefinition.execute; | ||
| const contextWindowSize = getModelContextWindow().contextWindow ?? 120000; |
There was a problem hiding this comment.
getModelContextWindow() without arguments always returns the 120K default (source: fallback). This means the 30% oversized threshold is hardcoded at ~36K tokens regardless of the actual model. An agent using Claude with a 200K context window would still exclude results above ~36K tokens instead of ~60K.
Consider passing model settings from ctx.config or accepting contextWindowSize as a parameter to wrapToolWithStreaming so the threshold matches the generation model.
| } | ||
| return { | ||
| type: 'json' as const, | ||
| value: JSON.parse( |
There was a problem hiding this comment.
The JSON.parse(JSON.stringify({...})) round-trip is a no-op on this plain object literal — every value is already JSON-safe (strings, numbers, plain objects). You can return the object directly as value.
| value: JSON.parse( | |
| value: { | |
| status: 'oversized', | |
| toolCallId: lastToolCallId, | |
| toolName, | |
| warning: | |
| '⚠️ Tool produced an oversized result that cannot be included in the conversation.', | |
| reason: formatOversizedRetrievalReason( | |
| detection.originalTokenSize, | |
| detection.contextWindowSize ?? contextWindowSize | |
| ), | |
| toolInfo: { | |
| toolName, | |
| toolArgs: lastArgs, | |
| structureInfo: detection.structureInfo ?? '', | |
| }, | |
| recommendation: | |
| 'Consider: 1) narrowing filters/queries on the next tool call, 2) asking the user to break down the request, 3) processing data differently.', | |
| }, |
| } | ||
|
|
||
| const summaryMessage = JSON.stringify(summaryData); | ||
| const systemMessages = messages.filter((m: any) => m.role === 'system'); |
There was a problem hiding this comment.
The old handlePrepareStepCompression kept both system messages and original conversation messages (stepMessages.slice(0, originalMessageCount)). This new buildCompressPrompt only keeps role === 'system' messages. That means the original user query and any initial conversation history are dropped — the LLM only sees the compression summary.
Is this intentional? If the original user message carries important framing that the summary might not perfectly capture, the LLM could lose track of the original ask after compression.
| ctx: CompressionRetryContext | ||
| ): LanguageModelMiddleware { | ||
| return { | ||
| specificationVersion: 'v3', |
There was a problem hiding this comment.
The specificationVersion: 'v3' field may tie this to a specific AI SDK middleware contract version. If the SDK upgrades the spec version (e.g. to v4), this middleware might silently stop being called. Consider adding a comment noting which @ai-sdk/provider versions this was verified against (you mention 3.0.2–3.0.4 in the PR description) so future maintainers know to re-check.
…nal messages - Context window (pullfrog #2, load-bearing): getModelContextWindow() was called without args and always returned the 120K default, so the 30% oversized threshold was hardcoded at ~36K regardless of the actual model. Added currentModelSettings to AgentRunContext, stashed after configureModelSettings, and read lazily inside toModelOutput. - Compression prompt (pullfrog #4, load-bearing): buildCompressPrompt only kept role==='system' messages, dropping the original user query and conversation-history prefix. Now takes originalMessageCount and preserves messages.slice(0, originalMessageCount) as the prefix — matching the pre-middleware handlePrepareStepCompression behavior. - Async-iterator fallback (pullfrog #1): replaced the unsound `as unknown as AsyncIterator` cast with a proper Reader → iterator adapter so the dead branch is safe if ever triggered. - Middleware spec-version comment (pullfrog #5): documented which @ai-sdk/provider versions the wrapGenerate/wrapStream contract was verified against. - JSON round-trip (pullfrog #3): kept as-is. The round-trip is not a no-op — it launders `unknown` tool args through JSONValue and strips non-JSON types. Added a comment explaining this. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
There was a problem hiding this comment.
PR Review Summary
(4) Total Issues | Risk: Medium
🟠⚠️ Major (2) 🟠⚠️
🟠 1) Missing test coverage Compression failure during retry is untested
Issue: The wrapGenerate and wrapStream implementations call ctx.compressPrompt() which can throw (e.g., summarizer model failure, network error), but there is no test verifying the behavior when compression itself fails. The compressPrompt call at lines 156 and 186 is outside any explicit error handling in the middleware.
Why: If compressPrompt throws during a retry attempt, the error propagation behavior is untested. A bug here could cause silent failures or incorrect error messages being surfaced to users. This is a realistic failure mode since compression involves an LLM call. The tests currently only verify success paths and overflow retry scenarios, not compression infrastructure failures.
Fix: Add test:
it('propagates compression errors during retry', async () => {
const ctx = {
compressPrompt: vi.fn().mockRejectedValue(new Error('Summarizer failed'))
};
const middleware = createCompressionRetryMiddleware(ctx);
const doGenerate = vi.fn().mockRejectedValue(makeOverflowError());
await expect(middleware.wrapGenerate?.({
doGenerate,
doStream: vi.fn(),
params: { prompt: [] } as any,
model: { doGenerate: vi.fn(), doStream: vi.fn() } as any,
})).rejects.toThrow('Summarizer failed');
expect(ctx.compressPrompt).toHaveBeenCalledOnce();
});Refs:
🟠 2) compression.ts buildCompressPrompt has no dedicated unit tests
Issue: The new buildCompressPrompt function (lines 41-104) transforms the prompt after compression, including critical logic for: (1) handling array vs object summary results, (2) injecting stop instructions based on compression cycle count, (3) mapping related_artifacts to artifact_reference tags, and (4) filtering system messages. These code paths are only tested implicitly through integration.
Why: Bugs in buildCompressPrompt could cause: (1) incorrect prompt structure after compression leading to LLM confusion, (2) stop instructions not being applied correctly causing infinite loops, (3) artifact references being malformed. The function has conditional branches (compressionCycle >= 1, hasNewWork checks) that should be explicitly tested.
Fix: Create unit tests for buildCompressPrompt:
describe('buildCompressPrompt', () => {
it('injects stop instruction after multiple compression cycles', async () => {
const compressor = makeCompressor({
getCompressionCycleCount: vi.fn().mockReturnValue(2)
});
const prompt = buildCompressPrompt(compressor);
const result = await prompt([
{ role: 'system', content: 'sys' },
{ role: 'user', content: 'hi' }
]);
expect(result[1].content).toContain('STOP ALL TOOL CALLS');
});
it('preserves system messages and removes others', async () => { /* ... */ });
it('formats artifact references correctly', async () => { /* ... */ });
});Refs:
Inline Comments:
- 🟠 Major:
compressionRetryMiddleware.ts:156Compression failure lacks telemetry - 🟡 Minor:
tool-wrapper.ts:136-155Unnecessary JSON.parse(JSON.stringify()) wrapper
🕐 Pending Recommendations (2)
These issues were raised in the prior pullfrog review and are still unresolved:
-
🟠
compressionRetryMiddleware.ts:36—toAsyncIteratorfallback castsReadableStreamDefaultReadertoAsyncIterator, but these interfaces are incompatible.reader.next()will throwTypeErrorwhen the stream lacksSymbol.asyncIterator. -
🟠
tool-wrapper.ts:114—getModelContextWindow()called without arguments always returns 120K fallback. The oversized detection threshold will be wrong for models with different context windows.
🚫 REQUEST CHANGES
Summary: The reactive compression approach is architecturally sound — detecting overflow from provider errors and retrying once is cleaner than proactive token-budget prediction. The stream peek strategy (single .next(), no tee(), no buffering) is efficient. However, the two issues flagged by pullfrog remain unaddressed (broken toAsyncIterator cast, context window always using 120K fallback), and the new middleware lacks telemetry for compression failures and test coverage for failure paths. The JSON.parse/stringify wrapper on a plain object is also unnecessary overhead.
Discarded (8)
| Location | Issue | Reason Discarded |
|---|---|---|
compressionRetryMiddleware.ts:140 |
specificationVersion: 'v3' hardcoded |
Valid observation but low risk — existing middleware in agents-core uses same pattern; adding a shared constant is a nice-to-have, not blocking |
compression.ts:41 |
safeCompress result shape handling is fragile |
The array vs object handling matches existing compressor contract; spread operator would be a refactor preference |
ai-sdk-callbacks.ts:11 |
Vestigial handlePrepareStepCompression parameters |
Intentional scaffolding per spec — prep for future step-based logic |
detectContextOverflow.ts:22 |
413 status exclusion may silently misclassify | 413 is bytes not tokens per HTTP spec; adding trace-level logging is a nitpick |
compressionRetryMiddleware.ts:173 |
Original stream not explicitly released | The reader is consumed (one chunk read); explicit cancel is a nice-to-have for edge cases |
compressionRetryMiddleware.ts:12 |
Loose StreamPart type with index signature |
Intentional flexibility for AI SDK internal chunk variations |
compression.ts:45 |
Prompt cast to any[] loses type info |
Type guards would be nice but current implementation works; not blocking |
compressionRetryMiddleware.test.ts:n/a |
Missing test for empty stream (immediate EOF) | Edge case with synthetic finish chunk handling; nice-to-have |
Reviewers (6)
| Reviewer | Returned | Main Findings | Consider | While You're Here | Inline Comments | Pending Recs | Discarded |
|---|---|---|---|---|---|---|---|
pr-review-standards |
3 | 0 | 0 | 0 | 1 | 2 | 0 |
pr-review-tests |
7 | 2 | 0 | 0 | 0 | 0 | 5 |
pr-review-errors |
5 | 0 | 0 | 0 | 1 | 2 | 2 |
pr-review-sre |
5 | 0 | 0 | 0 | 0 | 2 | 3 |
pr-review-types |
5 | 0 | 0 | 0 | 0 | 2 | 3 |
pr-review-architecture |
5 | 0 | 0 | 0 | 0 | 2 | 3 |
| Total | 30 | 2 | 0 | 0 | 2 | 2 | 8 |
Note: Multiple reviewers flagged the same two issues (toAsyncIterator cast, context window fallback) that were already raised by pullfrog — these were consolidated into Pending Recs rather than duplicated.
| 'Context overflow detected in doGenerate, compressing and retrying' | ||
| ); | ||
|
|
||
| const compressedPrompt = await ctx.compressPrompt(params.prompt as unknown[]); |
There was a problem hiding this comment.
🟠 MAJOR: Compression failure lacks telemetry
Issue: When ctx.compressPrompt() throws, the error propagates without calling setRetryTelemetry(). The compression.outcome span attribute is only set on success (line 163) or retry error (line 167), not on compression infrastructure failures.
Why: During incidents, operators cannot distinguish compression failures (summarizer timeout, model unavailable) from LLM errors in observability dashboards. This creates a blind spot that extends mean-time-to-resolution. The same pattern applies to the streaming path at line 186.
Fix:
let compressedPrompt: unknown[];
try {
compressedPrompt = await ctx.compressPrompt(params.prompt as unknown[]);
} catch (compressionErr) {
setRetryTelemetry(provider, detector, 1, 'compression_failed');
logger.error(
{ provider, detector, error: compressionErr instanceof Error ? compressionErr.message : String(compressionErr) },
'Compression failed during overflow retry'
);
throw compressionErr;
}Refs:
| value: JSON.parse( | ||
| JSON.stringify({ | ||
| status: 'oversized', | ||
| toolCallId: lastToolCallId, | ||
| toolName, | ||
| warning: | ||
| '⚠️ Tool produced an oversized result that cannot be included in the conversation.', | ||
| reason: formatOversizedRetrievalReason( | ||
| detection.originalTokenSize, | ||
| detection.contextWindowSize ?? contextWindowSize | ||
| ), | ||
| toolInfo: { | ||
| toolName, | ||
| toolArgs: lastArgs, | ||
| structureInfo: detection.structureInfo ?? '', | ||
| }, | ||
| recommendation: | ||
| 'Consider: 1) narrowing filters/queries on the next tool call, 2) asking the user to break down the request, 3) processing data differently.', | ||
| }) | ||
| ), |
There was a problem hiding this comment.
🟡 Minor: Unnecessary JSON.parse(JSON.stringify()) wrapper
Issue: The oversized stub is a plain object literal constructed inline with only primitive values and nested plain objects. The JSON.parse(JSON.stringify(...)) round-trip has no effect on such an object.
Why: This pattern is typically used for deep-cloning or stripping non-serializable properties. Here, the object is constructed from scratch with no circular references, functions, or special objects — the serialization round-trip is pure overhead (CPU, memory, potential error surface if any dynamic value is non-serializable).
Fix: (1-click apply)
| value: JSON.parse( | |
| JSON.stringify({ | |
| status: 'oversized', | |
| toolCallId: lastToolCallId, | |
| toolName, | |
| warning: | |
| '⚠️ Tool produced an oversized result that cannot be included in the conversation.', | |
| reason: formatOversizedRetrievalReason( | |
| detection.originalTokenSize, | |
| detection.contextWindowSize ?? contextWindowSize | |
| ), | |
| toolInfo: { | |
| toolName, | |
| toolArgs: lastArgs, | |
| structureInfo: detection.structureInfo ?? '', | |
| }, | |
| recommendation: | |
| 'Consider: 1) narrowing filters/queries on the next tool call, 2) asking the user to break down the request, 3) processing data differently.', | |
| }) | |
| ), | |
| return { | |
| type: 'json' as const, | |
| value: { | |
| status: 'oversized', | |
| toolCallId: lastToolCallId, | |
| toolName, | |
| warning: | |
| '⚠️ Tool produced an oversized result that cannot be included in the LLM context. The full result is stored as an artifact and can be retrieved later if needed.', | |
| reason: formatOversizedRetrievalReason( | |
| detection.originalTokenSize, | |
| detection.contextWindowSize ?? contextWindowSize | |
| ), | |
| toolInfo: { | |
| toolName, | |
| toolArgs: lastArgs, | |
| structureInfo: detection.structureInfo ?? '', | |
| }, | |
| recommendation: | |
| 'Consider: 1) narrowing filters/queries, 2) requesting specific fields, or 3) breaking into smaller requests.', | |
| }, | |
| }; |
Refs:
Ito Test Report ✅16 test cases ran. 1 additional finding, 15 passed. Across 16 test cases, 15 passed and 1 failed, showing generally stable local non-production behavior for baseline and edge chat flows (streaming/non-streaming completions, rapid same-conversation requests, mid-stream reload resume, and mobile 390x844), overflow handling (bounded recovery with a one-retry cap and no retries for non-overflow errors), oversized tool-output safeguards, and authorization/injection probes. The key finding was one High-severity validation gap in /run/api/chat where malformed approval-response-shaped payloads in messages[].content can bypass the approval guard when conversationId is omitted and proceed through normal execution instead of returning the intended HTTP 400, while no cross-tenant access bypass was observed. ✅ Passed (15)ℹ️ Additional Findings (1)
|
There was a problem hiding this comment.
PR Review Summary
(0) New Issues in Delta | Risk: Low
✅ Delta Review: All Pullfrog Issues Addressed
The commit 10e0d12a3 successfully addresses all issues raised in the prior pullfrog review:
| Issue | Status | Fix |
|---|---|---|
toAsyncIterator broken cast |
✅ Fixed | Reader now properly wrapped with { next: () => reader.read() } instead of unsafe cast |
| Context window always 120K | ✅ Fixed | toModelOutput now uses getModelContextWindow(ctx.currentModelSettings) with stored settings |
| Original messages dropped | ✅ Fixed | buildCompressPrompt now preserves originalMessages.slice(0, originalMessageCount) prefix |
| SDK version comment | ✅ Added | Comment documents verified SDK versions (3.0.2, 3.0.4) for future maintainers |
🔍 Delta Changes Reviewed
5 files modified (+35/-9 lines):
agent-types.ts— AddedcurrentModelSettings?: ModelSettingstoAgentRunContext✓compression.ts—buildCompressPromptnow acceptsoriginalMessageCountand preserves prefix ✓generate.ts— Setsctx.currentModelSettings = primaryModelSettingsbefore tool wrapping ✓tool-wrapper.ts— Context window lookup now uses actual model settings ✓compressionRetryMiddleware.ts— Fixed async iterator adapter + added SDK version comment ✓
🕐 Pending Recommendations (4)
Issues from prior reviews that remain valid for unchanged code paths:
- 🟠
compressionRetryMiddleware.ts:167— Compression failure during retry has no telemetry (compression.outcome: compression_failed) - 🟠
compression.ts:41-104—buildCompressPromptlacks dedicated unit tests for stop instruction injection and artifact reference formatting - 🟠
compressionRetryMiddleware.test.ts— Missing test for compression infrastructure failures (summarizer timeout/error) - 🟡
tool-wrapper.ts:138-157—JSON.parse(JSON.stringify({...}))is redundant on plain object literal
✅ APPROVE
Summary: The delta cleanly addresses all blocking issues from the prior review. The fixes are minimal and correct — the async iterator adapter now properly wraps the reader, context window detection uses actual model settings, and compression preserves the original conversation prefix. The reactive compression architecture is sound. The pending recommendations are valid improvements for test coverage and telemetry but are not blocking for merge.
Reviewers (2)
| Reviewer | Returned | Main Findings | Consider | While You're Here | Inline Comments | Pending Recs | Discarded |
|---|---|---|---|---|---|---|---|
pr-review-standards |
0 | 0 | 0 | 0 | 0 | 0 | 0 |
pr-review-types |
0 | 0 | 0 | 0 | 0 | 0 | 0 |
| Total | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Note: Delta scope — sub-reviewers found no new issues in the 5 changed files. Prior issues from full PR review remain as Pending Recommendations.
Preview URLsUse these stable preview aliases for testing this PR:
These point to the same Vercel preview deployment as the bot comment, but they stay stable and easier to find. Raw Vercel deployment URLs
|
Ito Test Report ✅12 test cases ran. 12 passed. Unified QA results were fully green: 12 executed test cases passed with 0 failures (plus one verified-empty segment with no remaining cases), indicating stable behavior across the covered run and chat endpoints in the local non-production setup. The key findings were that earlier blockers were confirmed as harness/environment issues rather than product defects, while critical behaviors worked as designed—including SSE streaming completion and clean termination, interruption recovery on the same conversation, mobile and burst-submit transport stability, correct reactive overflow retry semantics (with 413 excluded), and correct tool-output threshold handling for below-threshold pass-through versus above-threshold structured oversized exclusion. ✅ Passed (12)Commit: Tell us how we did: Give Ito Feedback |
|
This pull request has been automatically marked as stale because it has not had recent activity. If this PR is still relevant:
Thank you for your contributions! |
Summary
Implementation of spec #3129 (
specs/2026-04-14-reactive-compression/).wrapLanguageModelmiddleware. Compresses and retries only when the provider actually returns a context-overflow error. Preserves the AI SDK multi-step loop.tool-wrapper.ts'stoModelOutputseam. Raw outputs exceeding 30% of the context window are replaced with a structured error-shaped stub matchingdefault-tools.ts'sretrieval_blockedshape — the LLM sees a tool failure it can react to instead of bloated content.prepareStepcallback._oversizedWarninginjection fromBaseCompressorandArtifactService;metadata.isOversizedremains the durable signal.Stories delivered
9f25d9c57compressionRetryMiddleware(wrapGenerate + wrapStream)555620fc2c12970714tool-wrapper.ts:111toModelOutput4ec8c2131_oversizedWarninginjection83ae43a4e6bcd3bf2dArchitecture highlights
compressionRetryMiddlewareis per-request (factory closes over run context — DEC-13), so compression scope matches run scope.doGenerate/doStreamare nullary in@ai-sdk/[email protected]— retry usesoptions.model.doGenerate(modifiedOptions)directly. Verified middleware contract is identical across 3.0.2 and 3.0.4..next()+ prepend-then-pipe generator. Notee(), no array buffering (DEC-07). TTFT on the happy path is within microseconds of an unwrapped model.Telemetry added
Span attributes:
compression.trigger,compression.provider,compression.detector,compression.retry_number,compression.outcome,anthropic_overflow_regex_hit,tool.result.oversized_excluded.Changeset
`@inkeep/agents-api` patch: "Make mid-generation compression reactive to provider overflow errors and exclude oversized tool outputs from LLM context"
Test plan
pnpm typecheckpassespnpm lintpassespnpm --filter @inkeep/agents-api test --runpasses (includes new tests)compressionRetryMiddleware.ts+detectContextOverflow.ts(new code paths)tool-wrapper.ts:111oversized-exclusion wiringai-sdk-callbacks.tsandcompression.tshelpertool.result.oversized_excluded = trueappears and no content bytes reach the LLM.Notes
🤖 Generated with Claude Code