spec: reactive mid-generation compression + artifact exclusion#3129
spec: reactive mid-generation compression + artifact exclusion#3129tim-inkeep wants to merge 2 commits intomainfrom
Conversation
Ready-for-implementation spec for refactoring agents-api compression triggers: - Mid-gen compression becomes reactive (via AI SDK wrapLanguageModel middleware that catches provider-signaled overflow and retries once with compressed input) - Oversized tool outputs excluded at tool-wrapper.ts toModelOutput seam with structured error-shaped tool result (matching default-tools.ts retrieval_blocked) - Pre-generation conversation-history compression unchanged (safety net) Includes four evidence files (compression triggers, oversized artifact handling, provider overflow signals, middleware approach), audit + challenger findings, and audit-resolution changelog. 14 locked/directed decisions at HIGH confidence. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
|
A matching internal PR is ready in inkeep/agents-private#90 for canonical review and merge.
This comment will be updated as the bridge state changes. |
TL;DRThis PR adds a comprehensive spec (status: Key Changes
SPEC.md -- Main SpecificationThe core spec document (465 lines) structured in 17 sections:
evidence/current-compression-triggers.md -- Compression Trigger AuditMaps all existing compression trigger points in evidence/middleware-approach.md -- Why Middleware WorksEvidence that evidence/oversized-artifact-handling.md -- Artifact Path AnalysisTraces how oversized artifacts currently flow through the system. Key findings: detection ( evidence/provider-overflow-signals.md -- Provider Error TaxonomyDocuments how Anthropic, OpenAI, and the Vercel AI SDK signal context-overflow errors. OpenAI has a stable meta/_changelog.md -- Spec Process LogAppend-only record of the spec authoring process: initial creation, audit + challenger resolution, and finalization. Documents all corrections applied (class name fixes, injection-site corrections, middleware pseudocode rewrite, acceptance criteria rewrites) and decisions made during the audit/challenge cycle. meta/audit-findings.md -- Spec Audit Report10 findings (3 High, 4 Medium, 3 Low) from a systematic audit against the codebase at baseline commit. High findings: incorrect class naming ( meta/design-challenge.md -- Adversarial Design Review8 challenge items from a cold-reader stress test. Confirms the core architecture is sound. Key challenges addressed: strengthened DEC-06 rationale with multi-step step-N>1 argument, split artifact exclusion to use error-shaped tool results (not text stubs), added DEC-11 rejecting
|
TL;DRAdds a comprehensive spec ( No code changes — this PR is 8 new spec/evidence/meta documents (1,375 lines). What Changed (latest push)Commit
Key Changes
Main Spec (
|
There was a problem hiding this comment.
Verdict: Approve. This is an exceptionally thorough spec with rigorous evidence tracing, a self-contained audit/challenge cycle, and verified file:line citations (9/9 checked against main HEAD are accurate; the 10th — @ai-sdk/provider types — is unverifiable without installed deps but the spec's own audit resolved that concern via C6/C8).
The spec covers a well-scoped refactor (reactive mid-gen compression + artifact exclusion) with 14 locked decisions, 8 verifiable requirements, clear SCOPE/EXCLUDE/STOP_IF/ASK_FIRST boundaries, and a changelog showing how audit findings were resolved. The design is sound — middleware-based reactive retry preserves the AI SDK multi-step loop, and the tool-wrapper.ts:111 toModelOutput seam is the correct surgery site for oversized exclusion.
Two minor issues noted inline — neither blocks implementation.
Claude Opus | 𝕏
|
|
||
| ## 17. References | ||
|
|
||
| - Plan file: `/Users/timothycardona/.claude/plans/rustling-prancing-beaver.md` |
There was a problem hiding this comment.
This references a local filesystem path (/Users/timothycardona/.claude/plans/rustling-prancing-beaver.md) which is not accessible to other contributors or the implementation agent. Consider removing or replacing with a relative path to an in-repo artifact if the plan content is valuable, or simply delete the line — the spec is self-contained.
| - `compression.detector = 'openai_code' | 'anthropic_regex' | 'heuristic_400'` | ||
| - `compression.retry_number = 0 | 1` | ||
| - `compression.outcome = 'success' | 'second_overflow' | 'other_error'` | ||
| - `artifact.excluded = true` with `artifact.id`, `artifact.original_tokens`, `artifact.context_window` |
There was a problem hiding this comment.
anthropic_overflow_regex_hit = true is listed as a separate OTel attribute in §6.5, but R8's acceptance criteria in §9 list it as anthropic_overflow_regex_hit without specifying whether it's a span attribute or log field. The telemetry section says "span attributes / log fields" (ambiguously). For implementation clarity, pick one — span attribute is more queryable and consistent with the other compression.* attributes.
There was a problem hiding this comment.
Approving — spec is thorough, well-structured, and ready for implementation. Two minor nits left as inline comments on the prior review.
Claude Opus | 𝕏
There was a problem hiding this comment.
PR Review Summary
(4) Total Issues | Risk: Medium
🟠⚠️ Major (2) 🟠⚠️
Inline Comments:
- 🟠 Major:
SPEC.md:156Version mismatch — spec cites@ai-sdk/[email protected]but codebase has3.0.2 - 🟠 Major:
SPEC.md:227Oversized stub messaging is LLM-centric, not Agent Builder-debuggable
🟡 Minor (2) 🟡
Inline Comments:
- 🟡 Minor:
SPEC.md:253Telemetry attributes lack corresponding Traces UI treatment - 🟡 Minor:
SPEC.md:412Future artifact retrieval tool would require stub wording migration
💭 Consider (2) 💭
💭 1) SPEC.md:261 End User visibility when tool results are excluded
Issue: User Journey 7.1 describes LLM "correcting on the next step" but doesn't specify observable feedback to End Users when oversized tool results are excluded.
Why: End Users have no visibility that a tool result was excluded. The agent may produce confusing output like "I was unable to retrieve that data" without context. This is a potential coherence gap.
Fix: Consider specifying an observable signal for End Users (status update, streaming indicator) or document this as a known UX gap in the Risks section.
Refs: SPEC.md §7.1
💭 2) SPEC.md:47 Docs accompaniment for behavioral change
Issue: Spec changes observable agent behavior (compression triggers reactively, oversized tool results excluded) but doesn't mention docs updates.
Why: Agent Builders reading current docs won't learn about the new oversized exclusion behavior or changed compression timing. The Blast Radius section notes "Public SDK / API: No change" but docs surface isn't addressed.
Fix: Add to Verification Plan: "Review and update agents-docs content if it references compression behavior. Add changelog entry for Agent Builders."
Refs: SPEC.md §8
💡 APPROVE WITH SUGGESTIONS
Summary: This is a well-structured, comprehensive spec with thorough evidence files and a complete audit trail. The core architecture (reactive middleware + oversized exclusion at tool-wrapper seam) is sound and well-reasoned.
The main actionable item is the version mismatch — the spec cites @ai-sdk/[email protected] but the codebase has 3.0.2. This should be reconciled before implementation begins to ensure the middleware contract is valid.
The other findings are product polish suggestions (debugging experience, UI treatment, docs) that can be addressed in the implementation PR or follow-ups.
Discarded (4)
| Location | Issue | Reason Discarded |
|---|---|---|
evidence/provider-overflow-signals.md:30 |
OpenAI error code documentation | Confirmed accurate — matches existing codebase usage at distill-utils.ts:102 |
evidence/provider-overflow-signals.md:32-36 |
Anthropic overflow message patterns from community evidence | Already addressed — spec appropriately flags as Assumption A1 with MEDIUM confidence and includes telemetry for drift detection |
SPEC.md:345 |
DEC-11 rejection of transformParams | Confirmed correct — transformParams is proactive, conflicts with reactive design |
| Internal spec docs scope | Review against write-docs standards | Not applicable — specs/ directory contains internal engineering specs, not customer-facing docs |
Reviewers (3)
| Reviewer | Returned | Main Findings | Consider | While You're Here | Inline Comments | Pending Recs | Discarded |
|---|---|---|---|---|---|---|---|
pr-review-product |
5 | 0 | 2 | 0 | 2 | 0 | 1 |
pr-review-3p-specs |
6 | 0 | 0 | 0 | 2 | 0 | 4 |
pr-review-docs |
1 | 0 | 0 | 0 | 0 | 0 | 1 |
| Total | 12 | 0 | 2 | 0 | 4 | 0 | 6 |
|
|
||
| ### 6.2 Middleware (`compressionRetryMiddleware.ts`) | ||
|
|
||
| **Critical shape correction.** Per `@ai-sdk/[email protected]` (`index.d.ts:2066-2088`), `wrapGenerate` / `wrapStream` receive `doGenerate` / `doStream` as **nullary** functions — they bind the already-transformed params. To retry with modified params, call `options.model.doGenerate(modifiedOptions)` (or `options.model.doStream(modifiedOptions)`) directly. |
There was a problem hiding this comment.
🟠 MAJOR: Version mismatch with installed @ai-sdk/provider
Issue: Spec cites @ai-sdk/[email protected] but the codebase has @ai-sdk/[email protected] installed.
Why: The middleware contract (wrapGenerate/wrapStream signature with options.model for retry) must be verified against the actual installed version. If 3.0.4 introduced API changes not present in 3.0.2, the implementation will fail.
Fix: Update the spec to cite @ai-sdk/[email protected] after verifying the middleware interface matches, OR update the package to 3.0.4 if that version is required for the proposed pattern.
Refs:
- packages/agents-core/package.json:177 —
"@ai-sdk/provider": "3.0.2" - packages/ai-sdk-provider/package.json:39 — same
| toolCallId: currentToolCallId, | ||
| toolName, | ||
| warning: '⚠️ Tool produced an oversized result that cannot be included in the conversation.', | ||
| reason: formatOversizedReason(detection.originalTokenSize, detection.contextWindowSize), |
There was a problem hiding this comment.
🟠 MAJOR: Oversized stub messaging is LLM-centric, not Agent Builder-debuggable
Issue: The structured error stub includes reason: formatOversizedReason(...) but the format isn't specified. Agent Builders debugging in Traces UI will see raw token counts without context on the 30% threshold or how to resolve.
Why: First-contact legibility for Agent Builders matters. The existing retrieval_blocked response at default-tools.ts:225-245 has the same gap. Since this spec aims to improve coherence, the stub should help humans debug, not just LLMs retry.
Fix: Consider specifying:
- Human-readable token context:
~42K tokens (30% of 140K context limit) - A
docsUrlfield pointing to troubleshooting docs - Truncation for
toolArgsto avoid multi-KB dumps in Traces UI
Refs:
- default-tools.ts:225-245 — existing
retrieval_blockedpattern
| - `compression.detector = 'openai_code' | 'anthropic_regex' | 'heuristic_400'` | ||
| - `compression.retry_number = 0 | 1` | ||
| - `compression.outcome = 'success' | 'second_overflow' | 'other_error'` | ||
| - `artifact.excluded = true` with `artifact.id`, `artifact.original_tokens`, `artifact.context_window` |
There was a problem hiding this comment.
🟡 Minor: Telemetry attributes lack corresponding Traces UI treatment
Issue: New telemetry attributes (artifact.excluded, artifact.id, artifact.original_tokens, etc.) are specified, but there's no mention of whether the Traces Dashboard will surface them.
Why: Agent Builders need to see which tool calls had results excluded. Without UI treatment, these attributes are only discoverable via raw Jaeger queries.
Fix: Add to §8 (Blast Radius) or Future Work: "Traces UI should display artifact.excluded attribute on tool-call spans; consider a visual indicator (warning icon) in Tool Calls View."
Refs:
- SPEC.md §6.5 — telemetry spec
|
|
||
| | Item | Tier | Why not now | Triggers to revisit | | ||
| |---|---|---|---| | ||
| | Agent-callable artifact retrieval tool (e.g., `get_artifact_full(id)`) | **Explored** | Out of this spec's scope; stub informs the LLM content is unavailable without advertising a retrieval path. Designing a safe retrieval API (permissions, token budget of retrieved content, recursion prevention) is its own scope. | Agent builders report repeated frustration with "content unavailable"; product decision to support in-turn full-artifact access. | |
There was a problem hiding this comment.
🟡 Minor: Future artifact retrieval tool would require stub wording migration
Issue: DEC-03 locks "stub advertises no retrieval path," but if an artifact retrieval tool is added later, the recommendation field wording becomes outdated. This creates mild product debt.
Why: The stub wording is a de-facto API contract. The spec notes this as "Explored" but doesn't specify the migration story.
Fix: Add a note: "If an artifact retrieval tool is added, the oversized stub recommendation field should be updated to include it (e.g., 'retrieve the full artifact using get_artifact_full(id)'). This is a backward-compatible addition."
Preview URLsUse these stable preview aliases for testing this PR:
These point to the same Vercel preview deployment as the bot comment, but they stay stable and easier to find. Raw Vercel deployment URLs
|
- Correct @ai-sdk/provider version citation: 3.0.4 → 3.0.2 (verified middleware contract identical across both via dist/index.d.ts diff) - Fix formatOversizedReason → formatOversizedRetrievalReason (actual function exported from artifact-utils.ts:50) - Clarify R8 telemetry: all attributes are span attributes, not log fields - Remove local filesystem path from §17 References - Add End-User visibility gap to Risks (MEDIUM) with operator mitigation - Add Traces UI surfacing, enriched stub metadata, and retrieval-tool migration note to Future Work - Add docs-check step to Verification Plan No design changes. All review findings addressed. Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
There was a problem hiding this comment.
PR Review Summary
(0) Total Issues | Risk: Low
Re-review Scope
This is a delta-scoped re-review of commit f0175d6a2 ("spec: address PR review feedback (version pin, typos, telemetry clarity)").
Prior Feedback Status
All 9 findings from the previous review cycle have been properly addressed:
| Prior Finding | Resolution |
|---|---|
🟠 Version mismatch (@ai-sdk/[email protected] → 3.0.2) |
✅ §6.2 updated to cite 3.0.2 with verification note that 3.0.4 has identical middleware contract |
🟠 Function name typo (formatOversizedReason) |
✅ Corrected to formatOversizedRetrievalReason globally |
| 🟡 Telemetry ambiguity (span vs log fields) | ✅ §6.5 clarified: all are span attributes (not log fields) |
| 🟡 Future artifact retrieval migration story | ✅ Migration note added to §15 Future Work entry |
| 💭 End-User visibility gap | ✅ New row added to §13 Risks |
| 💭 Docs accompaniment | ✅ Docs check added to §14 Verification Plan |
| pullfrog: Local filesystem path in References | ✅ Removed; replaced with in-repo pointers |
| pullfrog: R8 telemetry ambiguity | ✅ Same as above — clarified as span attributes |
| (secondary) Enriched stub metadata | ✅ Added to §15 Future Work (Noted tier) |
Changelog Quality
The meta/_changelog.md update provides excellent traceability:
- Each finding is categorized (Accepted / Partially accepted / Declined)
- Verification evidence is documented (e.g.,
diffofdist/index.d.tsbetween versions) - Rationale for partial acceptance is clear (e.g., "added to Risks / Future Work rather than design changes")
✅ APPROVE
Summary: The spec is ready for implementation. All prior review feedback has been thoughtfully addressed — corrections were applied where warranted, and items outside the spec's scope were properly logged in Risks or Future Work sections. The changelog audit trail is exemplary.
Note: GitHub App cannot submit formal approvals. This is a recommendation to approve.
Reviewers (1)
| Reviewer | Returned | Main Findings | Consider | While You're Here | Inline Comments | Pending Recs | Discarded |
|---|---|---|---|---|---|---|---|
orchestrator |
0 | 0 | 0 | 0 | 0 | 0 | 0 |
| Total | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Note: Delta-scoped review — no sub-reviewers dispatched as prior feedback was fully addressed.
Summary
Ready-for-implementation spec for refactoring
agents-apicompression triggers. Spec only — no runtime code changes. Ship decisions are locked; implementation follows.What changes (per the spec)
prepareSteptoken-budget compression branch atai-sdk-callbacks.ts:11-191. Introduce an AI SDKwrapLanguageModelmiddleware (compressionRetryMiddleware) that catches provider-signaled context-overflow errors and retries once with compressed input — preserving the multi-step loop.tool-wrapper.ts:111'stoModelOutputgains an oversized check. Oversized raw outputs return a structured error-shaped tool result matchingdefault-tools.ts:214-246'sretrieval_blockedshape — no bytes reach the LLM.Spec artifacts
specs/2026-04-14-reactive-compression/SPEC.md— full spec, 14 locked decisions, 8 verifiable requirements, agent constraints.evidence/— four evidence files (compression triggers, oversized-artifact handling, provider overflow signals, middleware approach) with confidence-labeled findings and file:line citations.meta/audit-findings.md+meta/design-challenge.md— cold-reader audit + design-challenge output.meta/_changelog.md— full audit-resolution trail.Key design decisions
wrapLanguageModelmiddleware; NOT outer try/catch (1-way door)transformParams(proactive hook — contradicts reactive design)tool-wrapper.ts:111wrappedtoModelOutputTest plan
mainHEAD🤖 Generated with Claude Code