Skip to content

spec: reactive mid-generation compression + artifact exclusion#3129

Open
tim-inkeep wants to merge 2 commits intomainfrom
spec/reactive-compression-artifact-exclusion
Open

spec: reactive mid-generation compression + artifact exclusion#3129
tim-inkeep wants to merge 2 commits intomainfrom
spec/reactive-compression-artifact-exclusion

Conversation

@tim-inkeep
Copy link
Copy Markdown
Contributor

Summary

Ready-for-implementation spec for refactoring agents-api compression triggers. Spec only — no runtime code changes. Ship decisions are locked; implementation follows.

What changes (per the spec)

  • Mid-generation compression becomes reactive. Remove the proactive prepareStep token-budget compression branch at ai-sdk-callbacks.ts:11-191. Introduce an AI SDK wrapLanguageModel middleware (compressionRetryMiddleware) that catches provider-signaled context-overflow errors and retries once with compressed input — preserving the multi-step loop.
  • Oversized tool outputs excluded at their true seam. tool-wrapper.ts:111's toModelOutput gains an oversized check. Oversized raw outputs return a structured error-shaped tool result matching default-tools.ts:214-246's retrieval_blocked shape — no bytes reach the LLM.
  • Pre-generation conversation-history compression unchanged. Acts as the safety net for non-tool-result context growth.

Spec artifacts

  • specs/2026-04-14-reactive-compression/SPEC.md — full spec, 14 locked decisions, 8 verifiable requirements, agent constraints.
  • evidence/ — four evidence files (compression triggers, oversized-artifact handling, provider overflow signals, middleware approach) with confidence-labeled findings and file:line citations.
  • meta/audit-findings.md + meta/design-challenge.md — cold-reader audit + design-challenge output.
  • meta/_changelog.md — full audit-resolution trail.

Key design decisions

ID Decision Status
DEC-06 wrapLanguageModel middleware; NOT outer try/catch (1-way door) LOCKED
DEC-09 Two-context split: error-shaped stub for initial execution, compact text reference for conversation-history rebuild LOCKED
DEC-11 Reject AI SDK transformParams (proactive hook — contradicts reactive design) LOCKED
DEC-12 Surgery seam: tool-wrapper.ts:111 wrapped toModelOutput LOCKED
DEC-13 Middleware constructed per-request via factory closing over run context LOCKED

Test plan

  • Review SPEC.md §§1-6 (problem framing, architecture, design details) for alignment with intent
  • Review §10 Decision Log — confirm all 14 decisions match expectations
  • Review §16 Agent Constraints — SCOPE / EXCLUDE / STOP_IF / ASK_FIRST before implementation kicks off
  • Sanity-check evidence file:line citations against main HEAD
  • Verify Future Work tiers (Explored / Identified / Noted) and retry-budget revisit trigger

🤖 Generated with Claude Code

Ready-for-implementation spec for refactoring agents-api compression triggers:
- Mid-gen compression becomes reactive (via AI SDK wrapLanguageModel middleware
  that catches provider-signaled overflow and retries once with compressed input)
- Oversized tool outputs excluded at tool-wrapper.ts toModelOutput seam with
  structured error-shaped tool result (matching default-tools.ts retrieval_blocked)
- Pre-generation conversation-history compression unchanged (safety net)

Includes four evidence files (compression triggers, oversized artifact handling,
provider overflow signals, middleware approach), audit + challenger findings,
and audit-resolution changelog. 14 locked/directed decisions at HIGH confidence.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 14, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
agents-api Ready Ready Preview, Comment Apr 14, 2026 6:43pm
agents-docs Ready Ready Preview, Comment Apr 14, 2026 6:43pm
agents-manage-ui Ready Ready Preview, Comment Apr 14, 2026 6:43pm

Request Review

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented Apr 14, 2026

⚠️ No Changeset found

Latest commit: f0175d6

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@github-actions
Copy link
Copy Markdown
Contributor

A matching internal PR is ready in inkeep/agents-private#90 for canonical review and merge.

  • Original author attribution is preserved as @tim-inkeep
  • The internal PR is the authoritative merge surface
  • The public repo will pick up the merged change through the normal mirror sync

This comment will be updated as the bridge state changes.

@pullfrog
Copy link
Copy Markdown
Contributor

pullfrog Bot commented Apr 14, 2026

TL;DR

This PR adds a comprehensive spec (status: ready-for-implementation) for replacing the current proactive, token-budget-based mid-generation compression in agents-api with a reactive approach that compresses only when the LLM provider actually returns a context-overflow error. It also specifies excluding oversized tool outputs at the tool-wrapper seam before they ever reach the LLM, using a structured error-shaped tool result.

Key Changes

  • Reactive compression via AI SDK middleware: Introduces wrapLanguageModel middleware (compressionRetryMiddleware) that intercepts doGenerate/doStream calls. On a provider context-overflow error (HTTP 400), it compresses the prompt and retries once. Second failures propagate. This replaces the proactive prepareStep token-budget trigger.
  • Oversized tool output exclusion: Wraps toModelOutput at tool-wrapper.ts:111 to detect oversized raw tool outputs (>30% of context window) and substitute a structured { status: 'oversized', toolInfo, recommendation } error-shaped result matching the existing retrieval_blocked pattern in default-tools.ts.
  • Provider-aware overflow detection: New isContextOverflowError() helper that distinguishes OpenAI (context_length_exceeded code) and Anthropic (regex on error messages), explicitly excludes 413 byte-limit errors, and emits telemetry on regex hits for drift monitoring.
  • Removal of _oversizedWarning injection from both BaseCompressor.ts and ArtifactService.ts; metadata.isOversized remains as the durable signal.
  • 14 locked/directed decisions covering retry budget (1), terminal failure behavior (hard fail), middleware vs. outer try/catch (1-way door), stream peek mechanics (async-iterator .next(), no tee()), per-request middleware construction, and explicit rejection of transformParams.
  • 8 requirements with verifiable acceptance criteria, a detailed verification plan (unit, integration, telemetry, regression tests), and a blast-radius analysis.

SPEC.md -- Main Specification

The core spec document (465 lines) structured in 17 sections:

  • Problem Statement (SCR): Mid-generation compression fires proactively at 75-91% of context based on prediction, causing false-positive compressions that destructively summarize context the LLM would have successfully consumed. Additionally, oversized tool outputs reach the LLM without any check at the initial tool execution path.
  • Architecture: Adds a per-request compressionRetryMiddleware via wrapLanguageModel that intercepts at the per-step doGenerate/doStream boundary. For streaming, uses a peek-first-chunk pattern via async-iterator .next() to detect pre-commit overflow errors with negligible latency impact.
  • Oversized exclusion: Surgery at tool-wrapper.ts:111's toModelOutput hook -- detects oversized outputs and returns a structured error-shaped tool result so the LLM can self-correct (narrow queries, different approach).
  • Decision Log: 14 decisions (DEC-01 through DEC-14) covering all architectural choices, with reversibility ratings and rationale. Notable 1-way door: DEC-06 (middleware, not outer try/catch) with evidence that outer retry at step N>1 would lose completed tool calls.
  • Scope constraints: Explicit SCOPE, EXCLUDE, STOP_IF, and ASK_FIRST lists for implementers. 2 new files, 7 changed files, associated test updates.

evidence/current-compression-triggers.md -- Compression Trigger Audit

Maps all existing compression trigger points in agents-api. Confirms mid-generation compression is fully proactive (fires on prediction before each step), uses tiered thresholds (75%/83%/91% by model size), and that no provider-error recovery exists around generateText/streamText. Also confirms conversation-history compression is separate and unchanged.

evidence/middleware-approach.md -- Why Middleware Works

Evidence that wrapLanguageModel middleware makes retry invisible to the AI SDK multi-step loop. Explains stream peek mechanics, confirms post-commit retry is unsafe (once deltas are emitted, state is committed), and argues TTFT impact is negligible (microseconds of object property access on the already-awaited first chunk).

evidence/oversized-artifact-handling.md -- Artifact Path Analysis

Traces how oversized artifacts currently flow through the system. Key findings: detection (detectOversizedArtifact) only returns a result object; _oversizedWarning is injected at two downstream sites (BaseCompressor.ts:407, ArtifactService.ts:751); the actual LLM-facing seam is tool-wrapper.ts:111's toModelOutput (which has no oversized check today); and a retrieval-path exclusion already exists at default-tools.ts:214-246 providing the template shape.

evidence/provider-overflow-signals.md -- Provider Error Taxonomy

Documents how Anthropic, OpenAI, and the Vercel AI SDK signal context-overflow errors. OpenAI has a stable context_length_exceeded code; Anthropic uses message-text regex (drift risk). Confirms 413 is NOT overflow (Cloudflare byte limit). Notes max_tokens counts toward Anthropic overflow math, and that AI SDK does not normalize overflow codes.

meta/_changelog.md -- Spec Process Log

Append-only record of the spec authoring process: initial creation, audit + challenger resolution, and finalization. Documents all corrections applied (class name fixes, injection-site corrections, middleware pseudocode rewrite, acceptance criteria rewrites) and decisions made during the audit/challenge cycle.

meta/audit-findings.md -- Spec Audit Report

10 findings (3 High, 4 Medium, 3 Low) from a systematic audit against the codebase at baseline commit. High findings: incorrect class naming (ConversationCompressor vs MidGenerationCompressor), wrong _oversizedWarning injection-site citations, and misidentified artifact-exclusion surgery target (buildInitialMessages doesn't handle artifacts). All were resolved in the spec before finalization. Includes a confirmed-claims section verifying all file citations and external references.

meta/design-challenge.md -- Adversarial Design Review

8 challenge items from a cold-reader stress test. Confirms the core architecture is sound. Key challenges addressed: strengthened DEC-06 rationale with multi-step step-N>1 argument, split artifact exclusion to use error-shaped tool results (not text stubs), added DEC-11 rejecting transformParams, added DEC-13 for per-request middleware construction, and documented retry-budget-ladder revisit trigger. Post-commit hard-fail (C4) was independently dismissed as correct.

Pullfrog  | View workflow run | Triggered by Pullfrog | Using Claude Opus𝕏

@pullfrog
Copy link
Copy Markdown
Contributor

pullfrog Bot commented Apr 14, 2026

TL;DR

Adds a comprehensive spec (ready-for-implementation) for making mid-generation context compression reactive — firing only when the LLM provider actually returns a context-overflow error — instead of the current proactive token-budget prediction that over-eagerly compresses. Also specifies oversized tool-output exclusion at the tool-wrapper.ts toModelOutput seam so large tool results never reach the LLM prompt.

No code changes — this PR is 8 new spec/evidence/meta documents (1,375 lines).


What Changed (latest push)

Commit f0175d6 addresses PR review feedback from two bot reviewers (pullfrog, claude[bot]) — no design changes, only corrections and additions:

  • Version pin fix: @ai-sdk/[email protected]3.0.2 (the version actually declared in package.json; middleware contract verified identical across both)
  • Typo fix: formatOversizedReasonformatOversizedRetrievalReason (actual function name at artifact-utils.ts:50)
  • Telemetry clarification: all compression.* / anthropic_overflow_regex_hit / artifact.* attributes are explicitly span attributes (not log fields), emitted on step and tool-call spans
  • References cleanup: removed local filesystem path from §17; added in-repo pointers to meta/audit-findings.md and meta/design-challenge.md
  • New risk (MEDIUM): End-User visibility gap when tool results are excluded mid-turn — operators can see it in Jaeger via tool.result.oversized_excluded, but chat UI surfacing is deferred
  • New future work items: Traces UI surfacing for exclusion/overflow attributes, enriched stub metadata for debugging, retrieval-tool migration note
  • Verification plan addition: docs-check step to grep agents-docs/ for stale compression references during implementation
  • Changelog entry: full accounting of all review findings (accepted, partially accepted, none declined)

Key Changes

  • Introduces a new spec at specs/2026-04-14-reactive-compression/SPEC.md with 14 locked/directed decisions, 8 requirements with acceptance criteria, and a full verification plan
  • Defines a compressionRetryMiddleware using AI SDK wrapLanguageModel to intercept doGenerate/doStream per-step and retry once on provider overflow errors (Anthropic + OpenAI)
  • Specifies removal of the proactive token-budget compression branch from the prepareStep callback in ai-sdk-callbacks.ts
  • Specifies oversized tool-output exclusion at tool-wrapper.ts:111 using a structured error-shaped tool result matching the existing retrieval_blocked contract in default-tools.ts
  • Specifies removal of _oversizedWarning injection from BaseCompressor.ts and ArtifactService.ts (retaining metadata.isOversized as the durable signal)
  • 4 evidence documents grounding the design in codebase audit findings
  • 2 meta documents recording the audit/challenger process and changelog

Main Spec (SPEC.md)

The core document covers the full lifecycle: problem statement (SCR format), goals/non-goals, current vs target state, vertical-slice architecture diagram, 6 detailed design sections, user journeys for 3 personas, blast radius analysis, 8 requirements (R1–R8) with verifiable acceptance criteria, a 14-entry decision log, open questions, assumptions, risks, verification plan, future work, and agent constraints (scope/exclude/stop-if/ask-first).

Two new files are specified: compressionRetryMiddleware.ts (per-request factory returning LanguageModelV2Middleware) and detectContextOverflow.ts (provider-aware overflow discriminator). Seven existing files are modified. Pre-generation conversation-history compression is explicitly unchanged and acts as a safety net.

Evidence: Current Compression Triggers

Maps the existing compression trigger points — confirms mid-gen compression is fully proactive via MidGenerationCompressor.isCompressionNeededFromActualUsage(), fires at 75–91% of context depending on model tier, and that no provider-error recovery exists around generateText/streamText.

Evidence: Middleware Approach

Documents why wrapLanguageModel middleware is the only correct interception layer: retry is invisible to the multi-step loop, stream peek via async-iterator .next() adds negligible latency, and post-commit mid-stream errors cannot be retried.

Evidence: Oversized Artifact Handling

Traces the artifact-to-prompt path: detectOversizedArtifact only detects (returns warning string), _oversizedWarning is injected downstream at BaseCompressor.ts:407 and ArtifactService.ts:751, and the actual LLM-facing seam is tool-wrapper.ts:111's toModelOutput hook — not buildInitialMessages or buildToolResultForModelInput.

Evidence: Provider Overflow Signals

Documents how Anthropic (message-regex-only, no stable code) and OpenAI (context_length_exceeded code) signal overflow, that 413 is a byte-limit not context-overflow, and that the AI SDK normalizes errors but not overflow codes.

Meta: Audit Findings

10 findings (3 High, 4 Medium, 3 Low) from a systematic audit against the baseline commit. High findings corrected class name confusion (MidGenerationCompressor vs ConversationCompressor), _oversizedWarning injection-site misattribution, and the artifact-exclusion surgery target (tool-wrapper.ts:111 not buildInitialMessages). All findings resolved in the final spec.

Meta: Design Challenge

8-item challenger review stress-testing locked decisions. Confirms the middleware architecture is sound. Key outcomes: DEC-06 rationale strengthened (multi-step step-N>1 argument), DEC-11 added (explicitly rejecting transformParams), DEC-13 added (per-request middleware construction), and a future-work trigger for retry-budget ladder based on second_overflow rate.

Meta: Changelog

Append-only record of the spec process: initial creation, audit + challenger resolution (all corrections applied), finalization to ready-for-implementation, and PR review-cycle corrections with full finding-by-finding accounting.

Pullfrog  | View workflow run | Triggered by Pullfrog | Using Claude Opus𝕏

Copy link
Copy Markdown
Contributor

@pullfrog pullfrog Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verdict: Approve. This is an exceptionally thorough spec with rigorous evidence tracing, a self-contained audit/challenge cycle, and verified file:line citations (9/9 checked against main HEAD are accurate; the 10th — @ai-sdk/provider types — is unverifiable without installed deps but the spec's own audit resolved that concern via C6/C8).

The spec covers a well-scoped refactor (reactive mid-gen compression + artifact exclusion) with 14 locked decisions, 8 verifiable requirements, clear SCOPE/EXCLUDE/STOP_IF/ASK_FIRST boundaries, and a changelog showing how audit findings were resolved. The design is sound — middleware-based reactive retry preserves the AI SDK multi-step loop, and the tool-wrapper.ts:111 toModelOutput seam is the correct surgery site for oversized exclusion.

Two minor issues noted inline — neither blocks implementation.

Pullfrog  | View workflow run | Using Claude Opus𝕏


## 17. References

- Plan file: `/Users/timothycardona/.claude/plans/rustling-prancing-beaver.md`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This references a local filesystem path (/Users/timothycardona/.claude/plans/rustling-prancing-beaver.md) which is not accessible to other contributors or the implementation agent. Consider removing or replacing with a relative path to an in-repo artifact if the plan content is valuable, or simply delete the line — the spec is self-contained.

- `compression.detector = 'openai_code' | 'anthropic_regex' | 'heuristic_400'`
- `compression.retry_number = 0 | 1`
- `compression.outcome = 'success' | 'second_overflow' | 'other_error'`
- `artifact.excluded = true` with `artifact.id`, `artifact.original_tokens`, `artifact.context_window`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

anthropic_overflow_regex_hit = true is listed as a separate OTel attribute in §6.5, but R8's acceptance criteria in §9 list it as anthropic_overflow_regex_hit without specifying whether it's a span attribute or log field. The telemetry section says "span attributes / log fields" (ambiguously). For implementation clarity, pick one — span attribute is more queryable and consistent with the other compression.* attributes.

Copy link
Copy Markdown
Contributor

@pullfrog pullfrog Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving — spec is thorough, well-structured, and ready for implementation. Two minor nits left as inline comments on the prior review.

Pullfrog  | View workflow run | Using Claude Opus𝕏

Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review Summary

(4) Total Issues | Risk: Medium

🟠⚠️ Major (2) 🟠⚠️

Inline Comments:

  • 🟠 Major: SPEC.md:156 Version mismatch — spec cites @ai-sdk/[email protected] but codebase has 3.0.2
  • 🟠 Major: SPEC.md:227 Oversized stub messaging is LLM-centric, not Agent Builder-debuggable

🟡 Minor (2) 🟡

Inline Comments:

  • 🟡 Minor: SPEC.md:253 Telemetry attributes lack corresponding Traces UI treatment
  • 🟡 Minor: SPEC.md:412 Future artifact retrieval tool would require stub wording migration

💭 Consider (2) 💭

💭 1) SPEC.md:261 End User visibility when tool results are excluded

Issue: User Journey 7.1 describes LLM "correcting on the next step" but doesn't specify observable feedback to End Users when oversized tool results are excluded.

Why: End Users have no visibility that a tool result was excluded. The agent may produce confusing output like "I was unable to retrieve that data" without context. This is a potential coherence gap.

Fix: Consider specifying an observable signal for End Users (status update, streaming indicator) or document this as a known UX gap in the Risks section.

Refs: SPEC.md §7.1

💭 2) SPEC.md:47 Docs accompaniment for behavioral change

Issue: Spec changes observable agent behavior (compression triggers reactively, oversized tool results excluded) but doesn't mention docs updates.

Why: Agent Builders reading current docs won't learn about the new oversized exclusion behavior or changed compression timing. The Blast Radius section notes "Public SDK / API: No change" but docs surface isn't addressed.

Fix: Add to Verification Plan: "Review and update agents-docs content if it references compression behavior. Add changelog entry for Agent Builders."

Refs: SPEC.md §8


💡 APPROVE WITH SUGGESTIONS

Summary: This is a well-structured, comprehensive spec with thorough evidence files and a complete audit trail. The core architecture (reactive middleware + oversized exclusion at tool-wrapper seam) is sound and well-reasoned.

The main actionable item is the version mismatch — the spec cites @ai-sdk/[email protected] but the codebase has 3.0.2. This should be reconciled before implementation begins to ensure the middleware contract is valid.

The other findings are product polish suggestions (debugging experience, UI treatment, docs) that can be addressed in the implementation PR or follow-ups.

Discarded (4)
Location Issue Reason Discarded
evidence/provider-overflow-signals.md:30 OpenAI error code documentation Confirmed accurate — matches existing codebase usage at distill-utils.ts:102
evidence/provider-overflow-signals.md:32-36 Anthropic overflow message patterns from community evidence Already addressed — spec appropriately flags as Assumption A1 with MEDIUM confidence and includes telemetry for drift detection
SPEC.md:345 DEC-11 rejection of transformParams Confirmed correct — transformParams is proactive, conflicts with reactive design
Internal spec docs scope Review against write-docs standards Not applicable — specs/ directory contains internal engineering specs, not customer-facing docs
Reviewers (3)
Reviewer Returned Main Findings Consider While You're Here Inline Comments Pending Recs Discarded
pr-review-product 5 0 2 0 2 0 1
pr-review-3p-specs 6 0 0 0 2 0 4
pr-review-docs 1 0 0 0 0 0 1
Total 12 0 2 0 4 0 6


### 6.2 Middleware (`compressionRetryMiddleware.ts`)

**Critical shape correction.** Per `@ai-sdk/[email protected]` (`index.d.ts:2066-2088`), `wrapGenerate` / `wrapStream` receive `doGenerate` / `doStream` as **nullary** functions — they bind the already-transformed params. To retry with modified params, call `options.model.doGenerate(modifiedOptions)` (or `options.model.doStream(modifiedOptions)`) directly.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 MAJOR: Version mismatch with installed @ai-sdk/provider

Issue: Spec cites @ai-sdk/[email protected] but the codebase has @ai-sdk/[email protected] installed.

Why: The middleware contract (wrapGenerate/wrapStream signature with options.model for retry) must be verified against the actual installed version. If 3.0.4 introduced API changes not present in 3.0.2, the implementation will fail.

Fix: Update the spec to cite @ai-sdk/[email protected] after verifying the middleware interface matches, OR update the package to 3.0.4 if that version is required for the proposed pattern.

Refs:

toolCallId: currentToolCallId,
toolName,
warning: '⚠️ Tool produced an oversized result that cannot be included in the conversation.',
reason: formatOversizedReason(detection.originalTokenSize, detection.contextWindowSize),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 MAJOR: Oversized stub messaging is LLM-centric, not Agent Builder-debuggable

Issue: The structured error stub includes reason: formatOversizedReason(...) but the format isn't specified. Agent Builders debugging in Traces UI will see raw token counts without context on the 30% threshold or how to resolve.

Why: First-contact legibility for Agent Builders matters. The existing retrieval_blocked response at default-tools.ts:225-245 has the same gap. Since this spec aims to improve coherence, the stub should help humans debug, not just LLMs retry.

Fix: Consider specifying:

  1. Human-readable token context: ~42K tokens (30% of 140K context limit)
  2. A docsUrl field pointing to troubleshooting docs
  3. Truncation for toolArgs to avoid multi-KB dumps in Traces UI

Refs:

- `compression.detector = 'openai_code' | 'anthropic_regex' | 'heuristic_400'`
- `compression.retry_number = 0 | 1`
- `compression.outcome = 'success' | 'second_overflow' | 'other_error'`
- `artifact.excluded = true` with `artifact.id`, `artifact.original_tokens`, `artifact.context_window`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Minor: Telemetry attributes lack corresponding Traces UI treatment

Issue: New telemetry attributes (artifact.excluded, artifact.id, artifact.original_tokens, etc.) are specified, but there's no mention of whether the Traces Dashboard will surface them.

Why: Agent Builders need to see which tool calls had results excluded. Without UI treatment, these attributes are only discoverable via raw Jaeger queries.

Fix: Add to §8 (Blast Radius) or Future Work: "Traces UI should display artifact.excluded attribute on tool-call spans; consider a visual indicator (warning icon) in Tool Calls View."

Refs:


| Item | Tier | Why not now | Triggers to revisit |
|---|---|---|---|
| Agent-callable artifact retrieval tool (e.g., `get_artifact_full(id)`) | **Explored** | Out of this spec's scope; stub informs the LLM content is unavailable without advertising a retrieval path. Designing a safe retrieval API (permissions, token budget of retrieved content, recursion prevention) is its own scope. | Agent builders report repeated frustration with "content unavailable"; product decision to support in-turn full-artifact access. |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Minor: Future artifact retrieval tool would require stub wording migration

Issue: DEC-03 locks "stub advertises no retrieval path," but if an artifact retrieval tool is added later, the recommendation field wording becomes outdated. This creates mild product debt.

Why: The stub wording is a de-facto API contract. The spec notes this as "Explored" but doesn't specify the migration story.

Fix: Add a note: "If an artifact retrieval tool is added, the oversized stub recommendation field should be updated to include it (e.g., 'retrieve the full artifact using get_artifact_full(id)'). This is a backward-compatible addition."

@github-actions github-actions Bot deleted a comment from claude Bot Apr 14, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 14, 2026

Preview URLs

Use these stable preview aliases for testing this PR:

These point to the same Vercel preview deployment as the bot comment, but they stay stable and easier to find.

Raw Vercel deployment URLs

- Correct @ai-sdk/provider version citation: 3.0.4 → 3.0.2 (verified
  middleware contract identical across both via dist/index.d.ts diff)
- Fix formatOversizedReason → formatOversizedRetrievalReason (actual
  function exported from artifact-utils.ts:50)
- Clarify R8 telemetry: all attributes are span attributes, not log fields
- Remove local filesystem path from §17 References
- Add End-User visibility gap to Risks (MEDIUM) with operator mitigation
- Add Traces UI surfacing, enriched stub metadata, and retrieval-tool
  migration note to Future Work
- Add docs-check step to Verification Plan

No design changes. All review findings addressed.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review Summary

(0) Total Issues | Risk: Low


Re-review Scope

This is a delta-scoped re-review of commit f0175d6a2 ("spec: address PR review feedback (version pin, typos, telemetry clarity)").

Prior Feedback Status

All 9 findings from the previous review cycle have been properly addressed:

Prior Finding Resolution
🟠 Version mismatch (@ai-sdk/[email protected]3.0.2) ✅ §6.2 updated to cite 3.0.2 with verification note that 3.0.4 has identical middleware contract
🟠 Function name typo (formatOversizedReason) ✅ Corrected to formatOversizedRetrievalReason globally
🟡 Telemetry ambiguity (span vs log fields) ✅ §6.5 clarified: all are span attributes (not log fields)
🟡 Future artifact retrieval migration story ✅ Migration note added to §15 Future Work entry
💭 End-User visibility gap ✅ New row added to §13 Risks
💭 Docs accompaniment ✅ Docs check added to §14 Verification Plan
pullfrog: Local filesystem path in References ✅ Removed; replaced with in-repo pointers
pullfrog: R8 telemetry ambiguity ✅ Same as above — clarified as span attributes
(secondary) Enriched stub metadata ✅ Added to §15 Future Work (Noted tier)

Changelog Quality

The meta/_changelog.md update provides excellent traceability:

  • Each finding is categorized (Accepted / Partially accepted / Declined)
  • Verification evidence is documented (e.g., diff of dist/index.d.ts between versions)
  • Rationale for partial acceptance is clear (e.g., "added to Risks / Future Work rather than design changes")

✅ APPROVE

Summary: The spec is ready for implementation. All prior review feedback has been thoughtfully addressed — corrections were applied where warranted, and items outside the spec's scope were properly logged in Risks or Future Work sections. The changelog audit trail is exemplary.

Note: GitHub App cannot submit formal approvals. This is a recommendation to approve.

Reviewers (1)
Reviewer Returned Main Findings Consider While You're Here Inline Comments Pending Recs Discarded
orchestrator 0 0 0 0 0 0 0
Total 0 0 0 0 0 0 0

Note: Delta-scoped review — no sub-reviewers dispatched as prior feedback was fully addressed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant