Skip to content

fix(extensions): detect truncated pulled-extension trees in auto-resolve (swamp-club#133)#1197

Merged
stack72 merged 1 commit intomainfrom
worktree-twinkling-growing-pudding
Apr 20, 2026
Merged

fix(extensions): detect truncated pulled-extension trees in auto-resolve (swamp-club#133)#1197
stack72 merged 1 commit intomainfrom
worktree-twinkling-growing-pudding

Conversation

@stack72
Copy link
Copy Markdown
Contributor

@stack72 stack72 commented Apr 20, 2026

Fixes swamp-club#133.

Summary

Auto-resolve treated any pulled-extensions/<name>/ directory that existed on disk as "installed" — regardless of whether its contents were complete. A tree with manifest.yaml missing or a declared kind subdirectory missing still passed the isInstalled check, so the resolver emitted reason: already_installed and the downstream loader produced a confusing Unknown datastore type "@swamp/s3-datastore" / Unknown model type: @swamp/issue-lifecycle error with no pointer at what was actually wrong.

This is a regression from PR #1187 (swamp-club#121). Before that change, the adapter called installExtension with force: true on any load failure — a truncated tree self-healed on the next command. After #1187 the strict "never overwrite" guard caught the truncated case alongside the user-edit case it was meant for, and the truncation recovery path was lost.

Approach

The fix extends the "never overwrite" guarantee to truncation while giving users an actionable error. It is read-only — no filesystem mutation in the auto-resolve path. --force remains the only way auto-install state can overwrite a pulled extension.

  • Introduces a tri-state InstallationInspection result (missing | intact | truncated) on ExtensionInstallerPort, replacing isInstalled + installedPath. Both the path and the missing-files list ride back on the result.
  • The adapter drives the check from the lockfile's file list (authoritative file-level granularity, so partial truncation inside a kind subdir is caught too — not just whole-subdir removal).
  • The domain service branches on the three states: missing → install, intactalreadyInstalledButFailed (unchanged Issue 6: Migrate echo model to new architecture #121 path for user WIP), truncated → new alreadyInstalledTruncated event naming every missing file plus the swamp extension pull <name> --force recovery command.

Auto-repair was considered and rejected. The user-journey analysis surfaced that mutating the filesystem in response to a read-shaped command has too many quiet side effects for shared-repo / multi-worktree setups: silent version bumps (install is always version: null → latest), concurrent-worktree-rm reversal, masking of deliberate user state changes, and latency surprise. A loud, actionable error with an explicit recovery command is safer and keeps the #121 invariant intact.

What changed

  • src/domain/extensions/extension_auto_resolver.ts — new InstallationInspection type, port method inspectInstallation (replaces isInstalled + installedPath), new alreadyInstalledTruncated event on AutoResolveOutputPort, three-way branch in installAndLoad.
  • src/cli/auto_resolver_adapters.ts — implements inspectInstallation against the lockfile + per-extension directory + per-file stats; wires the new renderer through createAutoResolveOutputAdapter.
  • src/presentation/renderers/extension_auto_resolve.ts — new renderAutoResolveTruncated for both log (three-line actionable error) and json ({"event":"auto_resolve","status":"failed","reason":"truncated","missing":[...]}).
  • src/cli/auto_resolver_adapters_test.ts — 7 new inspectInstallation tests replacing the 5 prior isInstalled/installedPath cases; covers missing / intact / truncated / partial-truncation / all-files-missing / empty-files edge cases.
  • src/domain/extensions/extension_auto_resolver_test.ts — updated mock port; new regression guards for the three-way branch (intact → alreadyInstalledButFailed preserved from Issue 6: Migrate echo model to new architecture #121, truncated → alreadyInstalledTruncated is new).
  • integration/auto_resolver_truncated_test.ts — new hermetic integration test: hand-written scratch fixture, deletes one declared file, asserts the truncated event fires with the missing file AND the on-disk tree is byte-identical after the resolve attempt.
  • integration/auto_resolver_no_clobber_test.ts — ported to the new port method (no behavior change; same Issue 6: Migrate echo model to new architecture #121 invariant).
  • design/extension.md — "Safety: never overwrite on-disk extensions" section extended with the tri-state taxonomy, truncation predicate, and JSON event shape for scripting consumers.

Coverage

Works across all kinds that go through auto-resolve (models, vaults, datastores) — any file missing anywhere in the extension tree is detected when auto-resolve fires for a type from that extension, because inspectInstallation walks the whole lockfile file list. Driver-only or report-only extensions (no auto-resolve trigger) are not covered by this PR; extending to those would require new resolveDriverType / resolveReportType plumbing and is scope creep for #133. Worth a follow-up if belt-and-braces coverage is desired.

Test plan

  • deno fmt clean
  • deno check clean
  • deno lint clean
  • deno run test — 4445 pass / 0 fail (6 new adapter tests, 2 new domain-service tests, 1 new integration test; rewrote 5 Issue 6: Migrate echo model to new architecture #121 regression cases against the new port shape)
  • deno run compile — binary built
  • Manual reproduction against /tmp/swamp-repro-issue-133 with the compiled binary:
    • Before truncation: swamp extension pull @swamp/digitalocean then swamp model create @swamp/digitalocean/droplet test-droplet succeeds.
    • Truncate: rm -rf .swamp/pulled-extensions/@swamp/digitalocean/models.
    • Re-run model create: json mode emits {"event":"auto_resolve","status":"failed","reason":"truncated","missing":[68 files]}; log mode emits three ERR lines naming the missing files and the swamp extension pull @swamp/digitalocean --force recovery.
    • Follow the recovery command: swamp extension pull @swamp/digitalocean --force restores the tree.
    • Re-run model create: succeeds.
  • Tree on disk is byte-identical before and after the resolve attempt — no auto-repair, no silent overwrite.

Follow-up

  • Driver-only / report-only extensions remain uncovered by auto-resolve (see "Coverage" above). Worth a separate issue if exhaustive cross-kind coverage is desired.
  • Root cause of truncation itself is still unknown (partial extract, interrupted install, concurrent worktree op). This PR makes the symptom actionable so the cause is easier to investigate when it next occurs.

🤖 Generated with Claude Code

…lve (swamp-club#133)

Auto-resolve treated any `pulled-extensions/<name>/` directory that
existed on disk as "installed" — regardless of whether its contents were
complete. A tree with manifest.yaml missing or kind subdirectories
missing still passed the `isInstalled` check, so the resolver emitted
`reason: already_installed` and the downstream loader produced a
confusing `Unknown datastore type` / `Unknown model type` error with no
pointer at what was actually wrong.

Introduces a tri-state `InstallationInspection` result
(missing | intact | truncated) on `ExtensionInstallerPort`, replacing
`isInstalled` + `installedPath`. The adapter drives the check from the
lockfile's file list (authoritative file-level granularity, so partial
truncation inside a kind subdir is caught too) and the domain service
branches on the three states: missing → install, intact →
`alreadyInstalledButFailed` (unchanged #121 path for user WIP),
truncated → new `alreadyInstalledTruncated` event naming every missing
file plus the `swamp extension pull <name> --force` recovery.

No filesystem mutation on the detection path. The #121 "never overwrite
on-disk extensions" guarantee is preserved and extended to the truncated
case — `--force` remains the only escape. The truncation itself is out
of scope; this PR makes the symptom actionable so the underlying cause
(partial extract, interrupted install, concurrent worktree op) is
easier to investigate when it next occurs.

Verified end-to-end against the compiled binary with a scratch repo:
pulled `@swamp/digitalocean`, deleted `models/`, re-ran a command
needing a model type — got the new truncated event naming 68 missing
files in both log and json modes, and the `--force` recovery command
restored the tree cleanly.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CLI UX Review

Blocking

None.

Suggestions

  1. Log mode: quoted file list looks odd for ≤5 files (extension_auto_resolve.ts:181)
    The list variable is interpolated as a single LogTape placeholder, so it renders with quotes: missing 3 file(s): ".swamp/pulled-extensions/@swamp/aws/manifest.yaml, ...". The comment explains the tradeoff (avoid double-quoting), but a bulleted or newline-separated list of paths would be more scannable. Not blocking — the JSON path has all files unquoted.

  2. Log mode doesn't hint at --json for large truncations (extension_auto_resolve.ts:177–178)
    When >5 files are missing, the log line says ... and 63 more. A trailing hint like (run with --json for the full list) on line 3 would be helpful. Optional improvement.

  3. JSON path field in the truncated event is the extension root, not a file path — the field name is clear enough in context (path is used the same way in already_installed), but scripting consumers may need to know these are sibling concepts. Design doc already documents the shape, so this is fine.

Verdict

PASS. The new truncated error variant is actionable (names every missing file + the exact recovery command), structurally consistent with existing auto_resolve events, and the JSON output includes the missing array that scripts need. This is a clean UX improvement over the prior Unknown <kind> type fallback.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adversarial Review

Critical / High

None found.

Medium

  1. Sequential stat calls for large file lists — latency concern (src/cli/auto_resolver_adapters.ts:128-133)

    The inspectInstallation method issues one Deno.stat per file in the lockfile's files array, sequentially:

    for (const relPath of files) {
      try {
        await Deno.stat(join(repoDir, relPath));
      } catch {
        missing.push(relPath);
      }
    }

    An extension with hundreds of files (the PR body mentions 68 files for @swamp/digitalocean) will issue 68 sequential stat syscalls. On NFS, high-latency Docker volumes, or Codespaces mounts, each stat can take 5-20ms, adding 0.3-1.4s of wall time per extension per auto-resolve call. This runs on the hot path of every command that triggers auto-resolve.

    Not blocking because local fs is the typical target and the existing code already had per-directory stat costs, but worth noting as a potential papercut. A Promise.all batch (with concurrency limit) would parallelize the stats.

  2. TOCTOU between inspectInstallation and install (src/domain/extensions/extension_auto_resolver.ts:361-383)

    The inspection reads filesystem state, then the code branches to install. Between the two, another process could create/delete files, changing the state. However, this is explicitly documented as defence-in-depth (the ConflictError catch on install covers the race), so this is a known design trade-off, not a bug. Calling it out for completeness.

  3. Concurrency: two types from the same extension resolve simultaneously (src/domain/extensions/extension_auto_resolver.ts:160,345-375)

    The re-entrancy guard is per type (this.resolving is a Set<string> keyed on normalizedType). If two types from the same extension (e.g., @swamp/aws/ec2/instance and @swamp/aws/s3/bucket) resolve concurrently, both will call inspectInstallation("@swamp/aws") and both may proceed to install("@swamp/aws"). The defence-in-depth ConflictError catch on the second install prevents data corruption, so this is safe, but the second caller will get a silent null return from install and report failure even though the first caller succeeded. Pre-existing behavior, not introduced by this PR.

Low

  1. btoa(String.fromCharCode(...bytes)) in integration test may blow the stack for large files (integration/auto_resolver_truncated_test.ts:136)

    String.fromCharCode(...bytes) spreads all bytes as arguments. V8's argument limit is ~65,536; a file larger than 64KB would throw RangeError: Maximum call stack size exceeded. Not a concern here since the fixture files are tiny hand-written strings, but the helper isn't safe for general use.

  2. Empty missing array edge case in renderer (src/presentation/renderers/extension_auto_resolve.ts:155-186)

    If renderAutoResolveTruncated is ever called with missing: [], it would produce missing 0 file(s): with an empty list — grammatically odd but technically harmless. The domain layer guarantees missing.length > 0 before reaching the truncated branch, so this can't happen in practice.

  3. entry.files type safety (src/cli/auto_resolver_adapters.ts:126)

    entry.files ?? [] assumes files is either string[] or absent. The UpstreamExtensionEntry interface declares files?: string[], so this is correct. However, a malformed lockfile with files: "not-an-array" would silently iterate characters. This would produce nonsensical missing-file entries but wouldn't crash or corrupt state. The lockfile is trusted internal state, so this is theoretical.

Verdict

PASS. This is a clean, well-structured fix. The tri-state inspection is a sound design choice — it correctly distinguishes the three cases that the old binary check conflated, the read-only invariant is well-enforced, and the test coverage is thorough (unit, domain, and integration layers all exercised). The new code paths are all properly guarded and the existing regression tests have been correctly ported to the new API shape. No blocking issues.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

Clean, well-structured fix. The tri-state InstallationInspection discriminated union is the right DDD building block — it's a value object (no identity, immutable, equality by structure) that lives in the domain layer and keeps the domain service free of infrastructure concerns. The port boundary is properly respected: inspectInstallation is defined on ExtensionInstallerPort in the domain, the adapter in the CLI layer implements it against the filesystem and lockfile, and the renderer handles both log and json output modes. The read-only approach (detect and report, don't auto-repair) is the right call for shared-repo / multi-worktree safety.

Blocking Issues

None.

Suggestions

  1. Sequential stat calls in inspectInstallation (src/cli/auto_resolver_adapters.ts:128–133): The for loop does sequential Deno.stat per file. For extensions with many files (68 in the DigitalOcean example), Promise.allSettled would be faster. Not a concern for correctness — this isn't a hot path — but worth noting if perf profiling ever flags it.

  2. Renderer unit test: renderAutoResolveTruncated in src/presentation/renderers/extension_auto_resolve.ts has no dedicated unit test. This is consistent with the existing pattern (other renderAutoResolve* functions in the same file also lack unit tests), and the integration test covers the JSON path indirectly, so not blocking. A follow-up adding renderer-level tests for the whole file would be a nice improvement.

Both items are non-blocking — the test coverage across unit (adapter + domain service) and integration layers is solid, the libswamp import boundary is respected (auto_resolver_adapters.ts imports from ../libswamp/mod.ts), and the design doc update accurately describes the new tri-state taxonomy.

@stack72 stack72 merged commit 63ef46f into main Apr 20, 2026
10 checks passed
@stack72 stack72 deleted the worktree-twinkling-growing-pudding branch April 20, 2026 21:43
ianarsenault pushed a commit to ianarsenault/swamp that referenced this pull request Apr 21, 2026
…tion check (systeminit#1199)

## Summary

The tri-state `inspectInstallation` introduced in systeminit#1197 stats every path
in the lockfile's file list to detect a truncated install. That list
includes compiled bundles under `.swamp/bundles/**`,
`.swamp/vault-bundles/**`, `.swamp/driver-bundles/**`,
`.swamp/datastore-bundles/**`, and `.swamp/report-bundles/**` —
regenerable build artifacts, not source.

When a user clears any of those caches (normal hygiene, stale rebuild,
fresh worktree), auto-resolve flipped from `intact` → `truncated` and
emitted:

```
incomplete — missing 1 file(s): ".swamp/bundles/91139983/system_usage.js"
```

…instead of the `alreadyInstalledButFailed` ("already installed at … run
`--force` to recover") path that protects user WIP per systeminit#121. The
motivating swamp-club#133 symptoms are all source-tree artifacts
(manifest, kind subdirs, declared source), so the fix is to scope the
truncation predicate to source files only.

This filters bundle prefixes out of the stat loop in
`inspectInstallation`. Prefixes are derived from
`SWAMP_SUBDIRS.{bundles, vaultBundles, driverBundles, datastoreBundles,
reportBundles}` so a future bundle-dir addition stays in sync. No
filesystem mutation, no API/event/wire-format changes.

## Test Plan

- [x] `deno fmt --check`, `deno lint`, `deno check` clean
- [x] `src/cli/auto_resolver_adapters_test.ts` — 18/18 pass, including 2
new regression cases:
- `inspectInstallation ignores absent bundle artifacts and stays intact`
(covers all 5 bundle subdirs)
- `inspectInstallation reports truncated only for missing source, not
missing bundles`
- [x] `integration/auto_resolver_truncated_test.ts` — passes (systeminit#1197/systeminit#133
source-truncation behaviour preserved)
- [x] `integration/auto_resolver_no_clobber_test.ts` — passes (systeminit#121
no-clobber invariant preserved)
- [x] `deno run compile` succeeded
- [x] **End-to-end repro against the rebuilt binary:** `swamp extension
pull @stack72/[email protected]` → edit pulled `.ts` with
WIP marker + syntax error → `rm -rf .swamp/bundles` → `swamp model
create @stack72/system-usage instance1` now produces:
  ```
Extension "@stack72/system-extensions" is already installed at … but
failed to load.
Local edits may be preventing it from registering — inspect the source
and fix errors.
To reset to the registry version and discard local changes, run: swamp
extension pull "@stack72/system-extensions" --force
  ```
…instead of the buggy `incomplete — missing 1 file(s):
".swamp/bundles/91139983/system_usage.js"`.
- [ ] UAT `tests/cli/extension/pull_wip_preservation_test.ts:156` in
swamp-uat (CI gate — local run blocked by membership auto-trust
collision unrelated to this fix)

## Cross-references

- Introducing PR: systeminit#1197
- Fixes regression to swamp-club#121 / systeminit#1187 user-WIP preservation
- swamp-uat coverage: swamp-uat#149

Co-authored-by: Claude Opus 4.7 (1M context) <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant