Skip to content

feat: add .swamp-sources.yaml for loading extensions from external paths#1073

Merged
stack72 merged 8 commits intomainfrom
feat/extension-sources
Apr 3, 2026
Merged

feat: add .swamp-sources.yaml for loading extensions from external paths#1073
stack72 merged 8 commits intomainfrom
feat/extension-sources

Conversation

@stack72
Copy link
Copy Markdown
Contributor

@stack72 stack72 commented Apr 3, 2026

Summary

Two related changes in one PR:

  1. .swamp-sources.yaml — Load extensions from external filesystem paths without copying files. Enables testing extensions from separate repos (public swamp-extensions, private company repos, local dev workspaces).

  2. Bundle cache namespacing (fixes Pulled extension bundles poison mtime-based bundle cache, local source silently ignored #1065) — Bundle paths are now namespaced by source directory hash, preventing pulled bundles from poisoning the cache for local/source extensions with the same filename.

Design Decisions (in order of resolution)

1. Why a separate file instead of fields in .swamp.yaml?

.swamp.yaml is committed to version control. Extension source paths are inherently local and machine-specific. A separate gitignored file keeps the main config clean and prevents merge conflicts.

2. Why .swamp-sources.yaml not .swamp-dev.yaml?

"Dev" is vague. The file does exactly one thing: declare additional extension source paths for the loader. "Sources" describes what it is.

3. Why glob support?

The swamp-extensions repo has 251+ AWS service extensions. Without globs, sourcing them all would need 251 entries. With globs: path: ~/code/swamp-extensions/model/aws/*

4. Why sources load before pulled (not after)?

Load order: local > sources > pulled. If you've pulled @swamp/aws/ec2 AND point a source at your local dev copy, the source should win — that's the whole point of developing a replacement.

5. Why swamp extension source not swamp source?

swamp source already exists (troubleshooting). swamp extension source add/rm/list groups with other extension commands and follows the extension trust add/rm/list pattern.

6. Deno config discovery for source extensions

Source extensions live in a separate directory tree with their own deno.json. findNearestDenoConfig() walks up from the source file to find the nearest config, stopping at the consumer repo boundary to avoid picking up unrelated configs. A warning is logged when discovered config is used.

7. Bundle cache namespacing (#1065)

Problem: Bundles were flat at .swamp/bundles/<file>.js. Pulled and local extensions with the same relative path collided — pulled bundles silently replaced local ones.

Fix: bundleNamespace(baseDir, repoDir) hashes relative(repoDir, baseDir) to create namespaced paths: .swamp/bundles/<hash>/<file>.js. Using relative paths as hash input avoids macOS /var vs /private/var symlink issues.

8. Catalog layout version for safe migration

Existing repos have a catalog with flat bundle paths. On first run with the new binary, the layout version check ("namespaced-v1") forces a full rescan. Old flat bundles are moved (not deleted) into the correct hashed namespace, preserving pre-built bundles from pulled extensions that can't be rebundled locally.

Architecture

Load pipeline

.swamp-sources.yaml
    ↓ readSwampSources()
    ↓ expandSourcePaths() — glob expansion, ~/$ expansion
    ↓ resolveSourceExtensionDirs() — reads source's .swamp.yaml for dir layout
    ↓ collectDirsForKind() — extracts dirs per extension type
    ↓
loadUser*(sourceDirs=[...sourceDirs, pulledDir])
    ↓
bundleWithCache() → .swamp/bundles/<hash(baseDir)>/<file>.js

Layer structure

Layer Files Role
Domain swamp_sources.ts Zod schema, types, parser
Infrastructure swamp_sources_repository.ts, paths.ts, extension_catalog_store.ts File I/O, glob expansion, bundleNamespace(), layout version
CLI mod.ts, repo_context.ts Wiring sources into loader pipeline
CLI Commands extension_source_*.ts add/rm/list subcommands
LibSwamp sources/*.ts Business logic for add/rm/list
Presentation extension_source_*.ts Log/JSON renderers
Loaders user_model_loader.ts, vault/driver/datastore/report loaders Namespaced bundle paths
Pull pull.ts Namespaced bundle extraction

Real-World Testing Scenarios

All scenarios tested with old binary (swamp on PATH) → new compiled binary.

Scenario Result
Init old binary, upgrade new PASS.swamp-sources.yaml added to gitignore
Basic source loading (add/list/search/describe) PASS@test/hello from source appears
Source overrides pulled PASS — source version 2026.04.03.1 loaded, pulled 2026.03.05.1 skipped
Remove source, pulled returns PASS — pulled type discoverable after source removed
Bundle cache migration (old→new binary) PASS — flat bundles moved to hashed namespace, types load correctly
Second run after migration PASS — warm path, no re-migration
Invalid/missing source path PASS — red ✗ in list, no crash
Source remove and list PASS — remove works, last remove deletes file
Only filter PASS — model filtered out when --only vaults
Duplicate prevention PASS — "already exists" error
JSON output mode PASS — structured JSON
Old binary backward compat PASS — ignores sources file, no crash
Bundle namespace hash consistency PASS2e4ea9ae matches across processes

Automated Test Results

  • 4108 tests passed, 0 failed
  • deno check — clean
  • deno lint — clean
  • deno fmt — clean
  • deno run compile — binary builds

New tests added

  • swamp_sources_test.ts — 9 tests: YAML parsing, Zod validation, only filtering, glob detection
  • add_test.ts — 5 tests: add to empty/existing, only filter, duplicate rejection, empty path
  • remove_test.ts — 4 tests: remove, delete on last, not found, no sources
  • repo_service_test.ts.swamp-sources.yaml in gitignore
  • paths_test.ts — 3 tests: symlink-safe hash consistency, different dirs produce different hashes, format
  • extension_catalog_store_test.ts — 3 tests: layout version get/set/overwrite

File Format

sources:
  - path: ~/code/systeminit/swamp-extensions/model/aws/*
  - path: ~/code/acme-corp/internal-extensions/model/*
  - path: ~/code/my-experimental-model
    only: [models]

Migration

Fully automatic on first run with the new binary:

  1. Catalog layout version missing → forces full rescan
  2. Flat .js files in .swamp/bundles/ moved to hashed subdirectories
  3. Warning logged: "Migrated N bundle file(s) to namespaced layout"
  4. Layout version stored → subsequent runs take fast warm path

No swamp repo upgrade required. Old binary still works if user downgrades.

Known Limitations

  • No origin tracking in registries: swamp model type search doesn't show [sourced] badges. Deferred for v2.
  • Skills not yet updated: swamp-repo and swamp-extension-model skills need documentation for sources workflow.

Closes #1065
Related: #1028

🤖 Generated with Claude Code

stack72 and others added 7 commits April 3, 2026 01:35
Add a mechanism for loading extensions from external filesystem paths via
a gitignored `.swamp-sources.yaml` file. This enables developers to test
extensions from separate repositories — whether the public swamp-extensions
repo, private company repos with proprietary integrations, or local dev
workspaces — without copying files.

Key changes:
- New `.swamp-sources.yaml` file format with glob and `only` filter support
- `swamp extension source add/rm/list` CLI commands
- Source extensions load before pulled (sources override registry versions)
- Bundler discovers nearest deno.json for source extension import resolution
- `.swamp-sources.yaml` always added to .gitignore by repo init/upgrade

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
…ying

The findNearestDenoConfig walk now stops at the consumer repo root to
avoid picking up unrelated project configs. Bundle cache keying (#1065)
is deferred to a separate PR due to macOS symlink path normalization
complexity (/var vs /private/var).

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Bundle paths were flat (.swamp/bundles/<file>.js), causing collisions
when local, pulled, and source extensions had files with the same
relative path. Pulled bundles poisoned the cache for local extensions.

Fix: insert a hash of relative(repoDir, baseDir) into bundle paths:
.swamp/bundles/<hash>/<file>.js. Using relative paths as hash input
avoids macOS /var vs /private/var symlink issues — both sides share
the same prefix within a process, so relative() cancels it out.

Shared bundleNamespace() function in paths.ts used by all loaders
(models, vaults, drivers, datastores, reports) and extension pull.
Old flat-layout .js files are auto-cleaned on first run.

Closes #1065

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
…undles

When upgrading from the old flat bundle layout, the catalog may have
entries pointing to flat paths that were poisoned by #1065. The layout
version check forces a full rescan on first run with the new binary.

Old flat bundles are moved (not deleted) into the correct hashed
namespace, preserving pre-built bundles from pulled extensions that
can't be rebundled locally due to missing dependencies.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
- swamp-repo: New "Extension Sources" section with file format, CLI
  commands, load order, and common patterns
- swamp-extension-model: Updated Model Discovery section with source
  priority order and testing workflow
- swamp-troubleshooting: Added "Source Extension Not Loading" diagnostic
  checklist
- swamp-extension-vault, swamp-extension-datastore: Added one-line
  mentions of sources as alternative to publishing

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
The frontmatter description is the primary triggering mechanism. Without
source-related triggers, Claude wouldn't load the swamp-repo skill when
users ask about extension sources, .swamp-sources.yaml, or loading
extensions from external paths.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@stack72 stack72 marked this pull request as ready for review April 3, 2026 16:37
github-actions[bot]

This comment was marked as outdated.

github-actions[bot]

This comment was marked as outdated.

github-actions[bot]

This comment was marked as outdated.

- Re-export ExtensionKind and EXTENSION_KINDS from libswamp/mod.ts;
  fix CLI import to use mod.ts instead of internal domain path
- Add 13 unit tests for swamp_sources_repository.ts (file I/O, glob
  expansion, only filter, marker-based dir resolution, collectDirsForKind)
- Validate --only flag at add-time against EXTENSION_KINDS with clear
  error message for typos
- Add .example() to extension source list command for help consistency
- Fix list abort-on-error: expansion failure for one source no longer
  aborts listing of remaining sources
- Improve renderer wording: "1 source configured" / "N sources configured"
- Replace dynamic import("@std/yaml") with top-level import

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CLI UX Review

Blocking

None.

Suggestions

  1. extension_source_list.ts:62-65 — When a glob source expands to multiple paths (e.g. ~/code/aws/* → 5 roots), the log-mode output shows 5 extension roots but not the actual resolved paths. A --verbose expansion or a one-per-line indented list would help users confirm which directories were matched. Not blocking — the JSON output already includes expandedPaths for scripting use.

Verdict

PASS — New swamp extension source add/rm/list commands are well-structured, consistent with the extension trust pattern, both log and JSON output modes are implemented, error messages are clear and actionable, and flag naming (--only, --repo-dir) is consistent with the rest of the CLI.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adversarial Review

Critical / High

No critical or high severity issues found.

Medium

  1. src/cli/mod.ts:605 — Silent swallowing of .swamp-sources.yaml parse errors hides user mistakes.
    The outer catch {} at line 605 catches everything, including UserError from parseSwampSources() when the user writes invalid YAML (e.g., only: [invalid_kind]). The user gets no feedback — their sources file is silently ignored. Contrast with readSwampSources() which correctly re-throws non-NotFound errors. The outer try/catch defeats that.

    Breaking scenario: User adds only: [model] (missing the s) in .swamp-sources.yaml. The Zod validation throws a UserError, but the catch swallows it. Sources silently don't load, and the user has no idea why.

    Suggested fix: Catch Deno.errors.NotFound specifically, or at least log the error:

    } catch (err) {
      if (!(err instanceof Deno.errors.NotFound)) {
        logger.warn`Failed to load extension sources: ${err}`;
      }
    }
  2. src/cli/repo_context.ts:82-91getSourceWorkflowDirs re-reads and re-resolves .swamp-sources.yaml on every call, including glob expansion and filesystem stat calls.
    This function is called from requireInitializedRepoReadOnly, requireInitializedRepo, and requireInitializedRepoUnlocked. Meanwhile, runCli already resolves sources once at line 600-604. For commands that go through runCli AND one of these repo context functions, the sources file is read, globs expanded, and directories stat'd twice. Same silent-catch issue as #1.

    Impact: Performance waste on repos with many glob sources (the PR description mentions 251+ AWS extensions). Not a correctness bug, but worth noting since it was explicitly designed to be "resolved once and shared across all loaders" per the comment at line 596.

  3. src/infrastructure/persistence/swamp_sources_repository.ts:141-144 — Unsafe as cast of YAML content to RepoMarkerData.
    parseYaml(content) as RepoMarkerData trusts arbitrary YAML from the source's .swamp.yaml without validation. If the source repo has a malformed marker (e.g., modelsDir: 42 or modelsDir: ["an", "array"]), this becomes a non-string value that flows into resolve() at line 167. The resolve() call would likely coerce to string, but the behavior is undefined.

    Suggested fix: Validate the marker through the same schema the main repo marker uses, or at least guard resolveKindDir to return the default when the field isn't a string.

Low

  1. src/domain/models/user_model_loader.ts:551migrateOldFlatBundles uses String.includes("pulled-extensions") to find the pulled dir, which is fragile. If someone names their source directory something containing "pulled-extensions", old bundles could migrate to the wrong namespace. Using the known constant SWAMP_SUBDIRS.pulledModels path suffix would be more precise.

  2. src/infrastructure/persistence/paths.ts:146-154 — FNV-1a hash collision risk. 32-bit hash has a ~50% collision probability at ~77k inputs (birthday paradox). With only 8 hex chars, two different source directories could theoretically hash to the same namespace. In practice, a repo with hundreds of source dirs is unlikely to hit this, but worth acknowledging.

  3. src/domain/models/user_model_loader.ts:1009-1029findNearestDenoConfig uses sync I/O (Deno.statSync) in an otherwise async code path. Not a bug, but sync filesystem calls block the event loop. With many source extensions, this could add up.

Verdict

PASS — The code is well-structured, follows existing patterns consistently, has good test coverage (24+ new tests), and the bundle namespace migration is carefully handled. The medium findings are quality-of-life issues (silent error swallowing, duplicate work) rather than correctness bugs. The core design — namespace by hash, migrate on version change, source > pulled priority — is sound.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

Well-structured PR that adds .swamp-sources.yaml for loading extensions from external filesystem paths and fixes bundle cache collisions (#1065) via namespace hashing. The architecture follows established patterns cleanly.

Blocking Issues

None.

Suggestions

  1. Redundant source resolution in repo_context.ts: getSourceWorkflowDirs() is called independently in three places (requireInitializedRepoReadOnly, requireInitializedRepo, requireInitializedRepoUnlocked), each re-reading and re-parsing .swamp-sources.yaml. Meanwhile, mod.ts already resolves sources once for all other extension kinds. Consider passing the already-resolved workflow dirs through or caching them to avoid redundant I/O on every command invocation.

  2. migrateOldFlatBundles uses sync I/O: The method uses Deno.readDirSync, Deno.mkdirSync, and Deno.renameSync on the hot path. This is fine for migration (runs once), but worth noting it blocks the event loop during migration of repos with many flat bundles.

  3. DDD observation: The resolveKindDir, setKindDir, and getKindDir functions in swamp_sources_repository.ts use switch statements over ExtensionKind. If new kinds are added, all three switches need updating. A map-based approach would be more maintainable, but the current approach matches existing patterns in the codebase.

What looks good

  • Import boundaries respected: CLI commands and presentation renderers import from libswamp/mod.ts; libswamp imports from infrastructure directly — both consistent with established patterns.
  • Clean DDD layering: Domain types (swamp_sources.ts) contain only Zod schemas and types; infrastructure (swamp_sources_repository.ts) handles file I/O and glob expansion; application services (add.ts, remove.ts, list.ts) orchestrate operations; CLI commands are thin wiring.
  • Dependency injection: SourceAddDeps, SourceRemoveDeps, SourceListDeps enable clean testing without filesystem access.
  • Both output modes: All new commands support log and json output modes as required.
  • Comprehensive test coverage: 24+ new tests covering domain parsing, infrastructure CRUD, libswamp operations, catalog store versioning, bundle namespace hashing, and integration.
  • Safe migration: Old flat bundles are moved (not deleted) to preserve pre-built bundles from pulled extensions. Layout version check forces rescan only when needed.
  • Bundle namespace design: Using relative(repoDir, baseDir) as hash input avoids macOS /var vs /private/var symlink issues. FNV-1a hash is appropriate for this use case.
  • License headers: All new files include the AGPLv3 copyright header.
  • No fire-and-forget promises: All async operations are properly awaited.

@stack72 stack72 merged commit 7348ba7 into main Apr 3, 2026
10 checks passed
@stack72 stack72 deleted the feat/extension-sources branch April 3, 2026 16:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Pulled extension bundles poison mtime-based bundle cache, local source silently ignored

1 participant