Skip to content

feat: remove built-in S3 datastore in favor of @swamp/s3-datastore extension#930

Merged
stack72 merged 1 commit intomainfrom
remove-builtin-s3-datastore
Mar 30, 2026
Merged

feat: remove built-in S3 datastore in favor of @swamp/s3-datastore extension#930
stack72 merged 1 commit intomainfrom
remove-builtin-s3-datastore

Conversation

@stack72
Copy link
Copy Markdown
Contributor

@stack72 stack72 commented Mar 30, 2026

Summary

  • Remove all built-in S3 datastore code from the core binary
  • Delegate S3 storage entirely to the @swamp/s3-datastore extension (published as v2026.03.28.1)
  • Add generic swamp datastore setup extension <type> --config '{...}' command
  • Remove @aws-sdk/client-s3 from the binary (smaller binary, fewer transitive deps)

Follows the same pattern as #736 (vault providers → extensions). Builds on #910 which added auto-resolution and name remapping.

What changed

Deleted (6 infrastructure files + tests)

  • s3_client.ts, s3_client_test.ts, s3_lock.ts, s3_lock_test.ts, s3_cache_sync.ts, s3_datastore_verifier.ts

Type system

  • Removed S3DatastoreConfig, S3ConnectionConfig from datastore_config.ts
  • DatastoreConfig union is now FilesystemDatastoreConfig | CustomDatastoreConfig
  • isCustomDatastoreConfig checks config.type !== "filesystem" (S3 is now custom)
  • Removed S3 from BUILT_IN_DATASTORE_TYPES (only filesystem remains)

CLI

  • Removed swamp datastore setup s3 command
  • Added swamp datastore setup extension <type> --config '{...}' — works with any extension datastore
  • Legacy type names auto-remap: swamp datastore setup extension s3 --config '...'@swamp/s3-datastore

Runtime

  • Removed all config.type === "s3" branches from repo_context.ts, sync.ts, status.ts, setup.ts
  • S3 configs now flow through isCustomDatastoreConfig → extension provider handles lock, sync, health

Dependencies

  • Removed @aws-sdk/client-s3 from deno.json

Design choices

Why the provider pattern is structurally better

The old code created S3Client at 6+ call sites, manually destructuring config fields. #917 found endpoint and forcePathStyle were silently dropped at several sites, breaking S3-compatible stores. The extension stores config once at construction — this class of bug can't recur.

Sync behavioral differences (intentional)

Built-in used pullAll()/pushAll(). Extension uses pullChanged()/pushChanged() via the DatastoreSyncService interface. The "changed" variants are better for normal use (only transfer what's needed) and equivalent for empty caches. The built-in also had pullChangedForModel() for per-model scoped pulls — an optimization not available through the generic interface. Acceptable trade-off; can be added later if large datastores need it.

datastore status no longer shows S3-specific fields

Bucket/prefix/region/endpoint no longer displayed. The generic interface doesn't expose backend-specific details. Health checks still work via createVerifier().

Known limitation: offline + legacy config

If type: s3 in .swamp.yaml, user is offline, and extension isn't cached → resolveDatastoreConfig throws. Error message is clear: "Install it with: swamp extension pull @swamp/s3-datastore". Edge case — extension is cached after first use.

User impact

  • Existing type: s3 configs keep working — auto-remapped with deprecation warning
  • First use on new machine auto-installs extension (~5s one-time)
  • swamp datastore setup s3 replaced by swamp datastore setup extension @swamp/s3-datastore --config '{...}'
  • Binary is smaller (no AWS SDK bundled)

Test plan

  • deno check — clean
  • deno lint — 910 files, clean
  • deno fmt — 998 files, clean
  • deno run test — 3672 passed, 0 failed
  • deno run compile — binary compiles without @aws-sdk/client-s3
  • Manual test: compiled binary against real S3-backed repo works
  • 6 new tests for datastoreSetupExtension generator
  • CI passes

🤖 Generated with Claude Code

…tension

Remove all built-in S3 datastore code from the core binary and delegate
S3 storage entirely to the `@swamp/s3-datastore` extension. This
follows the same pattern as PR #736 which moved vault providers to
extensions.

## What changed

### Built-in S3 code removed
- Deleted 6 infrastructure files: `s3_client.ts`, `s3_lock.ts`,
  `s3_cache_sync.ts`, `s3_datastore_verifier.ts`, and their tests
- Removed `S3DatastoreConfig`, `S3ConnectionConfig` from the type system
- Removed `@aws-sdk/client-s3` from `deno.json` (no longer bundled)
- Removed all `config.type === "s3"` branches from `repo_context.ts`,
  `resolve_datastore.ts`, `setup.ts`, `sync.ts`, `status.ts`
- Removed `swamp datastore setup s3` CLI command

### Generic extension setup command added
- New `swamp datastore setup extension <type> --config '{...}'` command
  works with any extension datastore, not just S3
- Validates config against extension schema, verifies health via
  provider, migrates data via sync service, updates `.swamp.yaml`
- Supports legacy type names: `swamp datastore setup extension s3 ...`
  auto-remaps to `@swamp/s3-datastore`

### Backward compatibility preserved
- `type: s3` in `.swamp.yaml` auto-remaps to `@swamp/s3-datastore`
  with deprecation warning (from Phase 1+2 in PR #910)
- `SWAMP_DATASTORE=s3:bucket/prefix` env var format still works
- Extension auto-resolves on first use for logged-in users

### Docs and skills updated
- `design/datastores.md`: S3 now documented as extension, not built-in
- `README.md`: Updated setup examples to use extension command
- `.claude/skills/swamp-repo/SKILL.md`: Updated command reference

## Design choices

### Why the provider pattern is better than built-in
The old code created `S3Client` at 6+ call sites in `repo_context.ts`,
manually destructuring config fields each time. PR #917 found that
`endpoint` and `forcePathStyle` were silently dropped at several sites,
breaking S3-compatible stores (DigitalOcean Spaces, MinIO). The
extension's provider pattern stores config once at construction and
passes it through everywhere — structurally immune to this class of bug.

### Sync behavioral differences
The built-in S3 path used `pullAll()`/`pushAll()` for sync operations.
The extension path uses `pullChanged()`/`pushChanged()` via the
`DatastoreSyncService` interface. The "changed" variants are better for
normal use (only transfer what's needed). For empty caches, they're
equivalent since everything is "changed". The built-in also had
`pullChangedForModel()` for per-model scoped pulls — an optimization
not available through the generic interface, acceptable trade-off for
now.

### `datastore status` no longer shows S3-specific fields
Bucket, prefix, region, endpoint are no longer displayed in
`swamp datastore status`. The generic extension interface doesn't
expose backend-specific details. The health check still works via the
provider's `createVerifier()`.

### Known limitation: offline + legacy S3 config
If a user has `type: s3` in `.swamp.yaml`, is offline, and the extension
isn't cached, `resolveDatastoreConfig` throws before any command runs.
The error message is clear ("Install it with: swamp extension pull
@swamp/s3-datastore"). This is an edge case — the extension is cached
after first use.

## Test plan

- [x] `deno check` — clean
- [x] `deno lint` — 910 files, clean
- [x] `deno fmt` — 998 files, clean
- [x] `deno run test` — 3672 passed, 0 failed
- [x] `deno run compile` — binary compiles without `@aws-sdk/client-s3`
- [x] Manual test: compiled binary against real S3-backed repo, extension
      auto-resolved and command completed cleanly
- [ ] CI passes

Co-authored-by: Claude Opus 4.6 (1M context) <[email protected]>
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CLI UX Review

Blocking

None.

Suggestions

  1. Removed setup s3 subcommand leaves no breadcrumb — Users with swamp datastore setup s3 muscle memory will get a generic Cliffy "Unknown subcommand" error with no guidance toward the new command. A suggestion in the error or in the datastore setup --help epilog pointing to setup extension @swamp/s3-datastore would ease the migration. (Minor — existing .swamp.yaml configs still work transparently.)

  2. --config flag description doesn't hint at structure"JSON configuration for the datastore extension"\ is correct but gives no guidance. Something like "JSON config object for the extension (e.g., '{"bucket":"name","region":"us-east-1"}')"would make tab-complete /--help` output more self-explanatory for first-time users.

  3. datastore status no longer surfaces backend details — After setup, users running swamp datastore status see Type: @swamp/s3-datastore and a health check but no bucket/prefix/region. This regression is noted in the PR as intentional. It's fine for now, but worth tracking — operators debugging a misconfigured remote often want to confirm which bucket they're actually hitting.

Verdict

PASS — No blocking UX issues. Command, error messages, and log/JSON output modes are all consistent. The s3→extension migration is backward-compatible for existing configs, and error messages for missing extensions and bad JSON are clear and actionable.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

Blocking Issues

None.

Suggestions

  1. Comment references to S3CacheSyncService — Two files still reference S3CacheSyncService in comments (src/domain/datastore/datastore_sync_service.ts:23 and src/infrastructure/persistence/datastore_sync_coordinator.ts:41). These are minor but could be updated to reference the generic DatastoreSyncService interface instead, since the concrete class no longer exists.

Notes

  • Import boundary: CLI commands correctly import libswamp types from src/libswamp/mod.ts. Direct domain imports (datastoreTypeRegistry, resolveDatastoreType, etc.) from CLI code follow the pre-existing pattern used in other CLI files.
  • Type system: The DatastoreConfig union simplification (FilesystemDatastoreConfig | CustomDatastoreConfig) is clean. isCustomDatastoreConfig correctly checks only config.type !== "filesystem" now.
  • No dangling imports: Verified no remaining imports of deleted files (s3_client.ts, s3_lock.ts, s3_cache_sync.ts, s3_datastore_verifier.ts).
  • Test coverage: 6 new tests for datastoreSetupExtension cover valid config, skip migration, unregistered type, invalid config schema, unhealthy backend, and non-upgraded repo. Existing tests updated appropriately (S3 tests now expect UserError when extension is missing).
  • DDD alignment: The extension provider pattern (DatastoreProvider.createLock(), createVerifier(), createSyncService()) properly delegates infrastructure concerns to the extension, keeping the domain clean. The datastoreSetupExtension generator acts as an application service orchestrating the setup workflow.
  • Security: JSON config parsing uses JSON.parse in a try/catch with a UserError — no injection risk. Path traversal protection was in the deleted S3CacheSyncService and is now the extension's responsibility (appropriate boundary shift).
  • Backward compatibility: Legacy type: s3 configs auto-remap with clear error messages when the extension is missing. The RENAMED_DATASTORE_TYPES map handles the transition.

Clean removal of built-in S3 code in favor of the extension pattern. The changes are consistent and well-tested.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adversarial Review

Critical / High

No critical or high severity issues found.

Medium

  1. src/cli/commands/datastore_setup.ts:113-117 — Extension auto-resolution can fail silently before registry check

    The resolveDatastoreType call at line 116 is awaited but if it throws (e.g., network error during auto-resolution), the error propagates as an unhandled exception with a raw error message rather than a user-friendly UserError. The subsequent datastoreTypeRegistry.has() check at line 119 would handle the "extension not found" case gracefully, but a transient network failure during resolution would surface as an opaque stack trace.

    Example: User runs swamp datastore setup extension @swamp/s3-datastore --config '{}' while offline for the first time (extension not cached). resolveDatastoreType throws a network error, user sees a raw stack trace instead of the friendly "Install it with: swamp extension pull ..." message.

    Suggested fix: Wrap the resolveDatastoreType call in try/catch, letting it fall through to the !datastoreTypeRegistry.has() check which already has a clean error message.

  2. src/libswamp/datastores/setup.ts:302-306resolveCachePath fallback uses "unknown" when repoId is absent

    When input.repoId is undefined (which can happen — see the ?? fallback), the cache path becomes .swamp/repos/unknown. If two repos both lack IDs and set up extension datastores, they'll share the same cache directory, causing data cross-contamination during migration.

    Example: Two repos without repoId (e.g., during init race condition or corrupted marker) → both get cache at ~/.swamp/repos/unknown/ → migration from repo A overwrites repo B's cache.

    Suggested fix: This is a pre-existing pattern (the old S3 code had the same fallback), so it's not a regression. But since you're rewriting the setup flow, consider generating a random fallback ID or erroring when repoId is missing.

  3. src/libswamp/datastores/setup.ts:292-296updateRepoConfig is called before migration, but migration errors don't roll back the config

    If updateRepoConfig succeeds (writes extension type to .swamp.yaml) but the subsequent migration or push fails, the repo is left with a .swamp.yaml pointing to the extension datastore but with data still only in the old location. The errors array captures push failures but the "completed" event is still emitted.

    Example: pushChanged() throws. User sees a "completed" event with errors. .swamp.yaml now says type: @swamp/s3-datastore but data never made it to S3. Next command tries to pull from an empty remote store.

    Mitigation: This was also the behavior in the old S3 setup code (config was written before push), so it's not a regression. The errors array does communicate the failure. But worth noting the user could be confused by "completed with errors."

Low

  1. src/cli/commands/datastore_setup.ts:128-129JSON.parse with user-supplied --config could throw on non-string values
    The as Record<string, unknown> cast after JSON.parse doesn't validate the parsed value is actually an object. JSON.parse('"hello"') returns a string, JSON.parse('42') returns a number. Both would pass the parse but fail later when the provider tries to use the config as a Record.

    Not a security issue — Cliffy guarantees options.config is a string — but JSON.parse("true") would produce a boolean that silently passes through until the provider blows up with an opaque error.

    Suggested fix: Add typeof config !== "object" || config === null || Array.isArray(config) check after parse.

  2. Removed already_exists check for S3 datastores — The old datastoreSetupS3 had a checkS3DatastoreExists guard that prevented accidentally pointing a new repo at an existing S3 datastore. The new datastoreSetupExtension has no equivalent. If an extension provider doesn't implement its own "already exists" check, users can accidentally clobber remote data during setup migration. This is a behavioral change, not a bug — the extension is now responsible for this check.

Verdict

PASS — Clean removal of built-in S3 code in favor of the extension pattern. No dangling imports, no stale type references, no broken code paths. The isCustomDatastoreConfig type guard correctly updated to treat type !== "filesystem" as custom. The DatastoreConfig union properly simplified. All callers consistently route through extension provider pattern. The medium findings are pre-existing patterns or edge cases, not regressions.

@stack72 stack72 merged commit db8acf7 into main Mar 30, 2026
9 of 10 checks passed
@stack72 stack72 deleted the remove-builtin-s3-datastore branch March 30, 2026 17:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant