Skip to content

fix: skip extension loading for help/version commands, parallelize loaders (#738)#741

Merged
stack72 merged 2 commits intomainfrom
fix/slow-help-startup
Mar 17, 2026
Merged

fix: skip extension loading for help/version commands, parallelize loaders (#738)#741
stack72 merged 2 commits intomainfrom
fix/slow-help-startup

Conversation

@stack72
Copy link
Copy Markdown
Contributor

@stack72 stack72 commented Mar 17, 2026

Problem

Every CLI invocation — including swamp --help — was unconditionally loading all user extensions before running any command. This caused a ~42 second startup time, making even basic usage feel broken.

The root cause was four sequential loader calls, each independently:

  • Creating a RepoMarkerRepository and reading the marker file from disk
  • Creating an EmbeddedDenoRuntime and calling ensureDeno()
  • Spawning deno bundle subprocesses to compile extension files

With 4 sequential loaders and potentially many extension files, startup time was dominated by this overhead even when no extensions were needed.

Changes

1. Skip extension loading for commands that don't need it

Added commandNeedsExtensions() which checks the pre-parsed command against a set of commands that never use user extensions:

"", "help", "version", "completions", "init", "update", "auth", "telemetry", "issue"

swamp --help, swamp version, swamp completions bash, etc. are now instant.

2. Read marker and runtime once, share across all loaders

Previously each of the 4 loadUser* functions independently read the repo marker and created an EmbeddedDenoRuntime. Now the marker is read once (reusing the value already needed for resolveLogLevel) and a single EmbeddedDenoRuntime instance is shared — eliminating 3 redundant disk reads and 3 redundant runtime instantiations.

3. Run all 4 loaders in parallel

Replaced sequential await calls with Promise.all():

await Promise.all([
  loadUserModels(repoDir, marker, denoRuntime),
  loadUserVaults(repoDir, marker, denoRuntime),
  loadUserDrivers(repoDir, marker, denoRuntime),
  loadUserDatastores(repoDir, marker, denoRuntime),
]);

This is safe because each loader writes to its own registry, reads from a separate source directory, and writes to a separate bundle cache directory. ensureDeno() is idempotent (checks a version marker file before extracting). JavaScript's single-threaded event loop means Map.set() calls within registries are atomic within a tick.

User Impact

Command Before After
swamp --help ~42 seconds < 1 second
swamp version ~42 seconds < 1 second
swamp completions bash ~42 seconds < 1 second
swamp model type search aws ~42 seconds significantly faster (parallel loading)

Commands that need extensions (e.g. model, workflow, data, vault) still load all extensions, but now in parallel rather than sequentially.

Files Changed

  • src/cli/mod.ts — all changes in this single file

Test Plan

  • deno check — type checking passes
  • deno lint — no lint errors
  • deno fmt --check — formatting correct
  • deno run test — 3156 tests passed, 0 failed
  • deno run compile — binary compiled successfully

Fixes #738

…aders

Every CLI invocation was unconditionally loading all user extensions
(models, vaults, drivers, datastores) — each spawning deno bundle
subprocesses and creating its own EmbeddedDenoRuntime. With 4
sequential loaders this caused ~42 second startup times, even for
swamp --help.

- Add commandNeedsExtensions() to skip loading for commands that never
  need user extensions: help, version, completions, init, update,
  auth, telemetry, issue
- Read the repo marker once and share a single EmbeddedDenoRuntime
  across all loaders, eliminating 3 redundant marker reads and 3
  redundant runtime instantiations
- Run all 4 loaders in parallel with Promise.all() instead of
  sequential awaits

Fixes #738

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
github-actions[bot]

This comment was marked as outdated.

github-actions[bot]

This comment was marked as outdated.

Address review feedback on PR #741:

- Add 14 unit tests for commandNeedsExtensions() covering skip commands
  (help, version, completions, init, update, auth, telemetry, issue,
  bare invocation) and commands that need extensions (model, workflow,
  data), plus global flag variants

- Fix race condition in EmbeddedDenoRuntime.ensureDeno(): concurrent
  callers (e.g. the 4 parallel extension loaders) all saw cachedPath=null
  on the same tick and each spawned their own extraction subprocess.
  Add extractionPromise field so all concurrent callers share a single
  extraction and await the same promise.

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

This is a well-implemented performance optimization that fixes the ~42 second startup time for swamp --help and other commands that don't need user extensions. The changes are clean, safe, and well-tested.

No Blocking Issues Found


Code Quality ✓

  • TypeScript strict mode: No any types introduced, proper typing throughout
  • Named exports: commandNeedsExtensions correctly exported for testing
  • License headers: All files have AGPLv3 headers

DDD Compliance ✓

  • EmbeddedDenoRuntime is appropriately in the infrastructure layer
  • Loader functions remain application-level orchestration
  • No domain logic violations - purely CLI/infrastructure optimization

Test Coverage ✓

  • Comprehensive tests added for commandNeedsExtensions():
    • All 9 skip commands tested
    • Commands that need extensions tested (model, workflow, data)
    • Global flags before commands tested
  • Existing EmbeddedDenoRuntime tests verify caching behavior

Correctness ✓

  1. SKIP_EXTENSION_COMMANDS: Conservative list of commands that definitely don't need extensions
  2. Parallel Promise.all(): Safe because each loader writes to its own registry and has internal try-catch (never throws)
  3. Shared EmbeddedDenoRuntime: Safe with new extractionPromise caching to prevent concurrent extractions
  4. Shared marker: Good optimization - reads once instead of 4 times

Security ✓

No vulnerabilities introduced - purely a performance optimization with no new input handling.


Suggestions (Non-blocking)

  1. Minor: Consider adding a test case for commandNeedsExtensions(["--help"]) to explicitly verify this edge case (though it's implicitly covered since extractCommandInfo(["--help"]) returns {command: ""} which matches the empty string in SKIP_EXTENSION_COMMANDS).

CI passing (lint, test, format, dependency audit). Good to merge!

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adversarial Review

I systematically attempted to break this code across logic, error handling, concurrency, and data integrity dimensions.

Critical / High

None found.

Medium

1. Two separate EmbeddedDenoRuntime instances are created (src/cli/mod.ts:510 and src/cli/mod.ts:538)

When both commandNeedsExtensions(args) is true AND trustedCollectives.length > 0 && marker is true (the common case), two separate EmbeddedDenoRuntime instances are created:

// Line 510 - for extension loading
if (commandNeedsExtensions(args)) {
  const denoRuntime = new EmbeddedDenoRuntime();  // instance 1
  await Promise.all([...]);
}

// Line 538 - for auto-resolver
if (trustedCollectives.length > 0 && marker) {
  const denoRuntime = new EmbeddedDenoRuntime();  // instance 2
  ...
}

While functionally correct (the file-level version marker prevents double extraction), this means:

  • Redundant object instantiation
  • Each instance maintains its own cachedPath and extractionPromise state
  • The second instance will re-check the disk even though extraction is already complete

Suggested fix: Move EmbeddedDenoRuntime creation before the if (commandNeedsExtensions(args)) block and share the single instance across both usages:

const denoRuntime = new EmbeddedDenoRuntime();

if (commandNeedsExtensions(args)) {
  await Promise.all([
    loadUserModels(repoDir, marker, denoRuntime),
    ...
  ]);
}
// ... later ...
if (trustedCollectives.length > 0 && marker) {
  setAutoResolver(new ExtensionAutoResolver({
    extensionInstaller: createAutoResolveInstallerAdapter({
      denoRuntime,  // reuse same instance
      ...
    }),
  }));
}

Low

1. Test coverage gap for --option=value syntax (src/cli/mod_test.ts)

The tests verify --json (a boolean flag in GLOBAL_OPTIONS) works correctly, but don't test value-taking options with = syntax like --log-level=debug. The underlying extractCommandInfo() function has a pre-existing parsing quirk where --option=value style args incorrectly skip the next positional argument.

Example: ["--log-level=debug", "model"] would parse command as "" because model is skipped as a presumed option value.

This doesn't affect extension loading correctness (empty command still skips extensions, which is correct behavior for that edge case), but a test documenting this behavior would be valuable.

2. TOCTOU in file-level extraction check (src/infrastructure/runtime/embedded_deno_runtime.ts:74-85)

The version marker check followed by extraction has a theoretical race:

const existingVersion = (await Deno.readTextFile(versionMarker)).trim();
if (existingVersion === embeddedVersion.value) {
  await Deno.stat(targetBinary);
  return targetBinary;  // early return
}
// ... proceeds to extract

If two separate EmbeddedDenoRuntime instances (from the Medium issue above) both check simultaneously before either completes extraction, both could proceed to extract. This is harmless (identical content written twice) but wasteful. The promise-sharing fix within a single instance is correct; the issue only manifests with multiple instances.

Verdict

PASS — This is a well-implemented performance optimization. The parallel loading via Promise.all() is safe (separate directories and registries), the extraction promise sharing correctly prevents redundant subprocess spawns within an instance, and error handling is robust (all loaders catch errors internally). The Medium issue is a minor efficiency concern, not a correctness bug.

@stack72 stack72 merged commit ede9f2a into main Mar 17, 2026
6 checks passed
@stack72 stack72 deleted the fix/slow-help-startup branch March 17, 2026 21:20
stack72 added a commit that referenced this pull request Mar 19, 2026
…ecution (#743)

## Problem

With 104+ user extension models, `swamp model type search` (and any
other command that loads extensions) took **~46 seconds** to start.
After the fix in #741 that parallelised the 4 loader types and skipped
loading for help/version commands, the remaining bottleneck was in
`UserModelLoader` itself.

At startup, `loadModels()` eagerly built a **self-contained bundle** for
every model file — a separate `deno bundle` subprocess per model that
inlines all dependencies (including zod) so the bundle can run inside
Docker containers without network access. With 104 models, this was 104
sequential subprocesses on every single CLI invocation, regardless of
whether Docker execution would ever be used.

## Architecture Decision & Tradeoffs

### What changed

`bundleSource?: string` on `ModelDefinition` (a pre-built JS string
stored at load time) has been replaced with `bundleSourceFactory?: () =>
Promise<string>` — a memoizing closure that defers the expensive work to
the point of actual need.

```typescript
// Before: runs at startup, for every model, every invocation
modelDef.bundleSource = await bundleExtension(absolutePath, denoPath, { selfContained: true });

// After: closure set at load time, executed only on first Docker execution
let cachedBundle: string | undefined;
modelDef.bundleSourceFactory = async () => {
  if (!cachedBundle) {
    cachedBundle = await bundleExtension(absolutePath, denoPath, { selfContained: true });
  }
  return cachedBundle;
};
```

### Tradeoff: startup cost vs. first-execution cost

| | Before | After |
|---|---|---|
| Every CLI invocation | Pays bundling cost for all N models | Pays
nothing |
| First Docker execution of model A | Already paid at startup | Pays
bundling cost for model A only |
| Second Docker execution of model A | Pre-built | Memoized in-process —
instant |

The first out-of-process execution of a given model will be slightly
slower than before — it now bundles on demand rather than having it
pre-built. This is the right tradeoff because:

1. **Docker execution is rare** relative to everyday CLI usage (`type
search`, `model get`, `data list`, etc.)
2. **Cost is proportional to actual need** — only the models you
actually run out-of-process are ever bundled
3. **Memoization** ensures the cost is paid at most once per model per
process invocation
4. **The startup tax was paid unconditionally** regardless of what
command you ran — even read-only commands that never touch Docker

### Why only models?

Vaults, drivers, and datastores do not create self-contained bundles at
all — they only use the externalized (cached) bundle for in-process
execution. This change is model-specific because only models support
out-of-process/Docker execution via `bundleSource`.

## User Impact

**Measured on a real repo with 104 extension models:**

| Command | Before | After |
|---|---|---|
| `swamp model type search aws` | ~46 seconds | **~2.8 seconds** |
| `swamp model get my-model` | ~46 seconds | **~2.8 seconds** |
| `swamp data list` | ~46 seconds | **~2.8 seconds** |
| First Docker `model run` | Instant (pre-built) | ~same as before
(built on demand) |

**16x speedup** for the everyday command path. The remaining ~2.8s is
the warm-cache cost of the externalized bundle loading (disk reads +
dynamic imports for 104 files), which is a separate optimization
opportunity.

## Files Changed

| File | Change |
|---|---|
| `src/domain/models/model.ts` | `bundleSource?: string` →
`bundleSourceFactory?: () => Promise<string>` |
| `src/domain/models/user_model_loader.ts` | Remove eager bundling, set
memoizing factory closure |
| `src/domain/models/method_execution_service.ts` | Await
`bundleSourceFactory?.()` at execution time |

## Test Plan

- [x] `deno check` — type checking passes
- [x] `deno lint` — no lint errors
- [x] `deno fmt --check` — formatting correct
- [x] `deno run test` — 3175 tests passed, 0 failed
- [x] `deno run compile` — binary compiled successfully
- [x] Manual: `swamp model type search` in 104-model repo — 46s → 2.8s

---------

Co-authored-by: Claude Sonnet 4.6 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

It takes 42 seconds to run swamp --help

1 participant