feat: add context.readModelData() for first-class cross-model data access by keeb · Pull Request #919 · systeminit/swamp

keeb · 2026-03-30T03:40:36Z

Summary

Adds context.readModelData(modelName, specName?) to MethodContext so extension models can read another model's data without manual definition resolution and raw repository calls
Introduces DataAccessService domain service that handles model name resolution, data reading, JSON parsing, vault ref resolution, and orphan data recovery
Extracts DataRecord value object to src/domain/data/data_record.ts as a shared type between the expression system and data access layer

Test Plan

10 unit tests for DataAccessService covering: model resolution, spec filtering, empty/missing models, renamed data, vault refs, non-JSON content, orphan recovery
4 integration tests with real file-backed repositories: cross-model read, spec filter, non-existent model, no-data model
All 3699 existing tests pass
deno check, deno lint, deno fmt clean

Closes #914

🤖 Generated with Claude Code

…cess (#914) Extension models that need to read another model's data at runtime can now use `context.readModelData(modelName, specName?)` instead of manually resolving definitions and reading raw repository data. - Extract DataRecord value object to src/domain/data/data_record.ts - Add DataAccessService domain service with orphan data recovery - Add readModelData to MethodContext, wired via RawExecutionDriver - 10 unit tests + 4 integration tests Closes #914 Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

keeb · 2026-03-30T03:56:50Z

First consumer report

Used this in the @keeb/mms-dedup extension model tonight. The dedup model reads episode data from a source model (by name), checks local directories for existing files, and outputs only new episodes.

Before (workaround)

~80 lines of filesystem hacking:

resolveModelByName() — scan YAML files in models/ to find type + ID
readLatestData() — walk .swamp/data/{type}/{id}/, read latest pointer files, parse raw JSON
walkYaml() — recursive directory walker

const resolved = await resolveModelByName(context.repoDir, args.sourceModel);
const dataDir = `${context.repoDir}/.swamp/data/${resolved.type}/${resolved.id}`;
const items = await readLatestData(dataDir);

After

const items = await context.readModelData(args.sourceModel, "episode");

One line. Model name resolution, data reading, JSON parsing — all handled.

On coupling concerns

The API is read-only and name-based, which keeps it loose. The dedup model doesn't know or care about the source model's type, schema, or internals — it just reads data by name and spec. The coupling is the same as what CEL expressions already do in workflows (data.findBySpec("anime-source", "episode")), just available inside execute functions.

The alternative was each model being an island at runtime, which pushed all composition into workflow YAML. That works for simple chains but falls apart when a model needs to iterate over another model's data with logic (LLM-parse if raw, skip if pre-parsed, deduplicate within results). Workflows can't express that — it requires code.

The key property: models can read each other's data but can't write to each other's namespace. That's the right boundary.

github-actions

Adversarial Review

Critical / High

HIGH — Orphan data content lookup uses wrong modelId (src/domain/data/data_access_service.ts:126-131)

readModelData calls dataToRecord(data, resolved.modelType, resolved.modelId) for all data items, including orphans. dataToRecord then calls this.dataRepo.getContent(modelType, modelId, data.name) — but for orphan data, the content is stored on disk under the old model UUID, not resolved.modelId.

FileSystemUnifiedDataRepository.getContent resolves the content path using modelId (via getContentPath → getDataNameDir), so it will look in the wrong directory and return null. The result: orphan-recovered data silently has attributes: {} instead of its actual content.

Breaking example: Model "foo" is deleted and recreated (new UUID). Old data exists under old UUID. findOrphanData correctly finds the orphan Data metadata. But dataToRecord tries to read content at .swamp/data/<type>/<new-uuid>/<dataName>/... instead of .swamp/data/<type>/<old-uuid>/<dataName>/.... Content is null, JSON parse is skipped, attributes is {}.

Why the unit test doesn't catch this: The mock getContent in data_access_service_test.ts:596-601 looks up by dataName only, ignoring _type and _modelId parameters entirely. The mock masks the real behavior where modelId is part of the filesystem path.

Suggested fix: Track the original modelId for orphan data items (e.g., via the _orphanModelId field that's already in the return type signature but never actually set). Pass the correct modelId to dataToRecord for each item. Also fix the mock to key on (modelId, dataName) pairs so the test would fail without the fix.
HIGH — findOrphanData calls findAll inside a loop, causing N+1 queries (src/domain/data/data_access_service.ts:198)

The Tier 2 heuristic calls this.definitionRepo.findAll(modelType) inside the for (const [, items] of byModelId) loop. If there are M distinct old model IDs, this makes M identical repository queries. For filesystem-backed repos, that's M directory scans.

More importantly: the result of findAll is the same for every iteration (it depends on modelType, which is constant across the loop). This should be hoisted above the loop and called once.

Suggested fix: Move const allDefsOfType = await this.definitionRepo.findAll(modelType) before the loop.

Medium

MEDIUM — findAllGlobal() is unbounded and loads all data from all models (src/domain/data/data_access_service.ts:159)

Every call to readModelData calls findAllGlobal(), which scans the entire .swamp/data/ directory tree across all model types. For repos with many models and large data sets, this is an expensive operation performed on every cross-model read. There is no caching, filtering, or short-circuit.

This may be acceptable for now but becomes a performance problem at scale.
MEDIUM — readModelData is optional on MethodContext (src/domain/models/model.ts:241)

The property is declared as readModelData?: .... Extension model authors calling context.readModelData(...) will get a runtime error if the context wasn't created by RawExecutionDriver (or any other driver that doesn't populate it). Since this is a new API, it would be safer to have a clear error message rather than TypeError: context.readModelData is not a function.
MEDIUM — Tier 2 orphan heuristic can misattribute data (src/domain/data/data_access_service.ts:198-200)

If a model type has exactly one definition, the heuristic claims ALL orphan data under that type belongs to that definition. But the orphan data could belong to a different model definition that was deleted and not recreated. The single remaining definition gets data that was never its own.

Low

LOW — _orphanModelId in return type but never set (src/domain/data/data_access_service.ts:155-156)

The return type of findOrphanData declares { _orphanModelId?: string } but the returned Data objects never have this field set. This appears to be a vestige of an incomplete implementation. It's harmless since the field is optional, but it's dead code in the type signature.

Verdict

FAIL — The orphan data content lookup bug (finding #1) means orphan recovery silently returns empty data, which is a data correctness issue in a primary feature of this PR. The test mocks mask the real behavior.

github-actions

Code Review

Blocking Issues

Bug: Orphan data content will never be found on disk — In readModelData() (line 126-131 of data_access_service.ts), orphan data recovered by findOrphanData is passed to dataToRecord with resolved.modelId (the current definition UUID). But the content files are stored under the old model UUID on disk. Since FileSystemUnifiedDataRepository.getContent() uses modelId to build the file path, getContent will return null for all orphan data, and the record will have empty attributes: {}.

The unit test passes because the mock getContent ignores _modelId and matches only on dataName. To fix: findOrphanData needs to return the original modelId alongside each Data item, and dataToRecord must use that modelId for orphan items instead of resolved.modelId.
Unused _orphanModelId in return type — findOrphanData (line 154-157) declares Data & { _orphanModelId?: string } in its return type, suggesting this was intended to carry the old model ID — but it's never actually set. This is the missing piece that would fix issue #1. The field needs to be populated and then consumed in readModelData.

Suggestions

Remove unnecessary re-export — model_resolver.ts line 52 re-exports DataRecord "for backwards compatibility", but no file imports DataRecord from model_resolver.ts. The re-export can be removed entirely, and the comment about backwards compatibility deleted.
Inline import("./data.ts").Data type references — findOrphanData and dataToRecord use import("./data.ts").Data inline three times instead of a top-level import. Since Data is already used transitively (and the file imports from ./data_record.ts in the same module), a normal import type { Data } from "./data.ts" at the top would be cleaner.
Missing integration test for orphan recovery — The integration tests cover the happy path well but don't test orphan data recovery with real file-backed repos. Adding one would have caught blocking issue #1.

Address review feedback: - Track original modelId per data item via LocatedData so orphan-recovered data reads content from the old UUID path, not the current definition's - Hoist findAll(modelType) above the loop to avoid N+1 queries - Replace inline import("./data.ts").Data with top-level import - Add integration test for orphan recovery with real file-backed repos - Fix unit test mock to key on (modelId, dataName) pairs Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

github-actions

Code Review

Blocking Issues

None.

Suggestions

findAllGlobal() in orphan recovery (src/domain/data/data_access_service.ts:457): findOrphanData calls this.dataRepo.findAllGlobal() on every readModelData() invocation, scanning all data across all models. For large repos this could become a hot path. Consider caching this within a single readModelData call or adding a more targeted query (e.g., findAllForType(modelType)) in a follow-up if performance becomes a concern.
Domain→infrastructure type import (src/domain/data/data_access_service.ts:22): The UnifiedDataRepository import from infrastructure is acknowledged via the ratchet bump (21→22). If a domain-level port/interface for data repositories is ever introduced, this service would be a good candidate for migration.

Overall: well-structured domain service with clean DDD modeling, thorough test coverage (10 unit + 5 integration), and proper backward-compatible extraction of DataRecord. LGTM.

github-actions

Adversarial Review

Critical / High

None found.

Medium

Tier 2 orphan heuristic can misattribute data from a different deleted model — src/domain/data/data_access_service.ts:173-181

The single-definition heuristic says: if only one definition exists for this model type, claim all orphan data of that type. This is correct for the intended scenario (delete/recreate the same model), but breaks when a different model of the same type was deleted.

Breaking example:
- Model A (test/source, uuid-1) has data {value: "A"}
- Model B (test/source, uuid-2) has data {value: "B"}
- Model B is deleted (definition removed, data stays on disk)
- readModelData("model-a") is called
- findOrphanData finds uuid-2's data. Since allDefsOfType.length === 1 (only Model A remains), the heuristic claims Model B's data as belonging to Model A
- Result: readModelData("model-a") returns {value: "B"} — data that never belonged to Model A
Suggested fix: Restrict tier 2 to only fire when the orphan model ID doesn't match any existing definition (i.e., it's truly orphaned from a deleted definition), AND the type only has one definition. Additionally, consider checking if the definition count for the type has only ever been 1 (e.g., via a count of distinct model IDs in the data), though that may be impractical. At minimum, document this limitation clearly.
findAllGlobal() called unconditionally on every readModelData — src/domain/data/data_access_service.ts:119-123

Orphan recovery runs on every call, even when the current model has data under its current UUID. findAllGlobal() scans all data across all models. For repositories with many models and data items, this adds O(total-data) overhead to every cross-model read.

Suggested improvement: Consider skipping orphan recovery when currentData is non-empty, or making it opt-in via an options parameter, since the common case is that data lives under the correct UUID.

Low

Silent JSON parse failure returns empty attributes — src/domain/data/data_access_service.ts:206-208

If stored JSON is corrupted (truncated write, encoding issue), JSON.parse throws, the catch swallows the error, and a DataRecord with attributes: {} is returned. The caller cannot distinguish "this data has no fields" from "this data is corrupted." This is probably fine for resilience but could make debugging data issues difficult. A debug-level log line in the catch block would help.
readModelData is optional on MethodContext — src/domain/models/model.ts:243

Marked with ?, so extension drivers that don't add it won't break, but extension model authors calling context.readModelData?.(...) get undefined with no error message explaining why. This is a natural consequence of backwards compatibility and not really a bug, just worth noting.

Verdict

PASS — The core logic is correct for the primary use case. The orphan recovery tier-2 heuristic has a data misattribution edge case (medium #1) that should be acknowledged or addressed, but it's in an uncommon path and doesn't affect the happy path of cross-model data reads. The DataRecord extraction, vault ref resolution, spec filtering, and renamed/deleted filtering are all implemented correctly.

keeb mentioned this pull request Mar 30, 2026

context.readModelData(modelName) — first-class cross-model data access in execute functions #914

Closed

Merge branch 'main' into keeb/914-cross-model-data-access

8d26153

keeb marked this pull request as ready for review March 30, 2026 16:29

github-actions Bot requested changes Mar 30, 2026

View reviewed changes

keeb and others added 2 commits March 30, 2026 09:42

fix: revert unrelated skill/CLAUDE.md changes accidentally staged

54e4637

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

github-actions Bot approved these changes Mar 30, 2026

View reviewed changes

github-actions Bot reviewed Mar 30, 2026

View reviewed changes

stack72 merged commit 56427f9 into main Mar 30, 2026
10 checks passed

stack72 deleted the keeb/914-cross-model-data-access branch March 30, 2026 16:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add context.readModelData() for first-class cross-model data access#919

feat: add context.readModelData() for first-class cross-model data access#919
stack72 merged 4 commits intomainfrom
keeb/914-cross-model-data-access

keeb commented Mar 30, 2026

Uh oh!

keeb commented Mar 30, 2026

Uh oh!

github-actions Bot left a comment

Uh oh!

github-actions Bot left a comment

Uh oh!

github-actions Bot left a comment

Uh oh!

github-actions Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

keeb commented Mar 30, 2026

Summary

Test Plan

Uh oh!

keeb commented Mar 30, 2026

First consumer report

Before (workaround)

After

On coupling concerns

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Adversarial Review

Critical / High

Medium

Low

Verdict

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Code Review

Blocking Issues

Suggestions

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Code Review

Blocking Issues

Suggestions

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Adversarial Review

Critical / High

Medium

Low

Verdict

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants