Skip to content

feat: add context.readModelData() for first-class cross-model data access#919

Merged
stack72 merged 4 commits intomainfrom
keeb/914-cross-model-data-access
Mar 30, 2026
Merged

feat: add context.readModelData() for first-class cross-model data access#919
stack72 merged 4 commits intomainfrom
keeb/914-cross-model-data-access

Conversation

@keeb
Copy link
Copy Markdown
Contributor

@keeb keeb commented Mar 30, 2026

Summary

  • Adds context.readModelData(modelName, specName?) to MethodContext so extension models can read another model's data without manual definition resolution and raw repository calls
  • Introduces DataAccessService domain service that handles model name resolution, data reading, JSON parsing, vault ref resolution, and orphan data recovery
  • Extracts DataRecord value object to src/domain/data/data_record.ts as a shared type between the expression system and data access layer

Test Plan

  • 10 unit tests for DataAccessService covering: model resolution, spec filtering, empty/missing models, renamed data, vault refs, non-JSON content, orphan recovery
  • 4 integration tests with real file-backed repositories: cross-model read, spec filter, non-existent model, no-data model
  • All 3699 existing tests pass
  • deno check, deno lint, deno fmt clean

Closes #914

🤖 Generated with Claude Code

…cess (#914)

Extension models that need to read another model's data at runtime can now
use `context.readModelData(modelName, specName?)` instead of manually
resolving definitions and reading raw repository data.

- Extract DataRecord value object to src/domain/data/data_record.ts
- Add DataAccessService domain service with orphan data recovery
- Add readModelData to MethodContext, wired via RawExecutionDriver
- 10 unit tests + 4 integration tests

Closes #914

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@keeb
Copy link
Copy Markdown
Contributor Author

keeb commented Mar 30, 2026

First consumer report

Used this in the @keeb/mms-dedup extension model tonight. The dedup model reads episode data from a source model (by name), checks local directories for existing files, and outputs only new episodes.

Before (workaround)

~80 lines of filesystem hacking:

  • resolveModelByName() — scan YAML files in models/ to find type + ID
  • readLatestData() — walk .swamp/data/{type}/{id}/, read latest pointer files, parse raw JSON
  • walkYaml() — recursive directory walker
const resolved = await resolveModelByName(context.repoDir, args.sourceModel);
const dataDir = `${context.repoDir}/.swamp/data/${resolved.type}/${resolved.id}`;
const items = await readLatestData(dataDir);

After

const items = await context.readModelData(args.sourceModel, "episode");

One line. Model name resolution, data reading, JSON parsing — all handled.

On coupling concerns

The API is read-only and name-based, which keeps it loose. The dedup model doesn't know or care about the source model's type, schema, or internals — it just reads data by name and spec. The coupling is the same as what CEL expressions already do in workflows (data.findBySpec("anime-source", "episode")), just available inside execute functions.

The alternative was each model being an island at runtime, which pushed all composition into workflow YAML. That works for simple chains but falls apart when a model needs to iterate over another model's data with logic (LLM-parse if raw, skip if pre-parsed, deduplicate within results). Workflows can't express that — it requires code.

The key property: models can read each other's data but can't write to each other's namespace. That's the right boundary.

@keeb keeb marked this pull request as ready for review March 30, 2026 16:29
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adversarial Review

Critical / High

  1. HIGH — Orphan data content lookup uses wrong modelId (src/domain/data/data_access_service.ts:126-131)

    readModelData calls dataToRecord(data, resolved.modelType, resolved.modelId) for all data items, including orphans. dataToRecord then calls this.dataRepo.getContent(modelType, modelId, data.name) — but for orphan data, the content is stored on disk under the old model UUID, not resolved.modelId.

    FileSystemUnifiedDataRepository.getContent resolves the content path using modelId (via getContentPathgetDataNameDir), so it will look in the wrong directory and return null. The result: orphan-recovered data silently has attributes: {} instead of its actual content.

    Breaking example: Model "foo" is deleted and recreated (new UUID). Old data exists under old UUID. findOrphanData correctly finds the orphan Data metadata. But dataToRecord tries to read content at .swamp/data/<type>/<new-uuid>/<dataName>/... instead of .swamp/data/<type>/<old-uuid>/<dataName>/.... Content is null, JSON parse is skipped, attributes is {}.

    Why the unit test doesn't catch this: The mock getContent in data_access_service_test.ts:596-601 looks up by dataName only, ignoring _type and _modelId parameters entirely. The mock masks the real behavior where modelId is part of the filesystem path.

    Suggested fix: Track the original modelId for orphan data items (e.g., via the _orphanModelId field that's already in the return type signature but never actually set). Pass the correct modelId to dataToRecord for each item. Also fix the mock to key on (modelId, dataName) pairs so the test would fail without the fix.

  2. HIGH — findOrphanData calls findAll inside a loop, causing N+1 queries (src/domain/data/data_access_service.ts:198)

    The Tier 2 heuristic calls this.definitionRepo.findAll(modelType) inside the for (const [, items] of byModelId) loop. If there are M distinct old model IDs, this makes M identical repository queries. For filesystem-backed repos, that's M directory scans.

    More importantly: the result of findAll is the same for every iteration (it depends on modelType, which is constant across the loop). This should be hoisted above the loop and called once.

    Suggested fix: Move const allDefsOfType = await this.definitionRepo.findAll(modelType) before the loop.

Medium

  1. MEDIUM — findAllGlobal() is unbounded and loads all data from all models (src/domain/data/data_access_service.ts:159)

    Every call to readModelData calls findAllGlobal(), which scans the entire .swamp/data/ directory tree across all model types. For repos with many models and large data sets, this is an expensive operation performed on every cross-model read. There is no caching, filtering, or short-circuit.

    This may be acceptable for now but becomes a performance problem at scale.

  2. MEDIUM — readModelData is optional on MethodContext (src/domain/models/model.ts:241)

    The property is declared as readModelData?: .... Extension model authors calling context.readModelData(...) will get a runtime error if the context wasn't created by RawExecutionDriver (or any other driver that doesn't populate it). Since this is a new API, it would be safer to have a clear error message rather than TypeError: context.readModelData is not a function.

  3. MEDIUM — Tier 2 orphan heuristic can misattribute data (src/domain/data/data_access_service.ts:198-200)

    If a model type has exactly one definition, the heuristic claims ALL orphan data under that type belongs to that definition. But the orphan data could belong to a different model definition that was deleted and not recreated. The single remaining definition gets data that was never its own.

Low

  1. LOW — _orphanModelId in return type but never set (src/domain/data/data_access_service.ts:155-156)

    The return type of findOrphanData declares { _orphanModelId?: string } but the returned Data objects never have this field set. This appears to be a vestige of an incomplete implementation. It's harmless since the field is optional, but it's dead code in the type signature.

Verdict

FAIL — The orphan data content lookup bug (finding #1) means orphan recovery silently returns empty data, which is a data correctness issue in a primary feature of this PR. The test mocks mask the real behavior.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

Blocking Issues

  1. Bug: Orphan data content will never be found on disk — In readModelData() (line 126-131 of data_access_service.ts), orphan data recovered by findOrphanData is passed to dataToRecord with resolved.modelId (the current definition UUID). But the content files are stored under the old model UUID on disk. Since FileSystemUnifiedDataRepository.getContent() uses modelId to build the file path, getContent will return null for all orphan data, and the record will have empty attributes: {}.

    The unit test passes because the mock getContent ignores _modelId and matches only on dataName. To fix: findOrphanData needs to return the original modelId alongside each Data item, and dataToRecord must use that modelId for orphan items instead of resolved.modelId.

  2. Unused _orphanModelId in return typefindOrphanData (line 154-157) declares Data & { _orphanModelId?: string } in its return type, suggesting this was intended to carry the old model ID — but it's never actually set. This is the missing piece that would fix issue #1. The field needs to be populated and then consumed in readModelData.

Suggestions

  1. Remove unnecessary re-exportmodel_resolver.ts line 52 re-exports DataRecord "for backwards compatibility", but no file imports DataRecord from model_resolver.ts. The re-export can be removed entirely, and the comment about backwards compatibility deleted.

  2. Inline import("./data.ts").Data type referencesfindOrphanData and dataToRecord use import("./data.ts").Data inline three times instead of a top-level import. Since Data is already used transitively (and the file imports from ./data_record.ts in the same module), a normal import type { Data } from "./data.ts" at the top would be cleaner.

  3. Missing integration test for orphan recovery — The integration tests cover the happy path well but don't test orphan data recovery with real file-backed repos. Adding one would have caught blocking issue #1.

keeb and others added 2 commits March 30, 2026 09:42
Address review feedback:
- Track original modelId per data item via LocatedData so orphan-recovered
  data reads content from the old UUID path, not the current definition's
- Hoist findAll(modelType) above the loop to avoid N+1 queries
- Replace inline import("./data.ts").Data with top-level import
- Add integration test for orphan recovery with real file-backed repos
- Fix unit test mock to key on (modelId, dataName) pairs

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

Blocking Issues

None.

Suggestions

  1. findAllGlobal() in orphan recovery (src/domain/data/data_access_service.ts:457): findOrphanData calls this.dataRepo.findAllGlobal() on every readModelData() invocation, scanning all data across all models. For large repos this could become a hot path. Consider caching this within a single readModelData call or adding a more targeted query (e.g., findAllForType(modelType)) in a follow-up if performance becomes a concern.

  2. Domain→infrastructure type import (src/domain/data/data_access_service.ts:22): The UnifiedDataRepository import from infrastructure is acknowledged via the ratchet bump (21→22). If a domain-level port/interface for data repositories is ever introduced, this service would be a good candidate for migration.

Overall: well-structured domain service with clean DDD modeling, thorough test coverage (10 unit + 5 integration), and proper backward-compatible extraction of DataRecord. LGTM.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adversarial Review

Critical / High

None found.

Medium

  1. Tier 2 orphan heuristic can misattribute data from a different deleted modelsrc/domain/data/data_access_service.ts:173-181

    The single-definition heuristic says: if only one definition exists for this model type, claim all orphan data of that type. This is correct for the intended scenario (delete/recreate the same model), but breaks when a different model of the same type was deleted.

    Breaking example:

    • Model A (test/source, uuid-1) has data {value: "A"}
    • Model B (test/source, uuid-2) has data {value: "B"}
    • Model B is deleted (definition removed, data stays on disk)
    • readModelData("model-a") is called
    • findOrphanData finds uuid-2's data. Since allDefsOfType.length === 1 (only Model A remains), the heuristic claims Model B's data as belonging to Model A
    • Result: readModelData("model-a") returns {value: "B"} — data that never belonged to Model A

    Suggested fix: Restrict tier 2 to only fire when the orphan model ID doesn't match any existing definition (i.e., it's truly orphaned from a deleted definition), AND the type only has one definition. Additionally, consider checking if the definition count for the type has only ever been 1 (e.g., via a count of distinct model IDs in the data), though that may be impractical. At minimum, document this limitation clearly.

  2. findAllGlobal() called unconditionally on every readModelDatasrc/domain/data/data_access_service.ts:119-123

    Orphan recovery runs on every call, even when the current model has data under its current UUID. findAllGlobal() scans all data across all models. For repositories with many models and data items, this adds O(total-data) overhead to every cross-model read.

    Suggested improvement: Consider skipping orphan recovery when currentData is non-empty, or making it opt-in via an options parameter, since the common case is that data lives under the correct UUID.

Low

  1. Silent JSON parse failure returns empty attributessrc/domain/data/data_access_service.ts:206-208

    If stored JSON is corrupted (truncated write, encoding issue), JSON.parse throws, the catch swallows the error, and a DataRecord with attributes: {} is returned. The caller cannot distinguish "this data has no fields" from "this data is corrupted." This is probably fine for resilience but could make debugging data issues difficult. A debug-level log line in the catch block would help.

  2. readModelData is optional on MethodContextsrc/domain/models/model.ts:243

    Marked with ?, so extension drivers that don't add it won't break, but extension model authors calling context.readModelData?.(...) get undefined with no error message explaining why. This is a natural consequence of backwards compatibility and not really a bug, just worth noting.

Verdict

PASS — The core logic is correct for the primary use case. The orphan recovery tier-2 heuristic has a data misattribution edge case (medium #1) that should be acknowledged or addressed, but it's in an uncommon path and doesn't affect the happy path of cross-model data reads. The DataRecord extraction, vault ref resolution, spec filtering, and renamed/deleted filtering are all implemented correctly.

@stack72 stack72 merged commit 56427f9 into main Mar 30, 2026
10 checks passed
@stack72 stack72 deleted the keeb/914-cross-model-data-access branch March 30, 2026 16:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

context.readModelData(modelName) — first-class cross-model data access in execute functions

2 participants