feat: add context.readModelData() for first-class cross-model data access#919
feat: add context.readModelData() for first-class cross-model data access#919
Conversation
…cess (#914) Extension models that need to read another model's data at runtime can now use `context.readModelData(modelName, specName?)` instead of manually resolving definitions and reading raw repository data. - Extract DataRecord value object to src/domain/data/data_record.ts - Add DataAccessService domain service with orphan data recovery - Add readModelData to MethodContext, wired via RawExecutionDriver - 10 unit tests + 4 integration tests Closes #914 Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
First consumer reportUsed this in the Before (workaround)~80 lines of filesystem hacking:
const resolved = await resolveModelByName(context.repoDir, args.sourceModel);
const dataDir = `${context.repoDir}/.swamp/data/${resolved.type}/${resolved.id}`;
const items = await readLatestData(dataDir);Afterconst items = await context.readModelData(args.sourceModel, "episode");One line. Model name resolution, data reading, JSON parsing — all handled. On coupling concernsThe API is read-only and name-based, which keeps it loose. The dedup model doesn't know or care about the source model's type, schema, or internals — it just reads data by name and spec. The coupling is the same as what CEL expressions already do in workflows ( The alternative was each model being an island at runtime, which pushed all composition into workflow YAML. That works for simple chains but falls apart when a model needs to iterate over another model's data with logic (LLM-parse if raw, skip if pre-parsed, deduplicate within results). Workflows can't express that — it requires code. The key property: models can read each other's data but can't write to each other's namespace. That's the right boundary. |
There was a problem hiding this comment.
Adversarial Review
Critical / High
-
HIGH — Orphan data content lookup uses wrong modelId (
src/domain/data/data_access_service.ts:126-131)readModelDatacallsdataToRecord(data, resolved.modelType, resolved.modelId)for all data items, including orphans.dataToRecordthen callsthis.dataRepo.getContent(modelType, modelId, data.name)— but for orphan data, the content is stored on disk under the old model UUID, notresolved.modelId.FileSystemUnifiedDataRepository.getContentresolves the content path usingmodelId(viagetContentPath→getDataNameDir), so it will look in the wrong directory and returnnull. The result: orphan-recovered data silently hasattributes: {}instead of its actual content.Breaking example: Model "foo" is deleted and recreated (new UUID). Old data exists under old UUID.
findOrphanDatacorrectly finds the orphanDatametadata. ButdataToRecordtries to read content at.swamp/data/<type>/<new-uuid>/<dataName>/...instead of.swamp/data/<type>/<old-uuid>/<dataName>/.... Content is null, JSON parse is skipped, attributes is{}.Why the unit test doesn't catch this: The mock
getContentindata_access_service_test.ts:596-601looks up bydataNameonly, ignoring_typeand_modelIdparameters entirely. The mock masks the real behavior wheremodelIdis part of the filesystem path.Suggested fix: Track the original
modelIdfor orphan data items (e.g., via the_orphanModelIdfield that's already in the return type signature but never actually set). Pass the correct modelId todataToRecordfor each item. Also fix the mock to key on(modelId, dataName)pairs so the test would fail without the fix. -
HIGH —
findOrphanDatacallsfindAllinside a loop, causing N+1 queries (src/domain/data/data_access_service.ts:198)The Tier 2 heuristic calls
this.definitionRepo.findAll(modelType)inside thefor (const [, items] of byModelId)loop. If there are M distinct old model IDs, this makes M identical repository queries. For filesystem-backed repos, that's M directory scans.More importantly: the result of
findAllis the same for every iteration (it depends onmodelType, which is constant across the loop). This should be hoisted above the loop and called once.Suggested fix: Move
const allDefsOfType = await this.definitionRepo.findAll(modelType)before the loop.
Medium
-
MEDIUM —
findAllGlobal()is unbounded and loads all data from all models (src/domain/data/data_access_service.ts:159)Every call to
readModelDatacallsfindAllGlobal(), which scans the entire.swamp/data/directory tree across all model types. For repos with many models and large data sets, this is an expensive operation performed on every cross-model read. There is no caching, filtering, or short-circuit.This may be acceptable for now but becomes a performance problem at scale.
-
MEDIUM —
readModelDatais optional onMethodContext(src/domain/models/model.ts:241)The property is declared as
readModelData?: .... Extension model authors callingcontext.readModelData(...)will get a runtime error if the context wasn't created byRawExecutionDriver(or any other driver that doesn't populate it). Since this is a new API, it would be safer to have a clear error message rather thanTypeError: context.readModelData is not a function. -
MEDIUM — Tier 2 orphan heuristic can misattribute data (
src/domain/data/data_access_service.ts:198-200)If a model type has exactly one definition, the heuristic claims ALL orphan data under that type belongs to that definition. But the orphan data could belong to a different model definition that was deleted and not recreated. The single remaining definition gets data that was never its own.
Low
-
LOW —
_orphanModelIdin return type but never set (src/domain/data/data_access_service.ts:155-156)The return type of
findOrphanDatadeclares{ _orphanModelId?: string }but the returnedDataobjects never have this field set. This appears to be a vestige of an incomplete implementation. It's harmless since the field is optional, but it's dead code in the type signature.
Verdict
FAIL — The orphan data content lookup bug (finding #1) means orphan recovery silently returns empty data, which is a data correctness issue in a primary feature of this PR. The test mocks mask the real behavior.
There was a problem hiding this comment.
Code Review
Blocking Issues
-
Bug: Orphan data content will never be found on disk — In
readModelData()(line 126-131 ofdata_access_service.ts), orphan data recovered byfindOrphanDatais passed todataToRecordwithresolved.modelId(the current definition UUID). But the content files are stored under the old model UUID on disk. SinceFileSystemUnifiedDataRepository.getContent()usesmodelIdto build the file path,getContentwill returnnullfor all orphan data, and the record will have emptyattributes: {}.The unit test passes because the mock
getContentignores_modelIdand matches only ondataName. To fix:findOrphanDataneeds to return the originalmodelIdalongside eachDataitem, anddataToRecordmust use thatmodelIdfor orphan items instead ofresolved.modelId. -
Unused
_orphanModelIdin return type —findOrphanData(line 154-157) declaresData & { _orphanModelId?: string }in its return type, suggesting this was intended to carry the old model ID — but it's never actually set. This is the missing piece that would fix issue #1. The field needs to be populated and then consumed inreadModelData.
Suggestions
-
Remove unnecessary re-export —
model_resolver.tsline 52 re-exportsDataRecord"for backwards compatibility", but no file importsDataRecordfrommodel_resolver.ts. The re-export can be removed entirely, and the comment about backwards compatibility deleted. -
Inline
import("./data.ts").Datatype references —findOrphanDataanddataToRecorduseimport("./data.ts").Datainline three times instead of a top-level import. SinceDatais already used transitively (and the file imports from./data_record.tsin the same module), a normalimport type { Data } from "./data.ts"at the top would be cleaner. -
Missing integration test for orphan recovery — The integration tests cover the happy path well but don't test orphan data recovery with real file-backed repos. Adding one would have caught blocking issue #1.
Address review feedback:
- Track original modelId per data item via LocatedData so orphan-recovered
data reads content from the old UUID path, not the current definition's
- Hoist findAll(modelType) above the loop to avoid N+1 queries
- Replace inline import("./data.ts").Data with top-level import
- Add integration test for orphan recovery with real file-backed repos
- Fix unit test mock to key on (modelId, dataName) pairs
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
There was a problem hiding this comment.
Code Review
Blocking Issues
None.
Suggestions
-
findAllGlobal()in orphan recovery (src/domain/data/data_access_service.ts:457):findOrphanDatacallsthis.dataRepo.findAllGlobal()on everyreadModelData()invocation, scanning all data across all models. For large repos this could become a hot path. Consider caching this within a singlereadModelDatacall or adding a more targeted query (e.g.,findAllForType(modelType)) in a follow-up if performance becomes a concern. -
Domain→infrastructure type import (
src/domain/data/data_access_service.ts:22): TheUnifiedDataRepositoryimport from infrastructure is acknowledged via the ratchet bump (21→22). If a domain-level port/interface for data repositories is ever introduced, this service would be a good candidate for migration.
Overall: well-structured domain service with clean DDD modeling, thorough test coverage (10 unit + 5 integration), and proper backward-compatible extraction of DataRecord. LGTM.
There was a problem hiding this comment.
Adversarial Review
Critical / High
None found.
Medium
-
Tier 2 orphan heuristic can misattribute data from a different deleted model —
src/domain/data/data_access_service.ts:173-181The single-definition heuristic says: if only one definition exists for this model type, claim all orphan data of that type. This is correct for the intended scenario (delete/recreate the same model), but breaks when a different model of the same type was deleted.
Breaking example:
- Model A (
test/source, uuid-1) has data{value: "A"} - Model B (
test/source, uuid-2) has data{value: "B"} - Model B is deleted (definition removed, data stays on disk)
readModelData("model-a")is calledfindOrphanDatafinds uuid-2's data. SinceallDefsOfType.length === 1(only Model A remains), the heuristic claims Model B's data as belonging to Model A- Result:
readModelData("model-a")returns{value: "B"}— data that never belonged to Model A
Suggested fix: Restrict tier 2 to only fire when the orphan model ID doesn't match any existing definition (i.e., it's truly orphaned from a deleted definition), AND the type only has one definition. Additionally, consider checking if the definition count for the type has only ever been 1 (e.g., via a count of distinct model IDs in the data), though that may be impractical. At minimum, document this limitation clearly.
- Model A (
-
findAllGlobal()called unconditionally on everyreadModelData—src/domain/data/data_access_service.ts:119-123Orphan recovery runs on every call, even when the current model has data under its current UUID.
findAllGlobal()scans all data across all models. For repositories with many models and data items, this adds O(total-data) overhead to every cross-model read.Suggested improvement: Consider skipping orphan recovery when
currentDatais non-empty, or making it opt-in via an options parameter, since the common case is that data lives under the correct UUID.
Low
-
Silent JSON parse failure returns empty attributes —
src/domain/data/data_access_service.ts:206-208If stored JSON is corrupted (truncated write, encoding issue),
JSON.parsethrows, the catch swallows the error, and aDataRecordwithattributes: {}is returned. The caller cannot distinguish "this data has no fields" from "this data is corrupted." This is probably fine for resilience but could make debugging data issues difficult. A debug-level log line in the catch block would help. -
readModelDatais optional onMethodContext—src/domain/models/model.ts:243Marked with
?, so extension drivers that don't add it won't break, but extension model authors callingcontext.readModelData?.(...)getundefinedwith no error message explaining why. This is a natural consequence of backwards compatibility and not really a bug, just worth noting.
Verdict
PASS — The core logic is correct for the primary use case. The orphan recovery tier-2 heuristic has a data misattribution edge case (medium #1) that should be acknowledged or addressed, but it's in an uncommon path and doesn't affect the happy path of cross-model data reads. The DataRecord extraction, vault ref resolution, spec filtering, and renamed/deleted filtering are all implemented correctly.
Summary
context.readModelData(modelName, specName?)toMethodContextso extension models can read another model's data without manual definition resolution and raw repository callsDataAccessServicedomain service that handles model name resolution, data reading, JSON parsing, vault ref resolution, and orphan data recoveryDataRecordvalue object tosrc/domain/data/data_record.tsas a shared type between the expression system and data access layerTest Plan
DataAccessServicecovering: model resolution, spec filtering, empty/missing models, renamed data, vault refs, non-JSON content, orphan recoverydeno check,deno lint,deno fmtcleanCloses #914
🤖 Generated with Claude Code