Conversation
There was a problem hiding this comment.
Code Review
Clean, well-scoped fix. The root cause analysis is correct — PRAGMA journal_mode=WAL acquires a write lock, so busy_timeout must be configured first to allow SQLite to retry rather than immediately throwing SQLITE_BUSY.
Blocking Issues
None.
Suggestions
None — the fix is minimal, the test is appropriate, and the commit message clearly explains the reasoning.
2556386 to
f3dd360
Compare
There was a problem hiding this comment.
Adversarial Review
Critical / High
None.
Medium
- catalog_store_test.ts:312-329 — Test does not actually reproduce the race. Both
CatalogStoreinstances are opened sequentially in a synchronous Deno test. The constructor callsnew DatabaseSync(dbPath)+ three synchronousexec()calls, so by the timestore2opens,store1has already completed all its PRAGMAs andcreateSchema(). The real bug (per the PR body) required 9/10 concurrent runs across processes to reproduce. This test is a useful smoke test that two handles coexist, but it would pass even with the old ordering on most runs — it doesn't deterministically exercise the contention window. Not blocking since the fix itself is obviously correct and the author confirmed separate adversarial testing.
Low
- catalog_store_test.ts:327-328 — No cleanup of temp DB on failure. If
assertEqualsthrows beforeclose(), the two SQLite handles leak (WAL/SHM files left behind). This is consistent with every other test in the file so it's a pre-existing pattern, not a regression.
Verdict
PASS — The fix is a correct, minimal reordering of two PRAGMA statements. Setting busy_timeout before journal_mode=WAL ensures SQLite retries on SQLITE_BUSY rather than failing immediately when the WAL mode switch requires a write lock held by another connection. The test is a useful guard rail even if it can't deterministically reproduce the race.
…base is locked" under concurrency The CatalogStore constructor set PRAGMA busy_timeout after PRAGMA journal_mode=WAL. Under concurrent access, the WAL pragma needs a database lock but had no retry configured yet, causing SQLite to immediately throw SQLITE_BUSY instead of retrying. Swapping the order ensures the 5-second retry timeout is active before any lock-acquiring operation. Fixes #1055
f3dd360 to
c8ab97c
Compare
There was a problem hiding this comment.
Code Review
Clean, well-targeted fix. Swapping the PRAGMA order is the correct approach — busy_timeout must be configured before any operation that may need a write lock (like journal_mode=WAL), otherwise SQLite returns SQLITE_BUSY immediately instead of retrying.
Blocking Issues
None.
Suggestions
- Test timing sensitivity: The test relies on a 200ms
setTimeoutto release the lock, which could be flaky on slow CI runners if the subprocess hasn't started its busy-wait yet. In practice the 5000ms busy_timeout gives plenty of headroom, so this is unlikely to be an issue — just worth noting.
Overall: minimal, correct fix with a solid regression test. LGTM.
There was a problem hiding this comment.
Adversarial Review
Critical / High
None.
Medium
-
catalog_store_test.ts:365— subprocess has no timeout, can hang forever.
Theproc.output()call has no timeout orAbortSignal. If the subprocess hangs (e.g., Deno deadlocks on module resolution under lock contention, or the busy_timeout itself blocks for 5 full seconds on a slow CI runner before failing), this test blocks indefinitely. The project's CLAUDE.md convention says "every promise must be awaited or explicitly handled" and recommendsAbortSignalwith timeouts for outbound calls.Breaking scenario: CI runner is under heavy I/O load. The subprocess takes >5s to acquire the lock (busy_timeout expires), then exits with a non-zero code, but only after a long delay. Or worse, the Deno subprocess itself hangs during startup with a held exclusive lock on the file. The test suite stalls with no indication of what's wrong.
Suggested fix: Add a timeout to
proc.output()or wrap it in aPromise.racewith asetTimeoutrejection (e.g., 10s), and ensure the subprocess is killed on timeout:const timeout = AbortSignal.timeout(10_000); // or use Promise.race with a cleanup that calls proc.kill()
-
catalog_store_test.ts:360— 200ms delay is a race condition with subprocess startup.
The test assumes the subprocess will have started and reached the PRAGMA within 200ms, so it's "waiting under busy_timeout" when the lock is released. On a cold CI runner (no Deno cache, high load), Deno startup + module resolution can exceed 200ms, meaning the lock is already released before the subprocess tries to acquire it. In that case, the test passes vacuously — it doesn't actually validate the busy_timeout retry behavior.Breaking scenario: This doesn't break correctness (the test still passes), but it silently stops testing what it claims to test. A future regression could slip through.
Suggested fix: Instead of a fixed delay, have the subprocess signal readiness (e.g., write to stdout before constructing CatalogStore), and only release the lock after observing that signal.
Low
catalog_store_test.ts:332— temp script file is never cleaned up. ThescriptPathand its parent dir are left behind after the test. Minor sincemakeTempDbPathalready creates temp dirs that aren't cleaned up either — consistent with existing tests, but worth noting.
Verdict
PASS — The production fix (swapping two PRAGMA lines in catalog_store.ts:64-65) is correct and well-motivated. Setting busy_timeout before journal_mode=WAL ensures SQLite retries on lock contention rather than failing immediately. The test has minor robustness concerns (no subprocess timeout, timing-dependent lock scenario) but these don't affect the production code quality.
Summary
PRAGMA busy_timeout=5000andPRAGMA journal_mode=WALin theCatalogStoreconstructor so SQLite has a retry timeout configured before attempting operations that may need a write lock under concurrencyCatalogStoreinstances against the same DB file simultaneously to guard against regressionFixes #1055
Test Plan
deno run test src/infrastructure/persistence/catalog_store_test.ts— all 17 tests pass, including new concurrent-open testdatabase is lockedbefore the fix, 0/10 failed afterconcurrency_test.tscovers the end-to-end cross-process scenario🤖 Generated with Claude Code