Skip to content

Feat/shared tensorstore context#407

Draft
edyoshikun wants to merge 4 commits intomainfrom
feat/shared-tensorstore-context
Draft

Feat/shared tensorstore context#407
edyoshikun wants to merge 4 commits intomainfrom
feat/shared-tensorstore-context

Conversation

@edyoshikun
Copy link
Copy Markdown
Collaborator

@edyoshikun edyoshikun commented Apr 24, 2026

Add recheck_cached_data to TensorStoreConfig and forward it into
ts.open in TensorStoreImplementation.open_array. The option controls
whether cached chunk data is revalidated on every read (the TensorStore
driver default) or only at open time ("open"), which is the recommended
setting for long-running read-heavy workloads on networked filesystems
(NFS/VAST) where revalidation costs one stat/GETATTR per chunk per read.

Covered by a parametrized test that monkey-patches _ts_open to assert
the kwarg reaches TensorStore for each configured value and is absent when
unset.

Co-Authored-By: Claude Opus 4.7 (1M context) [email protected]

edyoshikun and others added 4 commits April 22, 2026 16:44
Add ``recheck_cached_data`` to ``TensorStoreConfig`` and forward it into
``ts.open`` in ``TensorStoreImplementation.open_array``. The option controls
whether cached chunk data is revalidated on every read (the TensorStore
driver default) or only at open time (``"open"``), which is the recommended
setting for long-running read-heavy workloads on networked filesystems
(NFS/VAST) where revalidation costs one stat/GETATTR per chunk per read.

``None`` (default) preserves existing behaviour by omitting the kwarg so
the TensorStore driver keeps its own default. ``True``, ``False``, and
``"open"`` are forwarded verbatim.

Covered by a parametrized test that monkey-patches ``_ts_open`` to assert
the kwarg reaches TensorStore for each configured value and is absent when
unset.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Let callers reuse a single ts.Context across many open_ome_zarr calls.

Problem: every open_ome_zarr(implementation="tensorstore", ...) creates
a fresh TensorStoreImplementation, and each instance lazily builds its
own ts.Context. Workloads that open dozens of plates (multi-experiment
training) end up with N disjoint cache pools and thread pools, none of
which share chunk data. This silently regresses over iohub 0.2.x's
Position.tensorstore(context=...) API that allowed Context sharing.

Fix: add shared_context: Any = None on TensorStoreConfig. When set,
TensorStoreImplementation._context() returns it verbatim instead of
building a fresh Context from the other knobs. Fully backwards-
compatible — default (None) preserves existing per-instance behavior.

Usage:
    import tensorstore as ts
    from iohub import open_ome_zarr
    from iohub.core.config import TensorStoreConfig

    shared_ctx = ts.Context({"cache_pool": {"total_bytes_limit": 4_000_000_000}})
    cfg = TensorStoreConfig(shared_context=shared_ctx, recheck_cached_data="open")

    plate_a = open_ome_zarr(path_a, implementation="tensorstore", implementation_config=cfg)
    plate_b = open_ome_zarr(path_b, implementation="tensorstore", implementation_config=cfg)
    # plate_a and plate_b now share one Context, one cache pool, one thread pool.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Let callers reuse a single ts.Context across many open_ome_zarr calls.

Problem: every open_ome_zarr(implementation="tensorstore", ...) creates
a fresh TensorStoreImplementation, and each instance lazily builds its
own ts.Context. Workloads that open dozens of plates (multi-experiment
training) end up with N disjoint cache pools and thread pools, none of
which share chunk data. This silently regresses over iohub 0.2.x's
Position.tensorstore(context=...) API that allowed Context sharing.

Fix: add shared_context: Any = None on TensorStoreConfig. When set,
TensorStoreImplementation._context() returns it verbatim instead of
building a fresh Context from the other knobs. Fully backwards-
compatible — default (None) preserves existing per-instance behavior.

Usage:
    import tensorstore as ts
    from iohub import open_ome_zarr
    from iohub.core.config import TensorStoreConfig

    shared_ctx = ts.Context({"cache_pool": {"total_bytes_limit": 4_000_000_000}})
    cfg = TensorStoreConfig(shared_context=shared_ctx, recheck_cached_data="open")

    plate_a = open_ome_zarr(path_a, implementation="tensorstore", implementation_config=cfg)
    plate_b = open_ome_zarr(path_b, implementation="tensorstore", implementation_config=cfg)
    # plate_a and plate_b now share one Context, one cache pool, one thread pool.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant