Feat/shared tensorstore context#407
Draft
edyoshikun wants to merge 4 commits intomainfrom
Draft
Conversation
Add ``recheck_cached_data`` to ``TensorStoreConfig`` and forward it into ``ts.open`` in ``TensorStoreImplementation.open_array``. The option controls whether cached chunk data is revalidated on every read (the TensorStore driver default) or only at open time (``"open"``), which is the recommended setting for long-running read-heavy workloads on networked filesystems (NFS/VAST) where revalidation costs one stat/GETATTR per chunk per read. ``None`` (default) preserves existing behaviour by omitting the kwarg so the TensorStore driver keeps its own default. ``True``, ``False``, and ``"open"`` are forwarded verbatim. Covered by a parametrized test that monkey-patches ``_ts_open`` to assert the kwarg reaches TensorStore for each configured value and is absent when unset. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Let callers reuse a single ts.Context across many open_ome_zarr calls.
Problem: every open_ome_zarr(implementation="tensorstore", ...) creates
a fresh TensorStoreImplementation, and each instance lazily builds its
own ts.Context. Workloads that open dozens of plates (multi-experiment
training) end up with N disjoint cache pools and thread pools, none of
which share chunk data. This silently regresses over iohub 0.2.x's
Position.tensorstore(context=...) API that allowed Context sharing.
Fix: add shared_context: Any = None on TensorStoreConfig. When set,
TensorStoreImplementation._context() returns it verbatim instead of
building a fresh Context from the other knobs. Fully backwards-
compatible — default (None) preserves existing per-instance behavior.
Usage:
import tensorstore as ts
from iohub import open_ome_zarr
from iohub.core.config import TensorStoreConfig
shared_ctx = ts.Context({"cache_pool": {"total_bytes_limit": 4_000_000_000}})
cfg = TensorStoreConfig(shared_context=shared_ctx, recheck_cached_data="open")
plate_a = open_ome_zarr(path_a, implementation="tensorstore", implementation_config=cfg)
plate_b = open_ome_zarr(path_b, implementation="tensorstore", implementation_config=cfg)
# plate_a and plate_b now share one Context, one cache pool, one thread pool.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Let callers reuse a single ts.Context across many open_ome_zarr calls.
Problem: every open_ome_zarr(implementation="tensorstore", ...) creates
a fresh TensorStoreImplementation, and each instance lazily builds its
own ts.Context. Workloads that open dozens of plates (multi-experiment
training) end up with N disjoint cache pools and thread pools, none of
which share chunk data. This silently regresses over iohub 0.2.x's
Position.tensorstore(context=...) API that allowed Context sharing.
Fix: add shared_context: Any = None on TensorStoreConfig. When set,
TensorStoreImplementation._context() returns it verbatim instead of
building a fresh Context from the other knobs. Fully backwards-
compatible — default (None) preserves existing per-instance behavior.
Usage:
import tensorstore as ts
from iohub import open_ome_zarr
from iohub.core.config import TensorStoreConfig
shared_ctx = ts.Context({"cache_pool": {"total_bytes_limit": 4_000_000_000}})
cfg = TensorStoreConfig(shared_context=shared_ctx, recheck_cached_data="open")
plate_a = open_ome_zarr(path_a, implementation="tensorstore", implementation_config=cfg)
plate_b = open_ome_zarr(path_b, implementation="tensorstore", implementation_config=cfg)
# plate_a and plate_b now share one Context, one cache pool, one thread pool.
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add
recheck_cached_datatoTensorStoreConfigand forward it intots.openinTensorStoreImplementation.open_array. The option controlswhether cached chunk data is revalidated on every read (the TensorStore
driver default) or only at open time (
"open"), which is the recommendedsetting for long-running read-heavy workloads on networked filesystems
(NFS/VAST) where revalidation costs one stat/GETATTR per chunk per read.
Covered by a parametrized test that monkey-patches
_ts_opento assertthe kwarg reaches TensorStore for each configured value and is absent when
unset.
Co-Authored-By: Claude Opus 4.7 (1M context) [email protected]