Skip to content

Implement version-specific default chunk and shard sizes in iohub utils#403

Merged
srivarra merged 5 commits intomainfrom
feat/sharding_in_utils
Apr 25, 2026
Merged

Implement version-specific default chunk and shard sizes in iohub utils#403
srivarra merged 5 commits intomainfrom
feat/sharding_in_utils

Conversation

@ieivanov
Copy link
Copy Markdown
Contributor

@ieivanov ieivanov commented Apr 22, 2026

Closes #401.

What this changes

Version-specific chunk and shard defaults for iohub.ngff.utils.create_empty_plate, plus the test coverage to pin them down.

src/iohub/ngff/utils.py

  • v0.5 (Zarr v3): default chunks = (1, 1, 16, 256, 256) clamped to shape; default shards_ratio yields a shard of (1, 1, Z, Y, X) — one shard per (T, C) slot, per the DCA spec referenced in Set version-specific default chunk and shard sizes in create_empty_plate #401.
  • v0.4 (Zarr v2): default chunks = (1, 1, Z, Y, X) with Z halved until the single-chunk byte size fits under _V04_MAX_CHUNK_SIZE_BYTES (500 MB). No sharding, as Zarr v2 has no sharding codec.
  • Extracted the branching into _default_chunks(shape, dtype, version) and _default_shards_ratio(shape, chunks) helpers, and promoted the two tunables to module-level constants _V05_DEFAULT_ZYX_CHUNKS and _V04_MAX_CHUNK_SIZE_BYTES.
  • Removed the max_chunk_size_bytes parameter from create_empty_plate; the 500 MB cap is now the module-level constant _V04_MAX_CHUNK_SIZE_BYTES and applies only to v0.4 chunks (v0.5 chunks are driven by the DCA spec, not a byte cap).
  • Updated the docstring to describe the version-specific defaults.

Breaking change?

Technically yes, but low blast radius. create_empty_plate(..., max_chunk_size_bytes=...) is no longer accepted — callers who passed it by keyword will get a TypeError. I confirmed that waveorder and biahub — the two primary downstream consumers of this function — never set the parameter explicitly, so this change is a no-op for them. No other in-tree callers pass it either (grep -R max_chunk_size_bytes in iohub turns up no hits outside utils.py).

If we want to be strict about API stability we can keep max_chunk_size_bytes as a deprecated passthrough that overrides the constant for v0.4, but given the known downstream callsites don't touch it, removing it is cleaner.

tests/ngff/test_ngff_utils.py

Added explicit tests that pin the defaults so regressions fail deterministically instead of depending on which examples hypothesis happens to draw:

  • test_v05_default_chunks (parametrized) — asserts the DCA-aligned chunk shape, clamped to each axis.
  • test_v05_default_shards_cover_zyx — one shard per (T, C) for divisible dims.
  • test_v05_default_shards_with_non_divisible_zyx — shard still covers Z/Y/X when they're not multiples of the chunk size.
  • test_v05_explicit_shards_ratio_is_honored — caller override wins.
  • test_v04_default_chunks_cover_full_zyx / test_v04_default_chunks_capped_by_byte_limit — full-ZYX chunks; Z halves when the chunk would exceed the byte cap.
  • test_v04_default_has_no_shardingarr.shards is None on v0.4.
  • test_v04_rejects_explicit_shards_ratio — raises on v0.4.
  • test_process_single_position_on_sharded_v05_store — end-to-end round-trip through the default-sharded v0.5 layout.
  • test_apply_transform_to_tczyx_on_multi_time_shard — direct oindex write with shard_t > 1. No xfail: if the write ever stops succeeding, this fires.

Tightened the hypothesis strategies (apply_transform_czyx_setup, process_single_position_setup) so integer time_indices / channel_indices are drawn unique=True and sorted via .map(sorted). See the "Why hypothesis needed tightening" section below.

Why hypothesis needed tightening

Before this change, the existing hypothesis-based tests (test_apply_transform_to_tczyx_and_save et al.) passed in CI under the new defaults but only by luck: @settings(max_examples=5) on each test meant CI drew 5 random examples per run, none of which happened to trip the failure. At max_examples=200 the same tests failed reliably.

Root cause: the new v0.5 defaults make stores sharded, and the write path goes through arr.oindex[...]. iohub's active codec pipeline is ZarrsCodecPipeline (zarrs is a required dep), which handles most oindex patterns correctly — but it raises UnsupportedVIndexingError and falls back to the default BatchedCodecPipeline when an indexer is duplicate or unsorted. The fallback pipeline has an upstream bug in its sharding codec merge step: zarr-developers/zarr-python#2834.

Direct repro on a sharded v0.5 store under iohub's default config:

Indexer pattern Result
oindex[[0, 1], [0]] — unique, sorted ✅ OK
oindex[[0, 2], [0]] — unique, non-contiguous but sorted ✅ OK
oindex[[0, 0], [0]] — duplicates ❌ fallback → zarr#2834
oindex[[1, 0, 2], [0]] — unsorted ❌ fallback → zarr#2834

The hypothesis strategy was generating duplicate / unsorted integer lists like [0, 0] and [1, 0, 2]. Real callers (biahub, the hybrid_scheme processing pipelines, etc.) always build sorted unique index lists, so the duplicate/unsorted draws weren't exercising a real use case — they were exercising a zarrs fallback path that wouldn't otherwise fire. Enforcing unique=True + .map(sorted) at the strategy level matches what callers pass and keeps the tests on the zarrs fast path.

Verification

Full file:

pytest tests/ngff/test_ngff_utils.py
19 passed in 3.49s

Stress run (all three hypothesis-based tests, max_examples=200 each, generate-only so the 600 draws are all fresh):

3 passed in 140.54s

No flakes, no xfail, and the new explicit tests fail loudly if any of the pinned defaults regress.

Test plan

  • tests/ngff/test_ngff_utils.py passes at default settings
  • Hypothesis-based tests pass at max_examples=200 (no duplicate/unsorted fallbacks)
  • v0.5 sharded write path round-trips correctly via process_single_position
  • v0.5 multi-time oindex write into a shared shard succeeds
  • v0.4 rejects shards_ratio and has shards is None by default
  • Confirmed no downstream callers (waveorder, biahub) pass max_chunk_size_bytes
  • Test in biahub processing pipeline on mantis v2

Follow-up

Should we update and Position.create_zeros (and therefore create_image which calls create_zeros) with better chunk and shard defaults? Currently chunking defaults to (1, 1, Z, Y, X), even for v0.5 and don't use sharding by default for v0.5

Transferred to #409

    def create_zeros(
        self,
        name: str,
        shape: tuple[int, ...],
        dtype: DTypeLike,
        chunks: tuple[int, ...] | None = None,
        shards_ratio: tuple[int, ...] | None = None,
        transform: list[TransformationMeta] | None = None,
        check_shape: bool = True,
    ):
        """Create a new zero-filled image array in the position.
        ...
        """
        if not chunks:
            chunks = self._default_chunks(shape, 3)

        if check_shape:
            self._check_shape(shape)
        arr_handle = self._create_zarr_array(name, shape, dtype, chunks, shards_ratio)
        img_arr = ImageArray.from_handle(arr_handle, self._impl)
        self._create_image_meta(name, transform=transform)
        return img_arr

    @staticmethod
    def _default_chunks(shape, last_data_dims: int):
        chunks = shape[-min(last_data_dims, len(shape)) :]
        return pad_shape(chunks, target=len(shape))

ieivanov and others added 2 commits April 22, 2026 15:47
apply_transform_czyx_setup and process_single_position_setup now draw
integer time_indices / channel_indices with unique=True and .map(sorted).
That matches the pattern real callers pass and keeps the oindex selectors
on the zarrs fast path -- unsorted or duplicate integer arrays cause
ZarrsCodecPipeline to raise UnsupportedVIndexingError and fall back to
BatchedCodecPipeline, which hits zarr-developers/zarr-python#2834 on
sharded v0.5 stores (tracked in iohub#404).

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
@ieivanov ieivanov marked this pull request as ready for review April 22, 2026 23:25
Copy link
Copy Markdown
Collaborator

@srivarra srivarra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good overall, just a question about what we are testing.

Comment thread src/iohub/ngff/utils.py Outdated
Comment thread tests/ngff/test_ngff_utils.py Outdated
Copy link
Copy Markdown
Contributor

@talonchandler talonchandler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All looks good to me.

Friendly reminder to not delete this branch when you merge (waveorder's main now depends on it), then release, then update waveorder's dependency. Thank you!

@ieivanov ieivanov requested a review from srivarra April 24, 2026 22:59
Copy link
Copy Markdown
Collaborator

@srivarra srivarra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@srivarra srivarra merged commit f281ac3 into main Apr 25, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Set version-specific default chunk and shard sizes in create_empty_plate

4 participants