Implement version-specific default chunk and shard sizes in iohub utils#403
Merged
Implement version-specific default chunk and shard sizes in iohub utils#403
Conversation
apply_transform_czyx_setup and process_single_position_setup now draw integer time_indices / channel_indices with unique=True and .map(sorted). That matches the pattern real callers pass and keeps the oindex selectors on the zarrs fast path -- unsorted or duplicate integer arrays cause ZarrsCodecPipeline to raise UnsupportedVIndexingError and fall back to BatchedCodecPipeline, which hits zarr-developers/zarr-python#2834 on sharded v0.5 stores (tracked in iohub#404). Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
srivarra
requested changes
Apr 23, 2026
Collaborator
srivarra
left a comment
There was a problem hiding this comment.
Changes look good overall, just a question about what we are testing.
6 tasks
talonchandler
approved these changes
Apr 24, 2026
Contributor
talonchandler
left a comment
There was a problem hiding this comment.
All looks good to me.
Friendly reminder to not delete this branch when you merge (waveorder's main now depends on it), then release, then update waveorder's dependency. Thank you!
edyoshikun
approved these changes
Apr 24, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #401.
What this changes
Version-specific chunk and shard defaults for
iohub.ngff.utils.create_empty_plate, plus the test coverage to pin them down.src/iohub/ngff/utils.py(1, 1, 16, 256, 256)clamped toshape; defaultshards_ratioyields a shard of(1, 1, Z, Y, X)— one shard per(T, C)slot, per the DCA spec referenced in Set version-specific default chunk and shard sizes increate_empty_plate#401.(1, 1, Z, Y, X)with Z halved until the single-chunk byte size fits under_V04_MAX_CHUNK_SIZE_BYTES(500 MB). No sharding, as Zarr v2 has no sharding codec._default_chunks(shape, dtype, version)and_default_shards_ratio(shape, chunks)helpers, and promoted the two tunables to module-level constants_V05_DEFAULT_ZYX_CHUNKSand_V04_MAX_CHUNK_SIZE_BYTES.max_chunk_size_bytesparameter fromcreate_empty_plate; the 500 MB cap is now the module-level constant_V04_MAX_CHUNK_SIZE_BYTESand applies only to v0.4 chunks (v0.5 chunks are driven by the DCA spec, not a byte cap).Breaking change?
Technically yes, but low blast radius.
create_empty_plate(..., max_chunk_size_bytes=...)is no longer accepted — callers who passed it by keyword will get aTypeError. I confirmed that waveorder and biahub — the two primary downstream consumers of this function — never set the parameter explicitly, so this change is a no-op for them. No other in-tree callers pass it either (grep -R max_chunk_size_bytesin iohub turns up no hits outsideutils.py).If we want to be strict about API stability we can keep
max_chunk_size_bytesas a deprecated passthrough that overrides the constant for v0.4, but given the known downstream callsites don't touch it, removing it is cleaner.tests/ngff/test_ngff_utils.pyAdded explicit tests that pin the defaults so regressions fail deterministically instead of depending on which examples hypothesis happens to draw:
test_v05_default_chunks(parametrized) — asserts the DCA-aligned chunk shape, clamped to each axis.test_v05_default_shards_cover_zyx— one shard per(T, C)for divisible dims.test_v05_default_shards_with_non_divisible_zyx— shard still coversZ/Y/Xwhen they're not multiples of the chunk size.test_v05_explicit_shards_ratio_is_honored— caller override wins.test_v04_default_chunks_cover_full_zyx/test_v04_default_chunks_capped_by_byte_limit— full-ZYX chunks; Z halves when the chunk would exceed the byte cap.test_v04_default_has_no_sharding—arr.shards is Noneon v0.4.test_v04_rejects_explicit_shards_ratio— raises on v0.4.test_process_single_position_on_sharded_v05_store— end-to-end round-trip through the default-sharded v0.5 layout.test_apply_transform_to_tczyx_on_multi_time_shard— directoindexwrite withshard_t > 1. Noxfail: if the write ever stops succeeding, this fires.Tightened the hypothesis strategies (
apply_transform_czyx_setup,process_single_position_setup) so integertime_indices/channel_indicesare drawnunique=Trueand sorted via.map(sorted). See the "Why hypothesis needed tightening" section below.Why hypothesis needed tightening
Before this change, the existing hypothesis-based tests (
test_apply_transform_to_tczyx_and_saveet al.) passed in CI under the new defaults but only by luck:@settings(max_examples=5)on each test meant CI drew 5 random examples per run, none of which happened to trip the failure. Atmax_examples=200the same tests failed reliably.Root cause: the new v0.5 defaults make stores sharded, and the write path goes through
arr.oindex[...]. iohub's active codec pipeline isZarrsCodecPipeline(zarrs is a required dep), which handles most oindex patterns correctly — but it raisesUnsupportedVIndexingErrorand falls back to the defaultBatchedCodecPipelinewhen an indexer is duplicate or unsorted. The fallback pipeline has an upstream bug in its sharding codec merge step: zarr-developers/zarr-python#2834.Direct repro on a sharded v0.5 store under iohub's default config:
oindex[[0, 1], [0]]— unique, sortedoindex[[0, 2], [0]]— unique, non-contiguous but sortedoindex[[0, 0], [0]]— duplicatesoindex[[1, 0, 2], [0]]— unsortedThe hypothesis strategy was generating duplicate / unsorted integer lists like
[0, 0]and[1, 0, 2]. Real callers (biahub, the hybrid_scheme processing pipelines, etc.) always build sorted unique index lists, so the duplicate/unsorted draws weren't exercising a real use case — they were exercising a zarrs fallback path that wouldn't otherwise fire. Enforcingunique=True+.map(sorted)at the strategy level matches what callers pass and keeps the tests on the zarrs fast path.Verification
Full file:
Stress run (all three hypothesis-based tests,
max_examples=200each, generate-only so the 600 draws are all fresh):No flakes, no
xfail, and the new explicit tests fail loudly if any of the pinned defaults regress.Test plan
tests/ngff/test_ngff_utils.pypasses at default settingsmax_examples=200(no duplicate/unsorted fallbacks)process_single_positionshards_ratioand hasshards is Noneby defaultwaveorder,biahub) passmax_chunk_size_bytesFollow-up
Should we update and
Position.create_zeros(and thereforecreate_imagewhich callscreate_zeros) with better chunk and shard defaults? Currently chunking defaults to (1, 1, Z, Y, X), even for v0.5 and don't use sharding by default for v0.5Transferred to #409