Skip to content

Switch to DistributedSampler#105

Merged
sfc-gh-sbekman merged 15 commits intomainfrom
mwyatt/move-to-dist-sampler
Mar 25, 2025
Merged

Switch to DistributedSampler#105
sfc-gh-sbekman merged 15 commits intomainfrom
mwyatt/move-to-dist-sampler

Conversation

@sfc-gh-mwyatt
Copy link
Copy Markdown
Collaborator

Previously we were pre-sharding loading datasets and saving to a cache file for each process. However, this is more cumbersome and error-prone than loading/processing data with a single process and using the DistributedSampler. Switching to use the DistributedSampler in this PR, which simplifies data loading and makes it more readable.

Other changes:

  • Caching of datasets is now the default and only behavior supported
  • Helper functions to detect local/shared filesystem to determine which ranks should load/process data

Comment thread arctic_training/config/data.py Outdated
Comment thread arctic_training/data/factory.py Outdated
Comment thread tests/data/utils.py Outdated
Comment thread arctic_training/data/utils.py
Comment thread arctic_training/data/factory.py Outdated
Comment thread arctic_training/data/factory.py
Comment thread arctic_training/data/factory.py Outdated
Comment thread arctic_training/data/factory.py Outdated
Copy link
Copy Markdown
Collaborator

@sfc-gh-sbekman sfc-gh-sbekman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to merge and then I will test as it looks it'll take me quite some time to deal with all the other changes as I haven't rebased in a while.

@sfc-gh-sbekman sfc-gh-sbekman merged commit 9436ee4 into main Mar 25, 2025
4 checks passed
@sfc-gh-sbekman sfc-gh-sbekman deleted the mwyatt/move-to-dist-sampler branch March 25, 2025 18:19
@sfc-gh-mwyatt sfc-gh-mwyatt mentioned this pull request Mar 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants