Releases: snowflakedb/ArcticTraining
Releases · snowflakedb/ArcticTraining
ArcticTraining v0.8.0
What's Changed
- Bump version to 0.7.3.dev0 by @sfc-gh-mwyatt in #343
- pynvml DGX Spark fix by @sfc-gh-sbekman in #345
- Fix: Unit tests conditionally create virtual environment by @sfc-gh-mwyatt in #352
- fix default path by @sfc-gh-sbekman in #354
- add torch.autocast support by @sfc-gh-sbekman in #355
- dgx station capability by @sfc-gh-sbekman in #356
- Bump version from 0.7.3.dev0 to 0.8.0 by @sfc-gh-mwyatt in #359
https://pypi.org/project/arctic-training/0.8.0/
Full Changelog: v0.7.2...v0.8.0
ArcticTraining v0.7.2
What's Changed
- Bump version to 0.7.2.dev0 by @sfc-gh-mwyatt in #337
- Relax dependency on
numbafor[snowflake]by @sfc-gh-dhung in #338 - Fix multi-node training with Ray launcher by @sfc-gh-dhung in #340
- fail data load if cache is missing by @sfc-gh-jrasley in #296
- Fix dependency resolution for dev installs by @akshatvishu in #341
- Bump version from 0.7.2.dev0 to 0.7.2 by @sfc-gh-mwyatt in #342
New Contributors
- @akshatvishu made their first contribution in #341
https://pypi.org/project/arctic-training/0.7.2/
Full Changelog: v0.7.1...v0.7.2
ArcticTraining v0.7.1
What's Changed
- Bump version to 0.7.1.dev0 by @sfc-gh-mwyatt in #308
- Limit transformers version <5.0.0 until release is more stable by @sfc-gh-mwyatt in #309
- Fix: Allow SFT datapacking only with micro batch size 1 by @sfc-gh-mwyatt in #312
- limit the batch size for
pack_datasetby @sfc-gh-sbekman in #314 - type fix by @sfc-gh-zhyao in #315
- Enhancement: Export WANDB environment variables with multinode execution by @sfc-gh-mwyatt in #316
- enable manual CI start by @sfc-gh-sbekman in #319
- Add Ray launcher backend by @sfc-gh-dhung in #311
- Plot DS wall clock timers in W&B by @sfc-gh-truwase in #320
- trying to fix jinja2 failure in cpu unit tests by @sfc-gh-sbekman in #324
- fix one off bug in checkpoint saving by @sfc-gh-sbekman in #323
- continuous wandb runs by @sfc-gh-sbekman in #328
- add exit_iteration_this_run + revamp early exit logic by @sfc-gh-sbekman in #325
- Add file expiry as requested in new AOAI API by @sfc-gh-caxu in #322
- set is_resume flag much sooner by @sfc-gh-sbekman in #326
- Use ArcticForge to generate data for spec training by @sfc-gh-yewang in #332
- Add USE_RAY environment variable by @sfc-gh-dhung in #329
- Add data source integrations with Snowflake by @sfc-gh-dhung in #330
- Bump to v0.7.1 by @sfc-gh-mwyatt in #336
New Contributors
- @sfc-gh-dhung made their first contribution in #311
- @sfc-gh-truwase made their first contribution in #320
- @sfc-gh-yewang made their first contribution in #332
Full Changelog: v0.7.0...v0.7.1
ArcticTraining v0.7.0
What's Changed
- Speculator for gpt-oss by @sfc-gh-jaelee in #260
- Bump v0.6.1.dev0 by @sfc-gh-mwyatt in #268
- check
arctic_training_runis in thePATHenv var's paths by @sfc-gh-sbekman in #275 - fix 404 link by @sfc-gh-sbekman in #276
- liger-kernel: cleanly report when a model isn't supported by @sfc-gh-sbekman in #269
- change deepspeed defaults by @sfc-gh-sbekman in #271
- deal with torch_dtype deprecation by @sfc-gh-sbekman in #279
- another
torch_dtype => dtypeupdate by @sfc-gh-sbekman in #280 - fix SP size by @sfc-gh-sbekman in #282
- pynvml is deprecated by @sfc-gh-sbekman in #285
- Allow masking empty think tokens preventing the loss of thinking ability by @sfc-gh-lborchmann in #286
- Expose scheduler-specific kwargs, such as min_lr_rate by @sfc-gh-lborchmann in #284
- Fix: prevent conversion of bool to float in deepspeed config by @sfc-gh-mwyatt in #287
- add python profiler by @sfc-gh-sbekman in #288
- Optimization: Better data filter and packing performance by @sfc-gh-mwyatt in #292
- tiled mlp: auto-monkeypatch by @sfc-gh-sbekman in #290
- model-specific flop counters by @sfc-gh-sbekman in #289
- Finish porting testing_utils.py by @sfc-gh-sbekman in #291
- move
wandblog dir out of repo's root by @sfc-gh-sbekman in #294 make autoformatrun only on branch's modified files by @sfc-gh-sbekman in #295- do not log the first train iter to wandb by @sfc-gh-sbekman in #293
- [CI] modal gpus workflow by @sfc-gh-sbekman in #299
- new feature: CausalTrainer by @sfc-gh-sbekman in #210
- ALST/UlyssesSP: API wrt variable seqlen by @sfc-gh-sbekman in #298
- allow hf model config overrides by @sfc-gh-sbekman in #302
- modal ci: fix by @sfc-gh-sbekman in #303
- rename fusedadam => fused_adam by @sfc-gh-sbekman in #306
- allow hf model config overrides: take 2 by @sfc-gh-sbekman in #304
- Bump version from 0.6.1.dev0 to 0.7.0 by @sfc-gh-mwyatt in #307
Full Changelog: v0.6.0...v0.7.0
ArcticTraining v0.6.0
What's Changed
- Bump v0.0.6.dev0 by @sfc-gh-mwyatt in #213
- add a link to a new blog post by @sfc-gh-sbekman in #215
- Update CODEOWNERS by @sfc-gh-sbekman in #216
- Updated paper and project list by @sfc-gh-jrasley in #218
- Update SwiftKV Llama and Qwen to support transformers 4.53 by @sfc-gh-jrasley in #221
- [ALST] sync with transformers>=4.53 masking utils changes by @sfc-gh-sbekman in #223
- Artifact download script by @sfc-gh-jrasley in #226
- Fix package files not being included on install by @sfc-gh-mwyatt in #227
- switch to raw yaml loading for artifact download script by @sfc-gh-jrasley in #228
- Fix for custom user script relative import by @sfc-gh-mwyatt in #222
- Refactor for supported SFT datasets by @sfc-gh-mwyatt in #220
- add FA3 support by @sfc-gh-sbekman in #232
- Add evaluation method by @sfc-gh-mwyatt in #186
- Fix for long error stack traces by @sfc-gh-mwyatt in #233
- Better datasets map and filter performance by @sfc-gh-mwyatt in #234
- Refactor dataloader creation for improved maintainability (aka. don't forget about persistent_workers) by @sfc-gh-prenc in #235
- CONTRIBUTING.md: add first-good-issue link by @sfc-gh-sbekman in #236
- FA3 support: fix tflops by @sfc-gh-sbekman in #238
- report the correct device in pynvml by @sfc-gh-sbekman in #239
- Extract eval log iter condition by @sfc-gh-prenc in #237
- set max_length to max_position_embeddings by @therealnaveenkamal in #240
- Fix: Make isort see
wandbas third party by @sfc-gh-mwyatt in #243 - ALST: add 1x H200 recipes by @sfc-gh-sbekman in #245
- [ALST] override attn mask for sdpa by @sfc-gh-sbekman in #242
- Debug/Dev Feature: Repeat small datasets to
max_lengthby @sfc-gh-mwyatt in #241 - typo by @sfc-gh-sbekman in #247
- ALST: FA3 and new Liger-kernel for int64 support by @sfc-gh-sbekman in #249
- Switch swiftkv llama-70b base model from 3.1 to 3.3 by @sfc-gh-jrasley in #229
- Update arctic-txt2sql README.md by @sfc-gh-bzhai in #225
- integrate TiledFusedLogitsLoss by @sfc-gh-sbekman in #244
- Update SwiftKV sequence parallel with updated TiledFusedLogitsLoss by @sfc-gh-aqiao in #248
- [swiftkv] transformers 4.54 has deepseek_v2 now by @sfc-gh-jrasley in #251
- checkpoint resume support by @sfc-gh-jrasley in #252
- new deepspeed release by @sfc-gh-sbekman in #253
- Bug fix: reordering of SFT dataset chats by @sfc-gh-mwyatt in #264
- [eval] replace
torch.inference_modewithtorch.no_gradby @sfc-gh-sbekman in #265 - [SP] make eval work by @sfc-gh-sbekman in #259
- bump v0.6.0 by @sfc-gh-mwyatt in #267
New Contributors
- @sfc-gh-prenc made their first contribution in #235
- @therealnaveenkamal made their first contribution in #240
Full Changelog: v0.0.5...v0.6.0
ArcticTraining v0.0.5
What's Changed
- bump to v0.0.5.dev0 by @sfc-gh-mwyatt in #194
- add openorca and acemath datasets by @sfc-gh-aqiao in #198
- fix lr scheduler for sequence parallel by @sfc-gh-aqiao in #199
- Add random sampling from data sources by @sfc-gh-aqiao in #197
- s/UlyssesPlus/Arctic Long Sequence Training (ALST)/ by @sfc-gh-sbekman in #202
- Fix custom code import error by @sfc-gh-mwyatt in #204
- Fix malformed pyproject.toml by @sfc-gh-mwyatt in #205
- Remove unnecessary assertion by @sfc-gh-jaelee in #206
- Long-context SwiftKV training by @sfc-gh-aqiao in #196
- add ALST paper reference by @sfc-gh-sbekman in #208
- Update SwiftKV configs and data by @sfc-gh-aqiao in #209
- Generalize support for instruct datasets by @sfc-gh-mwyatt in #211
- bump v0.0.5 by @sfc-gh-mwyatt in #212
Full Changelog: v0.0.4...v0.0.5
ArcticTraining v0.0.4
What's Changed
- bump v0.0.4.dev0 by @sfc-gh-mwyatt in #180
- add
data.config.dl_num_workersby @sfc-gh-sbekman in #177 - Avoid corrupted data cache by @sfc-gh-mwyatt in #174
- extended metrics + fake long sequences by @sfc-gh-sbekman in #181
- pre-Ulysses PR features by @sfc-gh-sbekman in #182
- Fix save_every_n_epochs supressing save_every_n_steps by @sfc-gh-lborchmann in #184
- Faster startup by @sfc-gh-mwyatt in #185
- add arctci-text2sql-r1 by @sfc-gh-bzhai in #183
- update header logo by @sfc-gh-jrasley in #187
- Fix early exit by @sfc-gh-mwyatt in #188
- self contained tests by @sfc-gh-sbekman in #189
- enabled python -m arctic_training_cli by @sfc-gh-sbekman in #191
- more GAS support + metrics-to-file logging by @sfc-gh-sbekman in #190
- Add DeepseekV2SwiftKV and reorganize swiftkv project by @sfc-gh-aqiao in #192
- Add Sequence Parallelism via Ulysses by @sfc-gh-sbekman in #45
- bump to v0.0.4 by @sfc-gh-mwyatt in #193
New Contributors
- @sfc-gh-lborchmann made their first contribution in #184
Full Changelog: v0.0.3...v0.0.4
ArcticTraining v0.0.3
New Features
- Fastest Speculative Decoding in vLLM with Arctic Inference and Arctic Training
- Snowflake Arctic Embed Joins ArcticTraining: Simple And Scalable Embedding Model Training
- DPO Trainer
What's Changed
- Refactor Data Loading by @sfc-gh-mwyatt in #41
- Improve data loading error message by @sfc-gh-mwyatt in #55
- Add unit tests for DataFactory by @sfc-gh-mwyatt in #54
- Refactor class registry by @sfc-gh-mwyatt in #56
- Switch to all fully qualified imports by @sfc-gh-mwyatt in #59
- Add Wandb callback by @sfc-gh-mwyatt in #60
- add additional W&B args by @sfc-gh-mwyatt in #61
- full deprecate scheduler.lr setting by @sfc-gh-mwyatt in #64
- Move to smaller CI model by @sfc-gh-mwyatt in #67
- info log the caching path by @sfc-gh-sbekman in #68
- Add PEFT support by @sfc-gh-mwyatt in #66
- Add tokenizer name as input to cache path creation by @sfc-gh-mwyatt in #69
- Use cached environment in actions workflows by @sfc-gh-mwyatt in #70
- Fix error with unit test env caching by @sfc-gh-mwyatt in #71
- Fix cache path collision by @sfc-gh-mwyatt in #76
- Make arctic training logging manage all logging by @sfc-gh-lmerrick in #53
- Add ZeRO3 checkpoint support for PEFT models by @sfc-gh-mwyatt in #73
- remove redundant code by @sfc-gh-sbekman in #83
- Create STYLE_GUIDE.md by @sfc-gh-sbekman in #77
- add Makefile by @sfc-gh-sbekman in #78
- cpu adam support by @sfc-gh-jrasley in #89
- add basic step timer by @sfc-gh-jrasley in #91
- Multi-Replica Generation (PR 1/4) by @sfc-gh-srajbhandari in #95
- allow newer transformers versions by @sfc-gh-jrasley in #96
- Add help message about DeepSpeed args to ArcticTraining launcher by @sfc-gh-mwyatt in #100
- Error on yaml config duplicate keys by @sfc-gh-mwyatt in #99
- fix caller import by @sfc-gh-bzhai in #103
- Resolve data cache path collision bug by @sfc-gh-mwyatt in #102
- Refactoring data generation for Spec Decoding to use the new Multi-Re… by @sfc-gh-srajbhandari in #104
- Arctic Embed in Arctic Training! by @sfc-gh-lmerrick in #107
- Fix include and exclude git LFS example code by @sfc-gh-lmerrick in #108
- Allow local data files for huggingface data sources by @sfc-gh-mwyatt in #106
- require
liger-kernel>=0.5.5by @sfc-gh-sbekman in #109 - Switch to DistributedSampler by @sfc-gh-mwyatt in #105
- add news section by @sfc-gh-jrasley in #110
- Fix for cache path arg included values by @sfc-gh-mwyatt in #111
- Add support for user-passed data splits by @sfc-gh-mwyatt in #112
- Add dev flag to repeat data samples by @sfc-gh-mwyatt in #113
- ExCoT-DPO project by @sfc-gh-bzhai in #65
- update links by @sfc-gh-jrasley in #119
- Update DPO liger loss check by @sfc-gh-mwyatt in #118
- Add
cache_fs_typefield by @sfc-gh-mwyatt in #121 - add arxiv link in tutorial by @sfc-gh-jrasley in #120
- update readme eval by @sfc-gh-bzhai in #124
- elevate projects and add older news by @sfc-gh-jrasley in #123
- Add training metrics logging by @sfc-gh-mwyatt in #122
- Training metrics output to W&B by @sfc-gh-mwyatt in #125
- Increase max line width to 119 by @sfc-gh-mwyatt in #128
- ExCoT models are public now by @sfc-gh-jrasley in #130
- bump to v0.0.3 by @sfc-gh-jrasley in #129
- Small improvements / fixes for metrics logging by @sfc-gh-mwyatt in #127
- Fix for Liger model callback warning by @sfc-gh-mwyatt in #131
- fix rank by @sfc-gh-sbekman in #132
- another multi-node rank issue by @sfc-gh-sbekman in #133
- metrics: seqlen report by @sfc-gh-sbekman in #134
- human_format_base2_number change to 2 decimals by @sfc-gh-sbekman in #136
- Sparse Attention Recipe by @sfc-gh-srajbhandari in #135
- Update cli.py by @sfc-gh-zhyao in #139
- Fix
clean_files_older_than_n_daysnot working caused by Azure API change by @sfc-gh-caxu in #140 - Add
div_lengthconfig to SFTDataFactory by @sfc-gh-mwyatt in #142 - metrics: better seconds formatting by @sfc-gh-sbekman in #137
- Add Qwen2-SwiftKV and reorganize swiftkv project by @sfc-gh-aqiao in #145
- fix multi-epoch training by @sfc-gh-aqiao in #147
- project: authors + stableness + py versions by @sfc-gh-sbekman in #151
- Change python version for workflows by @sfc-gh-mwyatt in #154
- HfDeepSpeedConfig import by @sfc-gh-sbekman in #150
- [Makefile] new optional helper to remove unused imports by @sfc-gh-sbekman in #148
- Fix for license check hook adding multiple license by @sfc-gh-mwyatt in #156
- train iter log: add mem metrics by @sfc-gh-sbekman in #153
- an optional mem profiler by @sfc-gh-sbekman in #152
- Fix SwiftKV README by @sfc-gh-aqiao in #164
- Parallel SFT Data Packing by @sfc-gh-mwyatt in #162
- Settable pad length for SFTDataFactory by @sfc-gh-mwyatt in #163
- Human-friendly number parsing by @sfc-gh-mwyatt in #166
- Move
max_lengthto baseDataConfigby @sfc-gh-mwyatt in #168 - Fix ReadTheDocs build and add unit test by @sfc-gh-mwyatt in #169
- support mlp-variant-speculator by @sfc-gh-jaelee in #170
- Update README.md by @sfc-gh-aqiao in #173
- Add early exit kill switch by @sfc-gh-mwyatt in #175
- Allow stderr on non-print ranks by @sfc-gh-jrasley in #161
- Add data-process mode to CLI by @sfc-gh-mwyatt in #178
- Human friendly values for DeepSpeed config by @sfc-gh-mwyatt in #167
- Fix for python wheel builds by @sfc-gh-mwyatt in #179
New Contributors
- @sfc-gh-sbekman made their first contribution in #68
- @sfc-gh-lmerrick made their first contribution in #53
- @sfc-gh-srajbhandari made their first contr...