Skip to content

Refactors train loop, adds padded batch packer, other fixes#654

Merged
Maxusmusti merged 2 commits intofixed-speed-gpt-oss-support-freezing-fix-lossfrom
i-show-speed-gpt-oss-changes-2
Sep 15, 2025
Merged

Refactors train loop, adds padded batch packer, other fixes#654
Maxusmusti merged 2 commits intofixed-speed-gpt-oss-support-freezing-fix-lossfrom
i-show-speed-gpt-oss-changes-2

Conversation

@RobotSail
Copy link
Copy Markdown
Member

This PR adds a number of improvements:

  1. padded batch packer
  2. refactor code so training loop is simpler
  3. simplifies batch management and aggregation
  4. simplifies gpt-oss saving
  5. adds tests for sampler and batch packing algorithms
  6. fixes gpt-oss detection during data processing

There are still a few other things to fix but this resolves the bulk of the outstanding items

@RobotSail RobotSail changed the base branch from main to fixed-speed-gpt-oss-support-freezing-fix-loss September 12, 2025 05:59
@mergify mergify Bot added testing Relates to testing ci-failure dependencies Pull requests that update a dependency file labels Sep 12, 2025
@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Sep 12, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. @RobotSail please rebase it. https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify Bot added the needs-rebase label Sep 12, 2025
@RobotSail RobotSail force-pushed the i-show-speed-gpt-oss-changes-2 branch from 9813f8a to e41ea48 Compare September 12, 2025 06:01
@mergify mergify Bot removed the ci-failure label Sep 12, 2025
@RobotSail RobotSail force-pushed the i-show-speed-gpt-oss-changes-2 branch from e41ea48 to 0253e48 Compare September 12, 2025 19:04
@mergify mergify Bot removed the needs-rebase label Sep 12, 2025
@RobotSail RobotSail force-pushed the i-show-speed-gpt-oss-changes-2 branch from 0253e48 to ca6293b Compare September 12, 2025 19:12
Copy link
Copy Markdown
Collaborator

@Maxusmusti Maxusmusti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mergify mergify Bot added the one-approval label Sep 15, 2025
Copy link
Copy Markdown
Collaborator

@Maxusmusti Maxusmusti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Maxusmusti Maxusmusti merged commit c1bb315 into fixed-speed-gpt-oss-support-freezing-fix-loss Sep 15, 2025
1 check passed
@Maxusmusti Maxusmusti deleted the i-show-speed-gpt-oss-changes-2 branch September 15, 2025 19:17
RobotSail added a commit that referenced this pull request Sep 16, 2025
* addition of padded batch packer + simplified train loop

* update tests + linting
RobotSail added a commit that referenced this pull request Sep 16, 2025
* addition of padded batch packer + simplified train loop

* update tests + linting
RobotSail added a commit that referenced this pull request Sep 16, 2025
* addition of padded batch packer + simplified train loop

* update tests + linting
Maxusmusti added a commit that referenced this pull request Sep 17, 2025
* Adding dequantized load support for gpt_oss models

Signed-off-by: Mustafa Eyceoz <[email protected]>

* Update gpt oss saving with requantization

Signed-off-by: Mustafa Eyceoz <[email protected]>

* Adjust data processing for gpt format

Signed-off-by: Mustafa Eyceoz <[email protected]>

* fix for exact quantization algorithm to replicate OpenAI quantized weights

* Speedup replicate implementation

Signed-off-by: Mustafa Eyceoz <[email protected]>

* router freezing for gpt oss

Signed-off-by: Mustafa Eyceoz <[email protected]>

* Add corrected loss, aux loss support, and batching updates

Signed-off-by: Mustafa Eyceoz <[email protected]>

* Cleanup unnecessary test files

Signed-off-by: Mustafa Eyceoz <[email protected]>

* Fix linting and review feedback

Signed-off-by: Mustafa Eyceoz <[email protected]>

* Add linting skip for mxfp4 import

Signed-off-by: Mustafa Eyceoz <[email protected]>

* Fix unit tests with mock configs

Signed-off-by: Mustafa Eyceoz <[email protected]>

* Switch to mini trainer sampler

Signed-off-by: Mustafa Eyceoz <[email protected]>

* remove dead code + add defaults (#653)

* Refactors train loop, adds padded batch packer, other fixes (#654)

* addition of padded batch packer + simplified train loop

* update tests + linting

* fix tests x2

---------

Signed-off-by: Mustafa Eyceoz <[email protected]>
Co-authored-by: Nikhil Nayak <[email protected]>
Co-authored-by: Oleg Silkin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file one-approval testing Relates to testing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants