Skip to content

Fix flux job cancellation#428

Merged
jwhite242 merged 4 commits intollnl:developfrom
jwhite242:bugfix/flux_cancel
Oct 30, 2023
Merged

Fix flux job cancellation#428
jwhite242 merged 4 commits intollnl:developfrom
jwhite242:bugfix/flux_cancel

Conversation

@jwhite242
Copy link
Copy Markdown
Collaborator

@jwhite242 jwhite242 commented Oct 4, 2023

Update 0.49 and 0.26 adapters cancellation to deal with recent update to submit that converts from flux JobID to native types.

Update general cancellation behaviors:

  • Check for in progress steps before declaring cancelled successfully to delay until actual final states can be serialized
  • Update cancel logic to mirror failure logic: mark all steps downstream of a cancelled step to also be cancelled

@jwhite242
Copy link
Copy Markdown
Collaborator Author

@FrankD412 You may want to revoke/revisit that review as testing led me to chase down some other oddities and make more significant behavioral changes to step cancellations in general

@jwhite242 jwhite242 merged commit 8487230 into llnl:develop Oct 30, 2023
jwhite242 added a commit that referenced this pull request Dec 12, 2023
1.1.10 Release (#432)

* Sync up read the docs config with dev environments using poetry (#399)
* Print usage on command line when no args are provided (#404)
* Add sacct fallback to slurm adapter to improve robustness of job tracking (#405)
* Update Flurm Job State mappings for flux versions >= 0.26 (#407)
* Bump certifi from 2021.10.8 to 2022.12.7 to address security issue (#409)
* Bump cryptography from 37.0.1 to 38.0.3 to address security issue (#410)
* Add missing shbang in unscheduled scripts from lsf adapter (#411)
* Update poetry lockfile to address dependabot flagged security issues (#412)
* Fix for Dockerfile smell DL3006 (#418)
* Port Maestro documentation to mkdocs and expand coverage of features and tutorials (#403)
* Update version info to be driven from pyproject.toml exclusively, and hook up to command line (#419)
* Pin mermaid to < 10.x due to api change (#422)
* Bump lock file certifi from 2022.12.7 to 2023.7.22 to address security issue (#426)
* Refactor flux adapter to avoid using pickle to talk to flux brokers installed in external environments (#415)
   Also adds flux integration tests to exercise against real flux brokers
* Add pager functionality to status command (#420)
* Patch broken flux job cancellation (#428)
* Insulate slurm adapters from user customization of squeue and sacct output formats (#431)
   Also adds live unit and integration tests for slurm adapter

---------

Co-authored-by: Francesco Di Natale <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Bruno P. Kinoshita <[email protected]>
Co-authored-by: Charles Doutriaux <[email protected]>
Co-authored-by: Giovanni Rosa <[email protected]>
Co-authored-by: Brian Gunnarson <[email protected]>
jwhite242 added a commit that referenced this pull request Feb 6, 2024
1.1.10 Release (#432)

* Sync up read the docs config with dev environments using poetry (#399)
* Print usage on command line when no args are provided (#404)
* Add sacct fallback to slurm adapter to improve robustness of job tracking (#405)
* Update Flurm Job State mappings for flux versions >= 0.26 (#407)
* Bump certifi from 2021.10.8 to 2022.12.7 to address security issue (#409)
* Bump cryptography from 37.0.1 to 38.0.3 to address security issue (#410)
* Add missing shbang in unscheduled scripts from lsf adapter (#411)
* Update poetry lockfile to address dependabot flagged security issues (#412)
* Fix for Dockerfile smell DL3006 (#418)
* Port Maestro documentation to mkdocs and expand coverage of features and tutorials (#403)
* Update version info to be driven from pyproject.toml exclusively, and hook up to command line (#419)
* Pin mermaid to < 10.x due to api change (#422)
* Bump lock file certifi from 2022.12.7 to 2023.7.22 to address security issue (#426)
* Refactor flux adapter to avoid using pickle to talk to flux brokers installed in external environments (#415)
   Also adds flux integration tests to exercise against real flux brokers
* Add pager functionality to status command (#420)
* Patch broken flux job cancellation (#428)
* Insulate slurm adapters from user customization of squeue and sacct output formats (#431)
   Also adds live unit and integration tests for slurm adapter

---------

Co-authored-by: Francesco Di Natale <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Bruno P. Kinoshita <[email protected]>
Co-authored-by: Charles Doutriaux <[email protected]>
Co-authored-by: Giovanni Rosa <[email protected]>
Co-authored-by: Brian Gunnarson <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants