Skip to content

Update flux exclusive behavior modify task info passed to allocation/launcher#467

Merged
jwhite242 merged 13 commits intodevelopfrom
feature/exclusive_mapping
Mar 19, 2026
Merged

Update flux exclusive behavior modify task info passed to allocation/launcher#467
jwhite242 merged 13 commits intodevelopfrom
feature/exclusive_mapping

Conversation

@jwhite242
Copy link
Copy Markdown
Collaborator

@jwhite242 jwhite242 commented Mar 12, 2026

Updates handling of exclusive keys to modify the batch directives and launcher rendering

  • Prunes everything but nodes from allocations
  • If set on launcher, overrides any cores per task, gpu flags, leaving nodes and tasks only
  • Jobspec creation via Python api now sets num_slots = nodes when exclusive is set, matching flux behavior

Updates default renderings:

  • cores per task omitted from launcher if not set in step keys
  • removes default 'nodes=1' in batch header/directives to sync with step keys
  • retains required >=1 for tasks/cores per task in jobspec creation but omits from rendered batch scripts to sync with flux default behaviors

Unblocks scheduling to nodes with configurable resources: subdividing gpus happens after allocation startup, and prior behavior of attaching tasks/gpus to allocation would cause flux to reject such jobs (unsatisfiable).

@jwhite242 jwhite242 changed the title Update flux exclusive behavior to prune tasks/cores/gpus from allocat… Update flux exclusive behavior modify task info passed to allocation/launcher Mar 12, 2026
…equirements to set nodes if exclusive also set
ngpus_per_slot = int(ceil(ngpus / nodes))
else:
ngpus_per_slot = None
LOGGER.warn(f"NODES recieved: {nodes}")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo and also why is it a wraning? Shouldn't this be an info? (unless you want it in the else)


args = ["flux", "run", "-n", str(procs)]

args = ["flux", "run"] #, "-n", str(procs)]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to keep the commented out bit?

if nodes is not None and nodes != '':
args.append("-N")
args.append(str(nodes))
# ntasks = nodes if nodes else 1
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment do we keep this?

# args.append(str(ntasks))

# NOTE: should we raise an exception here instead of just logging the error?
# better pre-run validation might be more useful...
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I agree I'd raise an error since we log it, or downgrade the log to a warning.


if procs:
args.append("-n")
args.append(str(procs))
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

args.append(f"- n {procs}") ?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

convention on all the adapters so far is each arg as a separate list item, following some of the subprocessing style arguments for shell commands, so going to leave it this way to be consistent

nodes = None
if nodes is not None and nodes != '': # Have to account for empty string!
nodes = int(nodes)
# if not isinstance(nodes, int):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cleanup?


# Calculate nprocs
ncores = cores_per_task * nodes
ncores = cores_per_task * processors # What was this check doing?
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like a big change, was it broken before?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

allowing nodes to be 0 or None (never could be before) broke it.

will likely remove it entirely later.. not clear this check is useful given there's no place to eval against the machine... might only catch negative values now, which will be better checked upstream after future refactoring.

conf:
resource:
rediscover: True # Flux wants 'true', not 'True' that python bool serializes to
rediscover: True
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed the comment?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stray debris from prior pr: already fixed it so it's not a user facing problem anymore

launcher_args:
setopt:
fastload:
fastload:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

want the new space?

conf:
resource:
rediscover: "true" # Flux wants 'true', not 'True' that python bool serializes to
rediscover: "true"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so string here but True above? Is it a flux issue?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

either one works: it's json syntax. see 'render_additional_args' in the flux0_49_0.py for explanation

@jwhite242 jwhite242 merged commit 73fd565 into develop Mar 19, 2026
14 checks passed
jwhite242 added a commit that referenced this pull request Mar 26, 2026
* Extra Flux Options  (#459)
* Add autoyes to cancel command  (#461)
* Add compact, sortable 'hashing' strategy for step ids/workspaces (#465)
* Update flux exclusive behavior modify task info passed to allocation/launcher (#467)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants