#107 Enhance/return codes by kcathey · Pull Request #128 · llnl/maestrowf

kcathey · 2018-08-03T14:41:03Z

These are the changes we discussed.

Changed DAG execute_ready_steps to return StudyStatus enum instead of a boolean conductor and maestro also accept StudyStatus and convert to return value

FrankD412 · 2018-08-03T16:41:03Z


-    study_complete = False
-    while not study_complete:
+    study_complete = StudyStatus.RUNNING


I recommend renaming this variable just to be clearer. While it is meant to check for study completion, it is not just a reflection of completion. I'd recommend something like study_status or completion_status so that it's clearer.

Yup you're right

FrankD412 · 2018-08-03T16:47:59Z

        cancel_lock_path = os.path.join(args.directory, ".cancel.lock")
        logger.info("Starting to monitor '%s'", dag.name)
-        monitor_study(dag, study_pkl[0], cancel_lock_path, args.sleeptime)
+        study_status = monitor_study(dag, study_pkl[0], cancel_lock_path, args.sleeptime)


This line fails flake8 checks -- it's longer than 79 characters.

I'll take a look

FrankD412 · 2018-08-03T16:49:51Z

@kcathey -- I'd also recommend rebasing your branch and resolving conflicts that may come up.

FrankD412 · 2018-08-03T16:51:03Z

            cancel_path = os.path.join(path, ".cancel.lock")
-            monitor_study(exec_dag, pkl_path, cancel_path, args.sleeptime)
+            # capture the StudyStatus enum to return
+            study_status = monitor_study(exec_dag, pkl_path, cancel_path, args.sleeptime)


I believe this one is also longer than 79 characters.

I'll take a look

FrankD412 · 2018-08-03T19:53:07Z

            # are no more things to run.
            logging.info("'%s' is complete. Returning.", self.name)
-            return True
+            return StudyStatus.FINISHED


@kcathey and I were talking about this condition. I found that the conductor returned 0 (FINISHED) status even when cancelled. @kcathey pointed out that this conditional gets triggered without checking for cancelled state. Kevin is going to look into a refactor to clean this up because I had initially done this to assess the state entering and leaving the status check to avoid needlessly waiting for another round of updates.

FrankD412 · 2018-08-03T19:53:54Z

+
+class StudyStatus(Enum):
+    """Workflow status enumeration"""
+    FINISHED = 0


For documentation purposes, it might be good to add inline comments here for the definitions of the enums. I plan to go back and update the other ones so their definitions are clear.

So something like this: FINISHED = 0 # The Study has finished successfully, all steps ran

to avoid initial implementaion copy and paste error

…nto enhance/return-codes * 'enhance/return-codes' of github.com:kcathey/maestrowf: Fix status.csv 'State' column writeout. (llnl#127) Updated Exception.message references to Exception.args (llnl#125) Additional debug logging for the Flux Spectrum Adapter (llnl#122) Add the generation of metadata to Study construction. (llnl#120) Updated the LULESH examples to use the LULESH git repository for cloning the repo (llnl#121) Tweaks and addition of starting to SpectrumFluxAdapter. (llnl#119) Refactor of the Study class to breakdown complex APIs (llnl#118)

kcathey

I think everything is fixed at this point.

FrankD412

Alright, this looks like it's functioning and meets all checks.

* Update setup.py to 1.1.3dev * Updated the getting started documentation * Added examples for running both lulesh specs * Flux for Spectrum bugfixes (#116) * Additional arguments can be passed through batch. * Correction of the use of extend to append. * Removal of OMPI vars from env. * Reversal of env altering. * Addition of mpi exe to batch. * Removal of -gpu flag * Addition of "allocated" to PENDING set. * Correction of the check_status method call. * Correction of the Flux import * Addition of the EnvironmentError to try/catch for check_status. * Addition of jobid to submission INFO logging. * Workflow setup fix (#117) * Fixed a workflow setup issue caused by spaces in the spec name * Additional dirname formatting * Refactor of the Study class to breakdown complex APIs (#118) * Minor docstring correction. * Change to study setup API to break out workspace creation. * Update Maestro frontend to use new Study method. * Renamed Study.setup to be better communicate its functionality. * Correction to style for configure_study method. * Moved environment application to add_step. * Split out acquiring environment elements to its own method. * Updates to staging checks. * Renamed _setup_linear and _setup_parameterized from setup to stage for clarity. * Tweaks and addition of starting to SpectrumFluxAdapter. (#119) * Updated the LULESH examples to use the LULESH git repository for cloning the repo (#121) * Add the generation of metadata to Study construction. (#120) * Addition of method and tweaks to support metadata. * Addition of call to the method for generating metadata. * Correction to encode YAML string for py3 compatibility. * Another attempt at py2-py3 compatible encoding. * Addition of load_metadata method. * Additional debug logging for the Flux Spectrum Adapter (#122) * Additional logging for debugging of the adapter. * Additional logging. * Updated Exception.message references to Exception.args (#125) * Fix status.csv 'State' column writeout. (#127) * #107 Enhance/return codes (#128) * #107 Added StudyStatus enumeration Changed DAG execute_ready_steps to return StudyStatus enum instead of a boolean conductor and maestro also accept StudyStatus and convert to return value * #128 fixed issues found in pull request * #107 added inline comments per PC * #107 pulling out _check_study_completion to avoid initial implementaion copy and paste error * #107 fixed line length issue * Bugfix that would cause the ExecutionGraph to not update when cancelled. (#131) * Addition of custom parameter generation when running studies. (#129) * Command line parameter for custom ParameterGenerators. * Addition of sample custom parameter generation. * Correction to missing parameters in SIZE. * Addition of utility method for importing custom gen py files. * Correction of syntax. * Conversion of DOS newlines to Unix. * Addition of call to check for custom code. * Correction of a bad docstring. * Addition of parameter metadata to Maestro. * Addition of parameters to study metadata. * Addition of copying of the parameter file to study workspace. * Correction to a bug for attempting a restart when a restart isn't specified. (#133) * Initial fix for steps that cannot restart. * Comments to clarify logic. * More flexible ExecutionGraph description API and logging of description. (#137) * Tweaks to how the ExecutionGraph adds descriptions. Added logging. * Updated study to use the new API and some minor refactor. * Fixed #134 * Fix to Variable verification. (#138) Fixes #123 * Updates to Maestro documentation (#114) * Updates to the README. * Added initial getting started example documentation * Addition of Quick Start documentation. (#130) * Addition of Quick Start. * Addition of cancellation status example. * Corrections to quick start and fix for build errors. * Addition of a LULESH study specification with heavier comments. (#132) * Start of a LULESH spec that's commented. * Addition of explanation of step structure. * Additional comments about steps and dependencies. * Minor tweaks to comments. * Addition of parameter block comments. * Addition of reference to launcher_token YAML * Tweaks and edits based on feedback. * Throttled workflows push steps into the ready queue multiple times. (#136) * Moved 'mark_submitted' into ExecutionGraph 'execute' * Conversion of ready_steps deque to use step names. * Capping the available slots to the length of ready_steps. * Fixes Popen output to use universal newlines. (#143) * Addition of a shell util to centralize process creation settings. * Switch to start_process method. * Addition of docstring for start_process. * Update to fix universal newline from false to true. * Addition of a check for a list and a hard set of shell to False. * Made conductor launch command a string. * Fix behavior of start_process for localscriptadapter. The shell=False flag was not being passed, nor were the env or cwd arguments being used in the actual function. * Clean up of start_process * Cleans up GitDependency errors to not rely on return codes. (#144) * Addition of a ping method. * Tweaks to error logging. * Moved specific existence exception out of return code check. * Cleanup of some merge characters. * Tweak for connectivity checking. * Addition of info to the error message. * Removal of git return codes since they don't mean anything. * Update version number to 1.1.3.

llnl#107 Added StudyStatus enumeration

fa9d5b8

Changed DAG execute_ready_steps to return StudyStatus enum instead of a boolean conductor and maestro also accept StudyStatus and convert to return value

FrankD412 reviewed Aug 3, 2018

View reviewed changes

FrankD412 requested changes Aug 3, 2018

View reviewed changes

FrankD412 reviewed Aug 3, 2018

View reviewed changes

FrankD412 assigned kcathey Aug 3, 2018

FrankD412 added the enhancement label Aug 3, 2018

kcathey and others added 2 commits August 3, 2018 10:54

llnl#128 fixed issues found in pull request

703853f

Merge branch 'develop' into enhance/return-codes

4db8930

FrankD412 reviewed Aug 3, 2018

View reviewed changes

kcathey added 4 commits August 3, 2018 13:54

llnl#107 added inline comments per PC

5328f28

llnl#107 pulling out _check_study_completion

e0726a3

to avoid initial implementaion copy and paste error

llnl#107 fixed line length issue

7f286a6

kcathey commented Aug 3, 2018

View reviewed changes

FrankD412 approved these changes Aug 3, 2018

View reviewed changes

FrankD412 merged commit 38a91b0 into llnl:develop Aug 3, 2018

kcathey mentioned this pull request Aug 3, 2018

Add return codes to the conductor #107

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

#107 Enhance/return codes#128

#107 Enhance/return codes#128
FrankD412 merged 7 commits intollnl:developfrom
kcathey:enhance/return-codes

kcathey commented Aug 3, 2018

Uh oh!

FrankD412 Aug 3, 2018 •

edited

Loading

Uh oh!

kcathey Aug 3, 2018

Uh oh!

FrankD412 Aug 3, 2018

Uh oh!

kcathey Aug 3, 2018

Uh oh!

FrankD412 commented Aug 3, 2018

Uh oh!

FrankD412 Aug 3, 2018

Uh oh!

kcathey Aug 3, 2018

Uh oh!

FrankD412 Aug 3, 2018

Uh oh!

FrankD412 Aug 3, 2018

Uh oh!

kcathey Aug 3, 2018

Uh oh!

kcathey left a comment

Uh oh!

FrankD412 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kcathey commented Aug 3, 2018

Uh oh!

FrankD412 Aug 3, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

FrankD412 commented Aug 3, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kcathey left a comment

Choose a reason for hiding this comment

Uh oh!

FrankD412 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

FrankD412 Aug 3, 2018 •

edited

Loading