#107 Enhance/return codes#128
Conversation
Changed DAG execute_ready_steps to return StudyStatus enum instead of a boolean conductor and maestro also accept StudyStatus and convert to return value
|
|
||
| study_complete = False | ||
| while not study_complete: | ||
| study_complete = StudyStatus.RUNNING |
There was a problem hiding this comment.
I recommend renaming this variable just to be clearer. While it is meant to check for study completion, it is not just a reflection of completion. I'd recommend something like study_status or completion_status so that it's clearer.
| cancel_lock_path = os.path.join(args.directory, ".cancel.lock") | ||
| logger.info("Starting to monitor '%s'", dag.name) | ||
| monitor_study(dag, study_pkl[0], cancel_lock_path, args.sleeptime) | ||
| study_status = monitor_study(dag, study_pkl[0], cancel_lock_path, args.sleeptime) |
There was a problem hiding this comment.
This line fails flake8 checks -- it's longer than 79 characters.
|
@kcathey -- I'd also recommend rebasing your branch and resolving conflicts that may come up. |
| cancel_path = os.path.join(path, ".cancel.lock") | ||
| monitor_study(exec_dag, pkl_path, cancel_path, args.sleeptime) | ||
| # capture the StudyStatus enum to return | ||
| study_status = monitor_study(exec_dag, pkl_path, cancel_path, args.sleeptime) |
There was a problem hiding this comment.
I believe this one is also longer than 79 characters.
| # are no more things to run. | ||
| logging.info("'%s' is complete. Returning.", self.name) | ||
| return True | ||
| return StudyStatus.FINISHED |
There was a problem hiding this comment.
@kcathey and I were talking about this condition. I found that the conductor returned 0 (FINISHED) status even when cancelled. @kcathey pointed out that this conditional gets triggered without checking for cancelled state. Kevin is going to look into a refactor to clean this up because I had initially done this to assess the state entering and leaving the status check to avoid needlessly waiting for another round of updates.
|
|
||
| class StudyStatus(Enum): | ||
| """Workflow status enumeration""" | ||
| FINISHED = 0 |
There was a problem hiding this comment.
For documentation purposes, it might be good to add inline comments here for the definitions of the enums. I plan to go back and update the other ones so their definitions are clear.
There was a problem hiding this comment.
So something like this: FINISHED = 0 # The Study has finished successfully, all steps ran
to avoid initial implementaion copy and paste error
…nto enhance/return-codes * 'enhance/return-codes' of github.com:kcathey/maestrowf: Fix status.csv 'State' column writeout. (llnl#127) Updated Exception.message references to Exception.args (llnl#125) Additional debug logging for the Flux Spectrum Adapter (llnl#122) Add the generation of metadata to Study construction. (llnl#120) Updated the LULESH examples to use the LULESH git repository for cloning the repo (llnl#121) Tweaks and addition of starting to SpectrumFluxAdapter. (llnl#119) Refactor of the Study class to breakdown complex APIs (llnl#118)
kcathey
left a comment
There was a problem hiding this comment.
I think everything is fixed at this point.
FrankD412
left a comment
There was a problem hiding this comment.
Alright, this looks like it's functioning and meets all checks.
* Update setup.py to 1.1.3dev * Updated the getting started documentation * Added examples for running both lulesh specs * Flux for Spectrum bugfixes (#116) * Additional arguments can be passed through batch. * Correction of the use of extend to append. * Removal of OMPI vars from env. * Reversal of env altering. * Addition of mpi exe to batch. * Removal of -gpu flag * Addition of "allocated" to PENDING set. * Correction of the check_status method call. * Correction of the Flux import * Addition of the EnvironmentError to try/catch for check_status. * Addition of jobid to submission INFO logging. * Workflow setup fix (#117) * Fixed a workflow setup issue caused by spaces in the spec name * Additional dirname formatting * Refactor of the Study class to breakdown complex APIs (#118) * Minor docstring correction. * Change to study setup API to break out workspace creation. * Update Maestro frontend to use new Study method. * Renamed Study.setup to be better communicate its functionality. * Correction to style for configure_study method. * Moved environment application to add_step. * Split out acquiring environment elements to its own method. * Updates to staging checks. * Renamed _setup_linear and _setup_parameterized from setup to stage for clarity. * Tweaks and addition of starting to SpectrumFluxAdapter. (#119) * Updated the LULESH examples to use the LULESH git repository for cloning the repo (#121) * Add the generation of metadata to Study construction. (#120) * Addition of method and tweaks to support metadata. * Addition of call to the method for generating metadata. * Correction to encode YAML string for py3 compatibility. * Another attempt at py2-py3 compatible encoding. * Addition of load_metadata method. * Additional debug logging for the Flux Spectrum Adapter (#122) * Additional logging for debugging of the adapter. * Additional logging. * Updated Exception.message references to Exception.args (#125) * Fix status.csv 'State' column writeout. (#127) * #107 Enhance/return codes (#128) * #107 Added StudyStatus enumeration Changed DAG execute_ready_steps to return StudyStatus enum instead of a boolean conductor and maestro also accept StudyStatus and convert to return value * #128 fixed issues found in pull request * #107 added inline comments per PC * #107 pulling out _check_study_completion to avoid initial implementaion copy and paste error * #107 fixed line length issue * Bugfix that would cause the ExecutionGraph to not update when cancelled. (#131) * Addition of custom parameter generation when running studies. (#129) * Command line parameter for custom ParameterGenerators. * Addition of sample custom parameter generation. * Correction to missing parameters in SIZE. * Addition of utility method for importing custom gen py files. * Correction of syntax. * Conversion of DOS newlines to Unix. * Addition of call to check for custom code. * Correction of a bad docstring. * Addition of parameter metadata to Maestro. * Addition of parameters to study metadata. * Addition of copying of the parameter file to study workspace. * Correction to a bug for attempting a restart when a restart isn't specified. (#133) * Initial fix for steps that cannot restart. * Comments to clarify logic. * More flexible ExecutionGraph description API and logging of description. (#137) * Tweaks to how the ExecutionGraph adds descriptions. Added logging. * Updated study to use the new API and some minor refactor. * Fixed #134 * Fix to Variable verification. (#138) Fixes #123 * Updates to Maestro documentation (#114) * Updates to the README. * Added initial getting started example documentation * Addition of Quick Start documentation. (#130) * Addition of Quick Start. * Addition of cancellation status example. * Corrections to quick start and fix for build errors. * Addition of a LULESH study specification with heavier comments. (#132) * Start of a LULESH spec that's commented. * Addition of explanation of step structure. * Additional comments about steps and dependencies. * Minor tweaks to comments. * Addition of parameter block comments. * Addition of reference to launcher_token YAML * Tweaks and edits based on feedback. * Throttled workflows push steps into the ready queue multiple times. (#136) * Moved 'mark_submitted' into ExecutionGraph 'execute' * Conversion of ready_steps deque to use step names. * Capping the available slots to the length of ready_steps. * Fixes Popen output to use universal newlines. (#143) * Addition of a shell util to centralize process creation settings. * Switch to start_process method. * Addition of docstring for start_process. * Update to fix universal newline from false to true. * Addition of a check for a list and a hard set of shell to False. * Made conductor launch command a string. * Fix behavior of start_process for localscriptadapter. The shell=False flag was not being passed, nor were the env or cwd arguments being used in the actual function. * Clean up of start_process * Cleans up GitDependency errors to not rely on return codes. (#144) * Addition of a ping method. * Tweaks to error logging. * Moved specific existence exception out of return code check. * Cleanup of some merge characters. * Tweak for connectivity checking. * Addition of info to the error message. * Removal of git return codes since they don't mean anything. * Update version number to 1.1.3.
These are the changes we discussed.