Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
57 commits
Select commit Hold shift + click to select a range
3ba67fd
Update setup.py to 1.1.3dev
FrankD412 Jun 4, 2018
fe3b1ec
Updated the getting started documentation
jsemler Jun 26, 2018
bc47d46
Added examples for running both lulesh specs
jsemler Jun 26, 2018
1122fb2
Flux for Spectrum bugfixes (#116)
FrankD412 Jul 4, 2018
b05d4c8
Workflow setup fix (#117)
jsemler Jul 7, 2018
a08cf6e
Refactor of the Study class to breakdown complex APIs (#118)
FrankD412 Jul 20, 2018
6fbe0b1
Tweaks and addition of starting to SpectrumFluxAdapter. (#119)
FrankD412 Jul 23, 2018
aa28222
Updated the LULESH examples to use the LULESH git repository for clon…
jsemler Jul 26, 2018
e5c8ba1
Add the generation of metadata to Study construction. (#120)
FrankD412 Jul 26, 2018
c144d63
Additional debug logging for the Flux Spectrum Adapter (#122)
FrankD412 Jul 31, 2018
ceb9f9b
Updated Exception.message references to Exception.args (#125)
jsemler Aug 1, 2018
8b25484
Fix status.csv 'State' column writeout. (#127)
tadesautels Aug 3, 2018
38a91b0
#107 Enhance/return codes (#128)
kcathey Aug 3, 2018
bb3b49d
Bugfix that would cause the ExecutionGraph to not update when cancell…
FrankD412 Aug 8, 2018
b72bacb
Addition of custom parameter generation when running studies. (#129)
FrankD412 Aug 8, 2018
786d523
Correction to a bug for attempting a restart when a restart isn't spe…
FrankD412 Aug 17, 2018
fceeffa
More flexible ExecutionGraph description API and logging of descripti…
FrankD412 Aug 17, 2018
8b7977b
Fix to Variable verification. (#138)
FrankD412 Aug 20, 2018
44d873d
Updates to Maestro documentation (#114)
FrankD412 Aug 20, 2018
fa988bf
Throttled workflows push steps into the ready queue multiple times. (…
FrankD412 Aug 20, 2018
2ac39fd
Fixes Popen output to use universal newlines. (#143)
FrankD412 Sep 19, 2018
92d2b95
Cleans up GitDependency errors to not rely on return codes. (#144)
FrankD412 Sep 22, 2018
0b0d333
Update version number to 1.1.3.
Sep 29, 2018
fd990ce
Addition of user enabled workspace hashing (#145)
FrankD412 Sep 30, 2018
9c401aa
Update setup.py to 1.1.4dev
FrankD412 Oct 5, 2018
31ebde8
More generalized FluxScriptAdapter (#149)
FrankD412 Oct 12, 2018
3c26982
README tweak to update quickstart link. (#139)
FrankD412 Oct 21, 2018
3857ce2
typos. fixes #141 (#154)
gonsie Oct 23, 2018
bd9624e
Correction of flake8 style errors [new version of flake8].
Oct 31, 2018
91dbe65
Update to setup.py to reflect dev version 1.0
Oct 31, 2018
a940c70
Correction to safe pathing for missed cases and make_safe_path enhanc…
FrankD412 Nov 3, 2018
0b0c58f
Correction to fix the format of output status time to avoid a comma t…
FrankD412 Nov 5, 2018
824d113
Removal of _stage_linear since it is now not needed. (#156)
FrankD412 Nov 6, 2018
934ce97
Addition of pargs for passing parameters to custom parameter generati…
FrankD412 Nov 8, 2018
ff39d1e
do not overwrite log file. (#162)
robinson96 Nov 13, 2018
c52540e
Added confirmation message after launching a study (#163)
jsemler Nov 15, 2018
1b5c55c
Enhancements to store relative pathing in the metadata file. (#165)
FrankD412 Dec 12, 2018
4bacb28
Addition of tag to LULESH git dependency. (#169)
FrankD412 Jan 8, 2019
5ce0674
Script Adapter Plugin (#167) (#170)
kcathey Jan 18, 2019
b411a72
PyYAML vulnerability fix (#171)
FrankD412 Feb 2, 2019
f477dd4
Minor tweak to indentation for flake8 failure.
Feb 6, 2019
37311fd
fixed pyyaml to requirements (#172)
kcathey Mar 1, 2019
b318d06
Addition of a loader to the yaml load call. (#174)
FrankD412 Mar 8, 2019
ad6e1cd
Correction to install enum34 for Python versions < 3.4 (#176)
FrankD412 Mar 19, 2019
36291d6
Addition of a Dockerfile for tutorials and ease of trying out. (#178)
FrankD412 Mar 24, 2019
cc5278d
Take out shebang from shell definition and add it when script is writ…
koning Apr 18, 2019
8e10b8f
Tweaks to fix malformed log statements. (#182)
FrankD412 Apr 18, 2019
be8536a
Correction to message when stating no to launch.
Apr 19, 2019
494b873
Addition of enumeration docstrings.
Dec 12, 2018
f9c5a54
Addition of NOOP state for workflow steps.
Dec 12, 2018
db97000
Addition of post step return enums.
Apr 19, 2019
1bdcf98
Enhance shell batch setting to apply to scheduler scripts. (#183)
Apr 25, 2019
e55ac03
Fixes the addition of the shebang header for SLURM (#184)
Apr 26, 2019
c2e46c8
Correction to an accidental reassignment of cmd.
FrankD412 May 1, 2019
2f5a47e
Removal of an assignment of self._exec in SLURM adapter.
May 1, 2019
b65ed46
Change to transition adapter returns to Record objects. (#177)
May 1, 2019
35cfc39
Merge branch 'develop' into feature/post_steps
May 2, 2019
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 22 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
*.pyc
build/
dist/
*egg-info/
Expand All @@ -12,3 +11,25 @@ docs/html/

# Testing output
.tox/
testing
htmlcov/
.tox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
.hypothesis/
.pytest_cache/


*.py[cod]
__pycache__/
htmlcov/
wheelhouse/

pylint_*.txt

#pycharm
.idea/
6 changes: 6 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
FROM ubuntu
LABEL maintainer="Francesco Di Natale [email protected]"

RUN apt-get update && apt-get install -y python python-pip git
ADD . /maestrowf
RUN pip install -U /maestrowf
26 changes: 26 additions & 0 deletions Pipfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
[[source]]
url = "https://pypi.python.org/simple"
verify_ssl = true
name = "pypi"

[packages]
"enum34" = "*"
filelock = "*"
six = "*"
tabulate = "*"
Fabric = "*"
PyYAML = ">= 4.2b1"
maestrowf = {path = "."}

[dev-packages]
"flake8" = "*"
pydocstyle = "*"
pylint = "*"
tox = "*"
coverage = "*"
sphinx_rtd_theme = "*"
Sphinx = "*"
pytest = "*"

[requires]
python_version = "3.6"
606 changes: 606 additions & 0 deletions Pipfile.lock

Large diffs are not rendered by default.

170 changes: 25 additions & 145 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,139 +5,31 @@
[![Stars](https://img.shields.io/github/stars/LLNL/maestrowf.svg)](https://github.com/LLNL/maestrowf/stargazers)
[![License](https://img.shields.io/badge/license-MIT-blue.svg)](https://raw.githubusercontent.com/LLNL/maestrowf/master/LICENSE)

A Python package that implements the workflow and run specification. The
package provides users with a generalized way to define a workflow, configure
parameters sweeps, and manage dependencies.
## Introduction

MaestroWF is designed with the following core principles in mind:
Maestro Workflow Conductor is a Python tool and library for specifying and automating multi-step computational workflows both locally and on supercomputers. Maestro parses a human-readable YAML specification that is self-documenting and portable from one user and environment to another.

##### Reproducibility
All simulation studies should be easily reproducible with just a single (or
small set of) file(s). Person A should be able to hand off to Person B without
large amounts of effort.
On the backend, Maestro implements a set of standard interfaces and data structures for handling "study" construction. These objects offer you the ability to use Maestro as a library, and construct your own workflows that suit your own custom needs. We also offer other structures that make portable execution on various schedulers much easier than porting scripts by hand.

##### Repeatability
All simulation studies should be easily repeatable. That is to say, it is not
enough to reproduce old studies -- executing the same exact flow on new studies
is just as important and should be easy to achieve in a simple manner.

##### Self-Documentation

It is not enough that a workflow runs. Getting to results is just as important
as how you get there. Even more important, documentation of how to execute
studies and what a workflow is doing at each step.

##### Consistency

Standard documentation and management of studies allows for an ecosystem to
be built around a common infrastructure. This concept allows for new tools and
services to be provided (in most cases) in a manner transparent to the end
user. Even more so, consistency allows different users to communicate about
a workflow using the same language and core concepts.

##### Dependency Management

An expandable framework for pulling dependencies from a wide array of different
sources. So long as a programming interface can be defined for acquiring a
dependency it can be added and managed in a study.

----------------

## External Information and Documentation

We are actively collecting and documenting requirements and user stories. If
you'd like to contribute information about your own use cases and workflow
process, please see the links below. Generally, we separate requirements into
two categories: study and workflow definition, and simulation management.

External Location for requirements pending.**

##### Study and Workflow Definition
### Core Concepts

Anything related to describing the definition of the methodology and process.
These requirements currently directly refer to the YAML study specification,
which is a general way to describe workflow processes, their computing
environment, and the steps in the methodology for producing results.
There are many definitions of workflow, so we try to keep it simple and define the term as follows:
```
A set of high level tasks to be executed in some order, with or without dependencies on each other.
```

##### Simulation Management
We have designed Maestro around the core concept of what we call a "study". A study is defined as a set of steps that are executed (a workflow) over a set of parameters. A study in Maestro's context is analogous to an actual tangible scientific experiment, which has a set of clearly defined and repeatable steps which are repeated over multiple specimen.

Functional requirements about what management capabilities the tool must be
able to perform. Capabilities such as automatic job tracking, job restarts, and
other functionality that a user would expect a backend system to handle without
user intervention.
Maestro's core tenets are defined as follows:

----------------
##### Repeatability
A study should be easily repeatable. Like any well-planned and implemented science experiment, the steps themselves should be executed the exact same way each time a study is run over each set of parameters or over different runs of the study itself.

## MaestroWF Core concepts

The foundations of the MaestroWF package are built on classes designed to
represent a few high level concepts which aim to have extremely clear APIs:
* A ```StudyEnvironment``` class that contains all data representing variables,
sourcing scripts, and dependencies that the Study requires to run.
* A ```ParameterGenerator``` class that contains all parameters, which
yields ```Combination``` objects that represent a valid combination of parameters
to be used in a single instance of a Study.
* A ```Study``` class (derived from a ```DAG```) which represents the high level
parameterized workflow and constructs the full study from parameters and
environment objects that it stores.

### Environment

The environment of a Study is represented by two classes: the ```StudyEnvionment```
and ```ParameterGenerator``` classes.

#### StudyEnvironment
The ```StudyEnvironment``` class stores all of the fundamental items a user
expects in the environment when executing a particular study. These items include:
* Variables
* Scripts
* Dependencies

Each of items stored within the ```StudyEnvironment``` is derived from the
appropriate abstract class with the appropriate interface. Each abstract type
requires a derived class know how to apply itself to the item being passed to it;
and if it must acquire some external item must provide the appropriate method to
do so. This design aims to make it so that a study is much easier to repeat (and
with metadata easy to reproduce).

#### ParameterGenerator
The goal of the ```ParameterGenerator``` class is to provide one centralized location
for managing and storing parameters. The implementation of the ParameterGenerator,
currently, is very basic. It takes lists of parameters and uses those to construct
combinations. Essentially, if you were to view this as an Excel table, you would
have a row for each valid combination you wanted to study.

The other goal is to make it so that by having the ParameterGenerator manage
parameters, functionality can be added without affecting how the end user interacts
with this class. The ParameterGenerator has an Iterator built in and will generate
each combination one by one. The end user should NEVER SEE AN INVALID COMBINATION.
Because this class generates the combinations as specified by the parameters added
(eventually with types or enforced inheritance), it opens up being able to quietly
change how this class generates its combinations. The iterable interface that the
end user sees will remain constant, allowing the internal workings of
the ```ParameterGenerator``` to remain abstracted.

### Study

The ```Study``` class is part of the meat and potatoes of this whole package. A
Study object is where the intersection of the major moving parts are
collected. These moving parts include:
- ParameterGenerator for getting combinations of user parameters
- StudyEnvironment for managing and applying the environment to studies
- Study flow, which is a DAG of the abstract workflow

The class is responsible for a number of the major key steps in study setup
as well. Those responsibilities include (but are not limited to):
- Setting up the workspace where a simulation campaign will be run.
- Applying the StudyEnvionment to the abstract flow DAG:
- Creating the global workspace for a study.
- Setting up the parameterized workspaces for each combination.
- Acquiring dependencies as specified in the StudyEnvironment.
- Intelligently constructing the expanded ExecutionDAG to be able to:
- Recognize when a step executes in a parameterized workspace
- Recognize when a step executes in the global workspace
- Expanding the abstract flow to the full set of specified parameters.
##### Consistent
Studies should be consistently documented and able to be run in a consistent fashion. The removal of variation in the process means less mistakes when executing studies, ease of picking up studies created by others, and uniformity in defining new studies.

##### Self-documenting
Documentation is important in computational studies as much as it is in physical science. The YAML specification defined by Maestro provides a few required key encouraging human-readable documentation. Even further, the specification itself is a documentation of a complete workflow.

----------------

Expand All @@ -161,41 +53,29 @@ Once set up, test the environment. The paths should point to a virtual environme
$ which python
$ which pip

### Installation

For general installation, you can install MaestroWF using the following:

$ pip install maestrowf

If you plan to develop on MaestroWF, install the repository directly using:

$ pip install -r requirements.txt
$ pip install -e .

----------------

## Quickstart Example

MaestroWF comes packed with a basic example using LULESH, a proxy application provided
by LLNL. Information and source code for LULESH can be found [here](https://codesign.llnl.gov/lulesh.php).

The example performs the following workflow locally:
- Download LULESH from the webpage linked above and decompress it.
- Substitute all necessary variables with their serial compilers and make LULESH.
- Execute a small parameter sweep of varying size and iterations (a simple sensitivity study)

In order to execute the sample study simply execute from the root directory of the repository:

$ maestro ./samples/lulesh/lulesh_sample1.yaml

When prompted, reply in the affirmative:

$ Would you like to launch the study?[yn] y

Currently, there is no way to monitor the status of a running study. However, you can monitor the output path which is placed in the ```sample_output/lulesh/``` directory.
### Quickstart Example

NOTE: This example can only be executed on Unix systems currently because it makes use of ```sed``` and ```curl```.
MaestroWF comes packed with a basic example using LULESH, a proxy application provided by LLNL. You can find the Quick Start guide [here](https://maestrowf.readthedocs.io/en/latest/quick_start.html#).

----------------

## Contributors
Many thanks go to MaestroWF's [contributors](https://github.com/LLNL/maestrowf/graphs/contributors).

If you have any questions, please [open a ticket](https://github.com/llnl/maestrowf/issues).
If you have any questions or to submit feature requests please [open a ticket](https://github.com/llnl/maestrowf/issues).

----------------

Expand Down
53 changes: 53 additions & 0 deletions docs/source/getting_started.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
Getting Started
================

Maestro Docker Container
*********************

In order to set up the Docker container execute the following from the root of the Maestro repository::

$ docker --build -t maestrowf .

To launch the interactive shell of the Ubuntu image simply run::

$ docker run -it maestrowf

Once inside the Docker container, the following should bring up help::

$ maestro -h

Installing MaestroWF
*********************

MaestroWF can be installed via pip outside of Docker with the following::

$ pip install maestrowf

.. note:: Using a `virtualenv <https://virtualenv.pypa.io/en/stable/>`_ is recommended.

Once installed run::

$ maestro -h

usage: maestro [-h] [-l LOGPATH] [-d DEBUG_LVL] [-c] {cancel,run,status} ...

The Maestro Workflow Conductor for specifiying, launching, and managing general workflows.

positional arguments:
{cancel,run,status}
cancel Cancel all running jobs.
run Launch a study based on a specification
status Check the status of a running study.

optional arguments:
-h, --help show this help message and exit
-l LOGPATH, --logpath LOGPATH
Alternate path to store program logging.
-d DEBUG_LVL, --debug_lvl DEBUG_LVL
Level of logging messages to be output:
5 - Critical
4 - Error
3 - Warning
2 - Info (Default)
1 - Debug
-c, --logstdout Log to stdout in addition to a file. [Default: True]
5 changes: 5 additions & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,11 @@ Welcome to Maestro Workflow Conductor Documentation
:maxdepth: 4
:caption: Contents:

getting_started
quick_start
lulesh_breakdown
maestro_core

modules

Indices and tables
Expand Down
4 changes: 4 additions & 0 deletions docs/source/lulesh_breakdown.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
LULESH Specification Breakdown
===============================

Stub
4 changes: 4 additions & 0 deletions docs/source/maestro_core.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
Maestro Core Concepts
======================

Stub
Loading