Skip to content

Added checks for invalid keys.#243

Merged
FrankD412 merged 1 commit intollnl:developfrom
ben-bay:develop
Apr 21, 2020
Merged

Added checks for invalid keys.#243
FrankD412 merged 1 commit intollnl:developfrom
ben-bay:develop

Conversation

@ben-bay
Copy link
Copy Markdown
Contributor

@ben-bay ben-bay commented Apr 9, 2020

This is my first attempt at adding valid key checking to Maestro. Why? Because indentation slip ups and typos can screw up a yaml spec. Worse, this type of error can be difficult to trace.

P.S. I'm sure I missed some keys that Maestro supports in step.cmd, so make sure to remind me of those.

P.P.S. Your logger appears to be broken. You'll only see this working for now if you replace my logger.warning with a print.

P.P.P.S. Speaking of logging, have you considered using color? Merlin is using the open source library coloredlogs.

@FrankD412 FrankD412 self-requested a review April 9, 2020 17:42
@FrankD412 FrankD412 added enhancement In Progress Issue or PR that is currently in active development. Specification labels Apr 9, 2020
@FrankD412 FrankD412 added this to the Release 1.1.7 milestone Apr 9, 2020
@FrankD412
Copy link
Copy Markdown
Member

This is a precursor look, but TravisCI is failing on the style check with the following:

2.13s$ flake8 .
./maestrowf/datastructures/yamlspecification.py:71:1: E302 expected 2 blank lines, found 1
./maestrowf/datastructures/yamlspecification.py:75:80: E501 line too long (91 > 79 characters)
./maestrowf/datastructures/yamlspecification.py:81:1: E302 expected 2 blank lines, found 1
./maestrowf/datastructures/yamlspecification.py:216:80: E501 line too long (88 > 79 characters)
./maestrowf/datastructures/yamlspecification.py:379:80: E501 line too long (82 > 79 characters)
./maestrowf/datastructures/yamlspecification.py:450:80: E501 line too long (103 > 79 characters)
The command "flake8 ." failed and exited with 1 during .

@kcathey
Copy link
Copy Markdown
Member

kcathey commented Apr 9, 2020

I'm wondering if we should take this opportunity to look into using a JSON Schema to validate the spec. This would get us away from having to manage the validation and just manage the schema itself.

@FrankD412
Copy link
Copy Markdown
Member

@kcathey -- I was just trying to find that yesterday because I recalled it. But I lost it in my sea of tabs... do you have a link to it? My google-fu failed me and I found everythig else JSON but that...

@kcathey
Copy link
Copy Markdown
Member

kcathey commented Apr 9, 2020

Python JSONSchema https://python-jsonschema.readthedocs.io/

@kcathey
Copy link
Copy Markdown
Member

kcathey commented Apr 9, 2020

@ben-bay Why don't you submit these as separate issues so that they can be tracked and not lost. We can then also start discussions around them and get some community input specifically about those things.

Submit this one as a bug.

P.P.S. Your logger appears to be broken. You'll only see this working for now if you replace my logger.warning with a print.

Ans this as an enhancement request.

P.P.P.S. Speaking of logging, have you considered using color? Merlin is using the open source library coloredlogs.

@ben-bay
Copy link
Copy Markdown
Contributor Author

ben-bay commented Apr 9, 2020

Sure.

@ben-bay ben-bay mentioned this pull request Apr 10, 2020
Copy link
Copy Markdown
Member

@FrankD412 FrankD412 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, looking good. I left some feedback. The general things that are causing the TravisCI check to fail are as follows:

flake8 ./maestrowf
./maestrowf/datastructures/yamlspecification.py:71:1: E302 expected 2 blank lines, found 1
./maestrowf/datastructures/yamlspecification.py:75:80: E501 line too long (91 > 79 characters)
./maestrowf/datastructures/yamlspecification.py:81:1: E302 expected 2 blank lines, found 1
./maestrowf/datastructures/yamlspecification.py:216:80: E501 line too long (88 > 79 characters)
./maestrowf/datastructures/yamlspecification.py:379:80: E501 line too long (82 > 79 characters)
./maestrowf/datastructures/yamlspecification.py:450:80: E501 line too long (103 > 79 characters)

Comment thread maestrowf/datastructures/yamlspecification.py Outdated
Comment thread maestrowf/datastructures/yamlspecification.py Outdated
Comment thread maestrowf/datastructures/yamlspecification.py Outdated
@FrankD412
Copy link
Copy Markdown
Member

@ben-bay -- I just checked out your fork and rebased it on #247 just to test. I'm not getting warnings show up. However, it looks like the existing verification methods that are in place catch the simple things I put into my specification. I think that your new checking would be the preferable way forward and would improve the utility and functionality of those checks. I noticed you put your check_keys calls after my verification rather than changing what's there.

What do you think? I'd have no issues replacing the previous verification with your improved setup.

@ben-bay
Copy link
Copy Markdown
Contributor Author

ben-bay commented Apr 14, 2020

@ben-bay -- I just checked out your fork and rebased it on #247 just to test. I'm not getting warnings show up. However, it looks like the existing verification methods that are in place catch the simple things I put into my specification. I think that your new checking would be the preferable way forward and would improve the utility and functionality of those checks. I noticed you put your check_keys calls after my verification rather than changing what's there.

What do you think? I'd have no issues replacing the previous verification with your improved setup.

Sure I can replace the verification logic. Also, you didn't see it, but I've got a lot of new functionality that uses the jsonschema library. Still working on it but I think it'll work.

@FrankD412
Copy link
Copy Markdown
Member

@ben-bay Awesome, I look forward to it. The verification logic has been due for an update for a while. 😄

@ben-bay
Copy link
Copy Markdown
Contributor Author

ben-bay commented Apr 14, 2020

Check out the diff to see the direction of my latest changes. I've removed the need for some of your original verification, but not all of it yet. Still working on that.

@ben-bay ben-bay marked this pull request as draft April 14, 2020 22:48
@ben-bay ben-bay requested a review from FrankD412 April 14, 2020 22:48
@FrankD412
Copy link
Copy Markdown
Member

@ben-bay You'll need to add jsonschema to the setup.py list of packaging so that when Maestro is installed it pulls the dependency. I was doing a cursory look over and that's the first thing that stood out.

@ben-bay
Copy link
Copy Markdown
Contributor Author

ben-bay commented Apr 15, 2020

Oh yeah duh

@ben-bay
Copy link
Copy Markdown
Contributor Author

ben-bay commented Apr 15, 2020

So currently our spec can find and report 4 kinds of errors:

  1. additionalProperties-- when there's some key maestro doesn't use
  2. type-- when the value's type is wrong
  3. required-- when a required key is missing
  4. uniqueItems-- when there's a duplicate element in an array (currently only used for depends)

...then there's the matter of making the schemas exactly fit what maestro expects. I'm not all the way there, but schemas.json seems like a good start.

Copy link
Copy Markdown
Member

@FrankD412 FrankD412 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far, these changes look reasonable. I do need to test them locally. Your style checks are still failing on TravisCI -- please correct those.

Comment thread maestrowf/datastructures/yamlspecification.py Outdated
Comment thread maestrowf/datastructures/yamlspecification.py
@FrankD412
Copy link
Copy Markdown
Member

Just wanted to note here, this PR has pointed out that Maestro prints out exceptions twice. Once I think because of logging and the other because I think raise naturally prints them. I think that I'll leave a comment in #248 to include that in its scope, since it has to do with what the logger is outputting.

@ben-bay ben-bay marked this pull request as ready for review April 16, 2020 19:01
@ben-bay
Copy link
Copy Markdown
Contributor Author

ben-bay commented Apr 16, 2020

Without a lot of scrutiny, it's possible that this feature will introduce bugs and cause headaches down the line. So scrutinize away!

Observation: logger.error, which this depends upon, doesn't appear to be behaving as expected. No error messages are apparent and the process continues even after a call of logger.error.

@FrankD412
Copy link
Copy Markdown
Member

@ben-bay When I tested your code earlier, Maestro did terminate. It would definitely stop and point out the key. I'll look through your recent commits.

@ben-bay
Copy link
Copy Markdown
Contributor Author

ben-bay commented Apr 16, 2020

@FrankD412 it's the exception handling logic. At some places it was double-nested, so I stopped raising ValueErrors and reverted it to only use logger.error for now.

@FrankD412
Copy link
Copy Markdown
Member

@ben-bay -- Got it. I just tested it and logging errors do show on my end, and it does continue executing as you mentioned. I did see earlier that the JSON schema package provides a ValidationError exception? It might be useful to throw that and specifically catch that type in Maestro's main block and then exit from there.

maestro run ./samples/hello_world/hello_world_validation.yaml 
2020-04-16 12:46:08,174 - maestrowf.maestro:main:409 - INFO - INFO Logging Level -- Enabled
2020-04-16 12:46:08,174 - maestrowf.maestro:main:410 - WARNING - WARNING Logging Level -- Enabled
2020-04-16 12:46:08,174 - maestrowf.maestro:main:411 - CRITICAL - CRITICAL Logging Level -- Enabled
2020-04-16 12:46:08,174 - maestrowf.datastructures.yamlspecification:load_specification:108 - INFO - Loading specification -- path = ./samples/hello_world/hello_world_validation.yaml
2020-04-16 12:46:08,177 - maestrowf.datastructures.yamlspecification:validate_schema:438 - ERROR - Unrecognized key 'frank_key' found in spec section 'study.hello_world'.
2020-04-16 12:46:08,177 - maestrowf.datastructures.yamlspecification:validate_schema:466 - ERROR - Key 'cmd' is missing from spec section 'study.hello_world'.
2020-04-16 12:46:08,177 - maestrowf.datastructures.yamlspecification:validate_schema:438 - ERROR - Unrecognized key 'frank_key' found in spec section 'study.hello_world'.
2020-04-16 12:46:08,177 - maestrowf.utils:create_parentdir:102 - INFO - Directory does not exist. Creating directories to /mnt/f/Code/Python/maestrowf/sample_output/hello_world/hello_world_20200416-124608/logs
.....

@FrankD412
Copy link
Copy Markdown
Member

Awesome, @ben-bay -- Checks passed. I ran my simple test and it seemed to run just fine, even printed the exception only once. I'll run a few more extensive things, but I think this is safe to mark as robust enough that it's not a WIP. I'll get back with a formal review later! :D

@FrankD412 FrankD412 removed the In Progress Issue or PR that is currently in active development. label Apr 16, 2020
Copy link
Copy Markdown
Member

@FrankD412 FrankD412 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I think this is a great addition. I left some comments, overall not drastic changes, the biggest two are the Specification.verify_schema call as it doesn't look like it's implemented in the base class, but rather here in YAMLSpecification and the moving of the verification call in verify_parameters.

I'm curious your thoughts about the sys.exit in the _verify method -- I've always been taught that minimizing exit points is best. And moving the try/catch to maestro.py would make it so that we can use an enumeration to return a code for the error. What do you think?

Comment thread maestrowf/datastructures/yamlspecification.py Outdated
Comment thread maestrowf/datastructures/yamlspecification.py Outdated
Comment thread maestrowf/datastructures/yamlspecification.py Outdated
Copy link
Copy Markdown
Member

@FrankD412 FrankD412 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some bug fixes I found that will fix the monitoring of the env section of the spec. The last thing that needs sorting is that when the env section is verified, it prints out a message that says the error is in the env section as a whole rather than git or paths sections. Haven't found where to set that yet.

Comment thread maestrowf/datastructures/schemas.json
Comment thread maestrowf/datastructures/yamlspecification.py
@FrankD412
Copy link
Copy Markdown
Member

Awesome -- I've been testing it the last 15 or so minutes and things appear to be working. This PR actually uncovered some bugs that I wasn't aware of, so yay! Thanks for fixing the things this exposed.

I noticed something odd about the schema -- not how you defined it but with how jsonschema handles it. I realized in my suggestion, I forgot to add the type key under the following (nothing wrong as it seems to work, just found it interesting):

"git": {
            "type": "array",
            "items": {
              "properties": {
                "name": {"type": "string", "minLength": 1},
                "path": {"type": "string", "minLength": 1},
                "url": {"type": "string", "minLength": 1}
              },
              "required": [
                "name",
                "path",
                "url"
              ],
              "additionalProperties": false
            }
          },

This PR also seems to close #238 -- I wanted to assign you the issue since this seems to close it, but was surprised to find that I couldn't. Not sure why, I'll track that down.

Copy link
Copy Markdown
Member

@FrankD412 FrankD412 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright! Things appear to be working -- once the TravisCI tests go through I'll merge. From what I can tell, the flake8 checks pass on my local machine, so I don't expect them to fail on Travis. Do you mind squashing the commits down so that you get credit for all the work you've put into this. Thanks Ben!

Comment thread maestrowf/datastructures/schemas.json Outdated
"name": {"type": "string", "minLength": 1},
"description": {"type": "string", "minLength": 1}
},
"additionalProperties": false,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a quick comment -- I was testing this and found that this should probably be set to true. The description block is meant to have other types of information that isn't necessarily name and description. Just a comment, I didn't think this was worth un-approving the PR.

@FrankD412
Copy link
Copy Markdown
Member

@ben-bay -- It looks like you need to run a small rebase just to squash the merge commit in your branch. I ran it locally on my end, and it was just fixing the requirements.txt and maestro.py conflicts.

added checks for valid keys

branch updates

working on validation

added schema file

updates

fixed spec

fixed spec

added jsonschema to deps

updates

ran black on yamlspecification.py

specified newest jsonschema version

added manifest

added include_package_data to setup.py

reformatted json

experimental package_data

fixed path

fixed path

fixed path again

reverted newline

added check for empty strings

reworked exception logic

implemented reviewer suggestions, shifted exception logic, renamed redundant variables

renamed variable

removed unused import

added missing `self.verify_environment()` call

Co-Authored-By: Francesco Di Natale <[email protected]>

paths and git dependencies are now array types

Co-Authored-By: Francesco Di Natale <[email protected]>

removed redundant logic

swapped number type to integer

moved env schema validation to top, which avoids some types of ambiguous errors

removed test yaml

removed some additionalProperties restrictions

unknown commit

removed debug print
@ben-bay
Copy link
Copy Markdown
Contributor Author

ben-bay commented Apr 21, 2020

I ~think~ I successfully rebased and fixed the conflicts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants