Add CAT_SUMMARY process and offical_taxonomy param#366
Merged
d4straub merged 8 commits intonf-core:devfrom Dec 16, 2022
prototaxites:cat_summarise
Merged
Add CAT_SUMMARY process and offical_taxonomy param#366d4straub merged 8 commits intonf-core:devfrom prototaxites:cat_summarise
d4straub merged 8 commits intonf-core:devfrom
prototaxites:cat_summarise
Conversation
|
Contributor
Author
|
@nf-core-bot fix linting |
Contributor
Author
|
Finally got a chance to test this on our cluster - all working fine now. |
d4straub
approved these changes
Dec 16, 2022
Collaborator
d4straub
left a comment
There was a problem hiding this comment.
Looks good to me.
I also restarted test runs because they seem to have failed due to connectivity problems. Linting is expected to fail because of the template update.
In case all test pass (except linting) but you cannot merge the PR, ping me and I'll do it.
Contributor
Author
|
@d4straub All passing except linting as well - could you merge? Many thanks! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Added a CAT_SUMMARY process to summarise the output of CAT as a single final file, as currently there is a single output per assembly group (a lot if you have many single-sample assemblies!). I wanted to re-use the COMBINE_TSV process (as I essentially re-wrote it before realising it exists), but the output files from CAT are gzipped so I've cloned it and added an ungzipping line, rather than modify either the CAT output or push all the files through the GUNZIP process. I don't like duplicating code but this seemed like the simplest solution.
Also added an option to make CAT only output 'official' taxonomy, i.e. Kingdom, Phylum, etc. - I noticed this was available in the CAT documentation and thought it would be useful, as currently munging the taxonomy with many mismatched and empty fields is quite annoying!
Haven't run any tests yet as I've got a pipeline running on our cluster where I have the database downloaded (I don't think CAT will run on Gitpod?), but will un-draft this once I've had a chance to run a test/if anyone else wants to run a quick test.
PR checklist
nf-core lint).nextflow run . -profile test,docker --outdir <OUTDIR>).docs/usage.mdis updated.docs/output.mdis updated.CHANGELOG.mdis updated.README.mdis updated (including new tool citations and authors/contributors).