Move code from main workflow to subworkflows#446
Move code from main workflow to subworkflows#446prototaxites wants to merge 8 commits intonf-core:devfrom prototaxites:subworkflows
Conversation
|
|
FYI This is still on my radar @prototaxites - I just need to find more a space of more than 1h to be able to sit and focus on this as it's sort of a large refactoring 😬 (but very needed and welcome!) |
|
@prototaxites I just realised this will need a merge-conflict resolve since run-merging went in, sorry 🤦 |
|
Was on my radar - should be fixed now! |
|
Had a thought while running umpteen gels in the lab today - taxprofiler and mag both take the same type of input data, and have very similar pre-processing steps: fastqc -> fastp -> (taxprofiler only: complexity filtering) -> host removal -> (mag only: phix removal) -> fastqc. Would it make sense as part of a larger refactoring to consider spinning (some of) these parts into installable subworkflows, so that the exact same code could be shared between the pipelines? |
|
Yes very much so! |
|
Going to suggest we close this PR and revisit down the line. The pipeline has changed a lot with 2.4.0, and while I think it's definitely a good plan to break the pipeline in to subworkflows, I suspect it would be better to visit each independently and perhaps a bit of a roadmap. |
|
Yes sounds good 👍 |
Creates a number of new subworkflows for separate 'stages' of the pipeline: short read pre-processing, long read pre-processing, short-read taxonomy, assembly, assembly QC (Quast), bin QC, bin taxonomy, and annotation.
Tidies the workflow particularly around the assembly input (as discussed here: #439) to avoid nested if-elses where possible and use empty channel skipping.
I moved the various DB channel resolving components to their respective subworkflows - this simplifies the architecture a bit by not having to pass DBs as arguments to subworkflows, but perhaps reduces their re-usability elsewhere. Not sure what the preferred style is here but this can be easily changed.
Subworkflow names suggestions only 😅
PR checklist
nf-core lint).nextflow run . -profile test,docker --outdir <OUTDIR>).docs/usage.mdis updated.docs/output.mdis updated.CHANGELOG.mdis updated.README.mdis updated (including new tool citations and authors/contributors).