Digital normalization with BBnorm#422
Conversation
This PR is against the
|
|
jfy133
left a comment
There was a problem hiding this comment.
- Missing output docs
- Missing SeqTK citation/documentation(?)
To be clear, seqtk merge doesn't do paired-end merging/collapsing right? It just referse to merging the two files so all the reads are interleaved? It would be good to clarify in the comment around that module, it confused me at first and I wondered why a different read-collapser was being used.
Note I can't really comment on the validity of method etc (but logically the code makes sense to me)
Co-authored-by: James A. Fellows Yates <[email protected]>
Co-authored-by: James A. Fellows Yates <[email protected]>
Co-authored-by: James A. Fellows Yates <[email protected]>
I actually forgot to organize what to save in the output by default. And, now that I'm reminded, I wouldn't mind some recommendations from other contributors to the pipeline: Do you think it makes sense to save any of the output from either of the two new modules? Can be quite large files.
Thanks. I had forgotten I included this tool as it took so many days from implementing the code to actually do the PR.
It's interleaving. We're using it in metatdenovo, but didn't write the module. You want it better documented in the module repo, I suppose. I'll try to remember that.
Good. I'm not sure either, but there's one clear use case: When you can't, but really want to, get a coassembly through. |
Good question. Interleaving probably not. I didn't actually look what digital normalization does either so I can't say if the output would be useful 😬 . I'll leave that to others to decide :)
It could do, but I actually meant more here in the pipeline code. I have never used interleaving data myself and 'merge' always means 'collapse' to me personally., so I don't want to come back to this section of the pipeline and get confused in the future 😬 . |
OK, I kept it and updated
Good point, updated. |
d4straub
left a comment
There was a problem hiding this comment.
Just a few comments, looks good so far I think. But I also do not have experience with bbnorm / digital normalization.
Co-authored-by: Daniel Straub <[email protected]>
Co-authored-by: Daniel Straub <[email protected]>
|
Sorry, I had an extended eastern vacation. According to https://github.com/ablab/spades#specifying-multiple-libraries I made a mistake in the above code suggestion: instead of It might be good to add in I am fine with the PR, but @jfy133 requested changes and has to give his ok to the changes. |
Well deserved! :-)
The
It turns out
|
|
|
||
| - `bbmap/bbnorm/[sample]\*.fastq.gz | ||
| - `bbmap/bbnorm/log/[sample].bbnorm.log | ||
|
|
There was a problem hiding this comment.
You missed the closing rtags @erikrikarddaniel 😱 dev docs are broken! I'll make a PR now (skipping ci) form you to approve
Co-authored-by: James A. Fellows Yates <[email protected]>
Co-authored-by: James A. Fellows Yates <[email protected]>
Co-authored-by: James A. Fellows Yates <[email protected]>
|
Thanks @jfy133: fantastic attention to detail! :-) |
|
I did add them, something must have happened... |
|
😱 |
PR checklist
nf-core lint).nextflow run . -profile test,docker --outdir <OUTDIR>).docs/usage.mdis updated.docs/output.mdis updated.CHANGELOG.mdis updated.README.mdis updated (including new tool citations and authors/contributors).I added BBnorm, run with
--bbnorm(with two more params), which will normalize sequencing depth by reducing the number of reads for overrepresented kmers as well as removing very rare kmers.I've only run the tool on the small test cases, and can't say if and when it's useful to use on full-sized data. I hence didn't update
usage.md.I didn't add the new test case (tests the combination of
--bbnormand--coassemble_group.The tool is not published in a journal, and they just want people to refer to the Sourceforge page. I didn't know how to do this in
CITATIONS.mdbut provide an attempt.