Skip to content

Digital normalization with BBnorm#422

Merged
erikrikarddaniel merged 29 commits intonf-core:devfrom
erikrikarddaniel:diginorm
Apr 26, 2023
Merged

Digital normalization with BBnorm#422
erikrikarddaniel merged 29 commits intonf-core:devfrom
erikrikarddaniel:diginorm

Conversation

@erikrikarddaniel
Copy link
Copy Markdown
Member

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs- [ ] If necessary, also make a PR on the nf-core/mag branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

I added BBnorm, run with --bbnorm (with two more params), which will normalize sequencing depth by reducing the number of reads for overrepresented kmers as well as removing very rare kmers.

I've only run the tool on the small test cases, and can't say if and when it's useful to use on full-sized data. I hence didn't update usage.md.

I didn't add the new test case (tests the combination of --bbnorm and --coassemble_group.

The tool is not published in a journal, and they just want people to refer to the Sourceforge page. I didn't know how to do this in CITATIONS.md but provide an attempt.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 4, 2023

This PR is against the master branch ❌

  • Do not close this PR
  • Click Edit and change the base to dev
  • This CI test will remain failed until you push a new commit

Hi @erikrikarddaniel,

It looks like this pull-request is has been made against the erikrikarddaniel/nf-core-mag master branch.
The master branch on nf-core repositories should always contain code from the latest release.
Because of this, PRs to master are only allowed if they come from the erikrikarddaniel/nf-core-mag dev branch.

You do not need to close this PR, you can change the target branch to dev by clicking the "Edit" button at the top of this page.
Note that even after this, the test will continue to show as failing until you push a new commit.

Thanks again for your contribution!

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 4, 2023

nf-core lint overall result: Passed ✅ ⚠️

Posted for pipeline commit 7792026

+| ✅ 156 tests passed       |+
#| ❔   1 tests were ignored |#
!| ❗   1 tests had warnings |!
Details

❗ Test warnings:

  • pipeline_todos - TODO string in methods_description_template.yml: #Update the HTML below to your prefered methods description, e.g. add publication citation for this pipeline

❔ Tests ignored:

  • files_unchanged - File ignored due to lint config: lib/NfcoreTemplate.groovy

✅ Tests passed:

Run details

  • nf-core/tools version 2.7.2
  • Run at 2023-04-25 18:31:30

@jfy133 jfy133 changed the base branch from master to dev April 4, 2023 08:51
Copy link
Copy Markdown
Member

@jfy133 jfy133 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Missing output docs
  • Missing SeqTK citation/documentation(?)

To be clear, seqtk merge doesn't do paired-end merging/collapsing right? It just referse to merging the two files so all the reads are interleaved? It would be good to clarify in the comment around that module, it confused me at first and I wondered why a different read-collapser was being used.

Note I can't really comment on the validity of method etc (but logically the code makes sense to me)

Comment thread CITATIONS.md Outdated
Comment thread conf/test_bbnorm.config
Comment thread workflows/mag.nf Outdated
Comment thread workflows/mag.nf Outdated
erikrikarddaniel and others added 4 commits April 4, 2023 11:13
Co-authored-by: James A. Fellows Yates <[email protected]>
Co-authored-by: James A. Fellows Yates <[email protected]>
Co-authored-by: James A. Fellows Yates <[email protected]>
@erikrikarddaniel
Copy link
Copy Markdown
Member Author

  • Missing output docs

I actually forgot to organize what to save in the output by default. And, now that I'm reminded, I wouldn't mind some recommendations from other contributors to the pipeline: Do you think it makes sense to save any of the output from either of the two new modules? Can be quite large files.

 * Missing SeqTK citation/documentation(?)

Thanks. I had forgotten I included this tool as it took so many days from implementing the code to actually do the PR.

To be clear, seqtk merge doesn't do paired-end merging/collapsing right? It just referse to merging the two files so all the reads are interleaved? It would be good to clarify in the comment around that module, it confused me at first and I wondered why a different read-collapser was being used.

It's interleaving. We're using it in metatdenovo, but didn't write the module. You want it better documented in the module repo, I suppose. I'll try to remember that.

Note I can't really comment on the validity of method etc (but logically the code makes sense to me)

Good. I'm not sure either, but there's one clear use case: When you can't, but really want to, get a coassembly through.

@jfy133
Copy link
Copy Markdown
Member

jfy133 commented Apr 4, 2023

I actually forgot to organize what to save in the output by default. And, now that I'm reminded, I wouldn't mind some recommendations from other contributors to the pipeline: Do you think it makes sense to save any of the output from either of the two new modules? Can be quite large files.

Good question. Interleaving probably not. I didn't actually look what digital normalization does either so I can't say if the output would be useful 😬 . I'll leave that to others to decide :)

Thanks. I had forgotten I included this tool as it took so many days from implementing the code to actually do the PR.

It's interleaving. We're using it in metatdenovo, but didn't write the module. You want it better documented in the module repo, I suppose. I'll try to remember that.

It could do, but I actually meant more here in the pipeline code. I have never used interleaving data myself and 'merge' always means 'collapse' to me personally., so I don't want to come back to this section of the pipeline and get confused in the future 😬 .

@erikrikarddaniel
Copy link
Copy Markdown
Member Author

I actually forgot to organize what to save in the output by default. And, now that I'm reminded, I wouldn't mind some recommendations from other contributors to the pipeline: Do you think it makes sense to save any of the output from either of the two new modules? Can be quite large files.

Good question. Interleaving probably not. I didn't actually look what digital normalization does either so I can't say if the output would be useful grimacing . I'll leave that to others to decide :)

OK, I kept it and updated output.md.

Thanks. I had forgotten I included this tool as it took so many days from implementing the code to actually do the PR.

It's interleaving. We're using it in metatdenovo, but didn't write the module. You want it better documented in the module repo, I suppose. I'll try to remember that.

It could do, but I actually meant more here in the pipeline code. I have never used interleaving data myself and 'merge' always means 'collapse' to me personally., so I don't want to come back to this section of the pipeline and get confused in the future grimacing .

Good point, updated.

Copy link
Copy Markdown
Collaborator

@d4straub d4straub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few comments, looks good so far I think. But I also do not have experience with bbnorm / digital normalization.

Comment thread conf/modules.config Outdated
Comment thread conf/test_bbnorm.config
Comment thread docs/output.md Outdated
Comment thread modules/local/spades.nf
@erikrikarddaniel
Copy link
Copy Markdown
Member Author

I think I've done the things you asked for @jfy133 and @d4straub.

@d4straub
Copy link
Copy Markdown
Collaborator

d4straub commented Apr 11, 2023

Sorry, I had an extended eastern vacation.

According to https://github.com/ablab/spades#specifying-multiple-libraries I made a mistake in the above code suggestion: instead of -pe1-1/-pe1-2, --pe1-1/--pe1-2 seems correct. As far as I see only one library should arrive at that point, so -1 / -2 should be fine as well I assume.

It might be good to add in modules/local/spades.nf an explanation/comment, or is there a typo:
def readstr = meta.single_end ? "--12 ${reads}" : "-1 ${reads[0]} -2 ${reads[1]}"
--12 seems to expect a file with interlaced forward and reverse paired-end reads, not single-end data (which would be -s?!).

I am fine with the PR, but @jfy133 requested changes and has to give his ok to the changes.

@erikrikarddaniel
Copy link
Copy Markdown
Member Author

Sorry, I had an extended eastern vacation.

Well deserved! :-)

According to https://github.com/ablab/spades#specifying-multiple-libraries I made a mistake in the above code suggestion: instead of -pe1-1/-pe1-2, --pe1-1/--pe1-2 seems correct. As far as I see only one library should arrive at that point, so -1 / -2 should be fine as well I assume.

The --pe1-1 and -1 syntax is a bit confusing at the moment, but I chose to keep it the old style, i.e. -1.

It might be good to add in modules/local/spades.nf an explanation/comment, or is there a typo: def readstr = meta.single_end ? "--12 ${reads}" : "-1 ${reads[0]} -2 ${reads[1]}" --12 seems to expect a file with interlaced forward and reverse paired-end reads, not single-end data (which would be -s?!).

It turns out -s is not supported by metaspades yet, although output as part of the help message. I've added a comment. An alternative would be to add a different meta variable, e.g. meta.interleaved. The drawback with that is that the nf-core module doesn't support that. OTOH, the nf-core module supports quite intricate input of combinations of read files to spades via a yaml file.

I am fine with the PR, but @jfy133 requested changes and has to give his ok to the changes.

Copy link
Copy Markdown
Member

@jfy133 jfy133 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay, I made the 'mistake' of taking a holiday and had a huge namount of backlog to catch up with...

Last few changes and I think you're OK, as @d4straub is happy with the 'scientific' content'.

Comment thread docs/output.md Outdated
Comment thread docs/output.md

- `bbmap/bbnorm/[sample]\*.fastq.gz
- `bbmap/bbnorm/log/[sample].bbnorm.log

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing closing tags?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You missed the closing rtags @erikrikarddaniel 😱 dev docs are broken! I'll make a PR now (skipping ci) form you to approve

Comment thread workflows/mag.nf Outdated
Comment thread workflows/mag.nf Outdated
erikrikarddaniel and others added 4 commits April 25, 2023 20:27
Co-authored-by: James A. Fellows Yates <[email protected]>
Co-authored-by: James A. Fellows Yates <[email protected]>
Co-authored-by: James A. Fellows Yates <[email protected]>
@erikrikarddaniel
Copy link
Copy Markdown
Member Author

Thanks @jfy133: fantastic attention to detail! :-)

@erikrikarddaniel erikrikarddaniel merged commit 5e4ea0a into nf-core:dev Apr 26, 2023
@erikrikarddaniel
Copy link
Copy Markdown
Member Author

I did add them, something must have happened...

@jfy133
Copy link
Copy Markdown
Member

jfy133 commented Apr 26, 2023

😱

@jfy133 jfy133 mentioned this pull request Apr 28, 2023
@d4straub d4straub mentioned this pull request May 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants