Skip to content

Adding ancient subworkflow#247

Merged
jfy133 merged 34 commits intonf-core:devfrom
maxibor:dev
Jan 31, 2022
Merged

Adding ancient subworkflow#247
jfy133 merged 34 commits intonf-core:devfrom
maxibor:dev

Conversation

@maxibor
Copy link
Copy Markdown
Member

@maxibor maxibor commented Nov 2, 2021

This pull request adds a subworkflow specific to ancient DNA de novo assembly validation.

This consists of mainly two steps:

  • Ancient DNA damage estimation with PyDamage
  • Calling the consensus of the reads aligned back on the contigs (Freebayes + bcftools) to avoid introducing errors due to DNA damage

The proposed way of activating this aDNA subworkflow is through a specific profile -profile ancient_dna, as a few general parameters of MAG need to be tuned to make them sensible for aDNA data (the bowtie2 alignment mode).

Would be cool when you have time if you could add the ancient DNA subworfklow to the general MAG workflow schema (on the right of the assembly step) 🙂 .

To Do

  • Make it work with all assemblers
  • Make the bowtie2 alignment process accept options.args

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
    • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • Make sure your code lints (nf-core lint) (won't lint because of outdated multiqc config @skrakau @d4straub )
  • Ensure the test suite passes (nextflow run . -profile test,docker).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

@maxibor maxibor added WIP Work in progress do not merge labels Nov 2, 2021
@maxibor maxibor changed the base branch from master to dev November 2, 2021 15:35
@nf-core nf-core deleted a comment from github-actions Bot Nov 2, 2021
Copy link
Copy Markdown
Collaborator

@d4straub d4straub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!
The remaining test failures seem only linting and master/dev PR confusion.

Just to make sure I understand, this only evaluates whether contigs are likely of ancient origin? Wouldn't it be good when there is some kind of consequence to it, e.g. that those that are of no ancient origin are filtered out, or that the ones or ancient origin are separated (sorry for my ignorance, I have no idea how this usually works).

I am not sure whether the -profile is the best place to add those configurations, why do you prefer that over a parameter which is than documented in nextflow_schema.json? The documentation of the profile seems not clear this way. In any case, I'd expect aDNA also to be described in the usage.md at a more prominent place, such as before Running the pipeline.

Comment thread docs/output.md Outdated
Comment thread docs/usage.md Outdated
Comment thread workflows/mag.nf Outdated
@maxibor
Copy link
Copy Markdown
Member Author

maxibor commented Nov 9, 2021

Nice work! The remaining test failures seem only linting and master/dev PR confusion.

Just to make sure I understand, this only evaluates whether contigs are likely of ancient origin? Wouldn't it be good when there is some kind of consequence to it, e.g. that those that are of no ancient origin are filtered out, or that the ones or ancient origin are separated (sorry for my ignorance, I have no idea how this usually works).

I am not sure whether the -profile is the best place to add those configurations, why do you prefer that over a parameter which is than documented in nextflow_schema.json? The documentation of the profile seems not clear this way. In any case, I'd expect aDNA also to be described in the usage.md at a more prominent place, such as before Running the pipeline.

I suggest to use the -profile ancient_dna, because on top of the ancient_dna subworkflow, ancient DNA alignment needs to have specific parameters (in bowtie2).

@d4straub
Copy link
Copy Markdown
Collaborator

d4straub commented Nov 9, 2021

Nice work! The remaining test failures seem only linting and master/dev PR confusion.
Just to make sure I understand, this only evaluates whether contigs are likely of ancient origin? Wouldn't it be good when there is some kind of consequence to it, e.g. that those that are of no ancient origin are filtered out, or that the ones or ancient origin are separated (sorry for my ignorance, I have no idea how this usually works).
I am not sure whether the -profile is the best place to add those configurations, why do you prefer that over a parameter which is than documented in nextflow_schema.json? The documentation of the profile seems not clear this way. In any case, I'd expect aDNA also to be described in the usage.md at a more prominent place, such as before Running the pipeline.

I suggest to use the -profile ancient_dna, because on top of the ancient_dna subworkflow, ancient DNA alignment needs to have specific parameters (in bowtie2).

Do you mean bowtie2_mode = "--very-sensitive -N 1"? Why does that need to be in the profile? Adjusting params.bowtie2_mode in the workflow conditional on params.ancient_dna wouldn't work (e.g. if (params.ancient_dna) { params.bowtie2_mode = "--very-sensitive -N 1"})?
I just do not get (yet?) why that needs to be done in a profile. I am not strictly against a profile for that case, but I do not really see its benefit over some adjustments to the defaults somewhere in the mag.nf (or any other) either.

@maxibor
Copy link
Copy Markdown
Member Author

maxibor commented Nov 9, 2021

Nice work! The remaining test failures seem only linting and master/dev PR confusion.
Just to make sure I understand, this only evaluates whether contigs are likely of ancient origin? Wouldn't it be good when there is some kind of consequence to it, e.g. that those that are of no ancient origin are filtered out, or that the ones or ancient origin are separated (sorry for my ignorance, I have no idea how this usually works).
I am not sure whether the -profile is the best place to add those configurations, why do you prefer that over a parameter which is than documented in nextflow_schema.json? The documentation of the profile seems not clear this way. In any case, I'd expect aDNA also to be described in the usage.md at a more prominent place, such as before Running the pipeline.

I suggest to use the -profile ancient_dna, because on top of the ancient_dna subworkflow, ancient DNA alignment needs to have specific parameters (in bowtie2).

Do you mean bowtie2_mode = "--very-sensitive -N 1"? Why does that need to be in the profile? Adjusting params.bowtie2_mode in the workflow conditional on params.ancient_dna wouldn't work (e.g. if (params.ancient_dna) { params.bowtie2_mode = "--very-sensitive -N 1"})? I just do not get (yet?) why that needs to be done in a profile. I am not strictly against a profile for that case, but I do not really see its benefit over some adjustments to the defaults somewhere in the mag.nf (or any other) either.

Could also do like this :)

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Nov 9, 2021

nf-core lint overall result: Passed ✅ ⚠️

Posted for pipeline commit 8fa3290

+| ✅ 135 tests passed       |+
#| ❔   1 tests were ignored |#
!| ❗   2 tests had warnings |!
Details

❗ Test warnings:

  • nextflow_config - Config manifest.version should end in dev: '2.1.1'
  • readme - README did not have a Nextflow minimum version mentioned in Quick Start section.

❔ Tests ignored:

  • files_unchanged - File ignored due to lint config: lib/NfcoreTemplate.groovy

✅ Tests passed:

Run details

  • nf-core/tools version 2.1
  • Run at 2021-12-03 15:05:32

@maxibor maxibor added ready to review and removed WIP Work in progress do not merge labels Jan 28, 2022
@maxibor
Copy link
Copy Markdown
Member Author

maxibor commented Jan 28, 2022

Hey @d4straub and @jfy133 ,
This PR is now ready for (re)review.
The only issue left is that bowtie2 options are not properly passed to the process, even though it looks to me it's properly set up:

In conf/modules.config#L99

'bowtie2_assembly_align' {
            args            = params.bowtie2_mode ? params.bowtie2_mode : params.ancient_dna ? '--very-sensitive-local -N 1' : ''
            publish_files   = ['log':'']
            publish_by_meta = ['assembler', 'QC', 'id']
            publish_dir     = "Assembly"
        }

With params.ancient_dna = true, the bowtie2 command still looks like this:

$ cat /work/c1/c49e900dd4f90f1d88e47f02516a40/.command.sh
#!/bin/bash -euo pipefail
INDEX=`find -L ./ -name "*.rev.1.bt2l" -o -name "*.rev.1.bt2" | sed 's/.rev.1.bt2l//' | sed 's/.rev.1.bt2//'`
bowtie2 \
    -p "2" \
    -x $INDEX \
     \ # here should be --very-sensitive-local -N 1
    -1 "test_minigut.phix_removed.unmapped_1.fastq.gz" -2 "test_minigut.phix_removed.unmapped_2.fastq.gz" \
    2> "SPAdes-test_minigut-test_minigut.bowtie2.log" |
    samtools view -@ "2" -bS |         samtools sort -@ "2" -o "SPAdes-test_minigut-test_minigut.bam"
samtools index "SPAdes-test_minigut-test_minigut.bam"

if [ SPAdes-test_minigut-test_minigut = "SPAdes-test_minigut-test_minigut" ] ; then
    mv "SPAdes-test_minigut-test_minigut.bowtie2.log" "SPAdes-test_minigut.bowtie2.log"
fi

echo $(bowtie2 --version 2>&1) | sed 's/^.*bowtie2-align-s version //; s/ .*$//' > bowtie2_assembly.version.txt

My guess is that a switch to the nf-core modules DSL2 v2 syntax should fix it (like for the aDNA modules of this PR)

Comment thread docs/usage.md Outdated
Comment thread nextflow_schema.json
Comment thread subworkflows/local/ancient_dna.nf
Comment thread workflows/mag.nf
Copy link
Copy Markdown
Member

@jfy133 jfy133 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would still be good to reference that aDNA mode still works in the USAGE docs.

Might as well also mention in in the main README that this is now possible with MAG

But otherwise the code looks good!

Comment thread nextflow_schema.json Outdated
@maxibor
Copy link
Copy Markdown
Member Author

maxibor commented Jan 28, 2022

It would still be good to reference that aDNA mode still works in the USAGE docs.

Might as well also mention in in the main README that this is now possible with MAG

But otherwise the code looks good!

I agree, would be nice to update the

README, however, I suck at Inkscape, @d4straub Do you want to add it ?

That's what I did

mag_adna

@jfy133
Copy link
Copy Markdown
Member

jfy133 commented Jan 28, 2022

@maxibor I can also do it if you send me the SVG

Copy link
Copy Markdown
Collaborator

@d4straub d4straub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks quite good to me.

However, I think it would be really good to mention it in README (at least a small note about aDNA) and docs/usage.
I also agree that it should be in the overview figure (but I disagree to stretch the figure [i.e. rather minimize white space] and it should be obvious that its an on-demand or at least optional step, imho).

Edit: Just saw in the failed linting (expected, I know) the unexpected line
freebayes | Local copy of module does not match remote
thats a bit worrisome if its a relevant change. Should be local then.

Comment thread conf/test_ancient_dna.config Outdated
@maxibor
Copy link
Copy Markdown
Member Author

maxibor commented Jan 28, 2022

Edit: Just saw in the failed linting (expected, I know) the unexpected line
freebayes | Local copy of module does not match remote
thats a bit worrisome if its a relevant change. Should be local then.

That was just one nf-core modules update freebayes away, now it's updated 😉

Comment thread README.md Outdated
Copy link
Copy Markdown
Member

@jfy133 jfy133 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only think I find missing would be a short description in USAGE of exactly what the purposes of the aDNA workflow is (the README describes what, not why), but otherwise wise looks good 👍

Comment thread README.md Outdated
Comment thread nextflow_schema.json
Co-authored-by: James A. Fellows Yates <[email protected]>
Copy link
Copy Markdown
Collaborator

@d4straub d4straub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me as well. I agree and still think (previous comments) that there should be a mention of the profile and usage in the docs, i.e. usage.md. I approve it anyway.

@jfy133 jfy133 merged commit 7ff5996 into nf-core:dev Jan 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants