Skip to content

Summarise ancient DNA pyDamage results per bin and add to bin summary table #963

Merged
jfy133 merged 21 commits intodevfrom
merge-binqc-pydamage-2
Jan 16, 2026
Merged

Summarise ancient DNA pyDamage results per bin and add to bin summary table #963
jfy133 merged 21 commits intodevfrom
merge-binqc-pydamage-2

Conversation

@jfy133
Copy link
Copy Markdown
Member

@jfy133 jfy133 commented Jan 7, 2026

To close #833

Supersedes Nextflow only version in #835 (which was having problems with data-flow consistency) with a python script only method.

Essentially:

  1. Makes a contig to bin map
  2. Separates per-contig pydamage results
  3. Groups per-contig pydamage result per bin assignment (save)
  4. Summarises pydamage contig results per bin with a median value
  5. Attaches the median summaries to the bin_summary.tsv

Additional minor changes:

  • Fixes the official pyDamage results unnecessarily publishing versions.yml alongside the actual results
  • Minor reformatting in mag.nf based on language server

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • If necessary, also make a PR on the nf-core/mag branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core pipelines lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

@jfy133 jfy133 marked this pull request as ready for review January 7, 2026 12:21
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jan 7, 2026

nf-core pipelines lint overall result: Passed ✅ ⚠️

Posted for pipeline commit 9ab8d3f

+| ✅ 381 tests passed       |+
#| ❔   1 tests were ignored |#
!| ❗   6 tests had warnings |!
Details

❗ Test warnings:

  • pipeline_todos - TODO string in main.nf: Remove this line if you don't need a FASTA file [TODO: try and test using for --host_fasta and --host_genome]
  • pipeline_todos - TODO string in methods_description_template.yml: #Update the HTML below to your preferred methods description, e.g. add publication citation for this pipeline
  • pipeline_todos - TODO string in main.nf: Optionally add in-text citation tools to this list.
  • pipeline_todos - TODO string in main.nf: Optionally add bibliographic entries to this list.
  • pipeline_todos - TODO string in main.nf: Only uncomment below if logic in toolCitationText/toolBibliographyText has been filled!
  • pipeline_todos - TODO string in nextflow.config: Specify any additional parameters here

❔ Tests ignored:

  • files_unchanged - File ignored due to lint config: .github/PULL_REQUEST_TEMPLATE.md

✅ Tests passed:

Run details

  • nf-core/tools version 3.5.1
  • Run at 2026-01-16 13:32:29

Comment thread subworkflows/local/binning_pydamage/main.nf Outdated
@jfy133 jfy133 requested a review from prototaxites January 15, 2026 14:26
Copy link
Copy Markdown
Contributor

@prototaxites prototaxites left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the caveat that I don't have the energy to look at the Python code right now... 🥲

This looks good in general, but my main concern is the issue you raise with --binning_map_mode. I don't understand why you would get different sets of BAM files on resume - that seems concerning to me as I would assume this should be deterministic?

Comment thread docs/output.md
Comment thread workflows/mag.nf Outdated
Co-authored-by: Jim Downie <[email protected]>
@jfy133
Copy link
Copy Markdown
Member Author

jfy133 commented Jan 15, 2026

With the caveat that I don't have the energy to look at the Python code right now... 🥲

This looks good in general, but my main concern is the issue you raise with --binning_map_mode. I don't understand why you would get different sets of BAM files on resume - that seems concerning to me as I would assume this should be deterministic?

I think this is to do with maybe a brittle implementation of support for co-binning that was added a while ago:

bam[0],
bai[0],

I suspect the order of the BAM files in that channel are not stable across resumes when in group or all so differnet BAM filesa re used to generate the pyDamage results (and thus result in different contig names).

I don't have the time to look deeper/experiment with this now at all though, and I only noticed it because the default in the test profile was set to 'group' (aDNA people normally will just set with own.

So it's something I Can come back to in the future if it becomes a big problem..., but otherwise I really need this functioanlity with the common aDNA usecase merged in 3 months ago 😅

Copy link
Copy Markdown
Member

@dialvarezs dialvarezs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks perfect to me!
None of my comments is blocking.

Comment thread bin/summarise_pydamagebins.py Outdated
Comment thread bin/summarise_pydamagebins.py Outdated
Comment thread subworkflows/local/binning_pydamage/main.nf Outdated
Comment thread subworkflows/local/binning_pydamage/meta.yaml Outdated
Comment thread bin/summarise_pydamagebins.py Outdated
Comment thread bin/summarise_pydamagebins.py
Comment thread bin/summarise_pydamagebins.py Outdated
Comment thread bin/summarise_pydamagebins.py Outdated
Comment thread bin/summarise_pydamagebins.py Outdated
Comment thread subworkflows/local/binning_pydamage/main.nf Outdated
Comment thread subworkflows/local/binning_pydamage/main.nf Outdated
@jfy133 jfy133 merged commit e2d3a89 into dev Jan 16, 2026
19 of 21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants