Skip to content

Ale module#931

Merged
dialvarezs merged 29 commits intonf-core:devfrom
PetcuBogdan:ale_module
Jan 22, 2026
Merged

Ale module#931
dialvarezs merged 29 commits intonf-core:devfrom
PetcuBogdan:ale_module

Conversation

@PetcuBogdan
Copy link
Copy Markdown
Contributor

@PetcuBogdan PetcuBogdan commented Nov 16, 2025

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

Description

This PR adds ALE (Assembly Likelihood Estimator) for assembly quality control in nf-core/mag.
ALE is a probabilistic framework that evaluates assembly quality by computing the likelihood of sequencing reads given an assembly. It provides per-contig quality scores useful for identifying misassemblies, comparing assemblies, and validating quality before binning.

Changes made

Workflow:

  • Added ALE analysis for short-read assemblies (SPAdes, MEGAHIT)
  • Runs when binning or ancient DNA analysis is enabled
  • Reuses existing BAM files from binning preparation

References

@jfy133
Copy link
Copy Markdown
Member

jfy133 commented Nov 16, 2025

@nf-core-bot fix linting

@PetcuBogdan
Copy link
Copy Markdown
Contributor Author

Please let me know what I can improve. Thank you!

Copy link
Copy Markdown
Collaborator

@d4straub d4straub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ci tests fail with various errors. https://github.com/nf-core/mag/actions/runs/19404507607/job/55538896302?pr=931 fails with

    > ERROR ~ Error executing process > 'NFCORE_MAG:MAG:ALE (minigut)'
    > 
    > Caused by:
    >   Process `NFCORE_MAG:MAG:ALE (minigut)` terminated with an error exit status (134)
    > 
    > 
    > Command executed:
    > 
    >   ALE \
    >        \
    >       SPAdesHybrid-minigut-minigut.bam \
    >       SPAdesHybrid-minigut.scaffolds.fa \
    >       minigut_ALEoutput.txt
    >   
    >   cat <<-END_VERSIONS > versions.yml
    >   "NFCORE_MAG:MAG:ALE":
    >       ale: 20180904
    >   END_VERSIONS
    > 
    > Command exit status:
    >   134
    > 
    > Command output:
    >   BAM file: SPAdesHybrid-minigut-minigut.bam
    >   Assembly fasta file: SPAdesHybrid-minigut.scaffolds.fa
    >   ALE Output file: minigut_ALEoutput.txt
    >   Reading in assembly...
    >   Reading in the map and computing statistics...
    >   Insert length and std not given, will be calculated from input map.
    >   Found FR sample avg insert length to be 383.864169 from 28344 mapped reads
    >   Found FR sample insert length std to be 69.336488
    >   Found NOT_PROPER_FR sample avg insert length to be 892.122675 from 66297 mapped reads
    >   Found NOT_PROPER_FR sample insert length std to be 221.969163
    >   There were 99620 total reads, 99620 paired (97898 properly mated), 763 proper singles, 959 improper reads (818 chimeric). (83 reads were unmapped)
    >   Saved library parameters to minigut_ALEoutput.txt.param
    >   Computing read placements and depths
    > 
    > Command error:
    >   WARNING: The following read and its mate do not agree on the contigs and/or positions of their mappings:read1: NC_006347.1_4981 81: 0 0 106315 105875	read2: NC_006347.1_4981 161: 0 0 105578 106537	l: 1.000000 li: 1.000000, s1: 106315, s2: 105875, e1: 106441, e2: -1, c1: 0, c2: 0, NC_006347.1_4981, NOT_PROPER_FR, 0, b1: 34e7c540, b2: 0
    >   ALE: ALElike.c:1892: validateAlignmentMates: Assertion `thisAlignment->start2 == thisReadMate->core.pos' failed.
    >   BAM file: SPAdesHybrid-minigut-minigut.bam
    >   Assembly fasta file: SPAdesHybrid-minigut.scaffolds.fa
    >   ALE Output file: minigut_ALEoutput.txt
    >   Reading in assembly...
    >   Reading in the map and computing statistics...
    >   Insert length and std not given, will be calculated from input map.
    >   Found FR sample avg insert length to be 383.864169 from 28344 mapped reads
    >   Found FR sample insert length std to be 69.336488
    >   Found NOT_PROPER_FR sample avg insert length to be 892.122675 from 66297 mapped reads
    >   Found NOT_PROPER_FR sample insert length std to be 221.969163
    >   There were 99620 total reads, 99620 paired (97898 properly mated), 763 proper singles, 959 improper reads (818 chimeric). (83 reads were unmapped)
    >   Saved library parameters to minigut_ALEoutput.txt.param
    >   Computing read placements and depths
    >   .command.sh: line 6:    34 Aborted                 (core dumped) ALE SPAdesHybrid-minigut-minigut.bam SPAdesHybrid-minigut.scaffolds.fa minigut_ALEoutput.txt
    > 
    > Work dir:
    >   /home/runner/_work/mag/mag/~/tests/b1878932db1a90503becf8394b4ddfd4/work/f4/88d2098735b2dd029b42b1a840ced7
    > 
    > Container:
    >   quay.io/biocontainers/ale:20180904--py27ha92aebf_0
    > 
    > Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`
    > 
    >  -- Check '/home/runner/_work/mag/mag/~/tests/b1878932db1a90503becf8394b4ddfd4/meta/nextflow.log' file for details
    > ERROR ~ Could not find which method load() to invoke from this list:
    >   public java.lang.Object org.yaml.snakeyaml.Yaml#load(java.io.InputStream)
    >   public java.lang.Object org.yaml.snakeyaml.Yaml#load(java.io.Reader)
    >   public java.lang.Object org.yaml.snakeyaml.Yaml#load(java.lang.String)
    >   public java.lang.Object org.yaml.snakeyaml.Yaml#load(java.io.File)
    >   public java.lang.Object org.yaml.snakeyaml.Yaml#load(java.nio.file.Path)
    > 
    >  -- Check script '/home/runner/_work/mag/mag/subworkflows/nf-core/utils_nfcore_pipeline/main.nf' at line: 82 or see '/home/runner/_work/mag/mag/~/tests/b1878932db1a90503becf8394b4ddfd4/meta/nextflow.log' file for more details
    > ERROR ~ Pipeline failed. Please refer to troubleshooting docs: https://nf-co.re/docs/usage/troubleshooting
    > 
    >  -- Check '/home/runner/_work/mag/mag/~/tests/b1878932db1a90503becf8394b4ddfd4/meta/nextflow.log' file for details
    > -[nf-core/mag] Pipeline completed with errors-
    > WARN: Killing running tasks (1)
    FAILED (481.488s)

Additionally, test https://github.com/nf-core/mag/actions/runs/19404507607/job/55538896283?pr=931 indicates that ALE is run but output files are not published to the results folder:

    2     {                                 2     {                            
    3         "ADJUST_MAXBIN2_EXT": {       3         "ADJUST_MAXBIN2_EXT": {  
    4             "coreutils": 9.5          4             "coreutils": 9.5     
                                        +   5         },                       
                                        +   6         "ALE": {                 
                                        +   7             "ale": 20180904      
    5         },                            8         },                       
    6         "BIN_SUMMARY": {              9         "BIN_SUMMARY": {         
    7             "pandas": "1.4.3",       10             "pandas": "1.4.3",  

Comment thread docs/usage.md Outdated
Comment thread workflows/mag.nf Outdated
Comment thread workflows/mag.nf Outdated
Comment thread nextflow_schema.json Outdated
PetcuBogdan and others added 2 commits November 18, 2025 20:39
Copy link
Copy Markdown
Contributor

@prototaxites prototaxites left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @PetcuBogdan, a few thoughts from me!

Comment thread conf/modules.config Outdated
Comment thread conf/modules.config Outdated
Comment thread docs/output.md Outdated
Comment thread workflows/mag.nf Outdated
Comment thread workflows/mag.nf Outdated
@PetcuBogdan
Copy link
Copy Markdown
Contributor Author

Ci tests fail with various errors. https://github.com/nf-core/mag/actions/runs/19404507607/job/55538896302?pr=931 fails with

    > ERROR ~ Error executing process > 'NFCORE_MAG:MAG:ALE (minigut)'
    > 
    > Caused by:
    >   Process `NFCORE_MAG:MAG:ALE (minigut)` terminated with an error exit status (134)
    > 
    > 
    > Command executed:
    > 
    >   ALE \
    >        \
    >       SPAdesHybrid-minigut-minigut.bam \
    >       SPAdesHybrid-minigut.scaffolds.fa \
    >       minigut_ALEoutput.txt
    >   
    >   cat <<-END_VERSIONS > versions.yml
    >   "NFCORE_MAG:MAG:ALE":
    >       ale: 20180904
    >   END_VERSIONS
    > 
    > Command exit status:
    >   134
    > 
    > Command output:
    >   BAM file: SPAdesHybrid-minigut-minigut.bam
    >   Assembly fasta file: SPAdesHybrid-minigut.scaffolds.fa
    >   ALE Output file: minigut_ALEoutput.txt
    >   Reading in assembly...
    >   Reading in the map and computing statistics...
    >   Insert length and std not given, will be calculated from input map.
    >   Found FR sample avg insert length to be 383.864169 from 28344 mapped reads
    >   Found FR sample insert length std to be 69.336488
    >   Found NOT_PROPER_FR sample avg insert length to be 892.122675 from 66297 mapped reads
    >   Found NOT_PROPER_FR sample insert length std to be 221.969163
    >   There were 99620 total reads, 99620 paired (97898 properly mated), 763 proper singles, 959 improper reads (818 chimeric). (83 reads were unmapped)
    >   Saved library parameters to minigut_ALEoutput.txt.param
    >   Computing read placements and depths
    > 
    > Command error:
    >   WARNING: The following read and its mate do not agree on the contigs and/or positions of their mappings:read1: NC_006347.1_4981 81: 0 0 106315 105875	read2: NC_006347.1_4981 161: 0 0 105578 106537	l: 1.000000 li: 1.000000, s1: 106315, s2: 105875, e1: 106441, e2: -1, c1: 0, c2: 0, NC_006347.1_4981, NOT_PROPER_FR, 0, b1: 34e7c540, b2: 0
    >   ALE: ALElike.c:1892: validateAlignmentMates: Assertion `thisAlignment->start2 == thisReadMate->core.pos' failed.
    >   BAM file: SPAdesHybrid-minigut-minigut.bam
    >   Assembly fasta file: SPAdesHybrid-minigut.scaffolds.fa
    >   ALE Output file: minigut_ALEoutput.txt
    >   Reading in assembly...
    >   Reading in the map and computing statistics...
    >   Insert length and std not given, will be calculated from input map.
    >   Found FR sample avg insert length to be 383.864169 from 28344 mapped reads
    >   Found FR sample insert length std to be 69.336488
    >   Found NOT_PROPER_FR sample avg insert length to be 892.122675 from 66297 mapped reads
    >   Found NOT_PROPER_FR sample insert length std to be 221.969163
    >   There were 99620 total reads, 99620 paired (97898 properly mated), 763 proper singles, 959 improper reads (818 chimeric). (83 reads were unmapped)
    >   Saved library parameters to minigut_ALEoutput.txt.param
    >   Computing read placements and depths
    >   .command.sh: line 6:    34 Aborted                 (core dumped) ALE SPAdesHybrid-minigut-minigut.bam SPAdesHybrid-minigut.scaffolds.fa minigut_ALEoutput.txt
    > 
    > Work dir:
    >   /home/runner/_work/mag/mag/~/tests/b1878932db1a90503becf8394b4ddfd4/work/f4/88d2098735b2dd029b42b1a840ced7
    > 
    > Container:
    >   quay.io/biocontainers/ale:20180904--py27ha92aebf_0
    > 
    > Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`
    > 
    >  -- Check '/home/runner/_work/mag/mag/~/tests/b1878932db1a90503becf8394b4ddfd4/meta/nextflow.log' file for details
    > ERROR ~ Could not find which method load() to invoke from this list:
    >   public java.lang.Object org.yaml.snakeyaml.Yaml#load(java.io.InputStream)
    >   public java.lang.Object org.yaml.snakeyaml.Yaml#load(java.io.Reader)
    >   public java.lang.Object org.yaml.snakeyaml.Yaml#load(java.lang.String)
    >   public java.lang.Object org.yaml.snakeyaml.Yaml#load(java.io.File)
    >   public java.lang.Object org.yaml.snakeyaml.Yaml#load(java.nio.file.Path)
    > 
    >  -- Check script '/home/runner/_work/mag/mag/subworkflows/nf-core/utils_nfcore_pipeline/main.nf' at line: 82 or see '/home/runner/_work/mag/mag/~/tests/b1878932db1a90503becf8394b4ddfd4/meta/nextflow.log' file for more details
    > ERROR ~ Pipeline failed. Please refer to troubleshooting docs: https://nf-co.re/docs/usage/troubleshooting
    > 
    >  -- Check '/home/runner/_work/mag/mag/~/tests/b1878932db1a90503becf8394b4ddfd4/meta/nextflow.log' file for details
    > -[nf-core/mag] Pipeline completed with errors-
    > WARN: Killing running tasks (1)
    FAILED (481.488s)

Additionally, test https://github.com/nf-core/mag/actions/runs/19404507607/job/55538896283?pr=931 indicates that ALE is run but output files are not published to the results folder:

    2     {                                 2     {                            
    3         "ADJUST_MAXBIN2_EXT": {       3         "ADJUST_MAXBIN2_EXT": {  
    4             "coreutils": 9.5          4             "coreutils": 9.5     
                                        +   5         },                       
                                        +   6         "ALE": {                 
                                        +   7             "ale": 20180904      
    5         },                            8         },                       
    6         "BIN_SUMMARY": {              9         "BIN_SUMMARY": {         
    7             "pandas": "1.4.3",       10             "pandas": "1.4.3",  

Hi! Thanks a lot for pointing this out, and apologies for the slow reply — I needed some time to investigate it properly.

This issue is actually a known ALE bug that appears when running on the small synthetic BAM/FASTA files used in the nf-core test datasets. The error does not occur with real metagenomic data (e.g., the CAPES samples I used to validate the module locally), which is why the process completes normally outside the test environment.

One possible path forward would be to update the ALE module to catch this specific failure mode and surface it as a warning or a logged message rather than causing the process to hang.

Please let me know if there is something else that i can do, thank you.

@jfy133
Copy link
Copy Markdown
Member

jfy133 commented Dec 1, 2025

One possible path forward would be to update the ALE module to catch this specific failure mode and surface it as a warning or a logged message rather than causing the process to hang.

@PetcuBogdan do you know exactly what the bug is though? We can theoretically try and work aroudn it but it would be good to know what the cause before we do that as work arounds can sometimes be dangerous (hide a bigger different bug)

@PetcuBogdan
Copy link
Copy Markdown
Contributor Author

One possible path forward would be to update the ALE module to catch this specific failure mode and surface it as a warning or a logged message rather than causing the process to hang.

@PetcuBogdan do you know exactly what the bug is though? We can theoretically try and work aroudn it but it would be good to know what the cause before we do that as work arounds can sometimes be dangerous (hide a bigger different bug)

Hi! The ALE crash appears to come from its strict validateAlignmentMates check — some read pairs in the test dataset have inconsistent mate coordinates or orientations, and ALE aborts when encountering them. This issue shows up only on the test data; on the real CAPES dataset ALE runs normally, so the pipeline itself is fine.

To investigate this further I could also use samtools to filter the input BAM and remove problematic pairs before running ALE, so that we can confirm whether the error is strictly caused by these inconsistent alignments.

Let me know if you want me to try that.

@PetcuBogdan
Copy link
Copy Markdown
Contributor Author

I also noticed that there is a newer ale-core module available, which seems to be a more up-to-date and actively maintained version of ALE. It might handle mate-pair inconsistencies better or at least fail more gracefully. If updating ALE in the modules/nf-core is possible, switching to ale-core could potentially avoid this crash.

Link to ale-core package https://anaconda.org/channels/bioconda/packages/ale-core/overview

@PetcuBogdan
Copy link
Copy Markdown
Contributor Author

I also noticed that there is a newer ale-core module available, which seems to be a more up-to-date and actively maintained version of ALE. It might handle mate-pair inconsistencies better or at least fail more gracefully. If updating ALE in the modules/nf-core is possible, switching to ale-core could potentially avoid this crash.

Link to ale-core package https://anaconda.org/channels/bioconda/packages/ale-core/overview

@jfy133 I got a similar error using the newer ale-core package from 2020 instead of the older one from 2018, this is the error

Command error:
  WARNING: The following read and its mate do not agree on the contigs and/or positions of their mappings:read1: NC_000913.3_9229 147: 65 65 1104 756 read2: NC_000913.3_9229 97: 65 65 243 1328      
  l: 1.000000 li: 1.000000, s1: 1104, s2: 756, e1: 1230, e2: -1, c1: 65, c2: 65, NC_000913.3_9229, FR, (nil), b1: 0x55908300d6c0, b2: (nil)
  ALE: ALElike.c:1892: validateAlignmentMates: Assertion `thisAlignment->start2 == thisReadMate->core.pos' failed.
  .command.sh: line 5:     7 Aborted                 (core dumped) ALE MEGAHIT-sample1-sample1.bam MEGAHIT-sample1.contigs.fa sample1_ALEoutput.txt 

@nf-core-bot
Copy link
Copy Markdown
Member

Warning

Newer version of the nf-core template is available.

Your pipeline is using an old version of the nf-core template: 3.4.1.
Please update your pipeline to the latest version.

For more documentation on how to update your pipeline, please see the nf-core documentation and Synchronisation documentation.

Comment thread conf/modules.config
Copy link
Copy Markdown
Member

@dialvarezs dialvarezs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments from me.

BTW, I updated the changelog to move the entry to the current dev section.

Comment thread conf/modules.config Outdated
Comment thread docs/usage.md Outdated
Comment thread workflows/mag.nf Outdated
Copy link
Copy Markdown
Member

@jfy133 jfy133 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approval as comments not necessarily blocking (the second is worrying though)

Missing README.md, otherwise one question and one suggestion

Comment thread workflows/mag.nf Outdated
Comment thread conf/test.config
@dialvarezs dialvarezs merged commit ee13a46 into nf-core:dev Jan 22, 2026
37 of 39 checks passed
@jfy133
Copy link
Copy Markdown
Member

jfy133 commented Jan 23, 2026

Thanks @dialvarezs , and sorry that too long, and thanks for contributing @PetcuBogdan (and @amizeranschi for cooridnating!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants