Skip to content

Add nf-test barebone configs#855

Merged
jfy133 merged 28 commits intonf-core:devfrom
dialvarezs:nf-test-configs
Sep 17, 2025
Merged

Add nf-test barebone configs#855
jfy133 merged 28 commits intonf-core:devfrom
dialvarezs:nf-test-configs

Conversation

@dialvarezs
Copy link
Copy Markdown
Member

@dialvarezs dialvarezs commented Aug 25, 2025

According to #501 (comment), I added configs for the following profiles:

  • longreadonly
  • longreadonly_alternatives
  • hybrid
  • assembly_input

I also removed old configs.

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • If necessary, also make a PR on the nf-core/mag branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core pipelines lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

@dialvarezs dialvarezs changed the title nf-test barebones configs nf-test barebone configs Aug 25, 2025
@dialvarezs dialvarezs changed the title nf-test barebone configs Add nf-test barebone configs Aug 25, 2025
Copy link
Copy Markdown
Member

@jfy133 jfy133 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM so far, except for that single comment below.

I will now try to run each test separately and check all the expected tools execute

Comment thread conf/test_assembly_input.config Outdated
@jfy133
Copy link
Copy Markdown
Member

jfy133 commented Sep 5, 2025

Example of how I'm testing:

nextflow run ../main.nf -profile test_longreadonly,apptainer --outdir ./results
grep 'Submitted process' .nextflow.log | cut -d ' ' -f 13 | sort | uniq | rev | cut -d ':' -f 1 | rev | sort
## and
cat results/pipeline_info/nf_core_mag_software_mqc_versions.yml 

Compare list to #501

(having to switch to run on a cluster now unfortuantely)

@jfy133
Copy link
Copy Markdown
Member

jfy133 commented Sep 5, 2025

  • ✅ longreadonly: has all expected processes executed, and versions represented
  • ✅ longreadonly_alternatives: has all expected processes executed, and versions represented
  • ✅ hybrid: has all expected processes executed, and versions represented, including the METASPADESHYBRID variant of SPAdes
  • ❓ assembly_input:
    • At testing, missing GUNC (but see latest commit)
    • Missing MMSEQS: this runs if you specify a mmseqs db name via --metaeuk_mmseqs_db Kalamari, the smallest to my knowledge is kalamari, maybe we can try that?

@jfy133
Copy link
Copy Markdown
Member

jfy133 commented Sep 8, 2025

Yeah, so adding the Kalamari then executiosn MMSEQS_DATABASES 👍

@dialvarezs
Copy link
Copy Markdown
Member Author

Even Kalamari seems to be too big for the CI. It stopped after 1h40min.

@prototaxites
Copy link
Copy Markdown
Contributor

prototaxites commented Sep 12, 2025

Can you just pass a FASTA to metaeuk_db for testing? The MetaEuk test uses params.modules_testdata_base_path + 'proteomics/database/yeast_UPS.fasta

https://github.com/nf-core/modules/blob/e4760cac1fcad979e19c00da5f10573be91358a7/modules/nf-core/metaeuk/easypredict/tests/main.nf.test#L4

@prototaxites
Copy link
Copy Markdown
Contributor

Possibly also tuning the metaeuk args:

-s: sensitivity, 1 is fastest
--max-seqs: Maximum results per query passing prefilter - default is 300, maybe try 50?

There's a bunch of other options that might be worth setting, too - see metaeuk easy-predict for the list.

@dialvarezs
Copy link
Copy Markdown
Member Author

dialvarezs commented Sep 12, 2025

@prototaxites Yes, we were using that until now, but we decided to try a full db. I will try tunning the parameters, thanks for the suggestion!

@jfy133
Copy link
Copy Markdown
Member

jfy133 commented Sep 13, 2025

Does giving the FASTA still execute the MMSEQS module though (my main concern)?

@dialvarezs
Copy link
Copy Markdown
Member Author

@jfy133 No, it doesn't. The fasta is used directly by METAEUK_EASYPREDICT.
I tried several options to metauk to try to reduce the runtime, but didn't work.
I reverted the parameter to use the FASTA for now.

@prototaxites
Copy link
Copy Markdown
Contributor

prototaxites commented Sep 13, 2025

If you're open to a slight hack to test the process, we could modify the code so if metaeuk_mmseqs_db is set it always downloads a DB, even if skip_metaeuk is true. That way we can test downloading without running the test with the db...

edit: Or alternatively, as that might not be desirable - set task.ext.when = false in a test config for MetaEuk... Future-proof for as long as when: still supported.

@jfy133
Copy link
Copy Markdown
Member

jfy133 commented Sep 15, 2025

OK so I think we can say:

  • GUNC is broken, so we have to skip for now.
  • MetaEuk runs too long if we use Kalamari (so we just have to skip testing MMSEqs). At least we have the main tool.

Otherwise we have to rely on user reports.

@prototaxites
Copy link
Copy Markdown
Contributor

Kalamari is not actually the smallest MMSeqs2 database available - UniProtKB/Swiss-Prot is significantly smaller. The actual sequence files for SwissProt come to just 64Mb... the rest of the data is taxonomy, which I suspect will not significantly impact run time. Kalamari requires 633Mb of sequence files.

So it might be worth trying one last time with Swissprot...!

❱ du -h --max-depth 1 .                                                                                                                                                                                                                                               
611M	./swissprot
964M	./kalamari

❱ ll swissprot/                                                                                                                                                                                                                                                       
total 611M
-rw-r--r-- 1 jd42 team311 200M Sep 15 11:41 swissprot
-rw-r--r-- 1 jd42 team311    4 Sep 15 11:41 swissprot.dbtype
-rw-r--r-- 1 jd42 team311  72M Sep 15 11:41 swissprot_h
-rw-r--r-- 1 jd42 team311    4 Sep 15 11:41 swissprot_h.dbtype
-rw-r--r-- 1 jd42 team311  11M Sep 15 11:41 swissprot_h.index
-rw-r--r-- 1 jd42 team311  12M Sep 15 11:41 swissprot.index
-rw-r--r-- 1 jd42 team311 8.7M Sep 15 11:41 swissprot.lookup
-rw-r--r-- 1 jd42 team311 7.3M Sep 15 11:41 swissprot_mapping
-rw-r--r-- 1 jd42 team311   25 Sep 15 11:41 swissprot.source
-rw-r--r-- 1 jd42 team311 710M Sep 15 11:41 swissprot_taxonomy
-rw-r--r-- 1 jd42 team311  151 Sep 15 11:41 swissprot.version

Copy link
Copy Markdown
Member

@jfy133 jfy133 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I pushed one minor fix (easy to edit in GitHub)

@dialvarezs if you could re-run the snapshots to fix that one null version in the snapshots for ADJUST_MAXBIN_EXT (codespaces wouldn't allow me to push 😢 ), we are good to go for this PR...

and we can open the release PR for 5.0.0!

(Which is PERFECT timing, as this is something I can report in a grant progress update in 2 weeks 😅)

Comment thread CHANGELOG.md Outdated
Comment thread conf/test_assembly_input.config
@jfy133
Copy link
Copy Markdown
Member

jfy133 commented Sep 15, 2025

Kalamari is not actually the smallest MMSeqs2 database available - UniProtKB/Swiss-Prot is significantly smaller. The actual sequence files for SwissProt come to just 64Mb... the rest of the data is taxonomy, which I suspect will not significantly impact run time. Kalamari requires 633Mb of sequence files.

So it might be worth trying one last time with Swissprot...!

❱ du -h --max-depth 1 .                                                                                                                                                                                                                                               
611M	./swissprot
964M	./kalamari

❱ ll swissprot/                                                                                                                                                                                                                                                       
total 611M
-rw-r--r-- 1 jd42 team311 200M Sep 15 11:41 swissprot
-rw-r--r-- 1 jd42 team311    4 Sep 15 11:41 swissprot.dbtype
-rw-r--r-- 1 jd42 team311  72M Sep 15 11:41 swissprot_h
-rw-r--r-- 1 jd42 team311    4 Sep 15 11:41 swissprot_h.dbtype
-rw-r--r-- 1 jd42 team311  11M Sep 15 11:41 swissprot_h.index
-rw-r--r-- 1 jd42 team311  12M Sep 15 11:41 swissprot.index
-rw-r--r-- 1 jd42 team311 8.7M Sep 15 11:41 swissprot.lookup
-rw-r--r-- 1 jd42 team311 7.3M Sep 15 11:41 swissprot_mapping
-rw-r--r-- 1 jd42 team311   25 Sep 15 11:41 swissprot.source
-rw-r--r-- 1 jd42 team311 710M Sep 15 11:41 swissprot_taxonomy
-rw-r--r-- 1 jd42 team311  151 Sep 15 11:41 swissprot.version

Oh wow! OK ! Yeah might be worht trying!

@dialvarezs
Copy link
Copy Markdown
Member Author

Ok, I managed to get it running. Using swissprot definitely helped a bit but not enough. What really did the trick was to reduce the number of bins ran with metaeuk setting postbinning_input to refined_bins_only.

Copy link
Copy Markdown
Member

@jfy133 jfy133 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoop whoops! finally!

@jfy133
Copy link
Copy Markdown
Member

jfy133 commented Sep 17, 2025

Thank you @dialvarezs ! Merging!

@jfy133 jfy133 merged commit 85c1a4b into nf-core:dev Sep 17, 2025
20 checks passed
@jfy133 jfy133 mentioned this pull request Sep 17, 2025
22 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants