A comprehensive example demonstrating how to use the BioLM SDK within Nextflow pipelines for protein structure prediction and antibody engineering. This project provides two main workflows:
intro.nf: Protein structure prediction using ESMFoldantibody_engineering.nf: Antibody variant generation using AntiFold
# Install BioLM SDK
pip install biolmai
# Install Nextflow (if needed)
curl -s https://get.nextflow.io | bash-
Visit BioLM
-
Sign up and get your API token
-
Set it as an environment variable:
export BIOLMAI_TOKEN="your_token_here"
Note: The workflows include built-in token validation and will provide clear error messages if the token is missing or invalid.
Option A: Quick Demo (Recommended for first-time users)
nextflow run intro.nf --demo
# Output: results/DEMO.pdbOption B: Default Example
nextflow run intro.nf
# Output: results/GFP.pdbOption C: Your Own Sequences
nextflow run intro.nf --input your_proteins.fasta
# Output: results/sequence1.pdb, results/sequence2.pdb, etc.Quick Start:
nextflow run antibody_engineering.nf --num_variants 5
# Output: results/antibody_engineering_summary.html + analysis filesWith Custom Parameters:
nextflow run antibody_engineering.nf \
--num_variants 10 \
--sampling_temp 0.8# List output files
ls -la results/
# View PDB structure (first few lines)
head -20 results/*.pdbYou can run this workflow directly on Seqera Platform without any local setup:
- Click the badge:
- Sign in to your Seqera Platform account
- Configure parameters:
- Set your
BIOLMAI_TOKENas an environment variable - Choose your input mode (demo, default, or custom FASTA)
- Configure compute resources
- Set your
- Launch the workflow
Benefits of Seqera Platform:
- No local installation required
- Scalable cloud compute resources
- Built-in monitoring and visualization
- Easy parameter configuration
- Automatic result management
This project demonstrates two main use cases:
- Unified Workflow: Single workflow that handles both demo and production use cases
- ESMFold Integration: Using BioLM's ESMFold for protein structure prediction
- Nextflow Orchestration: Parallel processing and workflow management
- PDB Output: Direct extraction of protein structure files
- AntiFold Integration: Using BioLM's AntiFold for antibody variant generation
- Multi-Target Processing: Handles EGFR, PDL1, MBP, and IL-7RALPHA targets
- CDR Analysis: Comprehensive analysis of Complementarity-Determining Regions
- Automated PDB Download: Downloads PDB files directly from RCSB
- Rich Reporting: HTML summaries with diversity analysis and statistics
- BioLM API Token: Get your token from BioLM
- Python 3.7+: With the BioLM SDK installed
- Nextflow: Version 20.0 or later
- Purpose: Quick test with a hardcoded protein sequence
- Input: Built-in demo sequence (MKTVRQERLKSIVRILERSKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGYNIVATPRGYVLAGG)
- Output:
DEMO.pdb
- Purpose: Standard example using GFP protein
- Input: Built-in GFP sequence
- Output:
GFP.pdb
- Purpose: Process your own protein sequences
- Input: Your FASTA file
- Output: One
.pdbfile per sequence in the FASTA
- Downloads PDB files from RCSB (3c09, 5x8m, 5bjz, 6p67)
- Extracts sequences from heavy, light, and antigen chains
- Generates variants using AntiFold for CDR regions
- Analyzes diversity of generated antibody variants
- Creates reports with statistics and visualizations
- PDB files: Downloaded antibody structures
- Sequence files: Extracted chain sequences
- Variant files: Generated antibody variants
- Analysis files: CDR diversity analysis
- HTML report: Comprehensive summary with statistics
nf-biolm/
├── intro.nf # Protein structure prediction workflow
├── antibody_engineering.nf # Antibody engineering workflow
├── antibody_engineering_test.nf # Test version (mock data)
├── nextflow.config # Configuration
├── requirements.txt # Python dependencies
├── tower.yml # Seqera Platform configuration
├── LICENSE # MIT License
├── README.md # This file
├── results/ # Output directory
│ ├── *.pdb # Protein structure files
│ ├── analysis/ # Antibody analysis results
│ ├── sequences/ # Extracted sequences
│ ├── variants/ # Generated variants
│ └── *.html # Summary reports
└── work/ # Nextflow work directory
| Parameter | Description | Default |
|---|---|---|
--token |
BioLM API token | $BIOLMAI_TOKEN env var |
--input |
Input FASTA file | None (uses default) |
--demo |
Run in demo mode | false |
--outdir |
Output directory | results |
| Parameter | Description | Default |
|---|---|---|
--token |
BioLM API token | $BIOLMAI_TOKEN env var |
--num_variants |
Number of variants per target | 100 |
--sampling_temp |
Sampling temperature for generation | 0.8 |
--outdir |
Output directory | results |
The workflow produces PDB structure files directly:
- Format: Standard PDB format
- Location:
results/directory - Naming: Based on sequence ID from FASTA header
The workflow produces comprehensive analysis results:
- PDB files: Downloaded antibody structures
- Sequence files: Extracted chain sequences (JSON format)
- Variant files: Generated antibody variants (JSON format)
- Analysis files: CDR diversity analysis (JSON + CSV)
- HTML report: Interactive summary with statistics and visualizations
- PDB Files: Standard protein structure files ready for visualization
- Clean Output: No intermediate JSON files, just the structures you need
- Scalable: Can process single sequences or entire FASTA files
- Multi-Target Analysis: EGFR, PDL1, MBP, and IL-7RALPHA antibody targets
- CDR Diversity: Comprehensive analysis of antibody diversity
- Professional Reports: HTML summaries with statistics and visualizations
- Complete Pipeline: From PDB download to final analysis
- Real BioLM Integration: Uses actual AntiFold API for variant generation
- Visualize: Open PDB files in tools like PyMOL, Chimera, or online viewers
- Customize: Modify the workflow for your specific needs
- Scale Up: Process larger datasets with more compute resources
- Analyze Results: Review the HTML summary and diversity analysis
- Explore Variants: Examine generated antibody sequences and their properties
- Customize Targets: Modify the workflow to work with your own antibody targets
- Production Use: Scale up for large-scale antibody engineering projects
- API Token Issues: Ensure
BIOLMAI_TOKENis set correctly. The workflows now include graceful token validation and will provide clear error messages if the token is missing. - Import Error: Run
pip install biolmai - Workflow errors: Check the
.nextflow.logfile for details - API Rate Limits: BioLM has rate limits; wait between requests if needed
The workflow can be easily customized by:
- Modifying sequences: Edit the hardcoded sequences in the workflow
- Adding parameters: Extend the parameter list for additional options
- Changing output format: Modify the PDB extraction logic
- Blog Post: Scaling BioLM Workflows with Nextflow: From Notebooks to Production Pipelines - Learn more about integrating BioLM with Nextflow workflows
- BioLM Documentation: https://biolm.ai/ - Official BioLM platform and API documentation
- Nextflow Documentation: https://www.nextflow.io/ - Nextflow workflow framework documentation
- Seqera Platform: https://cloud.seqera.io/ - Cloud-native platform for running Nextflow workflows