Run on a Cluster#
This guide explains how to execute oxo-flow workflows on HPC clusters using SLURM, PBS, SGE, and LSF backends.
Overview#
oxo-flow's cluster module translates each rule into a cluster job submission. Resource requirements declared in the .oxoflow file (threads, memory, gpu, disk, time_limit) are mapped to the appropriate scheduler directives.
Environment wrapping is applied automatically — conda, docker, singularity, pixi, and venv environments are properly wrapped in the generated scripts.
Supported Schedulers#
| Scheduler | Status | Directive prefix |
|---|---|---|
| SLURM | Supported | #SBATCH |
| PBS/Torque | Supported | #PBS |
| SGE | Supported | #$ |
| LSF | Supported | #BSUB |
Declaring Resources#
Set resource requirements per rule:
[[rules]]
name = "align"
input = ["{sample}_R1.fastq.gz"]
output = ["aligned/{sample}.bam"]
threads = 16
memory = "32G"
environment = { singularity = "docker://biocontainers/bwa:0.7.17" }
shell = "bwa mem -t {threads} ref.fa {input} | samtools sort -o {output}"
[rules.resources]
gpu = 0
disk = "100G"
time_limit = "24h"
Resource fields#
| Field | Type | Example | Description |
|---|---|---|---|
threads |
Integer | 16 |
Number of CPU cores |
memory |
String | "32G" |
RAM allocation |
gpu |
Integer | 1 |
Number of GPUs (simple count) |
gpu_spec |
Table | See below | Detailed GPU specification |
disk |
String | "100G" |
Local disk space |
time_limit |
String | "24h" |
Wall-time limit |
GPU Specification#
For basic GPU requests, use the gpu field:
For advanced GPU configuration (SLURM only), use gpu_spec:
[rules.resources.gpu_spec]
count = 2
model = "a100" # GPU model (optional, SLURM only)
memory_gb = 40 # Per-GPU memory in GB (optional, SLURM only)
Different schedulers handle GPU requests differently:
| Scheduler | GPU Directive | Notes |
|---|---|---|
| SLURM | --gres=gpu:2 or --gres=gpu:a100:2:40g |
Full support for model and memory spec |
| PBS | gpu=2 |
Basic count only; model selection varies by site |
| SGE | -l gpu=2 |
Basic count only; requires queue configuration |
| LSF | -gpu 2 |
Basic count only |
SLURM Example#
oxo-flow generates SLURM job scripts automatically. For the align rule above, the generated script looks like:
#!/bin/bash
#SBATCH --job-name=align
#SBATCH --cpus-per-task=16
#SBATCH --mem=32G
#SBATCH --time=24:00:00
#SBATCH --output=logs/align_%j.out
#SBATCH --error=logs/align_%j.err
# Environment wrapping (automatically applied)
singularity exec docker://biocontainers/bwa:0.7.17 \
bwa mem -t 16 ref.fa sample1_R1.fastq.gz | samtools sort -o aligned/sample1.bam
PBS Example#
#!/bin/bash
#PBS -N align
#PBS -l ncpus=16
#PBS -l mem=32gb
#PBS -l walltime=24:00:00
#PBS -o logs/align.out
#PBS -e logs/align.err
cd $PBS_O_WORKDIR
# Environment wrapping (automatically applied)
singularity exec docker://biocontainers/bwa:0.7.17 \
bwa mem -t 16 ref.fa sample1_R1.fastq.gz | samtools sort -o aligned/sample1.bam
SGE Example#
#!/bin/bash
#$ -N align
#$ -pe smp 16
#$ -l h_vmem=2G
#$ -l h_rt=24:00:00
#$ -o logs/align.out
#$ -e logs/align.err
#$ -cwd
# Environment wrapping (automatically applied)
singularity exec docker://biocontainers/bwa:0.7.17 \
bwa mem -t 16 ref.fa sample1_R1.fastq.gz | samtools sort -o aligned/sample1.bam
Environment Wrapping#
When generating cluster scripts, oxo-flow automatically wraps commands through the environment resolver:
| Backend | Wrapping |
|---|---|---|
| Conda | conda activate <env>; <command> |
| Docker | docker run --rm -v ... <image> <command> |
| Singularity | singularity exec <image> <command> |
| Pixi | pixi run <command> |
| Venv | source <venv>/bin/activate; <command> |
| Modules | module load <mod1> <mod2>; <command> |
Environment Examples#
Conda with GPU for deep learning:
[[rules]]
name = "train_model"
input = ["data/train.h5"]
output = ["models/trained.pt"]
threads = 8
memory = "64G"
environment = { conda = "envs/pytorch.yaml" }
shell = "python train.py --input {input} --output {output} --gpus {resources.gpu}"
[rules.resources]
gpu = 2
time_limit = "24h"
Singularity with Modules (common on HPC):
[[rules]]
name = "variant_call"
input = ["aligned/{sample}.bam"]
output = ["variants/{sample}.vcf"]
threads = 16
memory = "32G"
environment = {
singularity = "docker://broadinstitute/gatk:4.4.0.0",
modules = ["cuda/11.8"] # Load CUDA module first
}
shell = "gatk HaplotypeCaller -I {input} -O {output}"
Pixi for reproducible environments:
[[rules]]
name = "qc_check"
input = ["{sample}.fastq.gz"]
output = ["qc/{sample}_fastqc.html"]
threads = 4
environment = { pixi = "pixi.toml" }
shell = "fastqc -t {threads} -o qc/ {input}"
Pure Module-based (traditional HPC):
[[rules]]
name = "align"
input = ["reads/{sample}.fq"]
output = ["aligned/{sample}.bam"]
threads = 32
memory = "64G"
environment = { modules = ["bwa/0.7.17", "samtools/1.17", "gcc/11"] }
shell = "bwa mem -t {threads} ref.fa {input} | samtools sort -o {output}"
Pre-build environments on cluster nodes
Ensure your conda environments, docker images, or singularity containers are available on all cluster nodes before submitting jobs. Use --skip-env-setup when environments are pre-built.
Resource Enforcement#
Local Execution#
When running locally (oxo-flow run), resource constraints are enforced:
- Check: Before execution, verify resources are available
- Reserve: Reserve resources before starting the job
- Release: Release resources after completion (or on failure/timeout)
# Limit to 16 threads and 32GB memory for local execution
oxo-flow run pipeline.oxoflow --max-threads 16 --max-memory 32768
Cluster Execution#
On clusters, the scheduler enforces resources based on the generated directives. oxo-flow does not manage resources during cluster execution — the scheduler handles that.
Best Practices#
Use Singularity on clusters
Most HPC clusters do not allow Docker. Use Singularity instead — oxo-flow handles the conversion automatically when you specify singularity = "docker://...".
Set realistic time limits
Generous wall-time limits prevent premature job termination but may lower scheduling priority. Profile your jobs first.
Use --keep-going for large batches
When running hundreds of samples, use oxo-flow run -k so that a single failure does not abort the entire run.
Check resource availability
Use sinfo (SLURM), pbsnodes (PBS), or qhost (SGE) to verify available resources before submitting.
Cache environment setup
Use --cache-dir to persist environment setup state across runs for faster startup.
Monitoring Jobs#
After submission, use your cluster's native tools:
Or use oxo-flow's status command with a checkpoint file:
See Also#
- Architecture: Cluster backends — internal cluster module design
- Environment System — Singularity and Docker on HPC
runcommand —--max-threads,--max-memory,--skip-env-setup,--cache-dirclustercommand — cluster submission reference