05 — Environment Management#
Use different software environments for different pipeline steps. This is critical in bioinformatics where tools have conflicting dependencies.
Concepts Covered
- Per-rule environment declarations
- Conda environment specifications
- Docker container execution
- Dependency isolation patterns
- Mixed-environment workflows
Workflow Definition#
# examples/gallery/05_conda_environments.oxoflow
[workflow]
name = "environment-showcase"
version = "1.0.0"
description = "Demonstrates per-rule environment isolation with conda, docker, and venv"
author = "oxo-flow examples"
[defaults]
threads = 2
memory = "4G"
[[rules]]
name = "download_data"
output = ["data/sequences.fasta"]
shell = """
mkdir -p data
echo '>seq1' > {output[0]}
echo 'ATCGATCGATCGATCGATCG' >> {output[0]}
echo '>seq2' >> {output[0]}
echo 'GCTAGCTAGCTAGCTAGCTA' >> {output[0]}
"""
[[rules]]
name = "quality_check"
input = ["data/sequences.fasta"]
output = ["qc/report.txt"]
shell = """
mkdir -p qc
count=$(grep -c '^>' {input[0]})
echo "Sequence count: $count" > {output[0]}
echo "QC: PASS" >> {output[0]}
"""
[rules.environment]
conda = "envs/qc.yaml"
[[rules]]
name = "align_sequences"
input = ["data/sequences.fasta"]
output = ["aligned/alignment.bam"]
threads = 8
memory = "16G"
shell = "echo 'Alignment placeholder' > {output[0]}"
[rules.environment]
docker = "biocontainers/bwa-mem2:2.2.1"
[[rules]]
name = "analyze_results"
input = ["aligned/alignment.bam", "qc/report.txt"]
output = ["results/analysis.json"]
shell = """
mkdir -p results
echo '{"status": "complete", "qc": "pass", "aligned": true}' > {output[0]}
"""
[rules.environment]
conda = "envs/analysis.yaml"
Key Concepts#
Per-Rule Environment Isolation#
Each rule can declare its own isolated software environment. oxo-flow supports five environment backends:
| Backend | Declaration | Use Case |
|---|---|---|
| Conda | conda = "envs/tool.yaml" |
Tool-specific environments with precise version pinning |
| Docker | docker = "image:tag" |
Container-based isolation with full reproducibility |
| Singularity | singularity = "docker://image:tag" |
HPC-compatible containers (no root required) |
| Pixi | pixi = "pixi.toml" |
Fast conda alternative with lockfile support |
| Venv | venv = "path/to/venv" |
Python virtual environments |
Why Per-Rule Environments?#
Bioinformatics tools often have conflicting dependencies:
- FastQC requires Java 11
- BWA-MEM2 requires a specific libdeflate version
- GATK requires Java 17 with specific Spark libraries
- VEP requires Perl with custom modules
Per-rule environment isolation eliminates dependency conflicts entirely. Each step runs in its own clean environment.
Environment Resolution Order#
When a rule specifies an environment, oxo-flow:
- Detects whether the backend is available on the system
- Creates the environment (if it doesn't exist)
- Activates the environment
- Runs the shell command inside the environment
- Deactivates the environment after completion
DAG with Mixed Environments#
graph TD
A["download_data<br/>(system)"] --> B["quality_check<br/>(conda)"]
A --> C["align_sequences<br/>(docker)"]
B --> D["analyze_results<br/>(conda)"]
C --> D
Running the Workflow#
Validate#
$ oxo-flow validate examples/gallery/05_conda_environments.oxoflow
✓ examples/gallery/05_conda_environments.oxoflow — 4 rules, 4 dependencies
Check Available Environments#
$ oxo-flow env list
Available environment backends:
✓ system — Default system shell
✓ conda — Conda package manager
✓ docker — Docker containers
✓ singularity — Singularity/Apptainer containers
✗ pixi — Not found
✗ venv — Not configured
What's Next?#
Move on to RNA-seq Quantification for a complete transcriptomics analysis pipeline.