Skip to content

Workflow Format#

The .oxoflow file format is oxo-flow's TOML-based workflow definition language. This page is the complete specification.


File Extension#

Workflow files use the .oxoflow extension: my-pipeline.oxoflow.


Top-level Structure#

[workflow]          # Required: metadata
[config]            # Optional: user variables
[defaults]          # Optional: rule defaults
[report]            # Optional: report configuration
[[rules]]           # Required: one or more rules

[workflow] — Metadata#

[workflow]
name = "my-pipeline"
version = "1.0.0"
description = "A short description"
author = "Your Name"
Field Type Required Default Description
name String Yes Pipeline name
version String No "0.1.0" Semantic version
description String No Human-readable description
author String No Author name or email

[config] — Configuration Variables#

User-defined key-value pairs accessible in rules as {config.<key>}:

[config]
reference = "/data/ref/hg38.fa"
samples_dir = "raw_data"
results_dir = "results"
min_quality = "30"

Values are TOML strings, integers, booleans, or arrays. String interpolation in rules uses {config.key} syntax.


[defaults] — Default Settings#

Applied to all rules unless explicitly overridden:

[defaults]
threads = 4
memory = "8G"
environment = { conda = "envs/base.yaml" }
Field Type Description
threads Integer Default CPU thread count
memory String Default memory allocation
environment Table Default environment specification

[report] — Report Configuration#

[report]
template = "clinical"
format = ["html", "json"]
sections = ["summary", "variants", "quality"]
Field Type Description
template String Report template name
format Array Output formats to generate
sections Array Report sections to include

[[rules]] — Rule Definitions#

Each [[rules]] entry defines a pipeline step. The double brackets indicate a TOML array of tables.

Basic example#

[[rules]]
name = "align"
input = ["{sample}_R1.fastq.gz", "{sample}_R2.fastq.gz"]
output = ["aligned/{sample}.bam"]
threads = 16
memory = "32G"
environment = { conda = "envs/alignment.yaml" }
shell = "bwa mem -t {threads} {config.reference} {input} | samtools sort -o {output}"

All fields#

Field Type Required Description
name String Yes Unique rule identifier
input Array of strings Yes Input file paths
output Array of strings Yes Output file paths
shell String Yes Shell command to execute
threads Integer No CPU threads (overrides defaults)
memory String No Memory allocation (overrides defaults)
environment Table No Environment specification

Environment specification#

# Conda
environment = { conda = "envs/tools.yaml" }

# Pixi
environment = { pixi = "envs/pixi.toml" }

# Docker
environment = { docker = "biocontainers/bwa:0.7.17" }

# Singularity
environment = { singularity = "docker://biocontainers/bwa:0.7.17" }

# Python venv
environment = { venv = "envs/requirements.txt" }

Resources (extended)#

For rules needing GPU, disk, or time limits, use the resources sub-table:

[[rules]]
name = "gpu_task"
input = ["data.h5"]
output = ["model.pt"]
threads = 8
memory = "64G"
shell = "python train.py"

[rules.resources]
gpu = 1
disk = "200G"
time_limit = "48h"
Field Type Example Description
gpu Integer 1 Number of GPUs
disk String "200G" Local disk space
time_limit String "48h" Wall-time limit

Wildcards#

Pattern syntax#

Use {name} in file paths for dynamic expansion:

input = ["{sample}_R1.fastq.gz"]
output = ["aligned/{sample}.bam"]

Built-in placeholders#

Placeholder Expands to
{input} Space-separated input files
{output} Space-separated output files
{threads} Thread count for this rule
{config.*} Value from [config] section

Custom wildcards#

Any {name} pattern not matching a built-in placeholder is treated as a wildcard. oxo-flow expands wildcards by matching against available files or explicit sample lists.


Dependency Resolution#

Dependencies are inferred automatically: if rule B lists a file in its input that appears in rule A's output, then B depends on A.

[[rules]]
name = "step1"
output = ["intermediate.txt"]
# ...

[[rules]]
name = "step2"
input = ["intermediate.txt"]   # depends on step1
# ...

No explicit dependency declaration is needed.


Multi-line Strings#

Use triple quotes for multi-line shell commands:

shell = """
mkdir -p results
bwa mem -t {threads} ref.fa {input} | \
  samtools sort -@ {threads} -o {output}
"""

Complete Example#

[workflow]
name = "ngs-pipeline"
version = "2.0.0"
description = "Complete NGS analysis pipeline"
author = "Genomics Core <core@example.org>"

[config]
reference = "/data/ref/hg38.fa"
known_sites = "/data/ref/known_sites.vcf.gz"
results = "results"

[defaults]
threads = 4
memory = "8G"
environment = { conda = "envs/base.yaml" }

[report]
format = ["html"]

[[rules]]
name = "fastqc"
input = ["raw/{sample}_R1.fastq.gz", "raw/{sample}_R2.fastq.gz"]
output = ["{config.results}/qc/{sample}_R1_fastqc.html"]
shell = "fastqc {input} -o {config.results}/qc/ -t {threads}"

[[rules]]
name = "trim"
input = ["raw/{sample}_R1.fastq.gz", "raw/{sample}_R2.fastq.gz"]
output = ["{config.results}/trimmed/{sample}_R1.fastq.gz"]
environment = { docker = "biocontainers/fastp:0.23.4" }
shell = "fastp --in1 {input[0]} --in2 {input[1]} --out1 {output[0]} --thread {threads}"

[[rules]]
name = "align"
input = ["{config.results}/trimmed/{sample}_R1.fastq.gz"]
output = ["{config.results}/aligned/{sample}.bam"]
threads = 16
memory = "32G"
environment = { conda = "envs/alignment.yaml" }
shell = "bwa mem -t {threads} {config.reference} {input} | samtools sort -o {output}"

See Also#