Skip to content

Create a Workflow#

This guide covers the complete process of authoring an .oxoflow workflow file, from project scaffolding to production-ready pipelines.


Scaffold a new project#

oxo-flow init my-pipeline
cd my-pipeline

This generates a project directory with a starter .oxoflow file, envs/ and scripts/ directories, and a .gitignore.


Workflow file structure#

Every .oxoflow file is TOML with four top-level sections:

[workflow]      # Required: name, version, metadata
[config]        # Optional: user-defined variables
[defaults]      # Optional: default settings for all rules
[[rules]]       # Required: one or more pipeline steps

The [workflow] section#

[workflow]
name = "my-pipeline"
version = "1.0.0"
description = "Short description of what this pipeline does"
author = "Your Name <you@example.com>"
Field Required Description
name Yes Pipeline name (used in reports and logs)
version No Semantic version (defaults to "0.1.0")
description No Human-readable description
author No Author name or organization

The [config] section#

Define variables that are referenced throughout the workflow:

[config]
reference = "/data/ref/hg38.fa"
samples_dir = "raw_data"
results_dir = "results"
min_quality = "30"

Reference them in rule fields with {config.variable_name}:

shell = "bwa mem {config.reference} {input} > {output}"

The [defaults] section#

Set default values applied to all rules unless overridden:

[defaults]
threads = 4
memory = "8G"
environment = { conda = "envs/base.yaml" }

Defining rules#

Each [[rules]] entry defines one step in the pipeline:

[[rules]]
name = "step_name"
input = ["path/to/input1.txt", "path/to/input2.txt"]
output = ["path/to/output.txt"]
threads = 8
memory = "16G"
environment = { conda = "envs/tools.yaml" }
shell = "my-tool --threads {threads} {input} > {output}"

Rule fields#

Field Required Type Description
name Yes String Unique rule identifier
input Yes Array Input file paths (may contain wildcards)
output Yes Array Output file paths (may contain wildcards)
shell Yes String Shell command to execute
threads No Integer CPU threads (overrides [defaults])
memory No String Memory requirement (e.g., "16G")
environment No Table Environment specification
resources No Table Additional resources (GPU, disk, time_limit)

Wildcards#

Use {name} syntax for dynamic file patterns:

[[rules]]
name = "align"
input = ["{sample}_R1.fastq.gz", "{sample}_R2.fastq.gz"]
output = ["aligned/{sample}.bam"]
shell = "bwa mem ref.fa {input} | samtools sort -o {output}"

oxo-flow expands {sample} from the available input files or from explicit configuration.

Built-in placeholders#

Placeholder Expands to
{input} Space-separated list of all input files
{output} Space-separated list of all output files
{threads} Thread count for this rule
{config.*} Value from the [config] section

Dependencies#

oxo-flow infers dependencies automatically: if rule B's input matches rule A's output, B depends on A. You do not need to declare dependencies explicitly.

[[rules]]
name = "step1"
output = ["intermediate.txt"]
# ...

[[rules]]
name = "step2"
input = ["intermediate.txt"]   # ← automatically depends on step1
# ...

Multi-line shell commands#

Use triple-quoted strings for complex commands:

shell = """
mkdir -p results
bwa mem -t {threads} {config.reference} {input} | \
  samtools sort -@ {threads} -o {output}
samtools index {output}
"""

Best practices#

Keep rules focused

Each rule should do one logical step. This makes the DAG clearer and allows better parallelism.

Use config variables

Put paths and parameters in [config] so they can be changed without editing rule definitions.

Lock environment versions

Pin tool versions in your conda YAML or Docker tags to ensure reproducibility.

Validate early

Run oxo-flow validate before executing to catch syntax errors and circular dependencies.


Complete example#

See the Workflow Format reference for the full specification.