workflow#
Native workflow engine with Snakemake and Nextflow compatibility export.
Synopsis#
oxo-call workflow run <FILE|TEMPLATE> [--verify]
oxo-call workflow dry-run <FILE|TEMPLATE>
oxo-call workflow verify <FILE|TEMPLATE> # alias: check
oxo-call workflow fmt <FILE|TEMPLATE> [--stdout] # alias: format
oxo-call workflow vis <FILE|TEMPLATE> # alias: dag
oxo-call workflow export <FILE|TEMPLATE> --to <snakemake|nextflow> [-o <FILE>]
oxo-call workflow generate <TASK> [--engine native|snakemake|nextflow] [-o <FILE>]
oxo-call workflow infer <TASK> --data <DIR> [--engine native|snakemake|nextflow] [-o <FILE>] [--run]
oxo-call workflow list
oxo-call workflow show <TEMPLATE> [--engine native|snakemake|nextflow]
Description#
The workflow command (alias: wf) provides a lightweight native Rust workflow engine that executes .oxo.toml pipeline files directly — no Snakemake, Nextflow, or Conda required. Snakemake and Nextflow are supported as compatibility export targets.
Native Engine Features#
- DAG-based execution — steps run in dependency order with maximum parallelism via
tokio::task::JoinSet - Wildcard expansion —
{sample}automatically expands per sample;{params.KEY}for shared parameters - Output caching — steps whose outputs are newer than their inputs are automatically skipped
- Parallel execution — independent steps within the same DAG phase run concurrently
- Gather steps — steps with
gather = truerun once after all wildcard instances of their dependency steps complete (e.g., MultiQC aggregation) - Progress display — step counter
[N/M], elapsed time, and DAG phase visualization - Cycle detection — dependency cycles are detected and reported as errors
- Verification — validate your workflow file before running with
workflow verify - Auto-formatting — canonical style with
workflow fmt - DAG visualization — text phase diagram with
workflow vis
Subcommands#
workflow run#
Execute a workflow file or built-in template:
oxo-call workflow run pipeline.oxo.toml
oxo-call workflow run rnaseq # Run a built-in template directly
# After all steps complete, use LLM to verify output files and report issues
oxo-call workflow run --verify pipeline.oxo.toml
--verify collects the expected output files declared by every step, probes them for existence and size, then asks the LLM for a structured verdict (summary, issues, suggestions). It is advisory — it never changes the exit code.
workflow dry-run#
Preview the expanded task graph without executing any commands:
The dry-run shows the DAG phase diagram, step-by-step expansion with wildcard bindings, commands, dependencies, inputs, and outputs.
workflow verify#
Validate a workflow file or built-in template for correctness before running. Checks for:
- Parse errors (malformed TOML)
- Empty step names or commands
- Duplicate step names
- References to unknown
depends_onsteps {params.key}references to undefined parameters{wildcard}references to undefined wildcards- Forward-ordering violations (depending on a step defined later in the file)
- DAG expansion failures (cycles, unresolvable dependencies)
oxo-call workflow verify pipeline.oxo.toml # Exit 0 if valid, 1 if errors
oxo-call workflow check pipeline.oxo.toml # alias
Example output:
workflow fmt#
Auto-format a .oxo.toml workflow file to canonical aligned style:
oxo-call workflow fmt pipeline.oxo.toml # Edit in-place
oxo-call workflow fmt pipeline.oxo.toml --stdout # Print to stdout
oxo-call workflow format pipeline.oxo.toml # alias
The formatter normalizes key alignment, sorts wildcards and params alphabetically, and ensures consistent quoting. Parse the file first with verify if you are unsure whether it is valid TOML.
workflow vis#
Visualize the workflow as a DAG phase diagram. Shows parallel execution groups, step dependency table, and wildcard expansion summary:
oxo-call workflow vis pipeline.oxo.toml # From file
oxo-call workflow vis rnaseq # Built-in template
oxo-call workflow dag rnaseq # alias
Example output:
◆ Workflow: rnaseq (6 steps, 13 tasks, 4 phases)
RNA-seq bulk transcript quantification pipeline
Wildcards:
sample = [s1, s2, s3]
────────────────────────────────────────────────────────────────
Pipeline DAG (4 phases, 13 tasks)
Phase 1 fastp[sample=s1] │ fastp[sample=s2] │ fastp[sample=s3]
↓
Phase 2 multiqc [gather] │ star[sample=s1] │ … +2 more
↓
Phase 3 samtools_index[sample=s1] │ … +2 more
↓
Phase 4 featurecounts[sample=s1] │ … +2 more
────────────────────────────────────────────────────────────────
Step details:
Step Gather Tasks Depends on
────────────────────────────────────────────────────────
fastp 3 (none)
multiqc yes 1 fastp
star 3 fastp
samtools_index 3 star
featurecounts 3 samtools_index
workflow export#
Export a native .oxo.toml workflow to Snakemake or Nextflow format:
oxo-call workflow export rnaseq --to snakemake -o Snakefile
oxo-call workflow export wgs --to nextflow -o main.nf
workflow generate#
Generate a workflow from a natural-language description using the configured LLM:
oxo-call workflow generate "RNA-seq analysis of mouse samples"
oxo-call workflow generate "Variant calling from WGS data" --engine snakemake -o Snakefile
workflow infer#
Detect data files in a directory and generate an appropriate workflow:
oxo-call workflow infer "RNA-seq QC and alignment" --data ./fastq_data/
oxo-call workflow infer "16S analysis" --data ./amplicon_reads/ --run # Generate and run
workflow list#
List all available built-in templates:
workflow show#
Display a built-in template in different formats:
oxo-call workflow show rnaseq
oxo-call workflow show wgs --engine snakemake
oxo-call workflow show metagenomics --engine nextflow
Built-in Templates#
| Template | Domain | Pipeline Steps |
|---|---|---|
rnaseq |
Transcriptomics | fastp → MultiQC + STAR → samtools index → featureCounts |
wgs |
Genomics | fastp → MultiQC + BWA-MEM2 → MarkDuplicates → BQSR → HaplotypeCaller |
atacseq |
Epigenomics | fastp → MultiQC + Bowtie2 → Picard dedup → blacklist filter → MACS3 |
chipseq |
Epigenomics | fastp → MultiQC + Bowtie2 → MarkDup → filter → MACS3 + bigWig |
metagenomics |
Metagenomics | fastp → MultiQC + host removal → Kraken2 → Bracken |
amplicon16s |
Metagenomics | cutadapt → fastp → MultiQC + DADA2 (gather) |
scrnaseq |
Single-cell | fastp → MultiQC + STARsolo (10x v3) → samtools index + cell QC |
longreads |
Genomics | NanoQ → NanoStat → MultiQC + Flye (parallel) → Medaka → QUAST |
methylseq |
Epigenomics | Trim Galore → MultiQC + Bismark → dedup → sort → methylation extract → bedGraph |
MultiQC Aggregation#
All templates follow a consistent pattern: MultiQC runs as an upstream QC aggregation step right after the QC/preprocessing step (e.g., fastp, trim_galore, or nanostat). It is configured as a gather step that depends only on the QC step, enabling it to run in parallel with downstream analysis:
- MultiQC only needs QC results (fastp JSON/HTML reports) to generate its report
- MultiQC scans the QC output directory (e.g., qc/ or trimmed/)
- A single comprehensive QC report is generated across all samples without blocking downstream steps
.oxo.toml Format#
[workflow]
name = "my-pipeline"
description = "Pipeline description"
version = "1.0"
# Wildcards: {sample} expands for each value
[wildcards]
sample = ["sample1", "sample2", "sample3"]
# Parameters: accessible as {params.KEY} in step commands
[params]
threads = "8"
reference = "/path/to/genome.fa"
# Steps: each [[step]] runs for every wildcard combination
[[step]]
name = "qc"
cmd = "fastp --in1 data/{sample}_R1.fq.gz --in2 data/{sample}_R2.fq.gz --out1 trimmed/{sample}_R1.fq.gz --out2 trimmed/{sample}_R2.fq.gz --json qc/{sample}_fastp.json"
inputs = ["data/{sample}_R1.fq.gz", "data/{sample}_R2.fq.gz"]
outputs = ["trimmed/{sample}_R1.fq.gz", "trimmed/{sample}_R2.fq.gz", "qc/{sample}_fastp.json"]
# MultiQC runs right after QC, in parallel with alignment
[[step]]
name = "multiqc"
gather = true
depends_on = ["qc"]
cmd = "multiqc qc/ -o results/multiqc/ --force"
outputs = ["results/multiqc/multiqc_report.html"]
[[step]]
name = "align"
depends_on = ["qc"]
cmd = "bwa mem -t {params.threads} {params.reference} trimmed/{sample}_R1.fq.gz trimmed/{sample}_R2.fq.gz | samtools sort -o aligned/{sample}.bam"
inputs = ["trimmed/{sample}_R1.fq.gz", "trimmed/{sample}_R2.fq.gz"]
outputs = ["aligned/{sample}.bam"]
[[step]]
name = "index"
depends_on = ["align"]
cmd = "samtools index aligned/{sample}.bam"
inputs = ["aligned/{sample}.bam"]
outputs = ["aligned/{sample}.bam.bai"]
Step Fields#
| Field | Type | Description |
|---|---|---|
name |
string | Unique step identifier (used in depends_on) |
cmd |
string | Shell command with {wildcard} and {params.KEY} substitution |
depends_on |
list | Names of steps that must complete first |
inputs |
list | Input file patterns for freshness checking |
outputs |
list | Output file patterns for freshness checking and skip-if-fresh logic |
gather |
bool | When true, runs once after ALL wildcard instances of dependency steps complete |
Progress Display#
During execution, the engine shows:
◆ oxo workflow — 13 task(s)
────────────────────────────────────────────────────────────────
Pipeline DAG (4 phases, 13 tasks)
Phase 1 fastp[sample=s1] │ fastp[sample=s2] │ fastp[sample=s3]
↓
Phase 2 multiqc [gather] │ star[sample=s1] │ … +2 more
↓
Phase 3 samtools_index[sample=s1] │ … +2 more
↓
Phase 4 featurecounts[sample=s1] │ … +2 more
────────────────────────────────────────────────────────────────
▶ fastp[sample=s1]
✓ [1/13] fastp[sample=s1]
▶ fastp[sample=s2]
✓ [2/13] fastp[sample=s2]
...
✓ [13/13] featurecounts[sample=s3]
────────────────────────────────────────────────────────────────
✓ Workflow complete — 13 task(s) run, 0 up to date (2m 34s)