02 — File Pipeline#
A linear three-step pipeline that generates data, transforms it, and produces a summary. This demonstrates how oxo-flow resolves dependencies automatically from file paths.
Concepts Covered
- Multi-rule workflows with automatic dependency resolution
- Input/output chaining (one rule's output is the next rule's input)
- Config variables accessible in shell commands via
{config.*} - Multi-line shell commands
Workflow Definition#
# examples/gallery/02_file_pipeline.oxoflow
[workflow]
name = "file-pipeline"
version = "1.0.0"
description = "Linear file processing pipeline with three sequential steps"
author = "oxo-flow examples"
[config]
greeting = "Welcome to oxo-flow"
[[rules]]
name = "generate_data"
output = ["data/raw.csv"]
shell = """
mkdir -p data
echo 'id,name,value' > {output[0]}
for i in $(seq 1 100); do echo "$i,item_$i,$((RANDOM % 1000))"; done >> {output[0]}
"""
[[rules]]
name = "transform"
input = ["data/raw.csv"]
output = ["data/filtered.csv"]
shell = """
head -1 {input[0]} > {output[0]}
awk -F',' 'NR>1 && $3 > 500' {input[0]} >> {output[0]}
"""
[[rules]]
name = "summarize"
input = ["data/filtered.csv"]
output = ["results/summary.txt"]
shell = """
mkdir -p results
total=$(tail -n +2 {input[0]} | wc -l)
echo '{config.greeting}' > {output[0]}
echo "Filtered records: $total" >> {output[0]}
echo "Generated by oxo-flow file-pipeline" >> {output[0]}
"""
Key Concepts#
Automatic Dependency Resolution#
oxo-flow builds a DAG by matching rule outputs to rule inputs:
generate_dataproducesdata/raw.csvtransformrequiresdata/raw.csv→ depends ongenerate_datasummarizerequiresdata/filtered.csv→ depends ontransform
You never need to declare dependencies manually — they are inferred from file paths.
Config Variables#
The [config] section defines key-value pairs that are accessible in shell commands via {config.key}:
In the shell: echo '{config.greeting}' expands to echo 'Welcome to oxo-flow'.
Multi-Line Shell Commands#
Use triple-quoted strings ("""...""") for multi-line shell commands. Each line is executed as part of a single shell invocation.
Running the Workflow#
Validate#
$ oxo-flow validate examples/gallery/02_file_pipeline.oxoflow
✓ examples/gallery/02_file_pipeline.oxoflow — 3 rules, 2 dependencies
Dry-Run#
$ oxo-flow dry-run examples/gallery/02_file_pipeline.oxoflow
oxo-flow 0.1.0 — Bioinformatics Pipeline Engine
Dry-run: 3 rules would execute:
1. generate_data [threads=1, env=system]
2. transform [threads=1, env=system]
3. summarize [threads=1, env=system]
DAG Visualization#
graph LR
A[generate_data] --> B[transform]
B --> C[summarize]
What's Next?#
Move on to Parallel Samples to learn how wildcards enable multi-sample parallel processing.