run#
Generate parameters with LLM and execute the tool.
Synopsis#
Options#
| Option | Description |
|---|---|
-a, --ask |
Prompt for confirmation before executing |
-m, --model <MODEL> |
Override the LLM model for this invocation |
--no-cache |
Skip cached documentation and fetch fresh --help output |
--json |
Output result as JSON (useful for scripting and CI) |
--verify |
After execution, use LLM to validate results (output files, stderr, exit code) |
-V, --var KEY=VALUE |
Substitute {KEY} in the task description before the LLM call (repeatable) |
-i, --input-list <FILE> |
Read input items from a file; runs the generated command for each item |
--input-items <ITEMS> |
Comma-separated input items; runs the generated command for each item |
-j, --jobs <N> |
Maximum parallel jobs when using --input-list / --input-items (default: 1) |
-x, --stop-on-error |
Abort remaining items after the first failure |
--auto-retry |
On failure, ask the LLM to analyze stderr and re-run with a corrected command (up to 2 retries) |
--scenario <SCENARIO> |
Force a workflow scenario: basic, prompt, doc, skill, or full (auto-detected by default). Invalid values are rejected with a non-zero exit code |
--no-stream |
Disable streaming (SSE) output from the LLM; the full response is shown after generation completes |
-v, --verbose |
Show docs source, skill info, and LLM details (global) |
--license <PATH> |
Path to license file (global option) |
Prompt Tier#
The prompt compression tier is auto-detected from the model size and context
window. You can override it with the llm.prompt_tier config key or the
OXO_CALL_LLM_PROMPT_TIER environment variable:
# Force Compact tier for a small model
OXO_CALL_LLM_PROMPT_TIER=compact oxo-call run samtools "sort bam by coordinate"
# Force Medium tier
OXO_CALL_LLM_PROMPT_TIER=medium oxo-call run samtools "sort bam by coordinate"
Use --verbose to see which tier was selected for a given invocation.
Description#
The run command is the primary way to use oxo-call. It:
- Fetches the tool's documentation (from cache or
--helpoutput) - Extracts structured knowledge — flag catalog and command examples from the help text
- Loads any matching skill (built-in, community, or user-defined)
- Sends the doc-enriched prompt to the configured LLM (single call by default)
- Parses the response to extract command arguments
- (with
--input-list/--input-items) Executes the command template for each item - Records the execution in command history
- (with
--verify) Asks the LLM to review the outputs and report issues
Step 2 is the key innovation: the flag catalog prevents hallucinated flags and doc-extracted examples serve as few-shot demonstrations — enabling reliable command generation even with small models (≤3B) and no skill files.
Examples#
# Basic usage
oxo-call run samtools "sort input.bam by coordinate and output to sorted.bam"
# With confirmation prompt
oxo-call run --ask bcftools "call variants from aligned.bam using ref.fa"
# Use LLM to verify outputs after execution
oxo-call run --verify samtools "sort input.bam by coordinate, output sorted.bam"
# Auto-retry on failure (LLM analyzes stderr and corrects the command)
oxo-call run --auto-retry samtools "sort input.bam with 8 threads"
# Maximum reliability: auto-retry and verify
oxo-call run --auto-retry --verify samtools "sort and index"
# Override LLM model for a single invocation
oxo-call run --model gpt-4 samtools "index sorted.bam"
# Force fresh documentation (skip cache)
oxo-call run --no-cache samtools "sort by name"
# Get JSON output for scripting
oxo-call run --json samtools "flagstat input.bam"
# Variable substitution: replace {SAMPLE} in the task before the LLM call
oxo-call run --var SAMPLE=NA12878 samtools \
"sort {SAMPLE}.bam by coordinate and output to {SAMPLE}.sorted.bam"
# Batch mode: generate the command template once, run for each BAM in a list
oxo-call run samtools "sort {item} by coordinate, output {item}.sorted.bam" \
--input-list bam_files.txt --jobs 4
# Same with inline items
oxo-call run samtools "index {item}" --input-items s1.bam,s2.bam,s3.bam --jobs 2
# Preview all batch commands without executing (dry-run)
oxo-call dry-run samtools "flagstat {item}" --input-items s1.bam,s2.bam
# Combine vars and batch input
oxo-call run bwa "align {item} to {REF} with {THREADS} threads" \
--var REF=hg38.fa --var THREADS=8 \
--input-list samples.txt --jobs 4
# Disable streaming (useful for CI/batch scripts and benchmarks)
oxo-call run --no-stream samtools "sort input.bam by coordinate"
Variable substitution (--var)#
Use --var KEY=VALUE to inject values into the task description before
the LLM receives it. Multiple --var flags are allowed:
This substitutes {INPUT} → sample.bam in the task string before the LLM
call, so the LLM generates a concrete command rather than a template.
Batch / parallel mode (--input-list / --input-items / --jobs)#
When you provide a list of input items, the LLM is called once to generate
a command template (which may contain {item}). The template is then executed
for each item in the list.
| Placeholder | Expands to |
|---|---|
{item} / {line} / {} |
The current input item ({} is the rush-compatible form) |
{nr} |
1-based item number |
{basename} |
Filename without directory |
{dir} |
Directory portion of the item path (or .) |
{stem} |
Filename without last extension |
{ext} |
File extension without dot |
Input list file format: one item per line; blank lines and lines starting
with # are ignored. IO errors during reading are propagated immediately
(no silent truncation).
Parallelism: set -j N (or --jobs N) to run up to N items concurrently.
The default is 1 (sequential). Exit codes are collected after all items finish;
any failure causes the overall command to exit non-zero.
Stop-on-error (-x / --stop-on-error): abort after the first item
failure — useful in pipelines where continuing on error would produce incorrect
downstream results.
JSON output (--json) in batch mode returns an array of per-item results:
{
"tool": "samtools",
"task_template": "flagstat {item}",
"command_template": "samtools flagstat {item}",
"total": 3,
"failed": 0,
"success": true,
"results": [
{ "item": "s1.bam", "command": "samtools flagstat s1.bam", "exit_code": 0, "success": true },
{ "item": "s2.bam", "command": "samtools flagstat s2.bam", "exit_code": 0, "success": true },
{ "item": "s3.bam", "command": "samtools flagstat s3.bam", "exit_code": 0, "success": true }
]
}
LLM Result Verification (--verify)#
When --verify is set, oxo-call captures the tool's stderr and probes the declared output files. It then asks the LLM to analyse the run and returns a structured verdict:
────────────────────────────────────────────────────────────
LLM Verification: Issues detected
Alignment completed but output BAM is suspiciously small.
Issues:
• sorted.bam — 0 bytes (likely empty output)
• Stderr contains 'truncated file'
Suggestions:
→ Check that input.bam is not corrupted
→ Re-run with --no-cache to refresh the tool documentation
────────────────────────────────────────────────────────────
Verification is advisory — it never changes the process exit code. Use --json to get the verification block in machine-readable form.
Automatic Task Normalization#
oxo-call uses a two-step process for task normalization:
-
Quality mode selection: When there is no static skill file for the tool and documentation is available, oxo-call selects Quality mode (multi-stage pipeline). If
--scenariois set, the scenario's default mode takes precedence. -
Optional normalization within Quality mode: Within the Quality pipeline, an extra LLM call normalizes the task only if it is considered vague or ambiguous:
-
Input:
"sort bam" - Normalized:
"sort BAM file input.bam by coordinate using samtools sort with 8 threads, output to sorted.bam"
The normalized task is shown when it differs from the original and is used for the command generation prompt. This secondary normalization triggers when:
- The task is shorter than 10 characters
- The task contains vague keywords (e.g., "just", "simply", "basically")
- The task contains non-ASCII characters (e.g., Chinese, Japanese)
Risk Assessment#
oxo-call automatically assesses the risk level of generated commands before execution:
| Risk Level | Trigger | Behavior |
|---|---|---|
| Safe | Normal bioinformatics commands | Proceeds normally |
| Warning | Force flags (-f, --force), output redirection (>), same input/output file |
Shows warning, proceeds with normal --ask behavior |
| Dangerous | rm, sudo, dd, mkfs, chmod, chown commands |
Forces confirmation prompt regardless of --ask |
Example output for a dangerous command:
────────────────────────────────────────────────────────────
⚠️ RISK: Dangerous command detected
The generated command contains 'rm' which can delete files.
Proceed with caution.
────────────────────────────────────────────────────────────
? Execute this command? [y/N]
This safety feature ensures that potentially destructive operations always require explicit user confirmation.
Input File Validation#
Before execution, oxo-call validates that input files exist on disk. Files following these flags are checked:
-i,--input,-I,--in-1,-2,--in1,--in2(paired-end inputs)-x,-U(reference/index inputs)--ref,--reference,--genome,--genome-dir,--genomeDir--sjdbGTFfile,--gtf,--bed
If a specified input file doesn't exist, oxo-call fails early with a clear error message, preventing confusing downstream failures.
Auto-Retry (--auto-retry)#
When --auto-retry is enabled and the generated command fails, oxo-call automatically:
- Captures the stderr and exit code
- Sends the failure context to the LLM
- Generates a corrected command
- Executes the corrected command
Up to 2 retry attempts are made. This is especially useful for complex tools where the first LLM attempt may miss a required flag or get a parameter format wrong.
────────────────────────────────────────────────────────────
⟳ Analyzing failure and generating corrected command...
────────────────────────────────────────────────────────────
Auto-retry: (attempt 1/2)
Corrected command: samtools sort -@ 8 -o sorted.bam input.bam
Fix: Added missing -o flag for output file
────────────────────────────────────────────────────────────
────────────────────────────────────────────────────────────
✓ Auto-retry succeeded on attempt 1
────────────────────────────────────────────────────────────
All retry attempts are recorded in the command history with an [auto-retry #N] prefix on the task description.
JSON Output#
When --json is used for a single-item run, the output is a JSON object:
{
"tool": "samtools",
"task": "flagstat input.bam",
"effective_task": "flagstat input.bam",
"command": "samtools flagstat input.bam",
"args": ["flagstat", "input.bam"],
"explanation": "Generates alignment statistics for the BAM file",
"dry_run": false,
"exit_code": 0,
"success": true,
"skill": "samtools",
"model": "gpt-4o"
}
When --verify is also used, an additional verification block is appended.
Behavior#
- Documentation is fetched automatically on first use and cached
- If a matching skill exists, expert knowledge is injected into the prompt
- The LLM response must contain
ARGS:andEXPLANATION:lines - On execution failure, the exit code is recorded in history
- Use
dry-runto preview commands without executing - Use
--no-cacheto force a fresh--helpfetch when docs may be stale - Use
--modelto quickly test different models without changing config - With
--input-list/--input-items, the LLM is called once; each item execution usessh -cwith{item}(and other placeholders) substituted