System Architecture#

Overview#

oxo-call is a Rust workspace with three crates:

Crate	Purpose	Published
`oxo-call` (root)	End-user CLI	Yes (crates.io)
`crates/license-issuer`	Maintainer-only license signing tool	No
`crates/oxo-bench`	Benchmarking and evaluation suite	No

The architecture is designed around a layered system that makes command generation usable in production science and engineering workflows. The core idea: Describe your task in plain language — oxo-call fetches the tool's documentation, asks your LLM backend to generate the exact flags you need.

Layered Architecture#

┌─────────────────────────────────────────────────────────────────────────┐
│                          User Interface Layer                           │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐  ┌────────────┐ │
│  │  CLI Client  │  │  Chat Mode   │  │  Web API     │  │  SDK/API   │ │
│  │  (cli.rs)    │  │  (chat.rs)   │  │  (server.rs) │  │  (lib.rs)  │ │
│  └──────────────┘  └──────────────┘  └──────────────┘  └────────────┘ │
└─────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────────────┐
│                     Language Processing Layer                           │
│  ┌──────────────────────────────────────────────────────────────────┐  │
│  │  Universal Task Translator (Any Language → Optimized English)   │  │
│  │  • task_normalizer.rs  • task_complexity.rs  • sanitize.rs      │  │
│  └──────────────────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────────────┐
│                     AI Orchestration Layer                              │
│  ┌──────────────────────────────────────────────────────────────────┐  │
│  │  Runner Pipeline (runner/)                                      │  │
│  │  • core.rs (orchestration)  • batch.rs (parallel execution)     │  │
│  │  • retry.rs (error recovery)  • utils.rs (tool detection)       │  │
│  └──────────────────────────────────────────────────────────────────┘  │
│                                                                         │
│  ┌──────────────────────────────────────────────────────────────────┐  │
│  │  LLM Integration (llm/)                                         │  │
│  │  • provider.rs (multi-provider support)  • types.rs (traits)    │  │
│  │  • Copilot / OpenAI / Anthropic / Ollama / DeepSeek / etc.      │  │
│  └──────────────────────────────────────────────────────────────────┘  │
│                                                                         │
│  ┌──────────────────────────────────────────────────────────────────┐  │
│  │  Command Generation (generator.rs)                              │  │
│  │  • LLM-based  • Rule-based  • Composite strategies              │  │
│  └──────────────────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────────────┐
│                     Knowledge Enhancement Layer                        │
│  ┌──────────────────────────────────────────────────────────────────┐  │
│  │  Documentation System                                           │  │
│  │  • docs.rs (resolver + caching)  • doc_processor.rs (extraction)│  │
│  │  • doc_summarizer.rs (compression)  • index.rs (search index)   │  │
│  └──────────────────────────────────────────────────────────────────┘  │
│                                                                         │
│  ┌──────────────────────────────────────────────────────────────────┐  │
│  │  Knowledge Module (knowledge/)                                  │  │
│  │  • tool_knowledge.rs (6000+ bioconda tools, TF-IDF search)     │  │
│  │  • error_db.rs (error recovery)  • best_practices.rs            │  │
│  └──────────────────────────────────────────────────────────────────┘  │
│                                                                         │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐  ┌────────────┐ │
│  │  Skill       │  │  MCP Skill   │  │  Mini Skill  │  │  Context   │ │
│  │  Manager     │  │  Provider    │  │  Cache       │  │  Builder   │ │
│  │  (skill.rs)  │  │  (mcp.rs)    │  │              │  │ (context.rs│ │
│  └──────────────┘  └──────────────┘  └──────────────┘  └────────────┘ │
└─────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────────────┐
│                     Execution & Monitoring Layer                        │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐  ┌────────────┐ │
│  │  Workflow     │  │  DAG         │  │  History     │  │  Job       │ │
│  │  Templates    │  │  Engine      │  │  Tracker     │  │  Manager   │ │
│  │ (workflow.rs) │  │ (engine.rs)  │  │ (history.rs) │  │  (job.rs)  │ │
│  └──────────────┘  └──────────────┘  └──────────────┘  └────────────┘ │
│                                                                         │
│  ┌──────────────────────────────────────────────────────────────────┐  │
│  │  Workflow Graph Visualization (workflow_graph.rs)                │  │
│  └──────────────────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────────────┐
│                     Infrastructure Layer                                │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐  ┌────────────┐ │
│  │  LLM Backend │  │  Cache Layer │  │  Config      │  │  Remote    │ │
│  │  (Multiple   │  │  (cache.rs)  │  │  Management  │  │  Execution │ │
│  │   Providers) │  │              │  │ (config.rs)  │  │ (server.rs)│ │
│  └──────────────┘  └──────────────┘  └──────────────┘  └────────────┘ │
│                                                                         │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐                  │
│  │  License     │  │  Error       │  │  Copilot     │                  │
│  │  Verifier    │  │  Handling    │  │  Auth        │                  │
│  │ (license.rs) │  │ (error.rs)   │  │(copilot_auth)│                  │
│  └──────────────┘  └──────────────┘  └──────────────┘                  │
└─────────────────────────────────────────────────────────────────────────┘

Layer Descriptions#

User Interface Layer — Multiple entry points for interacting with oxo-call: - CLI Client (cli.rs, main.rs): Primary command-line interface with Clap-based argument parsing - Chat Mode (chat.rs): Interactive conversational AI for bioinformatics tool guidance - Web API (server.rs): Remote server management for SSH/HPC execution - SDK/API (lib.rs): Programmatic Rust API for downstream crates and integrations

Language Processing Layer — Normalizes and analyzes user input before LLM processing: - Task Normalizer (task_normalizer.rs): Translates natural-language tasks into optimized prompts. Supports multilingual input (Chinese, Japanese, Korean, Spanish, French, German, Portuguese, Russian) via rule-based fast path, with LLM fallback for complex cases - Task Complexity (task_complexity.rs): Estimates task complexity for adaptive prompt tier selection - Sanitizer (sanitize.rs): Anonymizes sensitive data before sending to LLM

AI Orchestration Layer — Core intelligence pipeline: - Runner Pipeline (runner/): Orchestrates the full docs→skill→LLM→execute flow - LLM Integration (llm/): Multi-provider abstraction (GitHub Copilot, OpenAI, Anthropic, Ollama, Kimi/Moonshot, GLM/ZhipuAI, and more via OpenAI-compatible API) - Command Generator (generator.rs): Extensible generation strategies via the CommandGenerator trait

Knowledge Enhancement Layer — Grounds LLM calls in real documentation and domain expertise: - Documentation System (docs.rs, doc_processor.rs, doc_summarizer.rs): Fetches, parses, and caches tool documentation - Tool Knowledge Base (knowledge/tool_knowledge.rs): Embedded catalog of 6000+ bioconda tools with TF-IDF keyword search, loaded from JSONL at compile time via include_str!. Provides offline tool discovery, category inference, and related-tool recommendations - Error Knowledge Base (knowledge/error_db.rs): Learning from failures for error recovery - Best Practices (knowledge/best_practices.rs): Domain-specific bioinformatics best practices - Skill System (skill.rs): Domain-specific knowledge injection (user → community → MCP → built-in) - MCP Provider (mcp.rs): Model Context Protocol for external skill servers - Context Builder (context.rs): Assembles enriched context for LLM prompts

Execution & Monitoring Layer — Runs commands and tracks results: - Workflow Engine (engine.rs): DAG-based parallel workflow execution with tokio - Workflow Templates (workflow.rs): Pre-built bioinformatics pipelines (RNA-seq, WGS, etc.) - History Tracker (history.rs): JSONL command history with full provenance (UUID, model, exit code) - Job Manager (job.rs): Background job tracking and management

Infrastructure Layer — Platform services and configuration: - LLM Backend: Multi-provider support with adaptive prompt tiers - Cache Layer (cache.rs): Semantic hash-based response caching to reduce API costs - Config Management (config.rs): TOML-based configuration with environment variable overrides - License Verifier (license.rs): Ed25519 offline license verification

Module Structure#

main.rs             — Command dispatcher & license gate
  ├─→ cli.rs        — Command definitions (Clap)
  ├─→ handlers.rs   — Extracted command-handler helpers (formatting, suggestions)
  ├─→ license.rs    — Ed25519 offline verification
  ├─→ runner/       — Core orchestration pipeline + provenance tracking
  │     ├─→ core.rs            — Main runner logic
  │     ├─→ batch.rs           — Batch/parallel execution
  │     ├─→ retry.rs           — Auto-retry with error recovery
  │     └─→ utils.rs           — Tool detection & spinner utilities
  ├─→ docs.rs                  — Documentation resolver
  ├─→ doc_processor.rs         — Structured doc extraction (flag catalog, examples)
  ├─→ doc_summarizer.rs        — Documentation compression
  ├─→ skill.rs                 — Skill loading system + depth validation
  │     └─→ mcp.rs             — MCP skill provider (JSON-RPC / HTTP)
  ├─→ llm/                     — LLM integration
  │     ├─→ provider.rs        — Multi-provider client
  │     └─→ types.rs           — LlmProvider trait & types
  ├─→ llm_workflow.rs          — Fast/Quality workflow executor
  ├─→ generator.rs             — CommandGenerator trait (extensible strategies)
  ├─→ cache.rs                 — LLM response cache with semantic hash
  ├─→ history.rs               — Command history tracker with provenance
  ├─→ chat.rs                  — Interactive AI chat mode
  ├─→ sanitize.rs              — Data anonymization for LLM contexts
  ├─→ server.rs                — Remote server management (SSH / HPC)
  ├─→ workflow.rs              — Templates & registry
  │     └─→ engine.rs          — DAG execution engine
  ├─→ workflow_graph.rs        — DAG visualization
  ├─→ task_normalizer.rs       — Task normalization
  ├─→ task_complexity.rs       — Complexity estimation
  ├─→ context.rs               — Context assembly
  ├─→ config.rs                — Configuration management
  ├─→ index.rs                 — Documentation index
  ├─→ job.rs                   — Job management
  ├─→ format.rs                — Output formatting
  ├─→ mini_skill_cache.rs      — Lightweight skill caching
  ├─→ copilot_auth.rs          — GitHub Copilot authentication
  └─→ error.rs                 — Error type definitions
lib.rs              — Programmatic API surface (re-exports all modules)

Execution Flow#

Command Generation (run/dry-run)#

1. License verification (Ed25519 signature check)
2. Parallel fetch:
   a. Skill loading (user → community → MCP → built-in)
   b. Documentation fetch (cache → --help → local files → remote URLs)
      [skipped if high-quality skill is already available]
3. Structured doc extraction (flag catalog + command examples, deterministic)
4. Supervisor decision (orchestrator/supervisor.rs):
   - has_skill=true  → Fast mode (single-call), regardless of task complexity
   - has_skill=false AND complexity ≥ 0.5 → Quality mode (multi-stage)
   - has_skill=false AND complexity < 0.5  → Fast mode
5. Prompt enrichment (context, user preferences, best practices, executor hints)
6a. Fast mode (single LLM call):
   - Doc-enriched prompt: flag catalog + doc-extracted examples + skill knowledge
   - One LLM call → ARGS: / EXPLANATION: response
6b. Quality mode (parallel LLM calls via tokio::join!):
   - Stage 1 (concurrent): task standardization (if vague/short/non-ASCII)
   - Stage 2 (concurrent): mini-skill generation from doc (tool-keyed cache)
   - Stage 3: command generation with mini-skill + structured doc
7. Response parsing (extract ARGS: and EXPLANATION: lines, retry on invalid)
8. Flag validation against doc catalog (post-processing)
9. Command execution (run) or display (dry-run)
10. History recording (JSONL with UUID, exit code, timestamp)

Workflow Execution#

1. Parse .oxo.toml workflow definition
2. Expand wildcards ({sample}, {params.*})
3. Build dependency DAG
4. Topological sort for execution order
5. Execute with tokio parallelism (JoinSet)
6. Skip steps with fresh outputs (output-freshness caching)

Design Principles#

License-first: Core commands require valid Ed25519 signature
Docs-first grounding: Documentation fetched before LLM call to prevent hallucination
Offline-first: Cached docs, no license server, optional remote fetching
Skill-augmented prompting: Domain knowledge injected without code changes
Native performance: Direct native compilation for all major platforms (Linux, macOS, Windows)
Strict LLM contract: ARGS:/EXPLANATION: format with retry on invalid response
Adaptive prompt compression: Three prompt tiers (Full/Medium/Compact) auto-selected by model size and context window, ensuring reliable output from 0.5B to 200B+ parameter models
Extensible generation: CommandGenerator trait enables multiple generation strategies (LLM, rule-based, composite) with chain-of-responsibility pattern
Response caching: Optional LLM cache reduces API costs for repeated tasks via semantic hash (tool + task + docs + skill + model)
Smart model classification: Cloud API models (GPT, Claude, Gemini) are always classified as "large" regardless of marketing name (e.g., "gpt-5-mini"); local models use parameter-size tags for classification
Skill-aware orchestration: When a skill is available, the orchestrator always uses Fast (single-call) mode — the skill already provides the grounding that Quality mode would generate
Parallel LLM pipeline: In Quality mode, independent stages (task standardization + mini-skill generation) run concurrently via tokio::join!
Tool-level mini-skill cache: Mini-skills are cached by (tool, doc_hash), not by task — a single cache entry serves all tasks for the same tool

Why This Matters In Practice#

Usability: users can stay in natural language longer and only inspect flags when it matters
Reliability: docs-first grounding and a strict response contract reduce free-form model drift
Scientific reproducibility: provenance-rich history preserves the command, model, and context that produced each result
Engineering extensibility: skills, MCP providers, and workflow export let teams expand capability without rewriting the core pipeline