How-to: Switch LLM Provider#
This guide shows you how to change the LLM backend that oxo-call uses for command generation. All providers use the same prompt and produce compatible output.
Supported Providers#
| Provider | Models | Requires token | Best for |
|---|---|---|---|
github-copilot |
auto-selected | GitHub PAT with Copilot | GitHub users, no separate account |
openai |
gpt-4.1 (1M), gpt-4o, gpt-4o-mini, ... | OpenAI API key | Best accuracy, production use |
anthropic |
claude-sonnet-4-6 (1M), claude-3-5-sonnet, ... | Anthropic API key | Alternative frontier model |
deepseek |
deepseek-v3, deepseek-r1 (128K) | DeepSeek API key | Cost-effective, strong reasoning |
minimax |
minimax-m2.7 (1M) | MiniMax API key | Large context, Chinese-optimized |
ollama |
llama3.2, mistral, ... | None (local) | Air-gapped, private data, free |
openai (custom base) |
moonshot-v1-, kimi- | Moonshot API key | Kimi / Moonshot AI |
openai (custom base) |
glm-4, glm-5.1, ... | ZhipuAI API key | GLM / ZhipuAI |
GitHub Copilot (Default)#
GitHub Copilot is the default provider. The recommended way to authenticate is using the interactive OAuth login:
This will:
1. Open a browser window for GitHub authentication
2. Complete OAuth device flow automatically
3. Store the token securely in your config
4. Prompts you to choose a model (default: gpt-5-mini, lightweight free tier ⭐)
5. Saves all GA models to llm.models for quick switching
Supported GitHub Copilot Models#
The following GA models are available as of the latest GitHub Copilot docs.
Use oxo-call config model use <id> to switch between them at any time.
| Model ID | Display Name | Provider |
|---|---|---|
gpt-5-mini ⭐ |
GPT-5 Mini | OpenAI |
gpt-4.1 |
GPT-4.1 | OpenAI |
gpt-5.2 |
GPT-5.2 | OpenAI |
gpt-5.2-codex |
GPT-5.2-Codex | OpenAI |
gpt-5.3-codex |
GPT-5.3-Codex | OpenAI |
gpt-5.4 |
GPT-5.4 | OpenAI |
gpt-5.4-mini |
GPT-5.4 Mini | OpenAI |
claude-haiku-4.5 |
Claude Haiku 4.5 | Anthropic |
claude-sonnet-4 |
Claude Sonnet 4 | Anthropic |
claude-sonnet-4.5 |
Claude Sonnet 4.5 | Anthropic |
claude-sonnet-4.6 |
Claude Sonnet 4.6 | Anthropic |
claude-opus-4.5 |
Claude Opus 4.5 | Anthropic |
claude-opus-4.6 |
Claude Opus 4.6 | Anthropic |
gemini-2.5-pro |
Gemini 2.5 Pro |
⭐ = available on all Copilot plans including free tier. For the full authoritative list, see the GitHub Copilot supported models documentation.
Manual Token Setup#
Alternatively, you can set a GitHub token manually:
# Set via config
oxo-call config set llm.provider github-copilot
oxo-call config set llm.api_token ghu_xxxxxxxxxxxxxxxxxxxx
# Verify
oxo-call config verify
Important: For GitHub Copilot, you must use a GitHub App token (starts with ghu_), not a Personal Access Token (starts with ghp_). The oxo-call config login command handles this automatically.
Environment Variables (Not Recommended for Copilot)#
For other providers, environment variables work well, but for GitHub Copilot, environment variables like GITHUB_TOKEN often contain Personal Access Tokens that don't work with Copilot's token exchange endpoint. Therefore, GitHub Copilot ignores these environment variables and only uses the stored config token from oxo-call config login:
# These are IGNORED for github-copilot provider:
# export GITHUB_TOKEN=ghp_xxxx # Won't work
# export OXO_CALL_LLM_API_TOKEN=ghp_xxxx # Won't work
# Instead, use:
oxo-call config login
Get a Token Manually#
If you need to obtain a token manually:
- Use
oxo-call config login(recommended) - Or use the GitHub CLI:
gh auth token(returns aghu_token if you have Copilot access)
OpenAI#
oxo-call config set llm.provider openai
oxo-call config set llm.api_token sk-xxxxxxxxxxxxxxxxxxxxxxxx
# Optional: specify a model
oxo-call config set llm.model gpt-4.1 # 1M context (default, April 2025)
oxo-call config set llm.model gpt-4o-mini # faster, cheaper
oxo-call config set llm.model gpt-4o # 128K context
# Verify
oxo-call config verify
Environment variable fallbacks:
export OXO_CALL_LLM_PROVIDER=openai
export OXO_CALL_LLM_API_TOKEN=sk-xxxx
# or the standard OpenAI variable:
export OPENAI_API_KEY=sk-xxxx
Azure OpenAI#
oxo-call config set llm.provider openai
oxo-call config set llm.api_base https://your-resource.openai.azure.com/openai/deployments/your-deployment
oxo-call config set llm.api_token your-azure-key
oxo-call config set llm.model gpt-4.1
Anthropic#
oxo-call config set llm.provider anthropic
oxo-call config set llm.api_token sk-ant-xxxxxxxxxxxxxxxxxxxxxxxx
# Optional: specify a model
oxo-call config set llm.model claude-sonnet-4-6-20250514 # 1M context (default)
oxo-call config set llm.model claude-3-5-sonnet-20241022 # 200K context
# Verify
oxo-call config verify
Environment variable fallback:
DeepSeek#
DeepSeek provides cost-effective models with strong reasoning capabilities. DeepSeek V3 and R1 support 128K context.
oxo-call config set llm.provider deepseek
oxo-call config set llm.api_token sk-xxxxxxxxxxxxxxxxxxxxxxxx
# Optional: specify a model
oxo-call config set llm.model deepseek-chat # General purpose (default)
oxo-call config set llm.model deepseek-reasoner # Enhanced reasoning
# Verify
oxo-call config verify
| Model | Context | Best for |
|---|---|---|
deepseek-chat |
128K | General purpose, cost-effective |
deepseek-reasoner |
128K | Complex reasoning tasks |
MiniMax#
MiniMax provides models optimized for Chinese language with large context windows (up to 1M tokens).
oxo-call config set llm.provider minimax
oxo-call config set llm.api_token xxxxxxxxxxxxxxxxxxxxxxxx
# Optional: specify a model
oxo-call config set llm.model MiniMax-Text-01 # General purpose (default)
oxo-call config set llm.model abab6.5s-chat # Fast variant
# Verify
oxo-call config verify
| Model | Context | Best for |
|---|---|---|
MiniMax-Text-01 |
1M | General purpose, large context |
abab6.5s-chat |
245K | Fast, cost-effective |
Kimi / Moonshot AI#
Moonshot AI provides the Kimi model family via an OpenAI-compatible API. Kimi models feature large context windows (up to 256K with K2.5) and strong multilingual capabilities.
# Option 1: Use dedicated provider (recommended)
oxo-call config set llm.provider moonshot
oxo-call config set llm.api_token sk-xxxxxxxxxxxxxxxxxxxxxxxx
# Option 2: Use openai provider with custom base
oxo-call config set llm.provider openai
oxo-call config set llm.api_base https://api.moonshot.cn/v1
oxo-call config set llm.api_token sk-xxxxxxxxxxxxxxxxxxxxxxxx
# Choose a context-window variant
oxo-call config set llm.model moonshot-v1-8k # 8K context (faster)
oxo-call config set llm.model moonshot-v1-32k # 32K context
oxo-call config set llm.model moonshot-v1-128k # 128K context (recommended)
oxo-call config set llm.model kimi-k2.5 # 256K context (latest)
# Verify
oxo-call config verify
| Model | Context | Best for |
|---|---|---|
moonshot-v1-8k |
8K | Fast, simple tasks |
moonshot-v1-32k |
32K | Most tasks |
moonshot-v1-128k |
128K | Long documentation, complex tasks |
kimi-k2.5 |
256K | Latest, largest context |
GLM / ZhipuAI#
ZhipuAI provides the GLM series via an OpenAI-compatible API. GLM-5 supports up to 200K context, and GLM-5.1 supports 202K context.
# Option 1: Use dedicated provider (recommended)
oxo-call config set llm.provider zhipu
oxo-call config set llm.api_token xxxxxxxxxxxxxxxxxxxxxxxx
# Option 2: Use openai provider with custom base
oxo-call config set llm.provider openai
oxo-call config set llm.api_base https://open.bigmodel.cn/api/paas/v4
oxo-call config set llm.api_token xxxxxxxxxxxxxxxxxxxxxxxx
# Choose a model
oxo-call config set llm.model glm-4 # Standard, 128K context
oxo-call config set llm.model glm-4-flash # Faster/cheaper variant
oxo-call config set llm.model glm-4-long # 1M token context (long documents)
oxo-call config set llm.model glm-5 # 200K context
oxo-call config set llm.model glm-5.1 # 202K context
# Verify
oxo-call config verify
| Model | Context | Best for |
|---|---|---|
glm-4 |
128K | General use |
glm-4-flash |
128K | Cost-sensitive workflows |
glm-4-long |
1M | Very long documentation or multi-tool sessions |
glm-5 |
200K | Latest generation |
glm-5.1 |
202K | Autonomous execution, large context |
Ollama (Local, No Token)#
Ollama runs models locally — no API key or internet required. Ideal for sensitive data or air-gapped environments.
Install and start Ollama#
# Install Ollama (Linux/macOS)
curl -fsSL https://ollama.ai/install.sh | sh
# Pull a model
ollama pull llama3.2
# Start the server (usually auto-started)
ollama serve
Configure oxo-call#
oxo-call config set llm.provider ollama
oxo-call config set llm.model llama3.2
# Custom Ollama server URL (if not localhost)
oxo-call config set llm.api_base http://my-ollama-server:11434/v1
Note: If you were previously using OpenAI or another provider, the old API token is safely ignored when using Ollama. oxo-call will remind you of this when switching. To remove it:
oxo-call config set llm.api_token ""
Recommended models for bioinformatics#
| Model | Size | Tier | Notes |
|---|---|---|---|
qwen2.5-coder:0.5b |
0.5B | Compact | Fastest, good for simple tasks (83–100% accuracy) |
deepseek-coder:1.3b |
1.3B | Compact | Good balance of speed and accuracy |
llama3.2 |
3B | Compact | Best accuracy among ≤3B models (100%) |
starcoder2:3b |
3B | Compact | Good for code tasks (91%) |
llama3.1:8b |
8B | Medium | Better accuracy for complex tasks |
mistral |
7B | Medium | Good instruction following |
codellama:13b |
13B | Full | Best for complex technical commands |
qwen2.5-coder:7b |
7B | Medium | Excellent for bioinformatics commands |
Models ≤3B automatically use the Compact prompt tier (few-shot examples,
minimal context). Models 4–7B use the Medium tier. Models ≥8B use the
Full tier. You can override this with llm.prompt_tier.
Small model configuration#
For models ≤3B (e.g., qwen2.5-coder:0.5b, llama3.2), the Compact tier
is automatically selected. If you want to force a specific tier:
# Force Compact tier (recommended for ≤3B models)
oxo-call config set llm.prompt_tier compact
# Force Medium tier (for 4–7B models with limited context)
oxo-call config set llm.prompt_tier medium
# Auto-detect based on model size and context window (default)
oxo-call config set llm.prompt_tier auto
Or per-invocation:
Verify Your Configuration#
After setting up any provider:
This tests connectivity and returns the model being used:
Compare Providers Side-by-Side#
Run the same dry-run with different providers to compare output:
# Test with current provider
oxo-call dry-run samtools "sort input.bam by coordinate using 4 threads"
# Switch temporarily via environment variable
OXO_CALL_LLM_PROVIDER=ollama OXO_CALL_LLM_MODEL=llama3.2 \
oxo-call dry-run samtools "sort input.bam by coordinate using 4 threads"
Air-Gapped / Offline Mode#
oxo-call can run completely offline with no external network calls. This requires:
- Ollama for local LLM inference — no API key or internet needed
- Pre-cached documentation — tool
--helpoutput is cached after first use - Offline license verification — Ed25519 verification is entirely local
Complete offline setup#
# 1. Install Ollama and pull a model (requires internet once)
curl -fsSL https://ollama.ai/install.sh | sh
ollama pull llama3.2
# 2. Configure oxo-call for offline use
oxo-call config set llm.provider ollama
oxo-call config set llm.model llama3.2
# 3. Pre-cache documentation for your tools (requires internet once)
oxo-call docs add samtools
oxo-call docs add bcftools
oxo-call docs add fastp
# ... add all tools you plan to use
# 4. Verify offline readiness
oxo-call config verify # Should show Ollama connection OK
oxo-call docs list # Should show cached tools
oxo-call license verify # Should pass without network
After this setup, disconnect from the network. All subsequent oxo-call run and dry-run commands will work offline using cached documentation and local Ollama inference.
What requires network access#
| Feature | Requires Network | Offline Alternative |
|---|---|---|
| LLM inference (GitHub/OpenAI/Anthropic) | ✅ | Use Ollama |
| LLM inference (Ollama) | ❌ | Already local |
docs add --url (remote fetch) |
✅ | Use docs add --file or docs add --dir |
docs add (first-time --help capture) |
❌ | Tool must be installed locally |
| License verification | ❌ | Already offline |
skill install --url |
✅ | Copy skill files manually |
Team Setup / Organizational Deployment#
Sharing configuration across a team#
You can standardize oxo-call settings across your team using environment variables or shared configuration:
# Option 1: Shared environment variables (recommended for clusters)
# Add to your team's shared .bashrc or module file:
export OXO_CALL_LLM_PROVIDER=ollama
export OXO_CALL_LLM_API_BASE=http://shared-ollama-server:11434/v1
export OXO_CALL_LICENSE=/shared/licenses/license.oxo.json
# Option 2: Shared skill directory
# Place team-specific skills in a shared location and symlink:
ln -s /shared/oxo-call/skills/ ~/.config/oxo-call/skills
Sharing custom skills#
Distribute team skills via a shared directory or Git repository:
# Team lead creates skills
oxo-call skill create internal-tool -o /shared/oxo-call/skills/internal-tool.md
# Edit the skill file with team-specific conventions
# Team members install
cp /shared/oxo-call/skills/*.md ~/.config/oxo-call/skills/
# Or symlink the entire directory
Multi-user license#
A single commercial license covers all employees and contractors within the organization. Distribute the license.oxo.json file to team members via:
- Shared filesystem path (
export OXO_CALL_LICENSE=/shared/license.oxo.json) - Configuration management (Ansible, Puppet, etc.)
- Container image with pre-installed license
Troubleshooting#
"Connection refused" for Ollama
Make sure the Ollama server is running: ollama serve
"Invalid API key" errors
Check that the token is set correctly:
LLM output is incorrect or hallucinated
Try a more capable model, or enrich the documentation:
Rate limiting
For high-volume use, consider:
- Using a local Ollama model (no rate limits)
- Increasing retry settings
- Caching
--helpoutput to reduce API calls (done automatically after first run)