Executive Summary
Professional-grade AI coding for 15–20 hours per week now costs roughly €25/month, stays online even when one provider hiccups, and does not need a €150/month inference budget. The June 2026 update to our OpenCode + Oh My OpenAgent stack is built on three realisations:
- The new Level 3 models are good enough to carry almost everything. Kimi K2.7-code (Ollama Cloud) and GLM-5.2 (Ollama Cloud) are now the primary workhorses. DeepSeek V4 Pro is still available but demoted to a fallback, because Level 4 burns roughly 3× the GPU quota of Level 3 — in practice often more.
- Provider redundancy matters more than squeezing the last cent. The first fallback for every agent is the same model family on the other provider: if the primary is Ollama Cloud, the first fallback is OpenCode Zen, and vice versa. A single provider outage no longer stops the workflow.
- DeepSeek V4 Flash is free on OpenCode Zen and runs in parallel. Bulk agents (librarian, explore, quick tasks, writing) now use
opencode/deepseek-v4-flashas primary, withollama-cloud/deepseek-v4-flashas the cross-provider redundancy fallback.
The result: a stack that is cheaper, more resilient, and still routes rare high-stakes decisions to frontier pay-per-use models (GPT-5.5, GPT-5.4, Gemini) only when they are genuinely justified.
What Is Oh My OpenAgent?
OpenCode is an open-source AI coding framework that supports 75+ providers, MCP servers, and plugins. By default it is a single-model chat: you pick a model and talk to it.
Oh My OpenAgent is the plugin that turns OpenCode into a multi-agent development system. It adds specialised agents, each with independent model routing and ordered fallback chains. You do not have to pick one model for everything; the plugin routes each task to the agent (and model) that fits it best.
The Agent Catalog
| Agent | What it does | When to use it | Model used here |
|---|---|---|---|
| Sisyphus | Orchestrator. Receives every message, plans, and delegates to the right specialist. | Default for almost everything. Use for complex multi-step tasks that need coordination. | ollama-cloud/kimi-k2.7-code |
| Prometheus | Strategic planner. Interviews you like an engineer, then writes a detailed verified plan before any code is touched. | When the idea is vague or the change is critical. Trigger with /plan or Tab. | ollama-cloud/deepseek-v4-pro |
| Atlas | Todo orchestrator. Executes approved plans by distributing tasks and verifying completion. | After Prometheus produces a plan; drives the checklist across sessions. | ollama-cloud/glm-5.2 |
| Metis | Pre-planning analyst. Catches ambiguities, hidden intentions, and missing constraints before a plan is finalised. | Automatically invoked by Prometheus for gap analysis. | ollama-cloud/kimi-k2.7-code |
| Momus | Ruthless plan reviewer. Validates plans for clarity, verifiability, and completeness. | After plan creation, before execution begins. Acts as an OK/reject gate. | opencode/gpt-5.4 |
| Oracle | Read-only architecture consultant. High-IQ reasoning for unfamiliar patterns and tradeoffs. | Call @oracle for security concerns, unfamiliar code, or big architectural decisions. | opencode/gpt-5.4 |
| Librarian | Documentation and OSS search. Finds official docs and real-world implementation examples. | When you need to know "how does library X do Y?". | opencode/deepseek-v4-flash |
| Explore | Fast codebase grep. Pattern discovery and "where is X?" searches. | When you need to find files, functions, or conventions across the repo. | opencode/deepseek-v4-flash |
| Multimodal-Looker | Vision analyst. Reads screenshots, PDFs, diagrams. | For UI screenshots, architecture diagrams, or any visual input. | opencode/gemini-3.1-pro |
| Sisyphus-Junior | Focused implementation worker. Writes one assigned unit and cannot re-delegate. | Spawned automatically when Sisyphus delegates an implementation task. | ollama-cloud/kimi-k2.7-code |
| Hephaestus | GPT-native deep worker. Give it a goal, not a recipe. | Rare deep cross-domain reasoning that justifies GPT access. | opencode/gpt-5.5 |
Read-only agents (Oracle, Librarian, Explore, Multimodal-Looker) cannot write or edit. Momus cannot write or edit. Sisyphus-Junior cannot re-delegate. These restrictions keep each agent in its lane.
How To Invoke Agents Directly
Oh My OpenAgent registers every agent into OpenCode’s @-mention picker. You can override the default routing explicitly.
@-mentions
Type @ followed by the agent key. Useful ones:
| Shortcut | Routes to | Typical use |
|---|---|---|
@oracle | Oracle | Architecture review, security, hard debugging |
@librarian | Librarian | Docs, examples, library internals |
@explore | Explore | Find code patterns, references, conventions |
@plan "…" | Prometheus | Create a structured plan before coding |
All registered agents also appear in the picker (@sisyphus, @hephaestus, @prometheus, @atlas, @momus, @metis, @sisyphus-junior, @multimodal-looker). The most reliable direct shortcuts are @oracle, @librarian, @explore, and @plan.
Keyword Modes
Certain words in your prompt inject a specialised mode without changing agent:
| Keyword | Effect |
|---|---|
ultrawork / ulw | Full parallel-agents execution mode |
search / find | Web and documentation search focus |
analyze / investigate / audit / deep-dive | Deep context-gathering analysis |
hyperplan / hpp | Adversarial plan review |
hpp ulw / ulw hpp | Hyperplan + ultrawork combined |
Slash Commands
| Command | Effect |
|---|---|
/start-work | Activates Atlas on the latest Prometheus plan |
/plan | Switches to Prometheus for structured planning |
/refactor | LSP + AST-grep assisted refactoring |
/review-work | Spawns parallel QA agents |
/handoff | Generates a context summary for a new session |
/ulw-loop | Self-referential loop in ultrawork mode |
The Cost Reality: Why Level 3 Became the Default
Ollama Cloud Pro is a flat-rate subscription, but its quota is not infinite. Models are priced by GPU-time consumption level:
| Level | Active Parameters | Typical GPU Quota Cost |
|---|---|---|
| Level 2 (Medium) | ~13B | ~1 unit |
| Level 3 (High) | ~32B | ~3 units |
| Level 4 (Extra High) | ~49B+ | ~9–10 units |
DeepSeek V4 Pro is officially Level 4. Real-world usage showed it burning quota at 3× or more compared to Level 3 models. For agents that fire on every message, that multiplier is unsustainable. The new Level 3 models change the economics:
- Kimi K2.7-code** (Level 3): the coding-focused successor to K2.6. Stronger on long-horizon coding tasks and ~30% lower thinking-token usage.
- GLM-5.2** (Level 3): Z.ai’s new flagship, with a 1M-token context window at Level 3 pricing — a massive win for large-context tasks.
- MiniMax-M3** (Level 3): available but not used in this configuration; we prefer Kimi and GLM for our task mix.
The new rule is simple: default to Level 3, escalate to Level 4 only when the task profile clearly rewards it.
—
New Design Principle: Cross-Provider Redundancy First
Most model-routing discussions focus on capability. This configuration adds a second lens: availability. Every agent’s fallback chain is designed so the *first* fallback is the same model family on the *other* provider.
Examples:
| Primary | Provider | First Fallback | Provider |
|---|---|---|---|
ollama-cloud/kimi-k2.7-code | Ollama Cloud | opencode/kimi-k2.6 | OpenCode Zen |
opencode/deepseek-v4-flash | OpenCode Zen | ollama-cloud/deepseek-v4-flash | Ollama Cloud |
ollama-cloud/deepseek-v4-pro | Ollama Cloud | opencode/gpt-5.4 | OpenCode Zen |
opencode/gpt-5.4 | OpenCode Zen | ollama-cloud/deepseek-v4-pro | Ollama Cloud |
opencode/gemini-3.1-pro | OpenCode Zen | ollama-cloud/kimi-k2.7-code | Ollama Cloud |
ollama-cloud/glm-5.2 (Atlas) | Ollama Cloud | ollama-cloud/deepseek-v4-pro | Ollama Cloud |
The Atlas primary is an exception: opencode/glm-5.2 does not exist on OpenCode Zen yet, so its first fallback is the best available reasoning model on the same provider (ollama-cloud/deepseek-v4-pro). Its second fallback then crosses to ollama-cloud/kimi-k2.7-code.
If Ollama Cloud has a transient failure, OpenCode Zen takes over immediately. If Zen has a rate-limit hiccup, Ollama Cloud covers it. The cost of the occasional cross-provider call is tiny compared to the cost of a workflow that simply stops.
This also removes the old "free tier at the end of the chain" trap. Free-tier models on Zen (qwen3.6-plus-free, big-pickle, nemotron) were disabled at the inference-provider level in our setup and caused Provider not in allowed providers errors when the plugin’s hardcoded defaults tried to use them. The new configuration explicitly lists only allowed models and never falls through to hidden plugin defaults.
—
Updated Agent Assignments (June 2026)
| Agent / Category | Primary Model | Fallback Strategy | Rationale |
|---|---|---|---|
| Sisyphus | ollama-cloud/kimi-k2.7-code | opencode/kimi-k2.6, V4 Pro, GPT-5.4 medium | Every message. Level 3 keeps quota sane; redundancy keeps it online. |
| Oracle | opencode/gpt-5.4 high | V4 Pro, Kimi, Gemini 3.1 Pro high | Rare, high-stakes architecture calls. |
| Momus | opencode/gpt-5.4 high | V4 Pro, Kimi, Gemini 3.1 Pro high | QA review. Cost of a missed bug > cost of tokens. |
| Prometheus | ollama-cloud/deepseek-v4-pro | GPT-5.4 high, Kimi | Planning benefits from V4 Pro reasoning; Zen GPT is cross-provider fallback. |
| Plan | ollama-cloud/deepseek-v4-pro | GPT-5.4 high, Kimi | Same family as Prometheus, with plan-override patch. |
| Metis | ollama-cloud/kimi-k2.7-code | opencode/kimi-k2.6, V4 Pro, GPT-5.4 medium | Pre-planning analysis. |
| Sisyphus-Junior | ollama-cloud/kimi-k2.7-code | opencode/kimi-k2.6, V4 Pro, GPT-5.4 medium | Bulk implementation now on Level 3. |
| Librarian | opencode/deepseek-v4-flash | ollama-cloud/deepseek-v4-flash, Kimi, GPT-5.4 medium | Free Zen primary; Ollama Cloud redundancy. |
| Explore | opencode/deepseek-v4-flash | ollama-cloud/deepseek-v4-flash, Kimi, GPT-5.4 medium | Free Zen primary; Ollama Cloud redundancy. |
| Atlas | ollama-cloud/glm-5.2 | V4 Pro, Kimi, GPT-5.4 medium | GLM-5.2’s huge context at Level 3 pricing. |
| Multimodal-Looker | opencode/gemini-3.1-pro | Kimi, GPT-5.4 medium | Vision tasks; Kimi is the cross-provider vision fallback. |
| Hephaestus | opencode/gpt-5.5 medium | GPT-5.4 medium, V4 Pro, Kimi | GPT-native deep worker; non-GPT fallbacks allowed with warning. |
| quick | opencode/deepseek-v4-flash | ollama-cloud/deepseek-v4-flash, Kimi, GPT-5.4 medium | Free Zen + redundancy. |
| unspecified-low | opencode/deepseek-v4-flash | ollama-cloud/deepseek-v4-flash, Kimi, GPT-5.4 medium | Free Zen + redundancy. |
| unspecified-high | ollama-cloud/kimi-k2.7-code high | opencode/kimi-k2.6, V4 Pro, GPT-5.4 medium | Complex multi-file tasks. |
| deep | opencode/gpt-5.4 medium | V4 Pro, Kimi, Gemini 3.1 Pro high | Rare premium technical work. |
| visual-engineering | opencode/gemini-3.5-flash | Kimi, V4 Pro, Gemini 3.1 Pro high | UI/UX styling; Kimi covers vision redundancy. |
| ultrabrain | ollama-cloud/deepseek-v4-pro max | GPT-5.4 medium, Kimi, Gemini 3.1 Pro high | Maximum reasoning — V4 Pro leads GPQA Diamond, HMMT, HLE, and agentic Elo. |
| writing | opencode/deepseek-v4-flash | ollama-cloud/deepseek-v4-flash, Kimi, GPT-5.4 medium | Documentation, free Zen + redundancy. |
| infrastructure | ollama-cloud/deepseek-v4-pro | GPT-5.4 medium, Kimi | IaC still benefits from V4 Pro context/reasoning. |
What disappeared from the active mix:
ollama-cloud/glm-5.1andopencode/glm-5.1— removed entirely because they underperformed.opencode/nemotron-3-super-free,opencode/qwen3.6-plus-free,opencode/big-pickle— disabled free-tier fallbacks that caused provider errors.- Claude/Anthropic models — exist on Zen but intentionally disabled for cost control.
—
Why This Is Now a High-Availability Stack, Not Just a Cheap Stack
The previous version of this post treated fallback chains mainly as a cost-optimization tool: flat-rate first, pay-per-use if exhausted. The June 2026 revision treats them as an availability tool first.
Three concrete improvements:
- No single-provider failure stops work.** If Ollama Cloud’s US-east cluster has a bad hour, the orchestrator immediately falls back to OpenCode Zen’s
kimi-k2.6. If Zen’s free DeepSeek Flash hits a rate limit, the Ollama Cloud DeepSeek Flash takes over. - No hidden plugin defaults leak in.** Oh My OpenAgent has hardcoded fallback chains that reference providers disabled in our inference settings (tensorix, infercom, cortecs, minimax). The new config overrides every agent and category with explicit, fully-qualified model IDs, so the plugin never falls through to its defaults.
- Hephaestus is explicitly configured.** Previously it relied on plugin defaults and could silently fall back to disallowed providers. Now it uses
opencode/gpt-5.5as primary,opencode/gpt-5.4as same-provider fallback, andollama-cloud/deepseek-v4-pro/kimi-k2.7-codeas cross-provider fallbacks withallow_non_gpt_model: true.
—
Updated Monthly Cost Breakdown
| Component | Cost | Notes |
|---|---|---|
| Ollama Cloud Pro (annual plan) | €15/month | Flat-rate, generous GPU quotas |
| OpenCode Zen pay-per-use | ~€7–12/month | GPT-5.5, GPT-5.4 high/medium, Gemini 3.1 Pro, Gemini 3.5 Flash; rare calls only |
| TOTAL | ~€22–27/month | 15–20h/week heavy coding |
The cost sits comfortably inside the €20–30/month window and is far below the €150/month that many commercial AI coding tools charge. It dropped compared to the May 2026 configuration because:
- The orchestrator and implementation worker moved from Level 4 to Level 3.
- Bulk agents moved to free Zen DeepSeek Flash.
- Level 4 is now reserved for infrastructure, planning, ultrawork, and occasional escalations.
—
Privacy and Data Residency: Still the Same Strong Story
The new models do not change the privacy picture. Kimi K2.7-code, GLM-5.2, and DeepSeek V4 are open-weight models pulled from HuggingFace and hosted on Ollama Cloud’s US/EU NVIDIA infrastructure. Data does not flow to Chinese API endpoints. Ollama’s terms remain zero-logging, zero-retention.
When the primary model is opencode/..., inference runs through OpenCode Zen’s infrastructure, which is a separate provider. The redundancy design therefore also means sensitive data is not tied to a single provider’s pipeline.
—
The Full Configuration
The complete oh-my-openagent.jsonc keeps every model ID verified against:
https://opencode.ai/zen/v1/modelsfor OpenCode Zen modelshttps://ollama.com/search?c=cloudfor Ollama Cloud models
Conclusion
AI-assisted development does not need a €150/month inference budget, and it does not need to be fragile. The June 2026 update shows that a Level-3-first, cross-provider-redundant architecture delivers professional coding assistance for roughly €25/month while staying online through provider hiccups.
The winning combination:
- Kimi K2.7-code** for orchestration and complex coding.
- GLM-5.2** for large-context coordination (Atlas).
- OpenCode Zen DeepSeek V4 Flash** for free bulk search and documentation.
- DeepSeek V4 Pro** demoted to fallback and narrow use cases.
- GPT-5.5 / GPT-5.4 / Gemini** reserved for rare, high-stakes decisions.
This is not a workaround. It is a cheaper, more resilient, and benchmark-grounded architecture.
—
Resources
- Ollama Cloud Pro pricing**: https://ollama.com/pricing
- OpenCode Zen pricing**: https://opencode.ai/docs/zen
- OpenCode Zen model list**: https://opencode.ai/zen/v1/models
- Ollama Cloud models**: https://ollama.com/search?c=cloud
—
Author Note: This configuration is refined through daily use. The June 2026 revision was driven by three observations from production: (1) Level 4 quota burn is higher than advertised, (2) the new Level 3 models closed the quality gap for the orchestrator and implementation worker, and (3) provider-level failures were the most annoying outages — cross-provider redundancy solved them.

