Pivert's Blog

OpenCode + Oh My OpenAgent: High-Availability AI Coding for €25/Month (June 2026 Update)


Reading Time: 7 minutes

Executive Summary

Professional-grade AI coding for 15–20 hours per week now costs roughly €25/month, stays online even when one provider hiccups, and does not need a €150/month inference budget. The June 2026 update to our OpenCode + Oh My OpenAgent stack is built on three realisations:

  • The new Level 3 models are good enough to carry almost everything. Kimi K2.7-code (Ollama Cloud) and GLM-5.2 (Ollama Cloud) are now the primary workhorses. DeepSeek V4 Pro is still available but demoted to a fallback, because Level 4 burns roughly 3× the GPU quota of Level 3 — in practice often more.
  • Provider redundancy matters more than squeezing the last cent. The first fallback for every agent is the same model family on the other provider: if the primary is Ollama Cloud, the first fallback is OpenCode Zen, and vice versa. A single provider outage no longer stops the workflow.
  • DeepSeek V4 Flash is free on OpenCode Zen and runs in parallel. Bulk agents (librarian, explore, quick tasks, writing) now use opencode/deepseek-v4-flash as primary, with ollama-cloud/deepseek-v4-flash as the cross-provider redundancy fallback.

The result: a stack that is cheaper, more resilient, and still routes rare high-stakes decisions to frontier pay-per-use models (GPT-5.5, GPT-5.4, Gemini) only when they are genuinely justified.

What Is Oh My OpenAgent?

OpenCode is an open-source AI coding framework that supports 75+ providers, MCP servers, and plugins. By default it is a single-model chat: you pick a model and talk to it.

Oh My OpenAgent is the plugin that turns OpenCode into a multi-agent development system. It adds specialised agents, each with independent model routing and ordered fallback chains. You do not have to pick one model for everything; the plugin routes each task to the agent (and model) that fits it best.

The Agent Catalog

AgentWhat it doesWhen to use itModel used here
SisyphusOrchestrator. Receives every message, plans, and delegates to the right specialist.Default for almost everything. Use for complex multi-step tasks that need coordination.ollama-cloud/kimi-k2.7-code
PrometheusStrategic planner. Interviews you like an engineer, then writes a detailed verified plan before any code is touched.When the idea is vague or the change is critical. Trigger with /plan or Tab.ollama-cloud/deepseek-v4-pro
AtlasTodo orchestrator. Executes approved plans by distributing tasks and verifying completion.After Prometheus produces a plan; drives the checklist across sessions.ollama-cloud/glm-5.2
MetisPre-planning analyst. Catches ambiguities, hidden intentions, and missing constraints before a plan is finalised.Automatically invoked by Prometheus for gap analysis.ollama-cloud/kimi-k2.7-code
MomusRuthless plan reviewer. Validates plans for clarity, verifiability, and completeness.After plan creation, before execution begins. Acts as an OK/reject gate.opencode/gpt-5.4
OracleRead-only architecture consultant. High-IQ reasoning for unfamiliar patterns and tradeoffs.Call @oracle for security concerns, unfamiliar code, or big architectural decisions.opencode/gpt-5.4
LibrarianDocumentation and OSS search. Finds official docs and real-world implementation examples.When you need to know "how does library X do Y?".opencode/deepseek-v4-flash
ExploreFast codebase grep. Pattern discovery and "where is X?" searches.When you need to find files, functions, or conventions across the repo.opencode/deepseek-v4-flash
Multimodal-LookerVision analyst. Reads screenshots, PDFs, diagrams.For UI screenshots, architecture diagrams, or any visual input.opencode/gemini-3.1-pro
Sisyphus-JuniorFocused implementation worker. Writes one assigned unit and cannot re-delegate.Spawned automatically when Sisyphus delegates an implementation task.ollama-cloud/kimi-k2.7-code
HephaestusGPT-native deep worker. Give it a goal, not a recipe.Rare deep cross-domain reasoning that justifies GPT access.opencode/gpt-5.5

Read-only agents (Oracle, Librarian, Explore, Multimodal-Looker) cannot write or edit. Momus cannot write or edit. Sisyphus-Junior cannot re-delegate. These restrictions keep each agent in its lane.

How To Invoke Agents Directly

Oh My OpenAgent registers every agent into OpenCode’s @-mention picker. You can override the default routing explicitly.

@-mentions

Type @ followed by the agent key. Useful ones:

ShortcutRoutes toTypical use
@oracleOracleArchitecture review, security, hard debugging
@librarianLibrarianDocs, examples, library internals
@exploreExploreFind code patterns, references, conventions
@plan "…"PrometheusCreate a structured plan before coding

All registered agents also appear in the picker (@sisyphus, @hephaestus, @prometheus, @atlas, @momus, @metis, @sisyphus-junior, @multimodal-looker). The most reliable direct shortcuts are @oracle, @librarian, @explore, and @plan.

Keyword Modes

Certain words in your prompt inject a specialised mode without changing agent:

KeywordEffect
ultrawork / ulwFull parallel-agents execution mode
search / findWeb and documentation search focus
analyze / investigate / audit / deep-diveDeep context-gathering analysis
hyperplan / hppAdversarial plan review
hpp ulw / ulw hppHyperplan + ultrawork combined

Slash Commands

CommandEffect
/start-workActivates Atlas on the latest Prometheus plan
/planSwitches to Prometheus for structured planning
/refactorLSP + AST-grep assisted refactoring
/review-workSpawns parallel QA agents
/handoffGenerates a context summary for a new session
/ulw-loopSelf-referential loop in ultrawork mode

The Cost Reality: Why Level 3 Became the Default

Ollama Cloud Pro is a flat-rate subscription, but its quota is not infinite. Models are priced by GPU-time consumption level:

LevelActive ParametersTypical GPU Quota Cost
Level 2 (Medium)~13B~1 unit
Level 3 (High)~32B~3 units
Level 4 (Extra High)~49B+~9–10 units

DeepSeek V4 Pro is officially Level 4. Real-world usage showed it burning quota at 3× or more compared to Level 3 models. For agents that fire on every message, that multiplier is unsustainable. The new Level 3 models change the economics:

  • Kimi K2.7-code** (Level 3): the coding-focused successor to K2.6. Stronger on long-horizon coding tasks and ~30% lower thinking-token usage.
  • GLM-5.2** (Level 3): Z.ai’s new flagship, with a 1M-token context window at Level 3 pricing — a massive win for large-context tasks.
  • MiniMax-M3** (Level 3): available but not used in this configuration; we prefer Kimi and GLM for our task mix.

The new rule is simple: default to Level 3, escalate to Level 4 only when the task profile clearly rewards it.

New Design Principle: Cross-Provider Redundancy First

Most model-routing discussions focus on capability. This configuration adds a second lens: availability. Every agent’s fallback chain is designed so the *first* fallback is the same model family on the *other* provider.

Examples:

PrimaryProviderFirst FallbackProvider
ollama-cloud/kimi-k2.7-codeOllama Cloudopencode/kimi-k2.6OpenCode Zen
opencode/deepseek-v4-flashOpenCode Zenollama-cloud/deepseek-v4-flashOllama Cloud
ollama-cloud/deepseek-v4-proOllama Cloudopencode/gpt-5.4OpenCode Zen
opencode/gpt-5.4OpenCode Zenollama-cloud/deepseek-v4-proOllama Cloud
opencode/gemini-3.1-proOpenCode Zenollama-cloud/kimi-k2.7-codeOllama Cloud
ollama-cloud/glm-5.2 (Atlas)Ollama Cloudollama-cloud/deepseek-v4-proOllama Cloud

The Atlas primary is an exception: opencode/glm-5.2 does not exist on OpenCode Zen yet, so its first fallback is the best available reasoning model on the same provider (ollama-cloud/deepseek-v4-pro). Its second fallback then crosses to ollama-cloud/kimi-k2.7-code.

If Ollama Cloud has a transient failure, OpenCode Zen takes over immediately. If Zen has a rate-limit hiccup, Ollama Cloud covers it. The cost of the occasional cross-provider call is tiny compared to the cost of a workflow that simply stops.

This also removes the old "free tier at the end of the chain" trap. Free-tier models on Zen (qwen3.6-plus-free, big-pickle, nemotron) were disabled at the inference-provider level in our setup and caused Provider not in allowed providers errors when the plugin’s hardcoded defaults tried to use them. The new configuration explicitly lists only allowed models and never falls through to hidden plugin defaults.

Updated Agent Assignments (June 2026)

Agent / CategoryPrimary ModelFallback StrategyRationale
Sisyphusollama-cloud/kimi-k2.7-codeopencode/kimi-k2.6, V4 Pro, GPT-5.4 mediumEvery message. Level 3 keeps quota sane; redundancy keeps it online.
Oracleopencode/gpt-5.4 highV4 Pro, Kimi, Gemini 3.1 Pro highRare, high-stakes architecture calls.
Momusopencode/gpt-5.4 highV4 Pro, Kimi, Gemini 3.1 Pro highQA review. Cost of a missed bug > cost of tokens.
Prometheusollama-cloud/deepseek-v4-proGPT-5.4 high, KimiPlanning benefits from V4 Pro reasoning; Zen GPT is cross-provider fallback.
Planollama-cloud/deepseek-v4-proGPT-5.4 high, KimiSame family as Prometheus, with plan-override patch.
Metisollama-cloud/kimi-k2.7-codeopencode/kimi-k2.6, V4 Pro, GPT-5.4 mediumPre-planning analysis.
Sisyphus-Juniorollama-cloud/kimi-k2.7-codeopencode/kimi-k2.6, V4 Pro, GPT-5.4 mediumBulk implementation now on Level 3.
Librarianopencode/deepseek-v4-flashollama-cloud/deepseek-v4-flash, Kimi, GPT-5.4 mediumFree Zen primary; Ollama Cloud redundancy.
Exploreopencode/deepseek-v4-flashollama-cloud/deepseek-v4-flash, Kimi, GPT-5.4 mediumFree Zen primary; Ollama Cloud redundancy.
Atlasollama-cloud/glm-5.2V4 Pro, Kimi, GPT-5.4 mediumGLM-5.2’s huge context at Level 3 pricing.
Multimodal-Lookeropencode/gemini-3.1-proKimi, GPT-5.4 mediumVision tasks; Kimi is the cross-provider vision fallback.
Hephaestusopencode/gpt-5.5 mediumGPT-5.4 medium, V4 Pro, KimiGPT-native deep worker; non-GPT fallbacks allowed with warning.
quickopencode/deepseek-v4-flashollama-cloud/deepseek-v4-flash, Kimi, GPT-5.4 mediumFree Zen + redundancy.
unspecified-lowopencode/deepseek-v4-flashollama-cloud/deepseek-v4-flash, Kimi, GPT-5.4 mediumFree Zen + redundancy.
unspecified-highollama-cloud/kimi-k2.7-code highopencode/kimi-k2.6, V4 Pro, GPT-5.4 mediumComplex multi-file tasks.
deepopencode/gpt-5.4 mediumV4 Pro, Kimi, Gemini 3.1 Pro highRare premium technical work.
visual-engineeringopencode/gemini-3.5-flashKimi, V4 Pro, Gemini 3.1 Pro highUI/UX styling; Kimi covers vision redundancy.
ultrabrainollama-cloud/deepseek-v4-pro maxGPT-5.4 medium, Kimi, Gemini 3.1 Pro highMaximum reasoning — V4 Pro leads GPQA Diamond, HMMT, HLE, and agentic Elo.
writingopencode/deepseek-v4-flashollama-cloud/deepseek-v4-flash, Kimi, GPT-5.4 mediumDocumentation, free Zen + redundancy.
infrastructureollama-cloud/deepseek-v4-proGPT-5.4 medium, KimiIaC still benefits from V4 Pro context/reasoning.

What disappeared from the active mix:

  • ollama-cloud/glm-5.1 and opencode/glm-5.1 — removed entirely because they underperformed.
  • opencode/nemotron-3-super-free, opencode/qwen3.6-plus-free, opencode/big-pickle — disabled free-tier fallbacks that caused provider errors.
  • Claude/Anthropic models — exist on Zen but intentionally disabled for cost control.

Why This Is Now a High-Availability Stack, Not Just a Cheap Stack

The previous version of this post treated fallback chains mainly as a cost-optimization tool: flat-rate first, pay-per-use if exhausted. The June 2026 revision treats them as an availability tool first.

Three concrete improvements:

  • No single-provider failure stops work.** If Ollama Cloud’s US-east cluster has a bad hour, the orchestrator immediately falls back to OpenCode Zen’s kimi-k2.6. If Zen’s free DeepSeek Flash hits a rate limit, the Ollama Cloud DeepSeek Flash takes over.
  • No hidden plugin defaults leak in.** Oh My OpenAgent has hardcoded fallback chains that reference providers disabled in our inference settings (tensorix, infercom, cortecs, minimax). The new config overrides every agent and category with explicit, fully-qualified model IDs, so the plugin never falls through to its defaults.
  • Hephaestus is explicitly configured.** Previously it relied on plugin defaults and could silently fall back to disallowed providers. Now it uses opencode/gpt-5.5 as primary, opencode/gpt-5.4 as same-provider fallback, and ollama-cloud/deepseek-v4-pro/kimi-k2.7-code as cross-provider fallbacks with allow_non_gpt_model: true.

Updated Monthly Cost Breakdown

ComponentCostNotes
Ollama Cloud Pro (annual plan)€15/monthFlat-rate, generous GPU quotas
OpenCode Zen pay-per-use~€7–12/monthGPT-5.5, GPT-5.4 high/medium, Gemini 3.1 Pro, Gemini 3.5 Flash; rare calls only
TOTAL~€22–27/month15–20h/week heavy coding

The cost sits comfortably inside the €20–30/month window and is far below the €150/month that many commercial AI coding tools charge. It dropped compared to the May 2026 configuration because:

  • The orchestrator and implementation worker moved from Level 4 to Level 3.
  • Bulk agents moved to free Zen DeepSeek Flash.
  • Level 4 is now reserved for infrastructure, planning, ultrawork, and occasional escalations.

Privacy and Data Residency: Still the Same Strong Story

The new models do not change the privacy picture. Kimi K2.7-code, GLM-5.2, and DeepSeek V4 are open-weight models pulled from HuggingFace and hosted on Ollama Cloud’s US/EU NVIDIA infrastructure. Data does not flow to Chinese API endpoints. Ollama’s terms remain zero-logging, zero-retention.

When the primary model is opencode/..., inference runs through OpenCode Zen’s infrastructure, which is a separate provider. The redundancy design therefore also means sensitive data is not tied to a single provider’s pipeline.

The Full Configuration

The complete oh-my-openagent.jsonc keeps every model ID verified against:

  • https://opencode.ai/zen/v1/models for OpenCode Zen models
  • https://ollama.com/search?c=cloud for Ollama Cloud models

Conclusion

AI-assisted development does not need a €150/month inference budget, and it does not need to be fragile. The June 2026 update shows that a Level-3-first, cross-provider-redundant architecture delivers professional coding assistance for roughly €25/month while staying online through provider hiccups.

The winning combination:

  • Kimi K2.7-code** for orchestration and complex coding.
  • GLM-5.2** for large-context coordination (Atlas).
  • OpenCode Zen DeepSeek V4 Flash** for free bulk search and documentation.
  • DeepSeek V4 Pro** demoted to fallback and narrow use cases.
  • GPT-5.5 / GPT-5.4 / Gemini** reserved for rare, high-stakes decisions.

This is not a workaround. It is a cheaper, more resilient, and benchmark-grounded architecture.

Resources

  • Ollama Cloud Pro pricing**: https://ollama.com/pricing
  • OpenCode Zen pricing**: https://opencode.ai/docs/zen
  • OpenCode Zen model list**: https://opencode.ai/zen/v1/models
  • Ollama Cloud models**: https://ollama.com/search?c=cloud

Author Note: This configuration is refined through daily use. The June 2026 revision was driven by three observations from production: (1) Level 4 quota burn is higher than advertised, (2) the new Level 3 models closed the quality gap for the orchestrator and implementation worker, and (3) provider-level failures were the most annoying outages — cross-provider redundancy solved them.

Like it ?

Get notified on new posts (max 1 / month)
Soyez informés lors des prochains articles

Leave a Reply

Your email address will not be published. Required fields are marked *