OpenCode + Oh My OpenAgent: High-Availability AI Coding for €20/Month (June 2026 Update)

Professional-grade AI coding for 15–20 hours per week now lands closer to €20/month than €30/month, stays online when one provider hiccups, and does not need a €150/month inference budget. The June 2026 update to our OpenCode + Oh My OpenAgent stack is built on three simple decisions:

Use the free Zen flash model first. Bulk agents now start on opencode/deepseek-v4-flash-free, fall back to the paid Zen flash model only if needed, then hop to ollama-cloud/deepseek-v4-flash.
Stop burning GPT on Oracle. The biggest OpenCode surprise was not the free models; it was GPT-5.4 usage. Oracle now runs on ollama-cloud/deepseek-v4-pro, so GPT spend is limited to Momus and Hephaestus fallback.
Keep Level 3 as the default backbone. ollama-cloud/kimi-k2.7-code and ollama-cloud/glm-5.2 still carry most of the stack, with V4 Pro reserved for reasoning-heavy work.

The result: a stack that is cheaper, still resilient, and much less likely to quietly leak money through OpenCode pay-per-use calls.

What Is Oh My OpenAgent?

OpenCode is an open-source AI coding framework that supports dozens of providers, MCP servers, and plugins. By default it is a single-model chat: you pick a model and talk to it.

Oh My OpenAgent is the plugin that turns OpenCode into a multi-agent development system. It adds specialised agents, each with independent model routing and ordered fallback chains. You do not have to pick one model for everything; the plugin routes each task to the agent and model that fit it best.

The Agent Catalog

Agent	What it does	When to use it	Model used here
Sisyphus	Orchestrator. Receives every message, plans, and delegates to the right specialist.	Default for almost everything.	`ollama-cloud/kimi-k2.7-code`
Prometheus	Strategic planner. Interviews you, then writes a verified plan before any code is touched.	When the idea is vague or the change is critical.	`ollama-cloud/deepseek-v4-pro`
Atlas	Todo orchestrator. Executes approved plans across sessions.	After Prometheus produces a plan.	`ollama-cloud/glm-5.2`
Metis	Pre-planning analyst. Catches hidden constraints and ambiguities.	Automatically invoked by Prometheus.	`ollama-cloud/kimi-k2.7-code`
Momus	Ruthless plan reviewer. Validates clarity, verifiability, and completeness.	The one place we still accept GPT-5.4 cost.	`opencode/gpt-5.4`
Oracle	Read-only architecture consultant. High-IQ reasoning for unfamiliar patterns and tradeoffs.	`@oracle` for security, architecture, and hard debugging.	`ollama-cloud/deepseek-v4-pro`
Librarian	Documentation and OSS search.	When you need to know how library X does Y.	`opencode/deepseek-v4-flash-free`
Explore	Fast codebase grep. Pattern discovery and “where is X?” searches.	When you need files, symbols, or conventions fast.	`opencode/deepseek-v4-flash-free`
Multimodal-Looker	Vision analyst. Reads screenshots, PDFs, diagrams.	Any visual input.	`opencode/gemini-3.1-pro`
Sisyphus-Junior	Focused implementation worker. Writes one assigned unit and cannot re-delegate.	Spawned automatically during execution.	`ollama-cloud/kimi-k2.7-code`
Hephaestus	GPT-native deep worker. Give it a goal, not a recipe.	Rare deep cross-domain reasoning.	`opencode/gpt-5.5`

Read-only agents (Oracle, Librarian, Explore, Multimodal-Looker) cannot write or edit. Momus cannot write or edit. Sisyphus-Junior cannot re-delegate. These restrictions keep each agent in its lane.

How To Invoke Agents Directly

Oh My OpenAgent registers every agent into OpenCode’s @-mention picker. You can override the default routing explicitly.

`@`-mentions

Shortcut	Routes to	Typical use
`@oracle`	Oracle	Architecture review, security, hard debugging
`@librarian`	Librarian	Docs, examples, library internals
`@explore`	Explore	Find code patterns, references, conventions
`@plan "…"`	Prometheus	Create a structured plan before coding

All registered agents also appear in the picker (@sisyphus, @hephaestus, @prometheus, @atlas, @momus, @metis, @sisyphus-junior, @multimodal-looker).

Keyword Modes

Keyword	Effect
`ultrawork` / `ulw`	Full parallel-agents execution mode
`search` / `find`	Web and documentation search focus
`analyze` / `investigate` / `audit` / `deep-dive`	Deep context-gathering analysis
`hyperplan` / `hpp`	Adversarial plan review
`hpp ulw` / `ulw hpp`	Hyperplan + ultrawork combined

Slash Commands

Command	Effect
`/start-work`	Activates Atlas on the latest Prometheus plan
`/plan`	Switches to Prometheus for structured planning
`/refactor`	LSP + AST-grep assisted refactoring
`/review-work`	Spawns parallel QA agents
`/handoff`	Generates a context summary for a new session
`/ulw-loop`	Self-referential loop in ultrawork mode

The Cost Reality: GPT Was the Problem, Not Zen Itself

The expensive surprise in real usage was not deepseek-v4-flash on Zen. It was GPT-5.4.

In practice, Oracle was one of the main sources of OpenCode pay-per-use spend. A single heavy day could burn around $12, and a single month could quietly land around $20 just on GPT-5.4. That is too much for a tool that is supposed to feel economical.

So the new rule is blunt:

Oracle moves to ollama-cloud/deepseek-v4-pro
Deep work moves to ollama-cloud/deepseek-v4-pro
GPT-5.4 stays only on Momus
Hephaestus keeps GPT because the plugin is GPT-native there

This is the real cost win. The free flash model saves pennies and parallelises nicely. Removing GPT-5.4 from everyday reasoning agents saves dollars.

—

New Design Principle: Cheap First for Bulk, Ollama First for Reasoning

The earlier version of this stack overused the phrase “cross-provider redundancy first”. That is still true in spirit, but the actual winning policy is simpler:

Bulk agents use the free Zen flash model first
If Zen free rate-limits, use paid Zen flash second
If Zen is unavailable, hop to Ollama Cloud flash
Reasoning-heavy agents live on Ollama Cloud by default

Examples:

Primary	First Fallback	Second Fallback	Why
`opencode/deepseek-v4-flash-free`	`opencode/deepseek-v4-flash`	`ollama-cloud/deepseek-v4-flash`	Cheapest possible bulk path
`ollama-cloud/kimi-k2.7-code`	`opencode/kimi-k2.6`	`ollama-cloud/deepseek-v4-pro`	Cheap orchestration with a Zen safety net
`ollama-cloud/deepseek-v4-pro`	`ollama-cloud/kimi-k2.7-code`	`opencode/gemini-3.1-pro`	GPT-free reasoning path
`ollama-cloud/glm-5.2`	`ollama-cloud/deepseek-v4-pro`	`ollama-cloud/kimi-k2.7-code`	Atlas stays fully on Ollama

The important point is not philosophical purity. It is that the stack should stay online without silently shifting half the bill onto OpenCode GPT calls.

—

Agents vs Categories: Two Different Routing Layers

Oh My OpenAgent exposes two independent routing layers. Les confondre rend la config difficile à lire, donc les voici séparées.

Agents

Les agents sont des personnages adressables que tu peux mentionner avec @ ou que Sisyphus appelle explicitement. Chacun a sa propre configuration de modèle. Ils sont le “qui”.

Agent	What it does	When to use it	Model used here
Sisyphus	Orchestrator. Receives every message, plans, delegates.	Default for almost everything.	`ollama-cloud/kimi-k2.7-code`
Prometheus	Strategic planner. Writes a verified plan before code.	Quand l’idée est floue ou critique.	`ollama-cloud/deepseek-v4-pro`
Atlas	Todo orchestrator. Executes approved plans across sessions.	Après un plan Prometheus.	`ollama-cloud/glm-5.2`
Metis	Pre-planning analyst. Catches ambiguities and constraints.	Appelé automatiquement par Prometheus.	`ollama-cloud/kimi-k2.7-code`
Momus	Ruthless plan reviewer. Validates plans.	The one place we still accept GPT-5.4 cost.	`opencode/gpt-5.4`
Oracle	Read-only architecture consultant.	`@oracle` for security, architecture, hard debugging.	`ollama-cloud/deepseek-v4-pro`
Librarian	Documentation and OSS search.	Recherche de docs et d’exemples.	`opencode/deepseek-v4-flash-free`
Explore	Fast codebase grep and pattern search.	Trouver du code dans le repo.	`opencode/deepseek-v4-flash-free`
Multimodal-Looker	Vision analyst. Reads screenshots, PDFs, diagrams.	Tâche visuelle.	`opencode/gemini-3.1-pro`
Sisyphus-Junior	Focused implementation worker. Cannot re-delegate.	Spawné automatiquement en exécution.	`ollama-cloud/kimi-k2.7-code`
Hephaestus	GPT-native deep worker. Give it a goal, not a recipe.	Raisonnement profond cross-domain rare.	`opencode/gpt-5.5`

Read-only agents (Oracle, Librarian, Explore, Multimodal-Looker) cannot write or edit. Momus cannot write or edit. Sisyphus-Junior cannot re-delegate.

Category	Primary Model	Fallback Strategy	When to use
quick	`opencode/deepseek-v4-flash-free`	paid Zen flash, then Ollama flash	Trivial tasks
unspecified-low	`opencode/deepseek-v4-flash-free`	paid Zen flash, then Ollama flash	Moderate tasks
unspecified-high	`ollama-cloud/kimi-k2.7-code` high	`opencode/kimi-k2.6`, then V4 Pro	Complex multi-file tasks
deep	`ollama-cloud/deepseek-v4-pro`	Kimi, then Gemini 3.1 Pro high	Premium technical work
visual-engineering	`opencode/gemini-3.5-flash`	Kimi, V4 Pro, Gemini 3.1 Pro high	UI/UX, styling
ultrabrain	`ollama-cloud/deepseek-v4-pro` max	Kimi, then Gemini 3.1 Pro high	Maximum reasoning
writing	`opencode/deepseek-v4-flash-free`	paid Zen flash, then Ollama flash	Documentation
infrastructure	`ollama-cloud/deepseek-v4-pro`	Kimi	IaC, Terraform, K8s, Ansible

Routing automatique interne

Sisyphus ne choisit pas le modèle final à la main. Il choisit un canal :

@oracle ou @librarian → agent nommé avec son propre modèle.
task(category="visual-engineering", ...) → category, le plugin route vers le modèle de cette catégorie.
task(subagent_type="explore", ...) → agent nommé.

Il n’y a pas de correspondance forcée entre un agent et une catégorie. Par exemple visual-engineering est rarement appelée par l’agent multimodal-looker ; c’est Sisyphus qui l’utilise quand il identifie un besoin UI/UX. De même, librarian est un agent de recherche, mais une simple tâche d’écriture de doc ira plutôt en category: writing.

L’intérêt de séparer les deux couches :

Agents = contrôle explicite et adressable. Tu sais exactement qui lit ton message.
Categories = contrôle implicite par type de travail. Sisyphus n’a pas besoin de créer un agent pour chaque micro-tâche.
Coût par type de travail, pas par personnage. Le bulk écriture coûte le même prix flash qu’une exploration, sans forcer un routage vers Librarian.

Updated Agent Assignments (June 2026)

Agent	Primary Model	Fallback Strategy	Rationale
Sisyphus	`ollama-cloud/kimi-k2.7-code`	`opencode/kimi-k2.6`, then V4 Pro	Every message. Keep it Level 3 and cheap.
Oracle	`ollama-cloud/deepseek-v4-pro` max	Kimi, then Gemini 3.1 Pro high	The main GPT-cost saver.
Momus	`opencode/gpt-5.4` high	V4 Pro, Kimi, Gemini 3.1 Pro high	Worth paying for a ruthless critic.
Prometheus	`ollama-cloud/deepseek-v4-pro`	Kimi, then Gemini 3.1 Pro high	Planning likes V4 Pro; no GPT required.
Plan	`ollama-cloud/deepseek-v4-pro`	Kimi, then Gemini 3.1 Pro high	Same logic as Prometheus.
Metis	`ollama-cloud/kimi-k2.7-code`	`opencode/kimi-k2.6`, then V4 Pro	Cheap pre-planning analysis.
Sisyphus-Junior	`ollama-cloud/kimi-k2.7-code`	`opencode/kimi-k2.6`, then V4 Pro	Bulk implementation stays Level 3.
Librarian	`opencode/deepseek-v4-flash-free`	paid Zen flash, then Ollama flash	Free-first bulk search.
Explore	`opencode/deepseek-v4-flash-free`	paid Zen flash, then Ollama flash	Free-first codebase grep.
Atlas	`ollama-cloud/glm-5.2`	V4 Pro, then Kimi	Large-context coordination at Level 3 price.
Multimodal-Looker	`opencode/gemini-3.1-pro`	Kimi	Vision stays on Gemini.
Hephaestus	`opencode/gpt-5.5` medium	GPT-5.4 medium, V4 Pro, Kimi	GPT-native plugin path; leave it rare.

Updated Category Assignments (June 2026)

Category	Primary Model	Fallback Strategy	Rationale
quick	`opencode/deepseek-v4-flash-free`	paid Zen flash, then Ollama flash	Free-first trivial tasks.
unspecified-low	`opencode/deepseek-v4-flash-free`	paid Zen flash, then Ollama flash	Free-first moderate tasks.
unspecified-high	`ollama-cloud/kimi-k2.7-code` high	`opencode/kimi-k2.6`, then V4 Pro	Complex multi-file tasks.
deep	`ollama-cloud/deepseek-v4-pro`	Kimi, then Gemini 3.1 Pro high	Premium technical work without GPT.
visual-engineering	`opencode/gemini-3.5-flash`	Kimi, V4 Pro, Gemini 3.1 Pro high	UI/UX stays on Gemini.
ultrabrain	`ollama-cloud/deepseek-v4-pro` max	Kimi, then Gemini 3.1 Pro high	Maximum reasoning.
writing	`opencode/deepseek-v4-flash-free`	paid Zen flash, then Ollama flash	Free-first documentation.
infrastructure	`ollama-cloud/deepseek-v4-pro`	Kimi	IaC still benefits from V4 Pro.

What disappeared from the active mix:

GPT-5.4 as a routine fallback all over the stack
ollama-cloud/glm-5.1 and opencode/glm-5.1
unnecessary paid Zen flash calls for bulk agents

—

Why This Is Better Than the Previous “Optimised” Version

The previous revision was already smarter than the default plugin setup, but it still had one leak: too many routes eventually ended at opencode/gpt-5.4.

That was the expensive mistake.

Three concrete improvements now:

Oracle no longer burns GPT spend. The biggest hidden OpenCode cost is gone.
Bulk agents finally use the actual free model first. If Zen gives you deepseek-v4-flash-free, use it.
GPT is now explicit, not accidental. You pay for Momus because it is worth it. You keep Hephaestus GPT-native because the plugin expects it. Everything else is pushed back to Ollama Cloud or free Zen.

—

Updated Monthly Cost Breakdown

Component	Cost	Notes
Ollama Cloud Pro (annual plan)	€15/month	Flat-rate, generous GPU quotas
OpenCode Zen pay-per-use	~€3–7/month	Mostly Momus, occasional Hephaestus, Gemini vision/UI calls
TOTAL	~€18–22/month	15–20h/week heavy coding

This is the first version of the stack that feels honestly cheap. The saving does not come from turning everything into a weak free model. It comes from being disciplined about where GPT is allowed to exist.

—

Privacy and Data Residency: Still the Same Strong Story

The new models do not change the privacy picture. Kimi K2.7-code, GLM-5.2, and DeepSeek V4 are open-weight models pulled from HuggingFace and hosted on Ollama Cloud’s US/EU NVIDIA infrastructure. Data does not flow to Chinese API endpoints. Ollama’s terms remain zero-logging, zero-retention.

When the primary model is opencode/..., inference runs through OpenCode Zen’s infrastructure, which is a separate provider. The redundancy design therefore also means sensitive data is not tied to a single provider’s pipeline.

—

The Full Configuration

The complete oh-my-openagent.jsonc keeps every model ID verified against:

https://opencode.ai/zen/v1/models for OpenCode Zen models
https://ollama.com/search?c=cloud for Ollama Cloud models

Conclusion

AI-assisted development does not need a €150/month inference budget, and it definitely does not need accidental GPT bills.

The June 2026 update is the first version of this stack that feels both resilient and honestly cheap:

Kimi K2.7-code for orchestration and implementation
GLM-5.2 for large-context coordination
DeepSeek V4 Flash Free for bulk search and documentation
DeepSeek V4 Pro for heavy reasoning without OpenCode GPT spend
GPT-5.4 / GPT-5.5 reserved for the rare places where they are actually justified

That is the real trick: not “never use premium models”, but “stop using them by accident”.

—

Resources

Ollama Cloud Pro pricing: https://ollama.com/pricing
OpenCode Zen pricing: https://opencode.ai/docs/zen
OpenCode Zen model list: https://opencode.ai/zen/v1/models
Ollama Cloud models: https://ollama.com/search?c=cloud

—

Author Note: This configuration is refined through daily use. The June 2026 revision was driven by three observations from production: (1) Level 4 quota burn is real, (2) the free Zen flash model is good enough for bulk work, and (3) GPT-5.4 was the main OpenCode cost leak, so it had to be pushed out of the default reasoning path.

One response to “OpenCode + Oh My OpenAgent: High-Availability AI Coding for €20/Month (June 2026 Update)”

Maîtriser Oh My OpenCode et Oh My OpenAgent : un guide pratique pour techniciens et développeurs – Pivert's Blog says:

19 June 2026 at 12:03 am

[…] Note sur les modèles : cet article utilise exclusivement les modèles GPT d’OpenAI disponibles dans un abonnement Copilot (GPT-5-mini, GPT-5.4-mini, GPT-5.3-Codex, GPT-5.4). C’est le choix par défaut de nombreuses entreprises qui limitent leurs équipes à ces fournisseurs. Pour une approche multi-modèles optimisant le rapport qualité/coût avec un budget d’environ 30€/mois, consultez l’article complémentaire : OpenCode + Oh My OpenAgent : building a multi-model AI coding stack for €30/month. […]