Less is More: Why a Minimal Coding Agent Outperforms the Spaceships

In November 2025, Mario Zechner — the developer behind Pi, a coding agent that’s quietly amassed over 42,000 stars on GitHub — published a post titled “What I learned building an opinionated and minimal coding agent.” Five months later, the post reads like a manifesto for everyone who’s frustrated with the direction coding agents have taken. And the project? It hasn’t stopped evolving.

The Problem with Spaceships

Zechner’s journey mirrors what many developers have experienced. He started with ChatGPT, moved to Copilot completions (“which never worked for me”), spent a year and a half on Cursor, and eventually landed on Claude Code. Early Claude Code was “much more basic,” which suited him perfectly. Then it grew into what he calls a “spaceship with 80% of functionality I have no use for.”

The core complaints are familiar to anyone working with modern AI coding tools:

  • Context injection you can’t control: Existing harnesses inject system prompts and tool descriptions behind your back. You don’t see what’s being sent to the model, and you can’t customize it.
  • Changing behavior between releases: System prompts and tool definitions change on every update, breaking established workflows.
  • Poor observability: Sub-agents operate as black boxes. You can’t see what they’re doing or how they arrived at their conclusions.
  • Self-hosting doesn’t work: Libraries like the Vercel AI SDK don’t play nice with self-hosted models, especially around tool calling.

His response? Build his own. And name it something entirely un-Google-able, so it would never have any users. The irony is that Pi now has one of the most active open-source communities in the coding agent space.

The Four Pillars of Pi

What started as a personal project grew into four packages, each solving a specific problem:

  • pi-ai: A unified LLM API supporting Anthropic, OpenAI, Google, xAI, Groq, Cerebras, OpenRouter, DeepSeek, Fireworks, Cloudflare Workers AI, and any OpenAI-compatible endpoint. It handles streaming, tool calling, thinking/reasoning traces, cross-provider context handoffs, and token tracking.
  • pi-agent-core: An agent loop that orchestrates tool execution, validation, and event streaming.
  • pi-tui: A custom terminal UI framework with differential rendering and synchronized output for flicker-free updates.
  • pi-coding-agent: The actual CLI that ties everything together.

The philosophy across all four: “if I don’t need it, it won’t be built.”

The Contrarian Design Decisions

What makes Pi genuinely interesting isn’t the architecture — it’s the deliberate rejection of features that other agents consider essential.

A System Prompt Under 1,000 Tokens

Claude Code’s system prompt runs into the thousands of tokens. Pi’s entire system prompt plus tool definitions comes in under 1,000 tokens. The argument is that frontier models have been RL-trained extensively and inherently understand what a coding agent is. The benchmarks support this — Pi competes at the top of the Terminal-Bench leaderboard despite the minimal prompt.

Four Tools. That’s It.

Read, write, edit, bash. That’s the entire toolset. No specialized file search tools (grep, find, ls) are loaded by default. The model uses bash for file operations and has been trained on these exact tool schemas. Compare this to Claude Code’s extensive tool definitions or the MCP servers that dump 13,000+ tokens of tool descriptions into your context before you’ve even started working.

YOLO Mode — No Safety Theater

Pi runs with unrestricted filesystem and command access. No permission prompts, no pre-checking of bash commands. Zechner argues that security measures in coding agents are “mostly security theater” — as soon as an agent can write code and run code, it’s game over. The only real mitigation would be cutting network access entirely, which makes the agent useless.

No Built-in To-dos, No Plan Mode, No MCP

These are file-based problems, not tool problems. Need a to-do list? Write it to TODO.md. Need a plan? Write it to PLAN.md. This approach gives you persistence across sessions, version control, and full visibility. As for MCP — Zechner documented his reasoning extensively: popular MCP servers consume 7-9% of your context window just in tool descriptions. CLI tools with README files achieve the same result through progressive disclosure.

No Background Bash. Use Tmux.

Instead of building background process management into the agent, Pi expects you to use tmux. This gives you full observability — you can watch the agent debug a crashing program in LLDB, hop into the session yourself, and co-debug. Claude Code could do this too, but it doesn’t.

No Sub-agents

Zechner argues that sub-agents within a session are a sign of poor planning. If you need to gather context, do it in its own session first, create an artifact, and then use that artifact in a fresh session. This gives you full observability and steerability. The one exception: code review, where spawning Pi via bash with a review prompt makes sense — and gives you the full review output.

Five Months Later: Pi in 2026

Since Zechner’s article, Pi has evolved rapidly. The project now sits at v0.70.6 with over 260 releases, 3,800+ commits, and a growing contributor base. Here’s what’s changed since November 2025:

Massive Provider Expansion

Pi added support for DeepSeek (with V4 Flash/Pro models), Fireworks AI, and Cloudflare Workers AI as built-in providers. Amazon Bedrock sessions can now authenticate via bearer tokens, enabling Converse API access without local SigV4 credentials. Azure Cognitive Services endpoints are supported for OpenAI deployments. The provider ecosystem now covers virtually every major LLM provider and inference platform.

Extensions System

Perhaps the biggest architectural addition is a full extensions system. Extensions can now customize the working indicator, add autocomplete providers (stacked on top of built-in slash and path completion), control working row visibility, inject system prompt modifications, and hook into session lifecycle events. The terminate: true tool result option lets custom tools end on a final tool call without paying for an automatic follow-up LLM turn.

SDK and Embedding

Pi is no longer just a CLI tool. The @mariozechner/pi-coding-agent package exports a full SDK with createAgentSession(), RPC client, and tool factory functions. You can build custom coding agent experiences on top of Pi’s infrastructure. The pi-mom Slack bot and pi-web-ui package demonstrate alternative frontends built on the same core.

New Models and Features

Pi added support for GPT-5.5 Codex with xhigh reasoning, configurable thinking levels, and priority-tier pricing. Session management gained /clone (duplicate active branch into new session), improved /fork with position control, and session sharing via exported HTML with full ANSI rendering. The pi-tui framework gained configurable keybindings, image width controls, and terminal progress indicators (OSC 9;4).

Community and Session Sharing

Zechner launched an initiative to share open-source coding agent sessions on Hugging Face via badlogic/pi-share-hf. The argument is compelling: real-world session data — including failures, tool use patterns, and fixes — is far more valuable for improving coding agents than synthetic benchmarks. He publishes his own pi-mono development sessions publicly.

The Benchmarks Hold Up

In the original article, Zechner submitted Terminal-Bench 2.0 results showing Pi with Claude Opus 4.5 competing against Codex, Cursor, Windsurf, and other harnesses with their native models. The results were strong enough to challenge the assumption that you need massive system prompts and extensive tooling for good performance.

Notably, Terminus 2 — the Terminal-Bench team’s own minimal agent that gives the model a raw tmux session — also held its own against agents with far more sophisticated tooling. This reinforces a point Zechner makes: models know how to code. They don’t need us to hold their hand with 10,000 tokens of instructions.

What This Means for the Ecosystem

Pi’s success challenges several assumptions that have become conventional wisdom in the coding agent space:

  • Bigger system prompts ≠ better performance. Under 1,000 tokens can match or beat 10,000+ token prompts on frontier models.
  • More tools ≠ more capable agent. Four tools (read, write, edit, bash) are sufficient for effective coding. Additional tools often add more context overhead than capability.
  • MCP is often the wrong abstraction. CLI tools with README files provide the same functionality with progressive disclosure and zero context overhead.
  • Observability beats automation. Being able to see what the agent is doing — every tool call, every file read, every decision — is more valuable than automated sub-agent orchestration that hides context.
  • Stability beats features. A tool that works the same way every session, with a prompt that doesn’t change between releases, enables developers to build reliable workflows.

The Bigger Picture

Pi isn’t just a coding agent — it’s a philosophical statement about how AI developer tools should be built. The lesson isn’t that everyone should build their own agent. It’s that the industry’s trajectory toward ever-more-complex tools with ever-larger context footprints may be heading in the wrong direction.

As models get smarter, the scaffolding around them should get simpler, not more complex. Pi proves that with good context engineering, a minimal toolset, and full observability, you can match or exceed the performance of tools that are orders of magnitude more complex.

The project is MIT-licensed, actively maintained, and welcomes contributions — though Zechner is upfront about being “dictatorial” with PR reviews. If Pi doesn’t fit your needs, he genuinely encourages forking. Check it out at github.com/badlogic/pi-mono.

Leave a Reply

Your email address will not be published. Required fields are marked *