Skip to main content
Insights5 min read

Why Your AI Agents Need a Program Manager

February 8, 2026 by Asif Waliuddin

AI
Why Your AI Agents Need a Program Manager

Why Your AI Agents Need a Program Manager

Multi-agent isn't the problem. Multi-tool is.

Inside a single environment like Claude Code, agents coordinate fine. They share orchestration and context. Run 20 subagents and they stay aligned because the tool manages that coordination for you.

The trouble starts when you run multiple AI tools on the same repo. Claude Code for refactors. Codex CLI for tests. Gemini CLI for research. Each tool operating with its own memory, its own assumptions, and no shared state between them.

I ran Claude Code and Codex CLI on the same codebase. Claude refactored a module. Codex updated tests for that same module against the pre-refactor interface. Both saved their changes. The tests failed. Neither tool knew the other existed.

I'd spent 23 years watching this exact scenario play out with human teams. The tools were individually capable. The coordination was nonexistent.

The Failure Modes Are Predictable

The first thing that breaks is files. Two tools edit the same file at the same time. One saves. The other saves. The first edit is gone. No locks, no warnings, no conflict detection. The tools don't know the other exists.

Then knowledge starts evaporating. You make an architectural decision during a morning session in Claude Code. Afternoon session in Codex CLI starts fresh. "What approach are we using for authentication?" You explain it again. Tomorrow, a third time in Gemini CLI. Every tool starts from zero because there is no shared memory between them. Same cognitive overload that enterprise teams fight with onboarding documents and wikis. Just renamed to "context rot."

Visibility goes next. Three terminals, three tools, no dashboard. Which one finished? Which one is stuck? Did Codex start tests before Claude finished refactoring? You don't know unless you check each terminal manually. You become the message bus between your own tools.

Governance is the quiet failure. Quality checks happen when you remember to run them. Code style is enforced at PR time, after the work is done. Security scanning is a separate step someone has to trigger. The gap between "work being done" and "work being checked" is where problems grow.

And context is wasted from the start. Each tool begins fresh. There is no shared memory of past decisions across tools. No shared understanding of conventions. No shared task list. Every tool is an individual contributor with amnesia.

These failures compound. A tool drifts from the spec because it doesn't know the spec exists in another tool's session. It edits a file another tool is working on. Nobody catches the conflict until the PR review. The decision to revert is made but never recorded. Next week, someone asks why that approach was abandoned, and nobody remembers.

The Structural Isomorphism

These five failure modes aren't new. They're the exact problems that enterprise program management solves for human teams.

Human Team FailureMulti-Tool AI Failure
Developers overwrite each other's workTools edit the same file simultaneously
Knowledge gets lost in handoffsDecisions disappear between tools
No visibility into cross-team progressNo dashboard across tools
Governance enforced at audit timeQuality checked at PR time
Context stays siloedEach tool starts from scratch

File locking is resource allocation. Knowledge capture is institutional memory. Visibility is program status reporting. Governance is quality gates. Shared context is the program plan.

The patterns that keep hundred-person programs from imploding (dependency tracking, knowledge management, cross-team coordination, governance enforcement) are exactly what multi-tool AI development needs. Remove the word "human" from a PM job description and it reads as a spec for multi-tool AI coordination.

This isn't an analogy stretched to fit. It's an isomorphism. The failure modes are structurally identical. The solutions should be too.

What We Built

Forge is a program manager for your AI tools. A coordination layer that provides the same functions a human PM provides to a human team.

File locking prevents tools from editing the same file simultaneously. A Rust-level lock manager tracks which tool holds which files. Others queue. File locking exists because I've watched teams lose days to conflicting edits.

A knowledge flywheel captures every decision, pattern, and learning from every tool session. Auto-classified. Searchable across sessions and across tools. Decisions made during a Claude Code session are available when Codex CLI starts the next day. After a week, starting a new session feels different. The context follows you. The knowledge flywheel exists because I've watched decisions evaporate between sprints.

Governance runs continuously. Health scoring, drift detection, quality gates, and security scanning happen in real-time. Not at PR time. Not when you remember to check. Governance hooks exist because I've watched audit findings pile up from shortcuts nobody caught in real time.

A task board with dependency tracking shows which tool is doing what, which tasks are blocked, and which are available. Tools check the board instead of waiting for instructions.

Three AI tools coordinated through a single state directory and MCP protocol: Claude Code via MCP stdio, Codex CLI and Gemini CLI via filesystem conventions. Each tool reads its native config format. No wrappers.

The plugin installs in 30 seconds. The Rust orchestrator adds file locking and knowledge capture: 4MB binary, 292 tests, zero runtime dependencies. The visual dashboard adds real-time governance and the Infinity Terminal, where sessions survive browser close, network drops, and server restarts.

4,434 tests across the platform. 31/31 launch gates passing.

The Bottom Line

The solution isn't better agents inside a single tool. Agents within Claude Code already coordinate fine. The solution is coordination infrastructure across tools.

The patterns exist. Program management has been solving this for decades. Someone just needed to apply them to multi-tool AI development.

I spent 23 years doing this with humans first.

Launching March 2.

GitHub · forge.nxtg.ai · Follow the build on X