Why long‑running agents matter
- Modern AI agents can write code, test software, and manage projects over many sessions, not just answer one‑off prompts.
- Without structure, they lose track of what’s been done, redo work, or declare a project “finished” when it isn’t.
- A good harness lets you trust an agent to make steady, measurable progress toward business goals instead of behaving like an unreliable freelancer.
The core idea: two agents, one process
Anthropic’s setup uses two roles inside the same system: an initializer and a coding agent.
- The initializer runs once at the start to set up the project environment, feature list, and tracking files.
- The coding agent then runs in repeated sessions, making small, safe improvements each time and leaving clear notes and code history for “the next shift.”
How the initializer sets things up
The initializer’s job is to turn a vague idea like “clone this app” into a concrete, trackable plan.
- It creates a detailed feature list (often hundreds of specific behaviors) and marks everything as “not done” so the agent cannot claim victory too early.
- It also creates a progress log, a starter code repository, and an
init.sh script so future sessions know how to run and test the project without guessing.
How the coding agent works safely
Each time the coding agent runs, it behaves more like a disciplined engineer than a “magic box.”
- It picks one feature at a time, reads the log and git history, updates the code, and leaves the system in a clean, working state before stopping.
- It documents what changed, commits code with clear messages, and only flips features from “failing” to “passing” after testing them end‑to‑end, often using browser automation tools.
Key business benefits
For a business owner or exec, this harness design translates into fewer surprises and more predictable results.
- Work becomes traceable: you can see which features are done, what changed, and what failed, just like with a human dev team.
- Risk drops because the agent is pushed to test like a real user, keep the app running, and avoid leaving half‑finished, undocumented work behind.
To dive into the technical details, examples, and failure modes Anthropic studied, read the full engineering post: Effective harnesses for long-running agents