AI Agents
December 23, 2025
Harnessing long-running AI agents for real-world business projects
Author Image
Andrew Byrd
Author Image
Anthropic
Blog Image

Why long‑running agents matter

  • Modern AI agents can write code, test software, and manage projects over many sessions, not just answer one‑off prompts.​
  • Without structure, they lose track of what’s been done, redo work, or declare a project “finished” when it isn’t.​
  • A good harness lets you trust an agent to make steady, measurable progress toward business goals instead of behaving like an unreliable freelancer.​

The core idea: two agents, one process

Anthropic’s setup uses two roles inside the same system: an initializer and a coding agent.​

  • The initializer runs once at the start to set up the project environment, feature list, and tracking files.​
  • The coding agent then runs in repeated sessions, making small, safe improvements each time and leaving clear notes and code history for “the next shift.”​

How the initializer sets things up

The initializer’s job is to turn a vague idea like “clone this app” into a concrete, trackable plan.​

  • It creates a detailed feature list (often hundreds of specific behaviors) and marks everything as “not done” so the agent cannot claim victory too early.​
  • It also creates a progress log, a starter code repository, and an init.sh script so future sessions know how to run and test the project without guessing.​

How the coding agent works safely

Each time the coding agent runs, it behaves more like a disciplined engineer than a “magic box.”​

  • It picks one feature at a time, reads the log and git history, updates the code, and leaves the system in a clean, working state before stopping.​
  • It documents what changed, commits code with clear messages, and only flips features from “failing” to “passing” after testing them end‑to‑end, often using browser automation tools.​

Key business benefits

For a business owner or exec, this harness design translates into fewer surprises and more predictable results.​

  • Work becomes traceable: you can see which features are done, what changed, and what failed, just like with a human dev team.​
  • Risk drops because the agent is pushed to test like a real user, keep the app running, and avoid leaving half‑finished, undocumented work behind.​

To dive into the technical details, examples, and failure modes Anthropic studied, read the full engineering post: Effective harnesses for long-running agents

Continue Reading

Case Studies & Insights

Stay Ahead With Curated Industry Insights

Read More Articles