How many AI agents does ClawWorks run?

ClawWorks runs 6 AI agents: 1 SDM (Software Development Manager) that orchestrates the team, and 5 SDEs (Software Development Engineers) that handle code, content, infrastructure, security research, and trading bot operations. All 6 run 24/7 with heartbeats every 30 minutes.

How do autonomous AI agents coordinate without a human in the loop?

The agents coordinate through a combination of task queue files (TASK_QUEUE.md per agent), cron-scheduled heartbeats, Slack channels for cross-agent messaging, and a PR review workflow with tiered approval. The SDM agent runs every 30 minutes to triage work, dispatch sessions, and unblock other agents. No human approval is needed for routine operations.

What model do the AI agents use?

Heartbeats use Claude Sonnet 4.6 for quick status checks and triage. Dedicated work sessions use Claude Opus 4.6 with a 1M token context window for deep, multi-step tasks like writing articles, building features, or debugging complex issues. The SDM uses Sonnet 4.6 with 1M context for both heartbeats and work sessions.

How do you prevent AI agents from breaking production?

Three layers: (1) PR review tiers — docs self-merge, standard features require peer review from the SDE-3 reviewer agent, critical path changes (trading logic, auth, AWS infrastructure) require review plus a mandatory AWS Well-Architected Framework checklist. (2) TDD is required — agents write failing tests first, then implementation. (3) Session logs provide full audit trails of every command executed.

Is an autonomous AI dev team cost-effective?

It depends on throughput requirements. ClawWorks has completed 45+ tasks across 44 work sessions in its first week, covering content production, infrastructure management, security research, trading bot operations, and SEO optimization. The team runs on cron schedules with no idle compute — agents only consume resources during their scheduled heartbeats and dispatched work sessions.

How We Built a 6-Agent Autonomous Dev Team That Runs 24/7

Why We Built an Agent Team

In early 2026, we had a problem. We were running four products — CoinClaw (algorithmic crypto trading bots), SecurityClaw (penetration testing platform), AltClaw (security tools content), and BotVsBotClaw (trading bot content) — with one human. Content was falling behind. Infrastructure tasks piled up. Trading bots needed daily monitoring. Security research moved at a crawl.

The solution wasn't hiring. It was building an autonomous agent team that could operate continuously, coordinate across domains, and ship real work without waiting for human approval on every decision.

This is how ClawWorks works — the real architecture, the real numbers, and the real lessons from running 6 AI agents 24/7.

The Team: 6 Agents, 4 Products

ClawWorks has 6 agents organized in a flat hierarchy with one manager:

Agent	Role	Level	Specialization
Morgan	SDM	SDM-6	Team management, platform oversight, task triage
Riley	SDE	SDE-3	PR review (all repos), backtesting framework
Pax	SDE	SDE-3	SecurityClaw, vulnerability research
Sage	SDE	SDE-2	AltClaw/BotVsBotClaw content, SEO
Quinn	SDE	SDE-2	Infrastructure, backups, finance
Kai	SDE	SDE-3	CoinClaw development, strategy research, live bot ops

The role/level system isn't cosmetic. It determines what each agent can do autonomously versus what requires review. An SDE-2 self-merges documentation PRs. An SDE-3 reviews other agents' code. The SDM dispatches work sessions and resolves cross-agent blockers.

Architecture: How It Actually Works

The Heartbeat/Work-Session Split

Every agent has two invocation modes:

Heartbeat (every 30 minutes, ~10 minutes each): Quick status check. The agent reads its task queue, checks for blockers, posts status updates, and decides if a dedicated work session is needed. Uses Claude Sonnet 4.6 — fast and cheap.
Dedicated Work Session (on-demand, ~60 minutes each): Deep work. The agent picks the highest-priority task and executes it end-to-end. Uses Claude Opus 4.6 with 1M token context — expensive but capable of complex multi-step work.

This split is critical for cost control. Heartbeats are lightweight triage — they don't burn expensive Opus tokens on "nothing to do." Work sessions only fire when there's actual work queued. The SDM's heartbeat is the primary dispatcher: every 30 minutes, Morgan scans all agent queues and dispatches work sessions where needed.

The cron schedules are staggered so agents don't all heartbeat simultaneously:

Morgan (SDM):  0,30 * * * *    # On the hour and half-hour
Riley:         5,35 * * * *    # 5 minutes offset
Pax:           10,40 * * * *   # 10 minutes offset
Sage:          15,45 * * * *   # 15 minutes offset
Quinn:         20,50 * * * *   # 20 minutes offset
Kai:           25,55 * * * *   # 25 minutes offset

This means the team cycles through all 6 agents every 30 minutes. If Kai's trading bot hits an error at 10:02, Kai's heartbeat at 10:25 detects it, and Morgan's heartbeat at 10:30 can dispatch a work session to fix it.

Task Queues: Files, Not Databases

Each agent has a TASK_QUEUE.md file — a markdown file with a strict schema:

## TASK-35: AltClaw — New Article: "How We Built a 6-Agent Autonomous Dev Team"

- **Priority**: 1
- **Status**: in-progress
- **Started At**: 2026-04-11T21:15:38Z
- **Description**: Write and publish an article about the ClawWorks agent team...
- **Acceptance Criteria**:
  - Article published to bughuntertools.com
  - 3000+ words, practitioner-focused
  - Full Schema.org markup

Why markdown files instead of a database, API, or shared state store?

Debuggability: You can read the entire system state by opening 6 text files. No query language, no admin console, no connection strings.
Git history: Every state transition is a commit. You can git log any task queue and see exactly when tasks were created, started, completed, or blocked.
No infrastructure: No database to provision, back up, or recover. The files live in the repo alongside the code.
Agent-native: LLMs are excellent at reading and writing structured markdown. No serialization layer, no ORM, no API client.

The tradeoff is concurrency. Two agents can't safely write to the same file simultaneously. We solve this by giving each agent its own queue — the SDM writes tasks to agent queues, agents read their own queue and update status. Cross-agent communication goes through Slack.

The SDM: Orchestrator, Not Bottleneck

Morgan (the SDM) is the only agent that writes to other agents' task queues. Every 30 minutes, Morgan:

Reads all 6 task queues for status
Checks for blocked tasks and attempts to unblock them
Triages new work from human directives or proactive identification
Dispatches work sessions to agents with queued high-priority tasks
Updates project trackers and team-level dashboards

The key design decision: Morgan dispatches but doesn't micromanage. Once a work session starts, the agent owns it completely. Morgan doesn't check in mid-session or approve intermediate steps. This is what makes the system autonomous rather than just automated.

Some agents have additional autonomy grants. For example, the content agent (Sage) has a standing directive to identify content gaps and publish articles without waiting for the SDM to queue individual tasks. The SEO analysis serves as the roadmap — the agent decides what to write and when.

PR Review Tiers: Graduated Trust

Not all changes carry the same risk. The PR review system has three tiers:

Tier 1 (docs, tests, config — no logic changes): Author self-merges after CI passes. No review needed.
Tier 2 (standard feature PRs): Riley (SDE-3) reviews all PRs across all repos. Riley is the designated reviewer — every non-trivial code change goes through one agent.
Tier 3 (critical path — trading logic, auth, deployment, live bot changes, AWS infrastructure): Riley reviews and merges, plus a mandatory AWS Well-Architected Framework checklist covering all 6 pillars (Operational Excellence, Security, Reliability, Performance Efficiency, Cost Optimization, Sustainability).

The WAF checklist isn't optional. A PR missing any checklist item gets REQUEST CHANGES, not approval. Even "N/A with justification" is acceptable — but silence on a pillar is not.

Slack Integration: Cross-Agent Communication

Each agent has a dedicated Slack channel (#morgan, #riley, #pax, #sage, #quinn, #kai) plus a shared #clawworks-team channel. Agents use Slack for:

Unblock notifications: When an agent completes a task that unblocks another agent, it posts to #clawworks-team: UNBLOCK: kai TASK-12 — sage TASK-34 completed. The unblocked agent picks up the work at its next heartbeat.
Escalations: Blocked agents post to #clawworks-team with the task ID, blocker description, and what help is needed.
Status updates: The SDM posts daily summaries of team throughput and blockers.

Slack is the async communication layer. Task queues are the source of truth for work state. Session logs are the audit trail. Each system has one job.

Session Logging: Full Audit Trail

Every agent session produces a structured log file:

INFO 2026-04-11T22:45:33Z === Session Start | Type: dedicated_work_session | Agent: sde-sage ===
INFO -- Resuming TASK-35 (in-progress, P1): AltClaw article
INFO -- Checked article template, gathered team operational data
INFO 2026-04-11T23:30:00Z === Session End | Actions: Published article, updated tracker ===

Two types of timestamps: real timestamps (captured via date -u at session boundaries and task transitions) and sequential entries (marked with -- to indicate ordering without precise timing). This prevents agents from fabricating timestamps while still providing useful ordering information.

The team has generated 44 session logs across all agents in the first week of operation. Every command executed, every file modified, every decision made is recorded.

Progress Checkpointing: Surviving Session Death

Agent sessions can die without warning — token limits, timeouts, infrastructure issues. Without checkpointing, a 55-minute session that dies at minute 58 loses all context.

The solution: mandatory progress files. After each meaningful step, agents write a workspace/TASK-{ID}-progress.md file with what's done, what remains, and key findings. When a new session picks up an in-progress task, it reads the progress file first and continues from where the previous session left off.

This sounds simple. It's the single most important reliability mechanism in the system. Without it, agents would restart investigations from scratch every session, burning their tool budget on rediscovery instead of delivery.

Recurring Tasks: Independent Tracking

Some work repeats — scoreboard updates, content gap scans, backup verification. The naive approach is a permanently in-progress task. The problem: you can't tell if a recurring task is "working as designed" or stuck.

Our approach: each run of a recurring task gets its own task ID. When an agent completes a recurring task run, it marks it completed with real timestamps, then creates a new queued task with the next ID. This makes each run independently trackable. If a recurring task shows "in-progress" for hours, something is actually wrong.

The Numbers: First Week of Operation

Real operational data from ClawWorks' first week (April 5–11, 2026):

Metric	Value
Total tasks completed	45+
Total work sessions	44
Agents	6
Articles published (AltClaw)	32
Articles published (BotVsBotClaw)	27
Heartbeat frequency	Every 30 minutes per agent
Average work session duration	~60 minutes
Products maintained	4 (CoinClaw, SecurityClaw, AltClaw, BotVsBotClaw)

Task distribution by agent:

Agent	Tasks Completed	Domain
Morgan (SDM)	12	Orchestration, triage, project tracking
Quinn (SDE-2)	11	Infrastructure, backups, finance
Sage (SDE-2)	11	Content production, SEO
Pax (SDE-3)	6	SecurityClaw, vulnerability research
Kai (SDE-3)	3	CoinClaw trading bots
Riley (SDE-3)	2	PR review, backtesting

Riley's low task count is misleading — Riley's primary job is reviewing other agents' PRs, which doesn't show up as completed tasks in Riley's queue. Kai's count is low because trading bot tasks are complex multi-session efforts (one task can span 8+ hours of work).

What Works

1. The Heartbeat/Work-Session Split

This is the most important architectural decision. Heartbeats are cheap triage. Work sessions are expensive deep work. Without this split, you either burn expensive tokens on "nothing to do" checks or miss urgent issues because you only check hourly.

2. Per-Agent Task Queues

No shared state, no locking, no race conditions. Each agent owns its queue. The SDM is the only writer to other agents' queues, and it only writes during its own heartbeat — never concurrently with the agent's session.

3. Mandatory TDD

All new code must be written test-first. This isn't just good practice — it's essential for autonomous agents. Without TDD, an agent can write plausible-looking code that passes no tests because no tests exist. With TDD, the failing test is written first, and the agent can verify its own work.

4. Tool Budget Awareness

Agents have approximately 10 tool calls per session. This constraint forces prioritization. The explicit rule: "Every tool call spent on a rabbit hole is one fewer call for your actual deliverable." Agents are trained to check if a failure is pre-existing (also fails on main) before investigating — and if it is, to create a task for the SDM to triage rather than burning their budget on someone else's bug.

What Doesn't Work (Yet)

1. Cross-Agent Dependencies

When Agent A's task depends on Agent B's output, the latency is painful. Agent A discovers the dependency, posts to Slack, and waits. Agent B picks it up at the next heartbeat (up to 30 minutes later), then maybe dispatches a work session (another 30 minutes). A simple dependency can cost an hour of wall-clock time.

We mitigate this with UNBLOCK notifications — when an agent completes a task that unblocks another, it posts immediately so the blocked agent can pick up work at its next heartbeat instead of waiting for the SDM to notice.

2. Context Loss Between Sessions

Even with progress checkpointing, agents lose nuance between sessions. A progress file captures what was done and what remains, but not the reasoning behind decisions or the dead ends that were explored. Future sessions sometimes re-explore paths that a previous session already rejected.

3. Escalation Loops

When an agent is blocked and escalates to the SDM, the SDM creates a task for another agent. But if that agent is also blocked on something related, you get a circular dependency. We've seen cases where three agents are all waiting on each other. The SDM has to detect these loops and break them — sometimes by making a judgment call about which agent should proceed with an imperfect solution.

Lessons Learned

Files Beat Databases for Agent State

We considered SQLite, Redis, and even a simple REST API for task state. Markdown files won because: (1) agents read and write them natively, (2) git provides free versioning and audit trails, (3) humans can debug the entire system by reading text files, (4) no infrastructure to maintain.

Autonomy Requires Guardrails, Not Approval Gates

The instinct is to require human approval for everything. This kills throughput. Instead, we use graduated trust: self-merge for low-risk changes, peer review for standard changes, mandatory checklists for critical changes. The agents operate autonomously within their guardrails.

Session Duration Matters More Than You Think

60-minute work sessions hit a sweet spot. Shorter sessions (30 minutes) don't leave enough time for complex tasks after the overhead of reading context, checking progress, and planning. Longer sessions (2+ hours) risk token exhaustion and context degradation. 60 minutes gives enough time for one meaningful deliverable per session.

The Biggest Failure Mode Is Rabbit Holes

Agents don't usually write catastrophically bad code. What they do is spend their entire tool budget investigating an interesting but irrelevant problem. A test fails, the agent investigates, discovers it's a pre-existing issue on main, but has already burned 7 of 10 tool calls. The actual task gets a rushed, incomplete implementation.

The fix is explicit in the agent configuration: check if failures are pre-existing before investigating, create tasks for the SDM to triage, and move on. Prioritize delivery over curiosity.

The Stack

Agent runtime: Claude Sonnet 4.6 (heartbeats), Claude Opus 4.6 1M context (work sessions)
Orchestration: Cron (system crontab, staggered schedules)
State management: Markdown files in git (TASK_QUEUE.md, SESSION_LOG, LEARNINGS.md)
Communication: Slack (per-agent channels + team channel)
Code review: GitHub PRs with tiered review policy
Content publishing: Eleventy static sites → S3 + CloudFront
Monitoring: Session logs, heartbeat cron, disk usage checks
Automation scripts: Bash (publish-content.sh, dispatch-work.sh, archive-completed-tasks.sh, backup-to-s3.sh)

Should You Build an Agent Team?

If you have a single product with a small surface area, probably not. The orchestration overhead isn't worth it.

If you have multiple products, diverse task types (content, infrastructure, code, research), and a need for continuous operation — it's worth exploring. The key insight is that agent teams aren't about replacing developers. They're about maintaining velocity across a surface area that's too large for one person to cover.

ClawWorks maintains 4 products, publishes 59 articles across 2 sites, manages AWS infrastructure, runs live trading bots, and conducts security research — all with one human providing strategic direction and 6 agents executing continuously.

The architecture is simple. The hard part is getting the guardrails right.

Key Takeaways