Key Takeaways

Why We Built an Agent Team

In early 2026, we had a problem. We were running four products — CoinClaw (algorithmic crypto trading bots), SecurityClaw (penetration testing platform), AltClaw (security tools content), and BotVsBotClaw (trading bot content) — with one human. Content was falling behind. Infrastructure tasks piled up. Trading bots needed daily monitoring. Security research moved at a crawl.

The solution wasn't hiring. It was building an autonomous agent team that could operate continuously, coordinate across domains, and ship real work without waiting for human approval on every decision.

This is how ClawWorks works — the real architecture, the real numbers, and the real lessons from running 6 AI agents 24/7.

The Team: 6 Agents, 4 Products

ClawWorks has 6 agents organized in a flat hierarchy with one manager:

AgentRoleLevelSpecialization
MorganSDMSDM-6Team management, platform oversight, task triage
RileySDESDE-3PR review (all repos), backtesting framework
PaxSDESDE-3SecurityClaw, vulnerability research
SageSDESDE-2AltClaw/BotVsBotClaw content, SEO
QuinnSDESDE-2Infrastructure, backups, finance
KaiSDESDE-3CoinClaw development, strategy research, live bot ops

The role/level system isn't cosmetic. It determines what each agent can do autonomously versus what requires review. An SDE-2 self-merges documentation PRs. An SDE-3 reviews other agents' code. The SDM dispatches work sessions and resolves cross-agent blockers.

Architecture: How It Actually Works

The Heartbeat/Work-Session Split

Every agent has two invocation modes:

  1. Heartbeat (every 30 minutes, ~10 minutes each): Quick status check. The agent reads its task queue, checks for blockers, posts status updates, and decides if a dedicated work session is needed. Uses Claude Sonnet 4.6 — fast and cheap.
  2. Dedicated Work Session (on-demand, ~60 minutes each): Deep work. The agent picks the highest-priority task and executes it end-to-end. Uses Claude Opus 4.6 with 1M token context — expensive but capable of complex multi-step work.

This split is critical for cost control. Heartbeats are lightweight triage — they don't burn expensive Opus tokens on "nothing to do." Work sessions only fire when there's actual work queued. The SDM's heartbeat is the primary dispatcher: every 30 minutes, Morgan scans all agent queues and dispatches work sessions where needed.

The cron schedules are staggered so agents don't all heartbeat simultaneously:

Morgan (SDM):  0,30 * * * *    # On the hour and half-hour
Riley:         5,35 * * * *    # 5 minutes offset
Pax:           10,40 * * * *   # 10 minutes offset
Sage:          15,45 * * * *   # 15 minutes offset
Quinn:         20,50 * * * *   # 20 minutes offset
Kai:           25,55 * * * *   # 25 minutes offset

This means the team cycles through all 6 agents every 30 minutes. If Kai's trading bot hits an error at 10:02, Kai's heartbeat at 10:25 detects it, and Morgan's heartbeat at 10:30 can dispatch a work session to fix it.

Task Queues: Files, Not Databases

Each agent has a TASK_QUEUE.md file — a markdown file with a strict schema:

## TASK-35: AltClaw — New Article: "How We Built a 6-Agent Autonomous Dev Team"

- **Priority**: 1
- **Status**: in-progress
- **Started At**: 2026-04-11T21:15:38Z
- **Description**: Write and publish an article about the ClawWorks agent team...
- **Acceptance Criteria**:
  - Article published to bughuntertools.com
  - 3000+ words, practitioner-focused
  - Full Schema.org markup

Why markdown files instead of a database, API, or shared state store?

The tradeoff is concurrency. Two agents can't safely write to the same file simultaneously. We solve this by giving each agent its own queue — the SDM writes tasks to agent queues, agents read their own queue and update status. Cross-agent communication goes through Slack.

The SDM: Orchestrator, Not Bottleneck

Morgan (the SDM) is the only agent that writes to other agents' task queues. Every 30 minutes, Morgan:

  1. Reads all 6 task queues for status
  2. Checks for blocked tasks and attempts to unblock them
  3. Triages new work from human directives or proactive identification
  4. Dispatches work sessions to agents with queued high-priority tasks
  5. Updates project trackers and team-level dashboards

The key design decision: Morgan dispatches but doesn't micromanage. Once a work session starts, the agent owns it completely. Morgan doesn't check in mid-session or approve intermediate steps. This is what makes the system autonomous rather than just automated.

Some agents have additional autonomy grants. For example, the content agent (Sage) has a standing directive to identify content gaps and publish articles without waiting for the SDM to queue individual tasks. The SEO analysis serves as the roadmap — the agent decides what to write and when.

PR Review Tiers: Graduated Trust

Not all changes carry the same risk. The PR review system has three tiers:

The WAF checklist isn't optional. A PR missing any checklist item gets REQUEST CHANGES, not approval. Even "N/A with justification" is acceptable — but silence on a pillar is not.

Slack Integration: Cross-Agent Communication

Each agent has a dedicated Slack channel (#morgan, #riley, #pax, #sage, #quinn, #kai) plus a shared #clawworks-team channel. Agents use Slack for:

Slack is the async communication layer. Task queues are the source of truth for work state. Session logs are the audit trail. Each system has one job.

Session Logging: Full Audit Trail

Every agent session produces a structured log file:

INFO 2026-04-11T22:45:33Z === Session Start | Type: dedicated_work_session | Agent: sde-sage ===
INFO -- Resuming TASK-35 (in-progress, P1): AltClaw article
INFO -- Checked article template, gathered team operational data
INFO 2026-04-11T23:30:00Z === Session End | Actions: Published article, updated tracker ===

Two types of timestamps: real timestamps (captured via date -u at session boundaries and task transitions) and sequential entries (marked with -- to indicate ordering without precise timing). This prevents agents from fabricating timestamps while still providing useful ordering information.

The team has generated 44 session logs across all agents in the first week of operation. Every command executed, every file modified, every decision made is recorded.

Progress Checkpointing: Surviving Session Death

Agent sessions can die without warning — token limits, timeouts, infrastructure issues. Without checkpointing, a 55-minute session that dies at minute 58 loses all context.

The solution: mandatory progress files. After each meaningful step, agents write a workspace/TASK-{ID}-progress.md file with what's done, what remains, and key findings. When a new session picks up an in-progress task, it reads the progress file first and continues from where the previous session left off.

This sounds simple. It's the single most important reliability mechanism in the system. Without it, agents would restart investigations from scratch every session, burning their tool budget on rediscovery instead of delivery.

Recurring Tasks: Independent Tracking

Some work repeats — scoreboard updates, content gap scans, backup verification. The naive approach is a permanently in-progress task. The problem: you can't tell if a recurring task is "working as designed" or stuck.

Our approach: each run of a recurring task gets its own task ID. When an agent completes a recurring task run, it marks it completed with real timestamps, then creates a new queued task with the next ID. This makes each run independently trackable. If a recurring task shows "in-progress" for hours, something is actually wrong.

The Numbers: First Week of Operation

Real operational data from ClawWorks' first week (April 5–11, 2026):

MetricValue
Total tasks completed45+
Total work sessions44
Agents6
Articles published (AltClaw)32
Articles published (BotVsBotClaw)27
Heartbeat frequencyEvery 30 minutes per agent
Average work session duration~60 minutes
Products maintained4 (CoinClaw, SecurityClaw, AltClaw, BotVsBotClaw)

Task distribution by agent:

AgentTasks CompletedDomain
Morgan (SDM)12Orchestration, triage, project tracking
Quinn (SDE-2)11Infrastructure, backups, finance
Sage (SDE-2)11Content production, SEO
Pax (SDE-3)6SecurityClaw, vulnerability research
Kai (SDE-3)3CoinClaw trading bots
Riley (SDE-3)2PR review, backtesting

Riley's low task count is misleading — Riley's primary job is reviewing other agents' PRs, which doesn't show up as completed tasks in Riley's queue. Kai's count is low because trading bot tasks are complex multi-session efforts (one task can span 8+ hours of work).

What Works

1. The Heartbeat/Work-Session Split

This is the most important architectural decision. Heartbeats are cheap triage. Work sessions are expensive deep work. Without this split, you either burn expensive tokens on "nothing to do" checks or miss urgent issues because you only check hourly.

2. Per-Agent Task Queues

No shared state, no locking, no race conditions. Each agent owns its queue. The SDM is the only writer to other agents' queues, and it only writes during its own heartbeat — never concurrently with the agent's session.

3. Mandatory TDD

All new code must be written test-first. This isn't just good practice — it's essential for autonomous agents. Without TDD, an agent can write plausible-looking code that passes no tests because no tests exist. With TDD, the failing test is written first, and the agent can verify its own work.

4. Tool Budget Awareness

Agents have approximately 10 tool calls per session. This constraint forces prioritization. The explicit rule: "Every tool call spent on a rabbit hole is one fewer call for your actual deliverable." Agents are trained to check if a failure is pre-existing (also fails on main) before investigating — and if it is, to create a task for the SDM to triage rather than burning their budget on someone else's bug.

What Doesn't Work (Yet)

1. Cross-Agent Dependencies

When Agent A's task depends on Agent B's output, the latency is painful. Agent A discovers the dependency, posts to Slack, and waits. Agent B picks it up at the next heartbeat (up to 30 minutes later), then maybe dispatches a work session (another 30 minutes). A simple dependency can cost an hour of wall-clock time.

We mitigate this with UNBLOCK notifications — when an agent completes a task that unblocks another, it posts immediately so the blocked agent can pick up work at its next heartbeat instead of waiting for the SDM to notice.

2. Context Loss Between Sessions

Even with progress checkpointing, agents lose nuance between sessions. A progress file captures what was done and what remains, but not the reasoning behind decisions or the dead ends that were explored. Future sessions sometimes re-explore paths that a previous session already rejected.

3. Escalation Loops

When an agent is blocked and escalates to the SDM, the SDM creates a task for another agent. But if that agent is also blocked on something related, you get a circular dependency. We've seen cases where three agents are all waiting on each other. The SDM has to detect these loops and break them — sometimes by making a judgment call about which agent should proceed with an imperfect solution.

Lessons Learned

Files Beat Databases for Agent State

We considered SQLite, Redis, and even a simple REST API for task state. Markdown files won because: (1) agents read and write them natively, (2) git provides free versioning and audit trails, (3) humans can debug the entire system by reading text files, (4) no infrastructure to maintain.

Autonomy Requires Guardrails, Not Approval Gates

The instinct is to require human approval for everything. This kills throughput. Instead, we use graduated trust: self-merge for low-risk changes, peer review for standard changes, mandatory checklists for critical changes. The agents operate autonomously within their guardrails.

Session Duration Matters More Than You Think

60-minute work sessions hit a sweet spot. Shorter sessions (30 minutes) don't leave enough time for complex tasks after the overhead of reading context, checking progress, and planning. Longer sessions (2+ hours) risk token exhaustion and context degradation. 60 minutes gives enough time for one meaningful deliverable per session.

The Biggest Failure Mode Is Rabbit Holes

Agents don't usually write catastrophically bad code. What they do is spend their entire tool budget investigating an interesting but irrelevant problem. A test fails, the agent investigates, discovers it's a pre-existing issue on main, but has already burned 7 of 10 tool calls. The actual task gets a rushed, incomplete implementation.

The fix is explicit in the agent configuration: check if failures are pre-existing before investigating, create tasks for the SDM to triage, and move on. Prioritize delivery over curiosity.

The Stack

Should You Build an Agent Team?

If you have a single product with a small surface area, probably not. The orchestration overhead isn't worth it.

If you have multiple products, diverse task types (content, infrastructure, code, research), and a need for continuous operation — it's worth exploring. The key insight is that agent teams aren't about replacing developers. They're about maintaining velocity across a surface area that's too large for one person to cover.

ClawWorks maintains 4 products, publishes 59 articles across 2 sites, manages AWS infrastructure, runs live trading bots, and conducts security research — all with one human providing strategic direction and 6 agents executing continuously.

The architecture is simple. The hard part is getting the guardrails right.

Advertisement