557 Milliseconds to CRITICAL
SecurityClaw's AI Campaign Engine Finds AWS Keys, Confirms IDOR, and Plans Its Next Move
Here's what most security scanners do: run a tool, print output, wait for a human to decide what to do next. Here's what SecurityClaw Phase B does: run a tool, read the output, decide what to run next, run that too, chain the findings, flag what needs human approval, and enforce its own scope rules — all automatically. We built four AI modules, built deliberately vulnerable targets to test them against, then ran the whole thing live. 557 milliseconds. 14 combined findings. 4/4 PASS. Here's exactly what happened — including the two secrets the AI missed.
📢 Affiliate Disclosure: This site contains affiliate links to Amazon. We earn a commission when you purchase through our links at no additional cost to you.
1. What Phase B Actually Shipped
Phase A gave SecurityClaw the ability to run individual tools. Phase B gives it the ability to think about which tools to run next.
Four modules shipped together, and they are designed to work as a system rather than independently:
| Module | Time | What It Does | Live Network? |
|---|---|---|---|
| B1 — Adaptive Campaign Graph | 4ms | Chains tools automatically based on what prior tools found. One finding triggers the next test without human decision. | No (mock executor) |
| B2 — JS Bundle Analyser | 401ms | Crawls a web app's JavaScript bundles and uses AI analysis to find secrets, exposed endpoints, and auth patterns. | ✅ Yes (real HTTP) |
| B3 — IDOR Scanner | 149ms | Probes API endpoints with two different auth tokens and diffs the responses to confirm cross-account data access. | ✅ Yes (real HTTP) |
| B4 — Campaign Director | 3ms | Uses AI to generate a multi-phase campaign plan: right tools, right order, approval gates on destructive tools, scope enforcement. | No (mock Bedrock) |
Total: 557ms. 4/4 PASS. 14 combined findings.
Two of the four modules (B2 and B3) made real HTTP requests against live demo servers. The other two (B1 and B4) validated logic and plumbing against controlled inputs. That distinction matters — and we'll be specific about it throughout.
2. The Targets: What We Planted and Why
We built two controlled Flask demo applications specifically for Phase B validation. Each was seeded with specific vulnerabilities that a real security campaign might encounter.
JS Bundle Target (port 5043)
A React-style single-page application with five planted findings spread across its JavaScript bundles:
- An AWS Access Key ID (
AKIApattern) embedded inmain.abc123.js - A client ID passed in an Authorization header — a risky API authentication pattern
- An internal admin endpoint (
/internal/admin/) hardcoded in the bundle - A debug configuration endpoint (
/debug/config) exposed in the bundle - A JWT secret embedded in a client-side config object
All five are real-world patterns we've seen in actual bug bounty targets. JavaScript bundles routinely ship secrets because bundlers faithfully include everything developers put in their source files — minification obscures variable names, but it preserves string values. The AKIA prefix of an AWS key survives minification intact.
IDOR Target (port 5044)
A Flask API with three endpoints and deliberately uneven access control:
/api/users/{id}— intentionally broken. Any authenticated user can read any profile, regardless of ownership./api/orders/{id}— correctly access-controlled. Only the order owner gets a 200 response./api/cards/{id}— BOLA pattern (client_id header alone allows enumeration). Included in the app, not in this scan's probe list.
The users endpoint vulnerability is about as textbook as IDOR gets. Before the automated scan ran, manual verification confirmed it: User B (token-b, user ID 43) made a request to /api/users/42 and received Alice's complete profile — including her email address alice@demo.test. The orders endpoint correctly rejected the same cross-user access attempt. We wanted to see whether the scanner would catch both the vulnerability and the correct control.
3. B1 — Adaptive Campaign Graph: The 3-Hop Chain
The Campaign Graph is the engine that makes Phase B qualitatively different from a tool runner. It doesn't execute a list — it evaluates rules and makes decisions.
The validation tested four things. The first two are the ones worth understanding in depth.
The Redis Chain (1 hop)
Rule: when nmap finds an open port with service=redis, trigger redis-unauthenticated-check automatically. The test simulated nmap producing an OPEN_PORT finding on port 6379. The graph engine evaluated its registered rules, matched the finding to the Redis chain rule, and scheduled the credential check — without any human decision in between. The check confirmed unauthenticated access.
That's one finding spawning one follow-up test. Routine chain depth. The interesting test was next.
The 3-Hop Chain: CRITICAL
This is the money shot.
gobuster → FILE_FOUND (backup.zip)
└─ file-content-analyser → CREDENTIAL_FOUND (MD5 hash in backup)
└─ hashcat → WEAK_CREDENTIAL (admin:password, CRITICAL)
Order: ["gobuster", "file-content-analyser", "hashcat"]
Final severity: CRITICAL
Three tools. Three hops. Zero human decisions between hop 1 and hop 3. The finding that triggers hop 1 (a backup zip) is unremarkable by itself. The hash extracted at hop 2 is interesting but incomplete. By hop 3, you have a cracked password and a CRITICAL finding.
This is how experienced human testers work: they follow the chain. gobuster finds a file → they download and inspect it → they crack what they find. The Campaign Graph automates that reasoning.
The other two checks validated infrastructure behaviour: the depth cap (max_depth=5 enforced, with logged confirmation that depth-6 was skipped) and deduplication (identical skill+target pairs rejected on subsequent attempts, preventing infinite loops). Both passed cleanly.
Honest note: B1 runs on mock executors in this validation. There's no live network traffic for B1 — the chains are tested with pre-defined finding sets. The graph logic is real. The network execution against a real target is the next validation tier.
4. B2 — JS Bundle Analyser: AWS Key in the React Bundle
B2 is the heavyweight module — 401ms of the total 557ms runtime, and the module that made real network requests and real AI calls.
What the pipeline actually did
The Flask demo server was running on port 5043. B2:
- Made a real HTTP GET to the app's HTML page and extracted two
<script>tag bundle URLs - Fetched both bundle files (1,572 bytes total) over real HTTP
- Split the bundle content into analysis chunks using tiktoken (1,500 token chunks with 150-token overlap)
- Sent each chunk to Claude (claude-3-haiku) for AI analysis
- Deduplicated the findings and returned the final results
What it found: 3 of 5 planted secrets
| Severity | Finding | Detail |
|---|---|---|
| CRITICAL | AWS Access Key in JS Bundle | AKIAIOSFODNN7EXAMPLE pattern found in main.abc123.js |
| HIGH | Admin endpoint exposed in JS bundle | /internal/admin/ path hardcoded in bundle |
| MEDIUM | Client ID in Authorization header | CONFIG.clientId passed as auth token — risky authentication pattern |
Score: 3/5. The AI found the most critical finding (the AWS key) and two meaningful supporting findings. The two misses — the JWT secret embedded in a config object and the debug endpoint — were not surfaced as separate findings. Most likely cause: both were split across chunk boundaries during tokenization, losing the surrounding context that would push them above the confidence threshold.
This is expected, not a failure. B2 feeds into the broader campaign — findings it surfaces become inputs to the Campaign Graph. An AWS key finding in a production target would immediately trigger an AWS credential validation check as a follow-up. The 3/5 rate is honest; the architecture compensates for it.
For practitioners running similar assessments manually, The Web Application Hacker's Handbook covers JavaScript analysis and client-side secret exposure in depth — it remains the foundational reference for anyone building web security tooling.
5. B3 — IDOR Scanner: User B Reading Alice's Email
IDOR scanning is one of the most tedious parts of manual security testing. The vulnerability requires probing endpoints with multiple auth tokens and comparing responses — repetitive, precise work that scales poorly by hand.
B3 automates the comparison. It runs 35+ probes per endpoint set, diffs the responses, and presents only the confirmed anomalies.
The vulnerability in plain language
User B is authenticated as user ID 43. User B should only be able to see their own profile at /api/users/43. But the endpoint has no ownership check — any valid authentication token can access any user ID. User B sends GET /api/users/42 with their own token, and receives back Alice's full profile: name, email (alice@demo.test), and account details.
That's the vulnerability. The scanner found it without any human directing it to look there.
What the scanner found: 6 findings across 2 severities
| Classification | Severity | Endpoint | Finding |
|---|---|---|---|
| IDOR_CONFIRMED | HIGH | /api/users/42 |
User B receives alice@demo.test |
| IDOR_CONFIRMED | HIGH | /api/users/43 |
User B receives alice@demo.test (cross-user) |
| IDOR_CONFIRMED | HIGH | /api/users/42 (adjacent) |
Adjacent ID probe confirms same pattern |
| IDOR_CONFIRMED | HIGH | /api/users/43 (adjacent) |
Adjacent ID probe confirms same pattern |
| IDOR_PARTIAL | MEDIUM | /api/orders/101 |
User B gets 200; User A gets 403 on same resource — worth investigating |
| IDOR_PARTIAL | MEDIUM | /api/orders/101 (adjacent) |
Adjacent probe confirms differential pattern |
What the scanner correctly filtered: /api/orders/100 returned 403 for User B (correct — the order belongs to User A and access is properly controlled). The scanner classified this as ACCESS_CONTROLLED and excluded it from findings output. That's the right call — correctly functioning access control is not a bug.
The IDOR automation here solves a real bug bounty problem. Hacking APIs by Corey Ball covers IDOR and BOLA testing extensively — these are consistently among the highest-rewarding finding classes in modern bug bounty programs, and the manual testing process is exactly the kind of systematic, token-swapping work that benefits most from automation.
6. B4 — Campaign Director: AI-Planned Attack Sequence
B4 answers the question that stumps every new bug hunter: given a target, where do I start, and in what order?
The Campaign Director uses an AI model to generate a multi-phase assessment plan. For a web application target (demo-corp.test), the AI returned a 5-skill plan across two phases:
Phase 1 (Passive Recon): shodan-intel, nmap-recon Phase 2 (Active Enumeration): gobuster-enum, js-bundle-analyzer, idor-scanner Approval status: shodan-intel requires_approval: False ✅ executes nmap-recon requires_approval: False ✅ executes gobuster-enum requires_approval: False ✅ executes js-bundle-analyzer requires_approval: False ✅ executes idor-scanner requires_approval: False ✅ executes sqlmap-injection requires_approval: True 🔒 GATED (destructive) hydra-bruteforce requires_approval: True 🔒 GATED (destructive)
Three things worth highlighting:
The AI included Phase B's own tools unprompted. The director wasn't told to use js-bundle-analyzer or idor-scanner. It recognised the target type (web application) and selected both tools as appropriate for Phase 2. That's the planning intelligence working correctly.
Approval gating is hardcoded, not AI-controlled. When the AI proposed including sqlmap-injection and hydra-bruteforce, the approval gate triggered automatically — not because the AI decided these were dangerous, but because they're hardcoded in the engine as requiring approval. The AI cannot override this. Destructive tools are gated. Period.
Scope enforcement works. The director tried to include evil.com as a target (deliberately out of scope). The engine logged: "Skill nmap-recon targets out-of-scope 'evil.com' — skipping." Only demo-corp.test and its subdomain admin.demo-corp.test made it into the final plan. The AI proposes; the engine enforces.
Honest note: The AI planning call in B4 uses a pre-staged JSON response — the Bedrock API wasn't called live during this validation run. The scope enforcement, approval gating, and filter logic are all live code running against that response. The downstream machinery is fully exercised; the AI generation step is mocked.
For anyone building out a bug bounty methodology, Bug Bounty Bootcamp by Vickie Li is the best structured walkthrough of how to think about scope, phased testing, and what to prioritise — exactly the logic the Campaign Director is encoding into automation.
7. The Numbers
| Module | Status | Duration | Findings | Live Network? |
|---|---|---|---|---|
| B1 — Campaign Graph | ✅ PASS | 4ms | 5 | No (mock executor) |
| B2 — JS Bundle Analyser | ✅ PASS | 401ms | 3 | ✅ Yes (real HTTP + AI) |
| B3 — IDOR Scanner | ✅ PASS | 149ms | 6 | ✅ Yes (real HTTP) |
| B4 — Campaign Director | ✅ PASS | 3ms | 5 skills planned | No (mock Bedrock) |
| Total | 4/4 PASS | 557ms | 14 combined | 2/4 modules fully live |
For context on what 557ms means: a single nmap SYN scan against a /24 subnet typically takes 30 seconds or more. SecurityClaw Phase B ran four AI-driven modules in under a second. The 401ms majority is B2 doing real HTTP requests and AI analysis — that's the honest cost of the most sophisticated module.
8. Honest Gaps: What It Missed and Why
This section is where we earn the trust of security professionals. A 3-out-of-5 detection rate with a clear explanation is more valuable than a 5-out-of-5 rate with no caveats.
B2: 2 of 5 planted secrets not surfaced
The JWT secret embedded in a config object and the debug endpoint at /debug/config were not individually flagged by B2. The most likely cause: chunk boundary splitting. When tiktoken divides the bundle into 1,500-token chunks with 150-token overlap, a multi-line config object can be split across two adjacent chunks — losing the surrounding context that would push the pattern above the AI's confidence threshold.
The fix: tighter chunking strategy with a larger overlap window, or a second-pass analysis that sends suspect near-boundary regions for review. Neither affects the CRITICAL finding (AWS key) or the HIGH finding (admin endpoint), which were found cleanly.
B3: Duplicate findings in output
The 6 IDOR findings include duplicates from adjacent-ID probing — 4 IDOR_CONFIRMED findings across 2 endpoints (2 per endpoint). This is expected scanner behaviour: adjacent ID testing is the correct way to confirm that the access control gap is systematic rather than accidental. But a deduplication pass on (endpoint, classification) before output would reduce the noise. Cosmetic issue, not a false positive problem.
B3: /api/cards/ BOLA not tested
The BOLA pattern on /api/cards/ exists in the demo target but wasn't included in this validation run's probe list. The tool supports it; the test didn't configure it. Known gap, documented.
B1 and B4: Mock executors / mock Bedrock
The graph logic (B1) and campaign planning logic (B4) are fully validated. The actual tool execution against a real network target (B1) and the live AI generation call (B4) are the next validation tier. Phase B validated that the plumbing is sound. The live hunt proves the pipes work under pressure.
Both of these gaps are scheduled for the next campaign run — a full Phase A+B toolchain hunt against a real bug bounty target.
9. Why the Architecture Matters for Bug Hunters
Before Phase B, SecurityClaw ran tools. After Phase B, SecurityClaw reasons.
The practical difference for a bug bounty hunter:
JS Bundle Analyser closes a gap that most scanners ignore entirely. React and Next.js applications routinely ship secrets in public bundles. Context-aware AI analysis catches patterns that regex misses — object property assignments, base64-encoded values, keys embedded in larger config structures. This is a genuine detection capability improvement over pattern-only tools.
IDOR Scanner automates the most tedious part of manual testing: cross-account probing. Instead of manually swapping auth headers and comparing responses endpoint by endpoint, SecurityClaw runs 35+ probes, diffs the responses, and surfaces only confirmed anomalies. IDOR and BOLA are consistently high-reward finding classes. The time cost of thorough manual testing has historically limited how many endpoints a hunter can realistically check on a large target. Automation removes that constraint.
Adaptive Campaign Graph means one finding doesn't die in a report — it spawns the next test. A Redis port becomes a credentials check. A backup file becomes a cracked hash. Bug bounty is about chaining findings into impact; the graph automates the chain traversal that turns individual findings into high-severity reports.
Campaign Director answers the hardest question a new bounty hunter faces: where do I start? The AI reads the target type, selects the right tools, orders the phases, and gates the dangerous ones until the operator approves them. For experienced hunters, it's a planning accelerator. For people learning the workflow, it's an evidence-based starting point rather than a blank page.
The Art of Software Security Assessment covers the source analysis techniques that B2's AI analysis draws from — it's the reference for anyone who wants to understand what the AI is looking for when it scans JavaScript bundles, and why certain patterns carry higher risk than others. If you're building similar tooling or trying to understand false positive rates, Chapter 6 and 7 are the most relevant sections.
The whole Phase B system ran in 557ms. For context: that's under the time it takes a single nmap SYN scan to complete its TCP handshake cycle on a responsive host. The intelligence that took time to build was the four modules. The runtime benefit is essentially free once they're in place.
10. Recommended Resources
If Phase B's capabilities interest you and you want to understand the underlying techniques at a deeper level:
- The Web Application Hacker's Handbook — Client-side analysis, JavaScript security, and the full spectrum of web application attack surface coverage. The foundational reference for the techniques B2 automates.
- Hacking APIs by Corey Ball — IDOR, BOLA, and API-specific vulnerability classes in depth. Directly relevant to B3's scanning methodology and the finding classes it surfaces.
- Bug Bounty Bootcamp by Vickie Li — Phased testing methodology, scope management, and the structured approach to target assessment that B4's Campaign Director encodes into automation.
- The Art of Software Security Assessment — Source analysis and vulnerability identification techniques. Useful context for understanding what B2's AI analysis is looking for and why the missed-secrets rate matters.
SecurityClaw Phase B is in active development. D22 captures the validation state as of March 2026 — Phase B complete, first live hunt in preparation. View all SecurityClaw demos →