SecurityClaw Phase C: The Scanner That Learned to Remember
📢 Affiliate Disclosure: This site contains affiliate links to Amazon. We earn a commission when you purchase through our links at no additional cost to you.
Every automated scanner runs the same playbook. Nmap, then Nuclei, then Gobuster. In that order. Every target. Every time. Whether you've tested a hundred Spring Boot fintech services or zero, the tool queue looks identical on day one.
That's a significant gap between how automated tools work and how experienced human testers work. A seasoned bug hunter targeting a Spring Boot fintech service doesn't start from scratch. They walk in with a mental model: "I've seen actuator endpoints expose credentials on three of the last four fintech engagements. That's my first hypothesis. I'll start there." They adjust mid-engagement when something unexpected surfaces. They keep notes that make the next engagement faster and more targeted.
Phase C is the attempt to give SecurityClaw the same capability. Three components ship together: the Adversarial Hypothesis Engine (named attack hypotheses with confidence scores before any tool runs), the RePlanner (mid-campaign queue rewriting when triggers fire), and the Intelligence Store (cross-campaign persistence that feeds back into the next run). The demo below walks through a complete campaign against demobank.local — a controlled Spring Boot fintech target — showing exactly what each component does and where its current limits are.
1. Phase C at a Glance: Three Components, One Compounding Loop
The three Phase C components are designed to work as a cycle, not independently:
| Component | Code Label | What It Does | When It Runs |
|---|---|---|---|
| Intelligence Store | C3 |
Persists campaign records, rebuilds attack patterns, surfaces hit rates | Before campaign (load) and after campaign (write) |
| Adversarial Hypothesis Engine | C1 |
Generates named attack hypotheses with confidence scores informed by C3 hit rates | After C3 preload, before first tool runs |
| RePlanner | C2 |
Evaluates trigger conditions mid-campaign and rewrites the remaining tool queue | After each tool batch completes, before the next |
The loop: Intelligence Store loads prior campaign data → Hypothesis Engine uses that data to prioritise attack bets → tools run → RePlanner evaluates findings against hypotheses → tool queue adapts → campaign finishes → Intelligence Store writes the result back. Next campaign starts with richer data than the last.
The demo campaign numbers, for reference before diving in:
| Metric | Result |
|---|---|
| Prior campaigns in intel store (before run) | 3 |
| Patterns rebuilt | 16 → 17 |
| Hypotheses generated | 3 |
| H1 confidence (intel-boosted) | 90% |
| Replanning events triggered | 3 |
| CRITICAL findings | 1 (/actuator/env → DB_PASSWORD exposed) |
| HIGH findings | 2 (JWT weak secret + subdomain expansion) |
| Attack classes with 100% historical hit rate | 3 (auth_bypass, secret_exposure, recon_chain) |
2. C3 First: The Intelligence Store Pre-Seeds the Campaign
Before demobank.local is targeted, SecurityClaw loads three prior campaigns from the intelligence store: tradeflux.io, clearwire.finance, and vaultpay.co. All Spring Boot fintech targets. All with findings. The load happens before the planner writes a single hypothesis.
The store produces a hit rate table before any tool has run against demobank.local:
This is the fundamental shift Phase C makes. A traditional scanner walking into demobank.local has no context — every target is a blank slate. SecurityClaw is already carrying three data points: on every comparable target it has run, secret_exposure yielded a CRITICAL finding, auth_bypass found something exploitable, and recon_chain expanded the scope. That context directly shapes what gets prioritised.
One important caveat before moving on: three campaigns is a very thin dataset. 100% hit rate on secret_exposure means 3-for-3, not 300-for-300. The patterns are directional at this stage. We'll address the limits explicitly in section 7.
3. C1: The Adversarial Hypothesis Engine
With intel context loaded, the Adversarial Hypothesis Engine generates three named attack hypotheses. Not generic scan categories — actual named scenarios with explicit confidence scores and reasoning tied to the intelligence data.
A few things worth unpacking here.
H1 opens at 90% confidence. The intel store showed 100% hit rate on secret_exposure for Spring Boot fintech targets. The 10% gap between 100% historical and 90% opening confidence is intentional conservatism — the dataset is small enough that one miss would break the streak. The engine doesn't treat 3-for-3 as certainty.
H2 opens at 55%. The auth_bypass historical rate is 100%, but the JWT scope escalation pattern specifically had 50% prior evidence. The engine distinguishes between the attack class level hit rate and the specific attack vector hit rate. 55% is near-coin-flip. This hypothesis enters the campaign as a viable bet, not a confident one.
H3 opens at 65%. Subdomain expansion showed 67% historical rate — two successes out of three. That's enough to prioritise it over a clean-slate target but not enough to treat it as near-certain.
The practical output of this step: the initial tool queue is ordered with H1 tools first (nuclei with actuator-focused templates), H3 tools second (amass for subdomain discovery), H2 tools third (JWT assessment). Generic directory enumeration — gobuster — sits at the end as a fallback if the hypothesis-driven approach doesn't surface anything. This is already different from the default nmap-first queue.
For bug hunters, this maps directly to how manual recon prioritisation works. You don't run gobuster before checking whether the actuator endpoints are open when you know this target class has a 100% hit rate on credential exposure. You run gobuster when the high-confidence bets haven't paid off. The AHE is automating that decision.
4. C2: Mid-Campaign Replanning in Action
The initial skill queue goes into execution: nmap → nuclei → gobuster. By Batch 2, the queue no longer looks like this.
Batch 1 — nmap:
Standard port scan output. Nothing that triggers a replan. The RePlanner evaluates: no critical findings, no credential discovery, no high-confidence confirmation. Queue continues to nuclei as planned.
Batch 2 — nuclei:
Gobuster — the planned third tool in the original queue — is dropped entirely. The RePlanner's evaluation: a database password is now in the clear. Running gobuster to enumerate directories at this point is a suboptimal use of remaining campaign time. The CRITICAL credential finding confirms H1. The HIGH JWT vulnerability simultaneously starts confirming H2. The logical next moves are subdomain expansion (to understand the blast radius of the credential) and further authentication testing (to chain the JWT vuln with the exposed credentials).
The replan is automatic. No human intervention. The RePlanner evaluated the trigger set, compared it against the hypothesis states, and rewrote the remaining queue in real time.
Two more replan events fired after Batches 3 and 4 as amass and nikto completed. By Batch 4, the queue was empty — campaign complete. The full sequence of replanning decisions can be summarised:
| After Batch | Trigger(s) | Decision | Queue Change |
|---|---|---|---|
| Batch 1 (nmap) | None | Continue | No change — nuclei next |
| Batch 2 (nuclei) | CREDENTIAL_FOUND, CRITICAL_FINDING, HIGH_CONFIDENCE_CONFIRMED | PIVOT | Drop gobuster → add amass + nikto |
| Batch 3 (amass) | HIGH_CONFIDENCE_CONFIRMED | Continue | Queue unchanged — nikto next |
| Batch 4 (nikto) | HIGH_CONFIDENCE_CONFIRMED | Complete | Queue empty — campaign ends |
The trigger system is the key mechanism here. CREDENTIAL_FOUND fires when a secret or credential is confirmed in output. CRITICAL_FINDING fires on any CRITICAL-severity result. HIGH_CONFIDENCE_CONFIRMED fires when a hypothesis with ≥70% prior confidence is confirmed by evidence. Each trigger type maps to a set of replan responses in the RePlanner's decision matrix. PIVOT responses drop non-hypothesis-aligned tools from the queue and substitute more targeted ones. CONTINUE responses leave the queue unchanged and let the next batch proceed.
5. C3 Close: The Record Writes Back
After the campaign completes, SecurityClaw writes the full result back to the intelligence store:
The secret_exposure attack class now has four confirmed campaigns. The hit rate remains 100%. But the weight — how much the store trusts that number — has increased:
The weight value (0.40 per attack class here) reflects the store's confidence in the hit rate given the sample size. At 4 campaigns, the weight is meaningful but not authoritative. The compounding effect: the next Spring Boot fintech campaign will open H1 not at 90% but at a higher confidence, backed by a fourth data point. Each campaign narrows the uncertainty.
The store also records H2's confirmation. The JWT scope escalation hypothesis opened at 55% — near coin-flip — and was confirmed. That outcome shifts the auth_bypass pattern confidence upward for the next campaign. If H2 had not been confirmed, the confidence would have dropped instead. The store is not a ratchet — it adjusts in both directions.
6. Full Campaign Summary
7. Honest Limits: What Phase C Is and Isn't
Phase C ships a genuine capability improvement. It also has real limitations worth being explicit about, because the marketing version of "AI that learns" is almost always more impressive than the engineering reality.
The demo is wired. demobank.local is a controlled environment. We built the target, planted the vulnerabilities, and ran the campaign. The findings are real outputs from real code against a real (synthetic) target. The replanning logic fired against real trigger conditions. The intelligence store updated with real pattern data. But the vulnerability was planted by us. That's not the same as running this against an adversarial target where the vulnerability landscape is unknown.
Three campaigns is a thin dataset. 100% hit rate on secret_exposure sounds impressive. 3-for-3 is not the same as 30-for-30. At this data volume, a single refutation drops the hit rate to 75%. The patterns are directional — they tell the planner "lean here" — but they're not statistically authoritative until the campaign count grows into double digits per target class.
Hypotheses can be wrong. H2 (JWT scope escalation) opened at 55%. If demobank.local hadn't exposed a JWT vulnerability, the hypothesis would have been marked REFUTED and the confidence would have adjusted downward. Opening confidence is an educated prior, not a guarantee. The engine is calibrated by data, not optimism.
The replan heuristics are not magic. The RePlanner's decision to drop gobuster and add amass + nikto after the CRITICAL credential finding was the correct decision for this campaign. It's not guaranteed to be the correct decision in every scenario. The trigger-to-replan mapping is a rule set, not an AI planner in its own right. As the campaign volume increases, the mapping will need tuning against edge cases where the heuristic fires incorrectly.
The compounding effect takes time. At 4 campaigns, the intel store is a useful signal. At 10, it starts to become genuinely reliable. At 50, it's a serious competitive advantage in automated bug hunting on known target classes. The infrastructure is in place. The data foundation has to be built through real campaign runs.
The honest assessment: Phase C is a meaningful engineering advance over a static tool queue. It is not a magic system. It makes campaigns measurably smarter on repeat target classes. It needs campaign data to deliver on that promise. The first campaign against a new target class still runs partly blind — the store has nothing to offer until patterns accumulate.
8. Why This Matters for Bug Hunters
The gap Phase C closes is one that experienced human researchers bridge with institutional knowledge and notes. You come back to a bug bounty program after three months, you don't restart from scratch — you look at your previous notes, recall what the stack looked like, and adjust your approach accordingly.
Automated scanners have never done this. Every run has been stateless. SecurityClaw now has persistent cross-campaign memory, and it's structured enough to feed quantified confidence into real-time planning decisions.
The practical implications for bug hunters using SecurityClaw:
- On familiar target stacks: The more campaigns SecurityClaw has run against a given stack (Spring Boot, Django, Rails, Node/Express), the more targeted its first hypothesis set will be. Attack classes with high historical hit rates will open with higher confidence and be prioritised first.
- On replanning: The campaign adapts to what it finds. If a credential surfaces in the first batch, the remaining queue pivots to exploit that finding rather than continuing the generic enumeration path. This maps to how skilled human testers chain findings.
- On calibration over time: Confidence scores that prove consistently overconfident on a given target class will drift downward. The system self-corrects. This is the calibration property you want from a tool that makes probabilistic bets — it should get better at being right, not just accumulate wins in one direction.
For bug bounty hunters specifically: the compounding value comes from running SecurityClaw across a consistent scope over time. The first campaign on a new program is a baseline. By the third or fourth run, the intelligence store has enough data to meaningfully differentiate the approach from what a stateless scanner would do. If you're working a single high-value program long-term, this is the use case Phase C was built for.
For those building out their security toolkit, the references below include the books that cover the underlying concepts Phase C is automating — hypothesis-driven testing, adaptive methodology, and structured note-keeping across engagements.
9. Recommended Resources
The techniques Phase C automates are covered in depth in these practitioner references:
- The Web Application Hacker's Handbook — Stuttard & Pinto. The canonical reference for methodical web application security testing. Hypothesis-driven enumeration methodology that Phase C's AHE is partially automating is covered extensively here.
- Hacking APIs — Corey Ball. API security testing methodology including JWT attacks, scope escalation, and Spring Boot actuator exposure — the exact attack classes demonstrated in the Phase C demo.
- Bug Bounty Bootcamp — Vickie Li. Covers note-taking, target-class pattern recognition, and repeatable workflow across programs — the manual version of what SecurityClaw's intel store is automating.
- The Art of Software Security Assessment — Dowd, McDonald, Schuh. For understanding the attack classes that form the hypothesis taxonomy — secret exposure, authentication bypass, recon chains. Dense but authoritative.
- The Hacker Playbook 3 — Peter Kim. Real-world engagement methodology and adaptive campaign thinking — the closest practitioner analog to what Phase C is attempting to automate at tool-execution level.