Claude Code Security vs. Active Penetration Testing: The AI Arms Race Has Reached Your Codebase
In late 2025, a sophisticated threat actor compromised approximately 600 FortiGate network devices across 55 countries — and held access for five weeks before detection. The campaign didn't rely on zero-days. It used known credentials, weak authentication chains, and the predictable fact that most enterprise networks move too slowly to patch infrastructure-level vulnerabilities. Amazon CISO CJ Moses wrote about the campaign in detail on the AWS Security Blog: the attacker was methodical, patient, and exploiting gaps that defenders already knew existed.
One week later, on February 21, 2026, Anthropic launched Claude Code Security — an AI-powered code analysis product that scans codebases for security vulnerabilities in real time. The timing was not coincidental: we are in the middle of an arms race, and both sides are deploying AI.
The question for security teams isn't whether AI code scanning is useful. It clearly is. The question is what it covers, where it stops, and whether the FortiGate-style attack campaigns that dominate the current threat landscape are the kind of thing AI code scanning catches at all.
What Claude Code Security Actually Is
Claude Code Security launched on February 21, 2026, as a limited research preview available to Anthropic Enterprise and Team customers. It's an AI-native code analysis tool — Claude's model applied to security-focused code review at scale.
The product sits in the shift-left security category: finding vulnerabilities during development, before code reaches production. This is the right place to catch certain classes of vulnerability. Code that never ships with an injection flaw can never be exploited through one.
What Claude Code Security is designed to catch:
- Known vulnerability patterns: SQL injection, XSS, command injection, insecure deserialization — the OWASP Top 10 categories that appear consistently across codebases and have well-established patterns an AI model can be trained on
- Data flow analysis: Tracing untrusted input through a codebase to identify where it reaches dangerous sinks (database queries, shell calls, file system operations) without proper sanitization
- Hardcoded secrets and credentials: API keys, passwords, and tokens committed to repositories — a consistent and underrated source of material breaches
- Dependency risk: Known vulnerable third-party libraries, outdated packages, and supply-chain exposure
- False positive reduction: AI models can apply contextual judgment that reduces the noise-to-signal ratio that plagues traditional static analysis tools
For development teams, this is meaningful capability. Catching SQL injection in a code review before deployment is better than finding it during a penetration test — or after an attacker finds it first. The shift-left argument is sound.
The Attack Side: What the FortiGate Campaign Actually Looked Like
The FortiGate campaign documented by Amazon's CISO is instructive precisely because it represents the current state of sophisticated attack tradecraft — and it has almost nothing to do with code vulnerabilities that a static analysis tool would catch.
The attacker's methodology, as described by CJ Moses:
- Initial access via known credentials: Not a zero-day. Credential reuse, weak MFA configurations, and service account mismanagement.
- Infrastructure persistence: The attacker moved laterally across network infrastructure over five weeks. Detection depended on behavioural anomaly analysis, not signature matching.
- Target scope: 600 devices, 55 countries — network infrastructure, not application code. FortiGate is a firewall. There is no source code for a penetration tester to review.
- Exploit used: The campaign leveraged known vulnerabilities in FortiOS — CVEs that had published patches. The gap was patch velocity, not code quality.
The lesson here is important: the majority of significant enterprise compromises in 2025–2026 are exploiting gaps that code scanning doesn't touch — infrastructure misconfiguration, credential exposure, patch latency, network segmentation failures, and authentication chain weaknesses.
AI code scanning would not have detected or prevented this campaign. The attack surface was infrastructure, not application code.
Where AI Code Scanning Adds Real Value
To be clear: Claude Code Security addresses a real problem, and for certain organisations it addresses the right problem.
The strongest use cases:
- SaaS companies with large, active codebases — where injection vulnerabilities and insecure dependencies are persistent risks and shift-left scanning directly reduces remediation cost
- Teams running continuous deployment pipelines — where security gates in CI/CD provide meaningful friction against insecure code shipping to production
- Organisations with limited security headcount — where AI-augmented code review compensates for the gap between developer output and security review capacity
- Compliance-driven environments — where demonstrating code-level security controls (SOC 2, PCI DSS, ISO 27001) requires documented scanning processes
For these use cases, an AI code scanning product that catches injection flaws and flags hardcoded secrets before they reach production is a genuine security improvement.
The affiliate reading list that makes sense here: The Web Application Hacker's Handbook and Hacking: The Art of Exploitation provide the adversarial perspective that helps security teams understand what attackers are actually looking for — and what AI models are being trained to detect. Black Hat Python covers the tool-building side.
Where AI Code Scanning Stops
The honest accounting of what AI code scanning doesn't cover is where the conversation gets important for security teams making tool decisions.
Business logic vulnerabilities: An AI model reviewing code can identify injection patterns. It cannot reliably identify that the business logic for a financial transaction allows an attacker to manipulate the order of operations to transfer funds they don't own. Business logic flaws require an attacker's mental model, not pattern matching.
Authentication and authorisation chains: Whether a user who authenticates through endpoint A can access resource B without proper authorisation checks is an architectural question. Static analysis can flag missing access controls at a code level, but the complex web of how microservices verify identity in a real system requires active testing against the running application.
Infrastructure and network exposure: The FortiGate campaign exploited network infrastructure. Claude Code Security reviews code. These are different layers. A codebase with zero injection vulnerabilities can still be breached via misconfigured network segmentation, exposed admin interfaces, or credential reuse across infrastructure accounts.
Runtime and out-of-band behaviour: What a system does at runtime — how it interacts with external services, how it responds to malformed inputs in production, what happens to error states in a live environment — is only partially visible in source code. Active testing reveals runtime behaviour that static analysis cannot predict.
Novel and chained techniques: Advanced attackers chain multiple low-severity findings into a high-impact attack path. Individually, each step might look benign to a code scanner. The chain is only visible to a tester who is actively exploring the system with an attacker's intent.
The Right Model: Layered Security, Not Either/Or
The productive framing isn't "AI code scanning vs. penetration testing." It's understanding what layer each controls and whether your organisation has coverage across all layers that matter.
Layer 1 — Code-level scanning (shift-left): Tools like Claude Code Security, Semgrep, and Snyk catch known vulnerability patterns in development. They reduce the volume of issues that reach production and provide developers with actionable feedback in their workflow. Essential for any team shipping code at scale.
Layer 2 — Dynamic application security testing (DAST): Tools like Burp Suite Pro test the running application — finding vulnerabilities that are only visible when the application is executing. Authentication bypasses, session management flaws, and server-side request forgery are often invisible in static code but discoverable through active probing.
Layer 3 — Active penetration testing: Full-scope assessment by a tester with attacker intent. Network reconnaissance, credential testing, infrastructure enumeration, chained attack paths, business logic abuse, and post-exploitation analysis. The only layer that answers: "If a motivated attacker targeted us specifically, what would they find?"
The FortiGate campaign lived entirely in Layer 3 territory. Claude Code Security operates in Layer 1. Both are valid. Neither substitutes for the other.
The Credential Problem: Hardware MFA Is Still the Fastest Win
The FortiGate campaign ultimately succeeded because of credential weakness — not code vulnerabilities. If there's a single actionable takeaway from that campaign, it's that hardware MFA on critical infrastructure dramatically raises the cost of credential-based initial access.
A YubiKey 5C NFC or YubiKey 5 NFC on every privileged account — network admins, cloud infrastructure, VPN gateways — is not a complete security programme. But it is the highest-ROI single control against the credential-based initial access that dominates the current threat landscape. The attacker who spent five weeks in 600 FortiGate devices would have hit a significantly harder wall on day one.
What Active Testing Looks Like in 2026
The gap that matters for most security teams isn't between AI code scanning and no code scanning. It's between having Layer 1 coverage (code scanning) and having Layer 3 coverage (active penetration testing against the full environment).
Most organisations with mature development practices have some form of code scanning. Far fewer have regular, full-scope active penetration testing that covers their network, infrastructure, authentication systems, and application layer together — because historically that's required expensive consultants or a team of specialists with a collection of disconnected tools.
SecurityClaw is built to change that equation: a unified active penetration testing platform with 56+ integrated security skills spanning reconnaissance, web application testing, exploitation, and reporting. The goal is making full-scope Layer 3 assessment accessible to security teams that currently only have Layer 1 and Layer 2 coverage.
Claude Code Security is a meaningful product. It addresses a real problem. It doesn't address the FortiGate problem — and that's the problem most enterprise security teams are currently losing.
Explore SecurityClaw →FAQ
Does Claude Code Security replace penetration testing?
No. Claude Code Security is a code-level static analysis tool that catches vulnerability patterns in source code during development. Penetration testing actively probes running systems, tests authentication and business logic, and finds infrastructure vulnerabilities that are invisible to code scanning. They operate at different layers and address different threat models.
What vulnerabilities does Claude Code Security catch?
Claude Code Security is designed to identify known vulnerability patterns (injection flaws, XSS, insecure dependencies), trace data flow through codebases to identify unsafe handling of untrusted input, and flag hardcoded credentials and secrets. It operates on source code before deployment.
What did the FortiGate campaign exploit?
The campaign documented by Amazon CISO CJ Moses exploited credential weaknesses, weak MFA configurations, and patch latency across network infrastructure in 55 countries over five weeks. The attack surface was infrastructure, not application code — it would not have been prevented by code scanning of any kind.
Is AI code scanning worth it?
For teams shipping code at scale, yes — shift-left scanning that catches injection vulnerabilities and secrets before they reach production is valuable and cost-effective. The ROI question is whether your current threat model is dominated by code-level vulnerabilities or infrastructure/credential exposure. Most organisations need coverage at both layers.
What is the difference between static and dynamic security testing?
Static analysis (SAST) reviews source code without executing it. It catches code-level patterns but cannot observe runtime behaviour. Dynamic testing (DAST) and penetration testing operate against running applications and infrastructure, revealing vulnerabilities that are only visible when the system is actually executing.
When did Anthropic launch Claude Code Security?
Anthropic launched Claude Code Security on February 21, 2026, as a limited research preview for Enterprise and Team customers. The announcement was covered by The Hacker News and other security publications.
How does hardware MFA protect against credential-based attacks?
Hardware security keys (YubiKey and similar) implement phishing-resistant FIDO2/WebAuthn authentication. Unlike TOTP codes or SMS, hardware MFA cannot be phished, replayed, or bypassed via attacker-in-the-middle techniques. For privileged accounts on network infrastructure, hardware MFA dramatically raises the cost of credential-based initial access.