Why AI security audits now belong in every release

Laptop auf Stein im Weinberg, Code-Diff mit warmer Markierung, Morgenlicht.

What Claude reveals about the state of your codebase, and why your existing pen-test cadence isn't built for it.

Eighteen months ago “AI in security audits” was a bullet on a roadmap. Today, almost every engineering organisation in the world has a model on hand that is better at finding vulnerabilities in an unfamiliar codebase than most mid-tier pen-testers, in minutes rather than days. Claude is the most visible example, but far from the only one. The question is no longer whether your organisation's threat model has shifted. The question is how quickly you adapt your processes accordingly.

The asymmetry that's currently dissolving

Security has always lived off an asymmetry in favour of attackers: an attacker has to find one gap, a defender has to find them all. What's changing now is the cost structure on the attacker side. What used to be a specialist researcher with weeks of review effort is now an API call for a few euros. Known classes such as unsafe deserialisation, SSRF through missing allowlists, auth bypasses via inconsistent middleware or race conditions in payment flows are being found at a speed manual reviews can't compete with.

The problem isn't that Claude can do this. The problem is that anyone can. Anyone running a reverse-engineering session or a supply-chain audit on your open-source dependencies today needs neither specialist knowledge nor infrastructure. The barrier to entry is “credit card”.

Why the classic release cycle no longer fits

Most organisations we speak with run an external pen-test once a year, perhaps an internal one every six months, and SAST and DAST run in CI in between. That was a sensible compromise as long as attacker effort and defender effort stayed in roughly the same order of magnitude. That compromise is breaking down now.

Three developments converge. First, release frequency is rising: teams deploy daily, many hourly, and every release is a potential new attack surface. Second, the time between a public code push and automated analysis by third parties is effectively zero; attackers don't wait for your next annual pen-test. Third, classic SAST finds known patterns, while AI-assisted analysis also finds logic flaws, exactly the class for which pen-tests were previously expensive.

If your adversary finds in minutes what your own pipeline might detect in six months, you don't have a tooling problem. You have a cadence problem.

What “AI audit per release” concretely means

The recommendation isn't to cut the external pen-test budget; classic pen-tests remain valuable, especially for threat modelling and creative attack paths a model wouldn't construct in the same form. The recommendation is to close the gap between “SAST runs in CI” and “a pen-test team comes in once a year”.

An AI-assisted audit per release typically has four components:

  • Diff-based code analysis. The model receives the release delta plus the relevant context and reviews for gaps that a human reviewer might miss in a PR review, in particular auth, authorisation and input validation issues at new endpoints.
  • Dependency and configuration review. New dependencies, changed IAM policies, changed CORS or CSP headers, changed feature-flag defaults. That's the class of changes where supply-chain and misconfiguration attacks arise and which often slip through a classic code review.
  • Attack-path simulation. The model is explicitly instructed to think from the attacker's perspective: what would an informed external actor try first against this release? Which assumptions could be broken?
  • Regression on known findings. Findings from earlier audits are checked against every release to prevent reintroduction. That's the step many organisations underestimate.

Importantly, the result has to land triaged in your issue tracker, not in a PDF. An AI audit without an owner and an SLA is pure reassurance.

The three objections we hear most often

“That produces too many false positives.” True for naive setups. Quality stands and falls with the context you give the model: access to the full codebase, to test coverage, to architecture documentation and to earlier findings. A well-contextualised audit has a false-positive rate considerably below classic SAST. The effort shifts from “drawing signal from noise” to “maintaining context”, and that's the much more rewarding work.

“We don't send code to an external provider.” Legitimate objection. The answer is zero-retention contracts, enterprise tenants without training use, or, where data classification demands it, self-hosted models behind your own network boundary. The hyperscalers now have enterprise tiers that cover BaFin, ISO 27001 and HIPAA requirements. Anyone still claiming in 2026 that AI can't be used for compliance reasons usually means: we haven't adapted procurement yet.

“We have no budget for that.” Do the comparison: an external pen-test day costs in the low to mid four-figure range. An AI-assisted audit per release, at realistic token usage and a typical enterprise stack, costs less per release than lunch for the team. The problem isn't budget. The problem is that the pipeline the audit needs to be embedded in doesn't yet exist, and building that pipeline takes engineering time, not licence budget.

What you can do this week

If as a CTO or CISO you want to take exactly one step: run one of your last three releases manually through an AI audit. Use Claude, a comparable frontier model or a self-hosted equivalent. Give the model the diff plus enough context and ask for a security review from an attacker's perspective.

The result will take one of two forms. Either you find nothing your existing processes wouldn't already have found, in which case you have a comparatively cheap confirmation of your current posture. Or you find something that has been in production for weeks.

Our experience over the past few months points to the second variant. And it points to the fact that attackers using the same tools are already exploiting that head start.

The question to put to your team this week is therefore not “should we think about AI audits?” It is: “why aren't they running on every release already?”

A look at your audit cadence before the next release goes out.

Let's talk about your audit cadence

If you want to know how to embed an AI audit into your existing pipeline, without a procurement marathon, without cancelling pen-test contracts, without compliance risk, a sober conversation is worth having. 30 minutes, no pitch. We look at your current release and audit cadence and show you where the fastest lever is.

Book a slot directly

Frequently asked questions

What we get asked most often about AI security audits — answered openly.

Which model do you specifically recommend for an audit like this?+

For most setups we start with Claude Sonnet 4.6 or a comparable frontier model, because the ratio of reasoning depth, context length and cost is hard to beat at the moment. For sensitive codebases we move to enterprise tiers with zero retention or to self-hosted open-weight models — the choice depends more on your data classification than on technical preferences.

How do we integrate this into our existing CI/CD?+

As an additional pipeline stage after the build and before the deploy gate. The job pulls the diff plus context, calls the model, parses the structured result and creates findings as issues. For critical findings it blocks the deployment. We typically wire this up within a few days into any common CI environment — GitHub Actions, GitLab CI, Azure DevOps, Jenkins. No platform change needed.

Who interprets the findings — us or you?+

Both. The model delivers findings in structured form with severity, rationale and a remediation suggestion. Your team makes the final triage and decides what blocks, what gets a deadline, and what is closed as accepted risk. We can accompany the triage in the early phase so your team can calibrate the model's evaluation logic — after that it runs on its own.

How quickly do we see results if we start tomorrow?+

First results often the same day, if you simply run a manual audit on one recent release. A productive CI integration with issue routing and triage workflow typically takes us two to three weeks. So you almost always see the first productive cycle within the running sprint.

Does this replace the annual external pentest?+

No, it complements it. AI audits cover the high-frequency, code-near layer — every release, every diff. External pentests stay valuable for threat modelling, creative attack paths and independent validation — in other words, everything that needs human persistence and contextual knowledge. The combination is significantly stronger than either format alone.