Pwn2Own Berlin 2026 — 47 zero-days, and AI tooling takes centre stage for the first time

18 May 2026. Pwn2Own Berlin 2026 wrapped up on Saturday with 47 verified zero-days and 1.298 million US dollars in awards. For the first time, AI tooling sat at the centre of the competition: LiteLLM, OpenAI Codex, Anthropic Claude Code, Cursor, LM Studio, Chroma and Ollama all fell, several of them more than once. The 90-day disclosure window against vendors is now running.

Straight-on Aufsicht auf eine matte dunkle Schieferplatte: ein walnussgriffiger Drehmomentschluessel liegt diagonal ueber dem Frame, sein gebuersteter Stahlkopf zeigt nach links oben, der Griff laeuft nach rechts unten aus. An der Verschraubung zwischen Griff und Kopf sitzt eine kleine oxblutrote Kalibrierscheibe; daneben ein einzelner oxblutroter Oeltropfen, der das Licht auffaengt. Unten links ein offenes cremefarbenes Pruefprotokoll mit drei pencil-shorthand-Eintraegen und bruenierten Kalibriermarken, die dritte Marke leicht aus der Reihe; oben rechts eine messingbeschlagene Lupe, deren Linse zum Schluesselkopf neigt. Kuehles Studio-Schluessellicht von oben links, warmes Rim-Licht von rechts unten, schiefergrauer Backdrop mit Negativraum rechts. — AI-generated · gpt-image 2.0

What happened

The Zero Day Initiative closed out the three-day Pwn2Own Berlin 2026 (14–16 May, OffensiveCon): 47 verified zero-days, 1,298,250 US dollars in awards, DEVCORE crowned Master of Pwn with 505,000 US dollars. For the first time, the themed categories landed explicitly on AI: AI Databases, Coding Agents, Local Inferences and NVIDIA components. OpenAI Codex fell four times, Anthropic Claude Code three times, Cursor twice. In the Local Inference slot, successful chains hit LiteLLM (among them SSRF + code injection), LM Studio and Ollama; in the AI Databases slot, Oracle Autonomous AI Database and Chroma. The affected vendors have 90 days to remediate before the details are published.

Why it matters

Until now, AI tools were a side category at Pwn2Own. With Berlin, the LLM inference layer steps onto the same stage as browsers and hypervisors — and it does not hold up to the comparison. Three observations matter more than the prize tally. First, most of the AI-specific findings were classic web bugs (SSRF, path traversal, code injection, unsafe defaults) in the services around the model, not weaknesses in the model itself. Second, several submissions on LiteLLM and Claude Code collided: different teams independently found the same flaw. That signals structural weakness, not isolated bugs. Third, the successful coding-agent exploits hit precisely the class of tooling that has moved onto every developer’s workstation over the past twelve months.

What it means for the German Mittelstand

If your German Mittelstand company has rolled out Claude Code, Cursor or Codex onto developer laptops or into CI/CD pipelines over the past months, you now operate tools in which proven zero-days exist — even if the specific bugs remain under the 90-day embargo. The tools are no longer “new and unproven”; they are “new and provably exploitable.” The threat model of the development environment must include this layer.

For LiteLLM and similar LLM gateways, the finding sharpens. If you operate a proxy as the central bridge between internal applications and external model APIs (OpenAI, Anthropic, Mistral), you now have a component sitting in the data path through which customer data, trade secrets or API keys flow. An SSRF flaw at that point escalates to remote code execution. It thereby touches GDPR Article 32 (technical and organisational measures) and, for NIS-2-regulated companies, the reporting obligation for significant security incidents under § 32 of the new BSIG. The question “is our LLM gateway accurately mapped in the data-protection impact assessment?” belongs on the desk of your data protection officer now, not after the details are published.

Local inference stacks such as LM Studio or Ollama, often introduced in the Mittelstand as the GDPR-clean alternative to the cloud, deserve the same reflex: local is not automatically safe. The advantages remain (no third-country transfer, no token leakage to US providers), but the inference software itself belongs in internal patch management and vulnerability monitoring run by the IT department.

What it means for technical development

The Berlin results mark a maturity transition: AI tooling becomes auditable like any other production software. The conversation shifts from model security (jailbreaks, prompt injection, alignment) onto stack security — onto HTTP endpoints, tool-calling routes, MCP servers, vector databases. This is precisely the layer at the centre of the MCP and A2A standardisation work that has gathered pace under the Linux Foundation Agentic AI Foundation. The findings are the empirical argument for signed agent cards, sandboxed tool execution and capability-based permissions, which are already on the table in the standards work, now with pressure from the disclosure clock.

Architecturally, the security perimeter moves inward. The coding agent runs with read and write access to the repository, has network access to internal APIs, and executes model-suggested code. Compass Security demonstrated in Berlin that this jump carries real, not theoretical, risk. Anyone planning agent-based pipelines can no longer avoid extending the classic DevSecOps tooling (SAST, DAST, SBOM, container hardening) onto the agent portion.

Concrete recommendation

Inventory, in this order. First, list which coding agents (Claude Code, Codex, Cursor, Copilot) run on developer devices and in CI/CD pipelines, and with which permissions. Second, identify every LLM gateway in the data path and check whether personal data or trade secrets flow through it. Third, make sure these components are entered in internal patch management and in the NIS-2 incident reporting path. Fourth, subscribe to the ZDI advisories for the 90-day windows on the affected products, so you can patch on the publication day rather than learn about it from the press. Architectural steps such as sandboxed tool execution or capability restrictions only make sense after this inventory.

This article reflects our technical and strategic assessment. It does not replace legal advice or a data-protection impact assessment.

Sources

About the author

Kim Hartwig

CEO · Moselwal Digitalagentur

Kim is responsible for day-to-day operations and provides strategic support to our clients on a daily basis. Her expertise in computational linguistics combines an understanding of communication with technical know-how.

LinkedIn · kontakt@moselwal.de