vm2 reopened: a dozen sandbox escapes at once — what it means for your AI-agent stacks

Q: We don't use vm2 directly — does this affect us?

Probably yes, transitively. vm2 is a popular dependency for template engines, plugin systems, headless CMS renderers, and custom-logic layers. Run an npm ls vm2 in every production repository — and in the dependency trees of your build containers. Anyone who sees a hit there has an audit topic, even if their own code never mentions vm2.

Q: Is isolated-vm safer than vm2?

Structurally yes, because isolated-vm uses V8's native Isolate API and therefore has its own heap and microtask queue per sandbox. The class of coercion and prototype-pollution bypasses that plague vm2 is largely excluded there. “Fully safe” isolated-vm isn't either — but for stacks that can't get out of the Node process short-term, it's the markedly better bridge.

Q: We run Flowise in production. What do we do today?

Three steps. First: run npm ls vm2 in the Flowise stack and verify the effective version. Second: review every custom tool definition where code is dynamically executed — that's the spot where vm2 jumps in as a “silent fallback.” Third: move the Flowise container into its own trust boundary, without bind mounts to production data paths. That limits the immediate damage; the structural migration you can plan calmly.

[Translate to English:] Am 7. Mai 2026 wurde eine Sammelmeldung über zwölf neue, kritische Sandbox-Escapes in der Node-Bibliothek vm2 öffentlich. Mehrere Einträge tragen einen CVSS-Score von 9.8 oder 10.0. Wer Agent-Frameworks wie Flowise produktiv betreibt oder eigene MCP-Server hostet, hat heute einen Punkt auf der Liste mehr.

10. May 2026

On 7 May 2026, around a dozen new critical sandbox escapes were published in a single batch advisory for the Node library vm2. Several entries carry a CVSS score of 9.8 or 10.0 — “critical, no qualifications.” The library was already considered dead in the summer of 2023, was resurrected by its original author in October 2025 and rewritten in TypeScript. This week's findings show: the structural problems aren't gone. They're just packaged differently.

Short version for decision-makers

vm2 is a JavaScript sandbox meant to execute untrusted code in an isolated context, with whitelisting of Node built-ins. That task became suddenly very popular in the AI-agent era: anyone letting an agent generate and immediately run code — in a Flowise workflow, in LangChain.js tools, in custom MCP servers — needs a protection layer between model and host. vm2 was the easy answer for years: an npm package, a wrapper, done. The layer has one property that doesn't appear on any marketing diagram: it lives in the same V8 process as the calling application. If a piece of crafted code escapes the sandbox, it runs in the host application's context — with its tokens, its filesystem rights, its network access.

The current situation

This week's block is broad. A sample of the most severe entries: CVE-2026-43997 — code injection with access to the host Object, CVSS 10.0, full sandbox escape, arbitrary code execution on the host. CVE-2026-44009 — sandbox escape via null-proto exception, CVSS 9.8. CVE-2026-24118 — escape via __lookupGetter__, CVSS 9.8. CVE-2026-24781 — escape via inspect function, CVSS 9.8. CVE-2026-26332 — escape via SuppressedError, CVSS 9.8. CVE-2026-26956 — Symbol-to-string coercion with TypeError bypass, CVSS 9.8.

The official recommendation: update to vm2 ≥ 3.11.0. Anyone who can't roll that out promptly should reconsider whether vm2 is the right tool for their use case at all.

Why the usual assumption doesn't hold here

vm2 sits in an unsolvable dilemma: the V8 engine itself isn't built for multi-tenant code execution within the same process. Every new ECMAScript feature, every new special method, every new coercion path can become a bypass. The game is endless. The maintainer himself has acknowledged this publicly.

First — anyone running vm2 as a protection layer between LLM output and host system doesn't have a sandbox, they have a delaying strategy. The next gap is coming; the only question is whether the patch path in your own pipeline is faster than the exploit.

Second — in agent frameworks, vm2 is often configured as a “silent fallback.” Exactly what's unobtrusive in normal operation becomes the entry door in the security case. Anyone running Flowise or comparable stacks should check today exactly where vm2 sits in the code path — and whether that was even known.

Third — in build pipelines where vm2 is a transitive dependency (via third-party packages for template engines, plugin systems, dynamic configurations), the open CVE list alone is enough to trigger alarms from audit tools of every kind. Even when your own code has nothing to do with vm2, it blocks the next compliance report.

What we concretely recommend

The honest answer: get away from in-process sandboxes for agent code. Three structural paths, ordered by effort and protection.

Option A — containerisation with hard capability drop. Agent code runs in a Docker or Podman container with --cap-drop=ALL, no network, no bind mounts to production paths. That's immune to V8 bypasses because it no longer uses V8 as a trust boundary. Cost: a few hundred milliseconds per call, irrelevant in most agent scenarios.

Option B — Firecracker or Kata microVMs. For stacks where multiple tenants share the same frontend, the step toward hardware virtualisation is worth it. The Wolfi-OS-container plus Firecracker line has proven itself in our own setups — minimal attack surface, clear lifecycle management.

Option C — isolated-vm as a bridge. If you can't get out of the Node process short-term, at least migrate to isolated-vm. The library uses V8's native Isolate API directly and reduces the attack surface considerably — it isn't fully safe either, but it largely closes the class of coercion attacks.

Option D — Socket.dev's free certified patches as a time-limited bridge. If you can't upgrade to vm2 ≥ 3.11.0 right now because of compatibility constraints with transitive dependencies, Socket.dev publishes free, curated patches for the current vm2 sandbox escapes. Explicitly intended as a transition, not as a replacement for the structural migration. Practically interesting for stacks where vm2 is a transitive dependency carried by third-party packages and a direct upgrade isn't possible without breaking changes. As a permanent solution, the path remains Option A or B — the Socket patches are a controlled emergency brake, not a substitute for the architecture question.

In no scenario is “keep using vm2, just patch faster” a serious answer. We've encountered that logic in several audits and have not seen it hold up in any of them.

What we deliberately don't recommend

We don't recommend a blanket obligation to put every agent function in its own VM. There are productive use cases — data classification, embedding generation, simple JSON transformations — where untrusted code never gets executed. There the sandbox question is moot. It doesn't pay to introduce complexity where the threat model doesn't carry it.

What we do consistently follow through: at every point where LLM output becomes generated, executed code — be it Python, JavaScript, shell, SQL — a process boundary belongs in between. One that doesn't come from the same engine vendor as the sandbox attempt. Conceptually that's comparable to the point we made about the container layer in CVE-2026-31431 “Copy Fail”: locate responsibility at the right layer.

Who is most affected

Three profiles from our consulting practice are acutely exposed today. Mid-market companies that have set up a Flowise stack for internal automation in the past twelve months — often started as a “small AI pilot,” now with access to internal documents, mailboxes, ERP data. The chain between model and host is often exactly one vm2 layer here.

SaaS providers offering customers “custom-logic” fields where end customers enter JavaScript snippets — typical example: calculation of commission rules, dynamic evaluations. vm2 is the classic default here. The sandbox sits directly between tenant and host database.

Developer teams writing tools in their own MCP servers where model output gets executed without that being explicit in the architecture diagram. We've seen multiple reviews in recent weeks where vm2 was hidden as a pure implementation-detail entry in package.json. Same structural theme we described in the Bitwarden CLI supply-chain risk — just one level deeper in the stack.

Conclusion

vm2 isn't a sandbox. It's a repeatedly patched claim to be one. Today's CVE batch isn't the last of its kind and isn't the worst hit. The question isn't when the next bypass will come. It's whether you'll experience the next bypass wave on an architecture that does without vm2 — or on one that schedules another patch sprint. Personal background and technical detail on the in-process sandbox debate: ole-hartwig.eu.

Frequently asked questions on the vm2 situation

We don't use vm2 directly — does this affect us?+

Probably yes, transitively. vm2 is a popular dependency for template engines, plugin systems, headless CMS renderers, and custom-logic layers. Run an npm ls vm2 in every production repository — and in the dependency trees of your build containers. Anyone who sees a hit there has an audit topic, even if their own code never mentions vm2.

Isn't the update to vm2 3.11.0 enough?+

It closes the cases published today. It doesn't solve the structural problem that V8 isn't built for multi-tenant code execution in the same process. The maintainer himself has publicly acknowledged that further bypasses will be found. Roll out the update — yes. Consider the architecture solved — no.

If you can't upgrade to 3.11.0 right now because of compatibility constraints with transitive dependencies, there's a bridge available since this week: Socket.dev publishes free certified patches for the current CVE wave. That's a controlled emergency brake for three to four weeks, not a substitute for the architecture decision.

Is isolated-vm safer than vm2?+

Structurally yes, because isolated-vm uses V8's native Isolate API and therefore has its own heap and microtask queue per sandbox. The class of coercion and prototype-pollution bypasses that plague vm2 is largely excluded there. “Fully safe” isolated-vm isn't either — but for stacks that can't get out of the Node process short-term, it's the markedly better bridge.

How much effort is the migration to container sandboxes?+

For typical agent workflows with short code snippets, one to three person-days. Each tool call gets a dedicated container with --cap-drop=ALL, no network, no bind mount to production paths. Latency cost: a few hundred milliseconds, irrelevant in most agent scenarios. The harder question is usually what state has to persist between calls — that's an architecture discussion, not a vm2 question.

We run Flowise in production. What do we do today?+

Three steps. First: run npm ls vm2 in the Flowise stack and verify the effective version. Second: review every custom tool definition where code is dynamically executed — that's the spot where vm2 jumps in as a “silent fallback.” Third: move the Flowise container into its own trust boundary, without bind mounts to production data paths. That limits the immediate damage; the structural migration you can plan calmly.

What if we can't do this ourselves?+

We do this as part of our DevSecOps as a Service package. You give us access to your Node dependency tree and your agent configurations, we identify in-process sandbox points, propose a migration path, and hand over an audit-grade report including before/after validation.

Hardware security token on a weathered concrete plinth, a softly rendered cyber bunker entrance in the Mosel hillside in the background, cool morning light.

DevSecOps

Weiterlesen →

When your own password manager becomes a supply-chain risk: the Bitwarden CLI incident of 22 April 2026

Weiterlesen →

4 GB Gemini Nano in Chrome — how we handle the trust breach by large vendors in our testing

Weiterlesen →

Three matte-black authentication module cartridges in a row on a slate-grey surface, two with identical distribution labels, the middle one subtly different with minimal label offset and a sage-green patch cable looping off-frame to the upper left — metaphor for the substitution of a PAM module by a hard-to-detect backdoor, cool studio light.

PamDOORa: when authentication itself becomes the back door — what a PAM-based Linux backdoor means for mid-market stacks

Weiterlesen →

Bevor der nächste Bypass kommt — sprechen wir über Trust-Boundaries.

We audit your AI-agent stack for vm2 dependencies and replace sandboxed eval.

You give us read access to your AI-agent and workflow stacks — we audit transitive vm2 dependencies (including via LangChain, AutoGPT, BabyAGI, n8n), map sandboxed-eval use cases to isolated-vm, Deno subprocesses, or Wasm runtimes, and hand back an audit-ready migration plan for the next sprint wave.

This is the operational routine behind DevSecOps as a Service and the External IT Department — AI-agent hardening instead of a vm2-pin reflex on the next point release.

Talk to us →

vm2 reopened: a dozen sandbox escapes at once — what it means for your AI-agent stacks

Short version for decision-makers

The current situation

Why the usual assumption doesn't hold here

What we concretely recommend

What we deliberately don't recommend

Who is most affected

Conclusion

Frequently asked questions on the vm2 situation

Related articles

DevSecOps

When your own password manager becomes a supply-chain risk: the Bitwarden CLI incident of 22 April 2026

4 GB Gemini Nano in Chrome — how we handle the trust breach by large vendors in our testing

PamDOORa: when authentication itself becomes the back door — what a PAM-based Linux backdoor means for mid-market stacks