High

Semantic Kernel: when the prompt becomes the shell — what the Microsoft disclosure means for your agent architecture

Eine messingfarbene Schreibmaschinentaste auf cremefarbenem Papier, aus deren Kopf statt eines Buchstabens ein winziger Shell-Prompt aus gebürstetem Stahl herausragt; daneben ein geschlossenes ledernes Notizbuch und ein offenes Buch mit oxblutfarbenem Lesebändchen im kühlen Nordlicht.

On 7 May 2026 Microsoft disclosed two vulnerabilities in Semantic Kernel, the agent framework the company uses to plug its .NET and Python world into the idea of „tool-using LLMs“. CVE-2026-25592 and CVE-2026-26030 turn a simple prompt injection into remote code execution — one sentence, one malicious input, and the separation between model output and operating system is gone.

What has changed? Two Semantic Kernel CVEs turn prompt injection into a direct code-execution path on the host. Who is affected? German Mittelstand companies with an internal Copilot equivalent on Semantic Kernel, consulting firms with custom Azure agents, SaaS vendors with Semantic Kernel as a requirements.txt entry. What should you read today? Version status, plugin inventory, sandbox boundary — in that order.

TL;DR — the 90-second summary

Affected?

Microsoft Semantic Kernel .NET before 1.71.0 (CVE-2026-25592, SessionsPythonPlugin) and Python before 1.39.4 (CVE-2026-26030, InMemoryVectorStore default filter). Indirectly: every German Mittelstand stack with Semantic Kernel in its agent architecture, often invisible as a requirements.txt entry.

Risk?

Prompt injection → remote code execution on the host. CVE-25592: file-write tool unintentionally exposed, no path validation. CVE-26030: default vector-store filter uses eval() on a user-controlled string.

Immediate action?

Check the version, run a plugin inventory (which kernel functions are model tools?), move the default InMemoryVectorStore to Azure AI Search or pgvector.

Recommendation?

German Mittelstand with a Semantic Kernel agent: patch + plugin audit. Enterprise: additionally introduce a sandbox boundary as a separate process (Wolfi OS, Firecracker, Azure Dynamic Sessions with separate identity).

Criticality?

High (see badge in the page header).

 

What is the problem?

On 7 May 2026 Microsoft published two vulnerabilities in Semantic Kernel on its security blog — the agent framework the company uses to plug its .NET and Python world into the idea of „tool-using LLMs“. Both flaws — CVE-2026-25592 and CVE-2026-26030 — turn a simple prompt injection into remote code execution on the host. One sentence, one malicious input, and the separation between model output and operating system is gone.

This isn't a footnote. In the Microsoft world, Semantic Kernel is the natural on-ramp into agent architectures for many Mittelstand companies — shaped by Azure ties, .NET familiarity, and existing licenses. Anyone who opened a first agent against internal data in the last twelve months is, with high probability, running an affected version.

What Semantic Kernel is — and why the two flaws sit so uncomfortably

Semantic Kernel is Microsoft's answer to LangChain and LlamaIndex: an orchestration layer that bundles LLM calls, tool plugins, and memory backends into one framework. Developers define „kernel functions“ the model is allowed to call — file helpers, database queries, searches over internal documents. That plugin layer is the break point.

CVE-2026-25592 hits the .NET line before version 1.71.0. In SessionsPythonPlugin — the component agents use to execute Python code in Azure Container Apps sandboxes — a DownloadFileAsync helper function was inadvertently exposed as a model-callable tool, without robust path validation. A malicious instruction in the prompt stream could get the model to write a file to an arbitrary path on the host, including paths that are then automatically executed.

CVE-2026-26030 hits the Python line before version 1.39.4. The condition is narrower but more common than it seems: anyone running the InMemoryVectorStore as the backend for a search plugin with the default filter behavior, and getting untrusted input into the agent through any path, has a problem. The default filter was built as a Python lambda and evaluated against the search string via eval(). A prompt controlling the search string controls the lambda body.

Why „prompt injection isn't a big deal“ doesn't hold here

In many architecture diagrams the separation is clean: the model thinks, the framework executes, the host stays untouched. That separation is a narrative — it only holds as long as every tool the model can call is restrictive and parameterized.

Both Semantic Kernel flaws show the same defect with different faces. In CVE-2026-25592 an internal helper inadvertently becomes a model tool. In CVE-2026-26030 a user-controlled string is run through eval(). Neither is a „dumb bug“ — both are direct consequences of plugin architectures exposing too much out of convenience.

In our advisory practice we see the pattern recur. A plugin gets built because a concrete use case needs it. Three weeks later the model calls that function in production in combinations the author never anticipated. That isn't „misuse“ — it's the definition of an agent.

What we concretely recommend

First: anyone running Semantic Kernel in production has three tasks today — check the version (Python ≥ 1.39.4, .NET SDK ≥ 1.71.0), run a plugin inventory (which kernel functions are registered as model tools?), and harden the vector-store filter (move from InMemoryVectorStore with default filter to durable stores like Azure AI Search or Postgres pgvector).

Second — and this is the structural question behind both CVEs: a sandbox that runs in the same process as the agent is never a real sandbox. We've been recommending the same line for months as in the vm2 post: code execution that originates from model output belongs in a separate process with a hard capability drop. Wolfi OS containers, Firecracker microVMs, Azure Container Apps Dynamic Sessions with separate identity.

Third — the uncomfortable truth for many boards: these flaws won't be the last of this build type. Every plugin system with dynamic tool selection and eval-like paths will produce them. The question isn't whether a comparable CVE will land next quarter — the question is whether your architecture survives it or whether it lands in production.

What we deliberately don't recommend

We don't recommend replacing Semantic Kernel out of reflex. The framework isn't „broken“; the two CVEs are addressed cleanly in the patches. Anyone who has chosen the Microsoft world — for Azure integration, .NET stack, compliance ties — should keep using Semantic Kernel, but with the plugin discipline outlined here.

We equally don't recommend solving prompt injection with „better system prompts“. Microsoft itself writes in the disclosure with German-style clarity: prompt injection is a class, not a single bug. As long as a model processes untrusted input and can call tools, the defense isn't in the model — it's in the tool layer.

Who is most affected

German Mittelstand IT departments that have built an internal Copilot equivalent on Semantic Kernel in the last nine months — often as a „we want to stay independent of Microsoft Copilot“ argument. These stacks typically have access to SharePoint, Exchange mailboxes, ERP data. The vulnerability chain between input and host is exactly the plugin layer here.

Consultancies and audit firms building bespoke agent solutions for clients on Azure, running them as long-running agents with a vector-store backend. If you copied the InMemoryVectorStore from the quickstart — the default configuration in most tutorials — you sit in the tighter risk circle of CVE-2026-26030.

SaaS vendors using Semantic Kernel as a library in their own agent products, often without it showing up explicitly in architecture diagrams. We've seen multiple reviews recently where Semantic Kernel was hiding as a pure implementation detail in requirements.txt.

Conclusion

Semantic Kernel isn't broken. But the two flaws of 7 May 2026 aren't coincidence. They're a direct consequence of the architectural assumption that model output and host code may exist in the same process as long as the framework filters well enough. That assumption doesn't hold — in no agent framework, from any vendor.

The question isn't when the next comparable CVE will land in Semantic Kernel, LangChain, or LlamaIndex. It's whether you experience the next one on an architecture in which model output is separated from host code by a hard process boundary — or on one that schedules another patch sprint.

Personal context and technical detail on plugin discipline in agent frameworks: ole-hartwig.eu.

Who is affected?

Reach is driven by the installed base of Semantic Kernel — in production, in PoCs, as a hidden library dependency. Three profiles from our advisory practice are acute today:

SetupMain riskTypical downstream cost
Mittelstand IT with an internal Copilot equivalent on Semantic Kernel + SharePoint/Exchange/ERP accessSessionsPythonPlugin writes a file into an autostart path, or an eval lambda runs foreign code in the agent processCross-tenant exfiltration from M365 mailboxes, ERP database access in the agent process context
Consultancies with Azure agents per client, long-running agents with a vector storeInMemoryVectorStore taken from quickstart default, eval lambda fires on every searchOne incident per client, reputational damage across the portfolio
SaaS vendors with Semantic Kernel as a hidden library dependencyIn requirements.txt as semantic-kernel==1.x.x without pin, automatic pull of an old versionRCE in the SaaS product, impact across all SaaS customers at once
.NET stack with SessionsPythonPlugin and Azure Container Apps Dynamic SessionsFile-write tool inadvertently in the model toolkitContainer escape depending on sandbox configuration

Cutting across these: every stack that built its own plugin system against Semantic Kernel — with kernel functions performing path or eval operations. The structural class hits three Microsoft products, but the architectural assumption behind it (model output and host code in the same process) hits LangChain, LlamaIndex, and homegrown frameworks too.

Mitigation and immediate actions

The short answer: harden the version, run a plugin inventory, move the vector-store filter away from the eval() path. Three steps with code:

Check and harden version

 

# Python
pip show semantic-kernel | grep Version
# Target: 1.39.4 or newer
pip install -U "semantic-kernel>=1.39.4"

# .NET
dotnet list package | grep Microsoft.SemanticKernel
# Target: 1.71.0 or newer
dotnet add package Microsoft.SemanticKernel --version 1.71.0

 

Plugin inventory — which kernel functions are model tools?

 

# Python: list all registered kernel functions
from semantic_kernel import Kernel
kernel = Kernel()
# ... import plugins ...
for plugin_name, plugin in kernel.plugins.items():
    for func_name, func in plugin.functions.items():
        print(f"{plugin_name}.{func_name}: {func.description}")
        # Check: does this function write files? Execute code?
        # If yes: check whether it's really needed as a model tool

 

Migrate the vector store away from the InMemory default

 

# Instead of InMemoryVectorStore with default filter:
from semantic_kernel.connectors.memory.in_memory import InMemoryVectorStore  # NO LONGER

# Move to Azure AI Search:
from semantic_kernel.connectors.memory.azure_ai_search import AzureAISearchStore
store = AzureAISearchStore(
    search_endpoint="https://<your-search>.search.windows.net",
    api_key=os.environ["AZURE_SEARCH_KEY"],
)

# Or to Postgres pgvector:
from semantic_kernel.connectors.memory.postgres import PostgresStore
store = PostgresStore(
    connection_string=os.environ["PG_CONNSTR"],
)

 

Defense in depth: sandbox in a separate process

Code execution that originates from model output doesn't belong in the agent process. The robust separation runs via a separate process with a hard capability drop: Wolfi OS container with a non-root user and dropped capabilities, Firecracker microVM with a minimal kernel footprint, or Azure Container Apps Dynamic Sessions with a separate service identity (not the agent identity).

 

# Example pod spec for a Wolfi agent sandbox
apiVersion: v1
kind: Pod
metadata:
  name: semantic-kernel-sandbox
spec:
  securityContext:
    runAsNonRoot: true
    runAsUser: 65534
    seccompProfile:
      type: RuntimeDefault
  containers:
    - name: sandbox
      image: cgr.dev/chainguard/python:latest
      securityContext:
        allowPrivilegeEscalation: false
        readOnlyRootFilesystem: true
        capabilities:
          drop: ["ALL"]

Detection and verification

Five core questions when Semantic Kernel runs in your stack:

# Find Semantic Kernel installations in the stack
for venv in $(find /opt -name 'pyvenv.cfg' 2>/dev/null); do
    venv_dir=$(dirname "$venv")
    "$venv_dir/bin/pip" show semantic-kernel 2>/dev/null | grep '^Version'
done
grep -rEn 'semantic[-_]kernel' /opt /srv 2>/dev/null

Operator recommendation

The recommendation depends on the setup. Four scenarios, four answers — with an operational decision grid upfront:

Decision grid: when to mitigate now, when to wait for a maintenance window?

German Mittelstand with Semantic Kernel agent

Version to 1.39.4 (Python) or 1.71.0 (.NET). Plugin inventory in two hours. Vector-store migration away from the InMemory default in the next sprint. If you copied from the quickstart: check whether InMemoryVectorStore is still in the code.

Enterprise with its own agent platform

Plugin allowlist discipline: each kernel function with a write or eval path needs its own architecture review before model exposure. Vector store on Azure AI Search or pgvector with real filter syntax instead of lambda eval. Sandbox boundary in a separate process for all code execution from model output.

Consultancies with client agents

One plugin audit report per client, with concrete version and configuration state. Standard procedure: remove quickstart defaults from the code, vector-store migration into the client sprint, client notice with a documented sandbox boundary.

SaaS vendors with Semantic Kernel as a library

Patch in the next release wave. Plus an explicit customer notice, because downstream customers may hold their own pins. Add to the SBOM report with a clear version statement.

What we actually did

After the Microsoft disclosure on 7 May we ran our own stack and all client stacks with agent architecture through a check — the same pattern as vm2 and Comment-and-Control:

This routine is the operational practice behind DevSecOps as a Service and the External IT Department. Methodically, Semantic Kernel sits in the same fabric as Comment-and-Control and vm2: AI-agent architecture is a workflow question, not a model question.

Technical deep dive

Both CVEs sit in Semantic Kernel's plugin architecture — the layer that bundles LLM calls, tool plugins, and memory backends into one framework. The structural break points are there:

CVE-2026-25592: SessionsPythonPlugin (.NET, before 1.71.0)

SessionsPythonPlugin is the component agents use to execute Python code in Azure Container Apps Dynamic Sessions. In the plugin, a DownloadFileAsync helper function was registered as a model tool — without robust path validation. A model with prompt injection in the input stream could call the tool and write a file to an arbitrary path on the host, including autostart or cron paths that subsequently execute automatically.

Microsoft patch in 1.71.0: DownloadFileAsync no longer exposed as a model tool, path validation enforced against an allowed working-directory prefix.

CVE-2026-26030: InMemoryVectorStore default filter (Python, before 1.39.4)

InMemoryVectorStore is the quickstart default — an in-memory backend for vector search without external dependency. The default filter for search queries was built as a Python lambda and evaluated against the search string via eval(). A prompt controlling the search string controls the lambda body and executes arbitrary Python code in the agent process.

Microsoft patch in 1.39.4: default filter switched to a parameterized predicate model, eval() removed.

Aspects for assessment

Frequently asked questions on Semantic Kernel CVE-2026-25592 / 26030

Wir nutzen kein Semantic Kernel — betrifft uns CVE-2026-25592 / 26030 trotzdem?+

Direkt nein, strukturell oft ja. Die zwei CVEs sind ein Lehrstuck für eine Klasse: Plugin-Architekturen, in denen das Modell Tools mit zu großem Aktionsradius aufrufen darf. Wer LangChain, LlamaIndex, Haystack oder ein eigenes Agent-Framework betreibt, sollte heute dieselbe Prüfung fahren — welche Funktionen sind als Modell-Tools registriert, was darf jeder Aufruf, und läuft Code-Ausführung im selben Prozess wie der Agent?

Reicht der Patch auf Semantic Kernel 1.39.4 / 1.71.0 — oder brauchen wir mehr?+

Für die zwei konkret offengelegten Lücken ja. Für die Klasse dahinter nein. Die strukturelle Frage — darf das Modell Funktionen aufrufen, die Pfade entgegennehmen oder Strings interpretieren? — bleibt mit dem Patch unbeantwortet. Wir empfehlen den Patch sofort einzuspielen und die Plugin-Liste auf das Notwendigste zu reduzieren.

Wir betreiben den InMemoryVectorStore aus dem Quickstart — wie migrieren wir auf Azure AI Search oder pgvector?+

Sofort patchen auf semantic-kernel ≥ 1.39.4, danach Migration auf einen durable Vector-Store planen. Azure AI Search für die Microsoft-Welt, Postgres mit pgvector für Selfhosted, Qdrant für Multi-Mandanten-Setups. Die Default-Filter dieser Produkte sind nicht mit eval() gebaut. Der InMemoryVectorStore bleibt sinnvoll für lokale Tests, nicht für Produktion mit untrusted Input.

Ist eine Azure-Container-Apps-Dynamic-Session als Sandbox sicher genug für Modell-Output-Code?+

Für Code-Ausführung aus Modell-Output ja — wenn die Session eine eigene Identität hat, ohne Bind-Mounts auf produktive Pfade, ohne weitreichende Netzregeln. Genau das war der Konstruktionsgedanke hinter dem SessionsPythonPlugin. CVE-2026-25592 zeigt, dass die zusätzlichen Helper-Funktionen rund um die Session sorgfältig kuratiert werden müssen — nicht jede Convenience-Methode gehört in den Werkzeugkasten des Modells.

Wie aufwändig ist eine Semantic-Kernel-Plugin-Inventur — was kostet das Audit?+

Für einen überschaubaren Stack (1–2 Agenten, 5–10 Plugins) typischerweise ein halber bis ein Tag. Wir ziehen die registrierten Kernel-Funktionen aus dem Code, klassifizieren nach Aktionsradius (read, write, code-exec, network), markieren Modelltools mit Pfad- oder Eval-Nähe und schlagen einen Migrationspfad vor. Der Mehrwert ist meistens nicht der Audit-Bericht, sondern dass danach klar ist, was der Agent eigentlich tut.

Bevor der nächste Plugin-Bypass kommt — sprechen wir über Ihre Tool-Schicht.

We audit your Semantic Kernel stack against CVE-2026-25592/26030 and plugin discipline.

You give us read access to your Semantic Kernel code — we audit version state (Python 1.39.4 / .NET 1.71.0), plugin inventory (which kernel functions are model tools with write or eval paths), vector-store path (InMemoryVectorStore default yes or no), sandbox boundary, and hand back an audit-ready report with a concrete migration path away from in-process eval.

This is the operational routine behind DevSecOps as a Service and the External IT Department — agent-architecture hardening as a workflow discipline, not a reaction to the next patch.

Termin direkt vereinbaren

Conclusion

Semantic Kernel isn't broken. But the two flaws of 7 May 2026 aren't coincidence — they're a direct consequence of the architectural assumption that model output and host code may exist in the same process as long as the framework filters well enough. That assumption doesn't hold in any agent framework.

What matters more operationally than the individual CVEs is the pattern behind them: every plugin system with dynamic tool selection and eval-like paths produces this class of flaw. Anyone with plugin allowlist discipline, vector-store migration away from in-process eval, and a sandbox boundary as a separate process answers the next comparable disclosure in hours, not in a patch sprint.

Realistic risk framing: high for Mittelstand companies with an internal Copilot equivalent on Semantic Kernel and productive model-tool access. Medium for stacks with a clear plugin inventory and a vector store on a durable backend. Low for setups that consistently push code execution from model output into a separate process. The question isn't when the next comparable CVE in Semantic Kernel, LangChain, or LlamaIndex will appear. It's whether you experience the next one on an architecture in which model output is separated from host code by a hard process boundary.