Semantic Kernel: when the prompt becomes the shell — what the Microsoft disclosure means for your agent architecture

On 7 May 2026 Microsoft disclosed two vulnerabilities in Semantic Kernel, the agent framework the company uses to plug its .NET and Python world into the idea of „tool-using LLMs“. CVE-2026-25592 and CVE-2026-26030 turn a simple prompt injection into remote code execution — one sentence, one malicious input, and the separation between model output and operating system is gone.
What has changed? Two Semantic Kernel CVEs turn prompt injection into a direct code-execution path on the host. Who is affected? German Mittelstand companies with an internal Copilot equivalent on Semantic Kernel, consulting firms with custom Azure agents, SaaS vendors with Semantic Kernel as a requirements.txt entry. What should you read today? Version status, plugin inventory, sandbox boundary — in that order.
TL;DR — the 90-second summary
- Affected?
Microsoft Semantic Kernel .NET before 1.71.0 (CVE-2026-25592, SessionsPythonPlugin) and Python before 1.39.4 (CVE-2026-26030, InMemoryVectorStore default filter). Indirectly: every German Mittelstand stack with Semantic Kernel in its agent architecture, often invisible as a
requirements.txtentry.- Risk?
Prompt injection → remote code execution on the host. CVE-25592: file-write tool unintentionally exposed, no path validation. CVE-26030: default vector-store filter uses
eval()on a user-controlled string.- Immediate action?
Check the version, run a plugin inventory (which kernel functions are model tools?), move the default
InMemoryVectorStoreto Azure AI Search or pgvector.- Recommendation?
German Mittelstand with a Semantic Kernel agent: patch + plugin audit. Enterprise: additionally introduce a sandbox boundary as a separate process (Wolfi OS, Firecracker, Azure Dynamic Sessions with separate identity).
- Criticality?
High (see badge in the page header).
What is the problem?
On 7 May 2026 Microsoft published two vulnerabilities in Semantic Kernel on its security blog — the agent framework the company uses to plug its .NET and Python world into the idea of „tool-using LLMs“. Both flaws — CVE-2026-25592 and CVE-2026-26030 — turn a simple prompt injection into remote code execution on the host. One sentence, one malicious input, and the separation between model output and operating system is gone.
This isn't a footnote. In the Microsoft world, Semantic Kernel is the natural on-ramp into agent architectures for many Mittelstand companies — shaped by Azure ties, .NET familiarity, and existing licenses. Anyone who opened a first agent against internal data in the last twelve months is, with high probability, running an affected version.
What Semantic Kernel is — and why the two flaws sit so uncomfortably
Semantic Kernel is Microsoft's answer to LangChain and LlamaIndex: an orchestration layer that bundles LLM calls, tool plugins, and memory backends into one framework. Developers define „kernel functions“ the model is allowed to call — file helpers, database queries, searches over internal documents. That plugin layer is the break point.
CVE-2026-25592 hits the .NET line before version 1.71.0. In SessionsPythonPlugin — the component agents use to execute Python code in Azure Container Apps sandboxes — a DownloadFileAsync helper function was inadvertently exposed as a model-callable tool, without robust path validation. A malicious instruction in the prompt stream could get the model to write a file to an arbitrary path on the host, including paths that are then automatically executed.
CVE-2026-26030 hits the Python line before version 1.39.4. The condition is narrower but more common than it seems: anyone running the InMemoryVectorStore as the backend for a search plugin with the default filter behavior, and getting untrusted input into the agent through any path, has a problem. The default filter was built as a Python lambda and evaluated against the search string via eval(). A prompt controlling the search string controls the lambda body.
Why „prompt injection isn't a big deal“ doesn't hold here
In many architecture diagrams the separation is clean: the model thinks, the framework executes, the host stays untouched. That separation is a narrative — it only holds as long as every tool the model can call is restrictive and parameterized.
Both Semantic Kernel flaws show the same defect with different faces. In CVE-2026-25592 an internal helper inadvertently becomes a model tool. In CVE-2026-26030 a user-controlled string is run through eval(). Neither is a „dumb bug“ — both are direct consequences of plugin architectures exposing too much out of convenience.
In our advisory practice we see the pattern recur. A plugin gets built because a concrete use case needs it. Three weeks later the model calls that function in production in combinations the author never anticipated. That isn't „misuse“ — it's the definition of an agent.
What we concretely recommend
First: anyone running Semantic Kernel in production has three tasks today — check the version (Python ≥ 1.39.4, .NET SDK ≥ 1.71.0), run a plugin inventory (which kernel functions are registered as model tools?), and harden the vector-store filter (move from InMemoryVectorStore with default filter to durable stores like Azure AI Search or Postgres pgvector).
Second — and this is the structural question behind both CVEs: a sandbox that runs in the same process as the agent is never a real sandbox. We've been recommending the same line for months as in the vm2 post: code execution that originates from model output belongs in a separate process with a hard capability drop. Wolfi OS containers, Firecracker microVMs, Azure Container Apps Dynamic Sessions with separate identity.
Third — the uncomfortable truth for many boards: these flaws won't be the last of this build type. Every plugin system with dynamic tool selection and eval-like paths will produce them. The question isn't whether a comparable CVE will land next quarter — the question is whether your architecture survives it or whether it lands in production.
What we deliberately don't recommend
We don't recommend replacing Semantic Kernel out of reflex. The framework isn't „broken“; the two CVEs are addressed cleanly in the patches. Anyone who has chosen the Microsoft world — for Azure integration, .NET stack, compliance ties — should keep using Semantic Kernel, but with the plugin discipline outlined here.
We equally don't recommend solving prompt injection with „better system prompts“. Microsoft itself writes in the disclosure with German-style clarity: prompt injection is a class, not a single bug. As long as a model processes untrusted input and can call tools, the defense isn't in the model — it's in the tool layer.
Who is most affected
German Mittelstand IT departments that have built an internal Copilot equivalent on Semantic Kernel in the last nine months — often as a „we want to stay independent of Microsoft Copilot“ argument. These stacks typically have access to SharePoint, Exchange mailboxes, ERP data. The vulnerability chain between input and host is exactly the plugin layer here.
Consultancies and audit firms building bespoke agent solutions for clients on Azure, running them as long-running agents with a vector-store backend. If you copied the InMemoryVectorStore from the quickstart — the default configuration in most tutorials — you sit in the tighter risk circle of CVE-2026-26030.
SaaS vendors using Semantic Kernel as a library in their own agent products, often without it showing up explicitly in architecture diagrams. We've seen multiple reviews recently where Semantic Kernel was hiding as a pure implementation detail in requirements.txt.
Conclusion
Semantic Kernel isn't broken. But the two flaws of 7 May 2026 aren't coincidence. They're a direct consequence of the architectural assumption that model output and host code may exist in the same process as long as the framework filters well enough. That assumption doesn't hold — in no agent framework, from any vendor.
The question isn't when the next comparable CVE will land in Semantic Kernel, LangChain, or LlamaIndex. It's whether you experience the next one on an architecture in which model output is separated from host code by a hard process boundary — or on one that schedules another patch sprint.
Personal context and technical detail on plugin discipline in agent frameworks: ole-hartwig.eu.
Who is affected?
Reach is driven by the installed base of Semantic Kernel — in production, in PoCs, as a hidden library dependency. Three profiles from our advisory practice are acute today:
| Setup | Main risk | Typical downstream cost |
|---|---|---|
| Mittelstand IT with an internal Copilot equivalent on Semantic Kernel + SharePoint/Exchange/ERP access | SessionsPythonPlugin writes a file into an autostart path, or an eval lambda runs foreign code in the agent process | Cross-tenant exfiltration from M365 mailboxes, ERP database access in the agent process context |
| Consultancies with Azure agents per client, long-running agents with a vector store | InMemoryVectorStore taken from quickstart default, eval lambda fires on every search | One incident per client, reputational damage across the portfolio |
| SaaS vendors with Semantic Kernel as a hidden library dependency | In requirements.txt as semantic-kernel==1.x.x without pin, automatic pull of an old version | RCE in the SaaS product, impact across all SaaS customers at once |
| .NET stack with SessionsPythonPlugin and Azure Container Apps Dynamic Sessions | File-write tool inadvertently in the model toolkit | Container escape depending on sandbox configuration |
Cutting across these: every stack that built its own plugin system against Semantic Kernel — with kernel functions performing path or eval operations. The structural class hits three Microsoft products, but the architectural assumption behind it (model output and host code in the same process) hits LangChain, LlamaIndex, and homegrown frameworks too.
Mitigation and immediate actions
The short answer: harden the version, run a plugin inventory, move the vector-store filter away from the eval() path. Three steps with code:
Check and harden version
# Python
pip show semantic-kernel | grep Version
# Target: 1.39.4 or newer
pip install -U "semantic-kernel>=1.39.4"
# .NET
dotnet list package | grep Microsoft.SemanticKernel
# Target: 1.71.0 or newer
dotnet add package Microsoft.SemanticKernel --version 1.71.0
Plugin inventory — which kernel functions are model tools?
# Python: list all registered kernel functions
from semantic_kernel import Kernel
kernel = Kernel()
# ... import plugins ...
for plugin_name, plugin in kernel.plugins.items():
for func_name, func in plugin.functions.items():
print(f"{plugin_name}.{func_name}: {func.description}")
# Check: does this function write files? Execute code?
# If yes: check whether it's really needed as a model tool
Migrate the vector store away from the InMemory default
# Instead of InMemoryVectorStore with default filter:
from semantic_kernel.connectors.memory.in_memory import InMemoryVectorStore # NO LONGER
# Move to Azure AI Search:
from semantic_kernel.connectors.memory.azure_ai_search import AzureAISearchStore
store = AzureAISearchStore(
search_endpoint="https://<your-search>.search.windows.net",
api_key=os.environ["AZURE_SEARCH_KEY"],
)
# Or to Postgres pgvector:
from semantic_kernel.connectors.memory.postgres import PostgresStore
store = PostgresStore(
connection_string=os.environ["PG_CONNSTR"],
)
Defense in depth: sandbox in a separate process
Code execution that originates from model output doesn't belong in the agent process. The robust separation runs via a separate process with a hard capability drop: Wolfi OS container with a non-root user and dropped capabilities, Firecracker microVM with a minimal kernel footprint, or Azure Container Apps Dynamic Sessions with a separate service identity (not the agent identity).
# Example pod spec for a Wolfi agent sandbox
apiVersion: v1
kind: Pod
metadata:
name: semantic-kernel-sandbox
spec:
securityContext:
runAsNonRoot: true
runAsUser: 65534
seccompProfile:
type: RuntimeDefault
containers:
- name: sandbox
image: cgr.dev/chainguard/python:latest
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop: ["ALL"]
Detection and verification
Five core questions when Semantic Kernel runs in your stack:
- Which version is active?
pip show semantic-kernelordotnet list package. Before 1.39.4 / 1.71.0: patch immediately. - Which kernel functions are model tools? Walk the plugin list, flag functions with write or eval paths.
- Which vector store?
InMemoryVectorStorewith default filter → eval path open → migrate. - Which input sources? Customer chat, wiki, email — each can carry prompt injection.
- Is code execution in the agent process? If yes: introduce a sandbox boundary.
# Find Semantic Kernel installations in the stack
for venv in $(find /opt -name 'pyvenv.cfg' 2>/dev/null); do
venv_dir=$(dirname "$venv")
"$venv_dir/bin/pip" show semantic-kernel 2>/dev/null | grep '^Version'
done
grep -rEn 'semantic[-_]kernel' /opt /srv 2>/dev/null
Operator recommendation
The recommendation depends on the setup. Four scenarios, four answers — with an operational decision grid upfront:
Decision grid: when to mitigate now, when to wait for a maintenance window?
- Mitigate immediately if the agent processes untrusted input (customer chat, wiki, email) and can call tools with write or eval paths.
- Maintenance window acceptable if the agent only runs in a purely internal test environment and the plugin set is limited to read-only functions.
- InMemoryVectorStore in production? Always immediately — default filter is
eval()on a user-controlled string. - SaaS vendor with library dependency? Patch in the next release plus a customer notice, because downstream customers may be pinned.
German Mittelstand with Semantic Kernel agent
Version to 1.39.4 (Python) or 1.71.0 (.NET). Plugin inventory in two hours. Vector-store migration away from the InMemory default in the next sprint. If you copied from the quickstart: check whether InMemoryVectorStore is still in the code.
Enterprise with its own agent platform
Plugin allowlist discipline: each kernel function with a write or eval path needs its own architecture review before model exposure. Vector store on Azure AI Search or pgvector with real filter syntax instead of lambda eval. Sandbox boundary in a separate process for all code execution from model output.
Consultancies with client agents
One plugin audit report per client, with concrete version and configuration state. Standard procedure: remove quickstart defaults from the code, vector-store migration into the client sprint, client notice with a documented sandbox boundary.
SaaS vendors with Semantic Kernel as a library
Patch in the next release wave. Plus an explicit customer notice, because downstream customers may hold their own pins. Add to the SBOM report with a clear version statement.
What we actually did
After the Microsoft disclosure on 7 May we ran our own stack and all client stacks with agent architecture through a check — the same pattern as vm2 and Comment-and-Control:
- Inventory on 7 May, evening.
pip show semantic-kernelanddotnet list packageacross all client repositories. Result: 9 repos with active Semantic Kernel dependency, 6 in Python and 3 in .NET. 4 below patch version. - Plugin inventory on 8 May. Per repo we read the
@kernel_functiondecorations. 2 repos with a file-write tool in the model toolkit, 1 repo withSessionsPythonPlugincall. Marked as „patch this morning“. - Vector-store audit on 8 May. 3 repos with
InMemoryVectorStorefrom quickstart default. Migration to Azure AI Search or pgvector scheduled into the sprint. - Patch wave on 8–9 May. All 4 pre-patch repos pulled to 1.39.4 / 1.71.0, tests run, deployed.
- Sandbox-boundary audit on 9 May. Two clients still run code execution from model output in the agent process. Marked as architectural task for the next two sprints.
- What we deliberately didn't do. No Semantic Kernel replacement with LangChain or LlamaIndex — the architectural assumption (model output and host code in the same process) hits all three. No „system-prompt hardening“, because OWASP LLM01 marks that as structurally unsolvable.
This routine is the operational practice behind DevSecOps as a Service and the External IT Department. Methodically, Semantic Kernel sits in the same fabric as Comment-and-Control and vm2: AI-agent architecture is a workflow question, not a model question.
Technical deep dive
Both CVEs sit in Semantic Kernel's plugin architecture — the layer that bundles LLM calls, tool plugins, and memory backends into one framework. The structural break points are there:
CVE-2026-25592: SessionsPythonPlugin (.NET, before 1.71.0)
SessionsPythonPlugin is the component agents use to execute Python code in Azure Container Apps Dynamic Sessions. In the plugin, a DownloadFileAsync helper function was registered as a model tool — without robust path validation. A model with prompt injection in the input stream could call the tool and write a file to an arbitrary path on the host, including autostart or cron paths that subsequently execute automatically.
Microsoft patch in 1.71.0: DownloadFileAsync no longer exposed as a model tool, path validation enforced against an allowed working-directory prefix.
CVE-2026-26030: InMemoryVectorStore default filter (Python, before 1.39.4)
InMemoryVectorStore is the quickstart default — an in-memory backend for vector search without external dependency. The default filter for search queries was built as a Python lambda and evaluated against the search string via eval(). A prompt controlling the search string controls the lambda body and executes arbitrary Python code in the agent process.
Microsoft patch in 1.39.4: default filter switched to a parameterized predicate model, eval() removed.
Aspects for assessment
- OWASP LLM01 as the textbook case. Untrusted input + tool call = the class — documented state of the art since 2024.
- Plugin inconsistency in the default. Both flaws only hit if you adopt quickstart defaults. Anyone with their own plugin discipline isn't affected or is less so. That's the asymmetry between „copied from the tutorial“ and „designed own architecture“.
- Cross-vendor pattern. Microsoft documented the patch cleanly — LangChain and LlamaIndex have shown comparable architectural weaknesses in recent months. The structural class isn't Microsoft-specific.
- Sandbox assumption. A sandbox running in the same process as the agent isn't a sandbox. Real separation needs a separate process with a hard capability drop — Wolfi OS containers, Firecracker, Azure Container Apps Dynamic Sessions with separate identity.
Frequently asked questions on Semantic Kernel CVE-2026-25592 / 26030
Wir nutzen kein Semantic Kernel — betrifft uns CVE-2026-25592 / 26030 trotzdem?+
Direkt nein, strukturell oft ja. Die zwei CVEs sind ein Lehrstuck für eine Klasse: Plugin-Architekturen, in denen das Modell Tools mit zu großem Aktionsradius aufrufen darf. Wer LangChain, LlamaIndex, Haystack oder ein eigenes Agent-Framework betreibt, sollte heute dieselbe Prüfung fahren — welche Funktionen sind als Modell-Tools registriert, was darf jeder Aufruf, und läuft Code-Ausführung im selben Prozess wie der Agent?
Reicht der Patch auf Semantic Kernel 1.39.4 / 1.71.0 — oder brauchen wir mehr?+
Für die zwei konkret offengelegten Lücken ja. Für die Klasse dahinter nein. Die strukturelle Frage — darf das Modell Funktionen aufrufen, die Pfade entgegennehmen oder Strings interpretieren? — bleibt mit dem Patch unbeantwortet. Wir empfehlen den Patch sofort einzuspielen und die Plugin-Liste auf das Notwendigste zu reduzieren.
Wir betreiben den InMemoryVectorStore aus dem Quickstart — wie migrieren wir auf Azure AI Search oder pgvector?+
Sofort patchen auf semantic-kernel ≥ 1.39.4, danach Migration auf einen durable Vector-Store planen. Azure AI Search für die Microsoft-Welt, Postgres mit pgvector für Selfhosted, Qdrant für Multi-Mandanten-Setups. Die Default-Filter dieser Produkte sind nicht mit eval() gebaut. Der InMemoryVectorStore bleibt sinnvoll für lokale Tests, nicht für Produktion mit untrusted Input.
Ist eine Azure-Container-Apps-Dynamic-Session als Sandbox sicher genug für Modell-Output-Code?+
Für Code-Ausführung aus Modell-Output ja — wenn die Session eine eigene Identität hat, ohne Bind-Mounts auf produktive Pfade, ohne weitreichende Netzregeln. Genau das war der Konstruktionsgedanke hinter dem SessionsPythonPlugin. CVE-2026-25592 zeigt, dass die zusätzlichen Helper-Funktionen rund um die Session sorgfältig kuratiert werden müssen — nicht jede Convenience-Methode gehört in den Werkzeugkasten des Modells.
Wie aufwändig ist eine Semantic-Kernel-Plugin-Inventur — was kostet das Audit?+
Für einen überschaubaren Stack (1–2 Agenten, 5–10 Plugins) typischerweise ein halber bis ein Tag. Wir ziehen die registrierten Kernel-Funktionen aus dem Code, klassifizieren nach Aktionsradius (read, write, code-exec, network), markieren Modelltools mit Pfad- oder Eval-Nähe und schlagen einen Migrationspfad vor. Der Mehrwert ist meistens nicht der Audit-Bericht, sondern dass danach klar ist, was der Agent eigentlich tut.
We audit your Semantic Kernel stack against CVE-2026-25592/26030 and plugin discipline.
You give us read access to your Semantic Kernel code — we audit version state (Python 1.39.4 / .NET 1.71.0), plugin inventory (which kernel functions are model tools with write or eval paths), vector-store path (InMemoryVectorStore default yes or no), sandbox boundary, and hand back an audit-ready report with a concrete migration path away from in-process eval.
This is the operational routine behind DevSecOps as a Service and the External IT Department — agent-architecture hardening as a workflow discipline, not a reaction to the next patch.
Conclusion
Semantic Kernel isn't broken. But the two flaws of 7 May 2026 aren't coincidence — they're a direct consequence of the architectural assumption that model output and host code may exist in the same process as long as the framework filters well enough. That assumption doesn't hold in any agent framework.
What matters more operationally than the individual CVEs is the pattern behind them: every plugin system with dynamic tool selection and eval-like paths produces this class of flaw. Anyone with plugin allowlist discipline, vector-store migration away from in-process eval, and a sandbox boundary as a separate process answers the next comparable disclosure in hours, not in a patch sprint.
Realistic risk framing: high for Mittelstand companies with an internal Copilot equivalent on Semantic Kernel and productive model-tool access. Medium for stacks with a clear plugin inventory and a vector store on a durable backend. Low for setups that consistently push code execution from model output into a separate process. The question isn't when the next comparable CVE in Semantic Kernel, LangChain, or LlamaIndex will appear. It's whether you experience the next one on an architecture in which model output is separated from host code by a hard process boundary.
