17 min read

OLG Hamm 4 UKl 3/25 — what the AI chatbot liability ruling means for the architecture of your dialog system

18 May 2026. The Higher Regional Court of Hamm ruled on 12 May 2026 that a GmbH is fully liable for false statements made by its AI chatbot — careful programming with clean training data does not relieve liability (case no. 4 UKl 3/25, § 5 German Unfair Competition Act). The senate has admitted appeal to the Federal Court of Justice. From the platform side, the central consequence is not new but now clearly articulated: a freely generating LLM on your own website is a business act, not a third party, and requires an architecture that takes this into account.

Ein walnussfarbener Letterpress-Setzkasten mit präzise einsortierten Messing-Bleilettern liegt zentriert auf glattem Schiefer; rechts daneben ein leerer Setzstab mit drei verstreuten, unbenannten Lettern und einer oxblutfarbenen Wachssiegel-Notiz mit Aktenzeichen-Andeutung; im oberen rechten Bildbereich eine messingbeschlagene Lupe — die kontrollierte Sprache im Kasten neben der ungebundenen Generierung, in kühlem Nordlicht.
AI-generated · gpt-image 2.0

TL;DR — 90 seconds

The ruling cements on the platform side what we have been saying in every architecture session since the AI Act trilogue in 2024: an unconstrained LLM on your own website is the most expensive form of customer dialog integration, because the legal tolerance for hallucinations is effectively zero. The response is not “no AI dialog”, but a controlled architecture.

QuestionAnswer
Affected?Any company that runs a generative LLM with external customer-facing reach on its own website — chatbots, voice assistants, AI search fields, automated customer service. Regardless of company size.
What does the ruling say?An AI chatbot is legally part of the business organisation, not an independent third party (§ 5 UWG). False statements are attributable to the company, even if the training data was correct. The duty-of-care defence (Verkehrssicherungspflicht) does not apply.
What remains open?The ruling is not yet legally binding — appeal to the Federal Court of Justice has been admitted. The UWG reasoning, however, is consistent with the trajectory of the AI Act (transparency, watermarking) and with GDPR jurisprudence on automated decisions. A complete reversal on appeal is unlikely.
Immediate platform-side step?Architecture audit of your dialog system: does the system generate freely from the foundation model, or is it grounded in a verified corpus? Who checks statements about persons, titles, prices, availability? Is there an audit trail to reconstruct, after the fact, what the system said when and on what basis?

What the ruling says — and what it explicitly does not

A private clinic in aesthetic medicine ran an AI chatbot on its website for appointment booking and patient enquiries. When asked about the qualifications of the two managing directors, the chatbot replied that they were specialists for plastic and aesthetic surgery, or specialists for aesthetic medicine — both designations that do not exist under the German medical training regulations. The North Rhine-Westphalia Consumer Centre sued for injunctive relief. The Higher Regional Court of Hamm upheld the claim.

The three load-bearing considerations of the senate, translated into platform language:

First, attribution. The chatbot is part of the business organisation of the defendant. The defence reflex often raised in the market (“the system is autonomous, we have no influence on it”) does not hold. Anyone who places an LLM system on their own website and lets it answer in their own external voice is acting commercially, even if the individual answer is generative and therefore not foreseeable.

Second, duty of care. Careful selection and preparation of training data does not relieve liability. This is the strategically most important statement of the ruling. The architecture debate often argues that a “model trained on correct data” is safe. The ruling explicitly rejects this logic: generative models produce false outputs even from correct input data, hallucination is a property of the generation process, not a data error. A Verkehrssicherungspflicht defence, i.e. invoking “appropriate care during setup”, therefore does not apply.

Third, UWG anchor. The false statements constitute prohibited business practice under § 5 para. 1, para. 2 no. 3 of the German Unfair Competition Act (misleading commercial practice regarding the person, qualifications or rights of the trader). This is a classical unfair competition anchor, not novel AI-specific territory — the senate applied existing competition law to AI outputs. This legally conservative construction is the reason why appeal to the BGH was admitted but a complete reversal is unlikely.

What the ruling explicitly does not say, and what is occasionally missed in market coverage: it does not prohibit operating AI chatbots. It prohibits a specific misleading statement. Anyone who architects a chatbot so that such statements cannot technically arise is not an addressee of the ruling. The platform side therefore has a clear task: constrain generative freedom where it can cause commercial harm, and tie the answer paths to verified sources.

The legal assessment in the individual case (who is liable for what, what duties apply concretely, how contracts with model providers are to be read) belongs in the hands of your lawyer or data protection officer. Here we discuss the implications for the platform and architecture side.

Who is affected on the platform side

The ruling addresses AI chatbots in customer or patient dialog. The platform implication reaches further, because the same mechanism applies to any generative answer layer with external reach. A rough sorting of the dialog layers in the German Mittelstand, as we encounter them in our existing work.

Architecture patternHallucination exposurePlatform response
Embedded off-the-shelf chatbot widget (e.g. foundation model directly in an iframe)High — the chatbot answers freely from the model, no corpus bindingArchitecture migration to a grounded RAG pipeline or tool-based architecture required
RAG pipeline with verified corpus, no access-class filteringMedium — grounded answers, but risk of cross-hallucinations when the corpus is inconsistentCorpus audit, add access-class layer, define escalation paths
RAG pipeline with access classes and handoff (e.g. Moselwal Business Agent)Low — generative answers only from the verified corpus, hard locks on legally or safety-sensitive topicsDocument the audit trail, maintain brand voice discipline
Tool-based web architecture (WebMCP) — the agent calls structured tools, does not generate statements about the providerVery low — the platform delivers structured tool returns, the generative part sits with the calling agentKeep tool definitions clean, document the liability boundary against agent providers
Generative search / autocomplete in your own frontend, no disclaimerMedium-high — appears to the user as a curated statement from the providerSource display, disclosure line, escalation link
AI content generation in the CMS backoffice, editorial sign-off before publicationLow (hallucination risk sits with the editor, not the system) — but watermarking duty from 02 December 2026 (AI Act Art. 50)C2PA manifest embedding, disclosure markup, provenance audit

Anyone still working in the top quadrant (off-the-shelf widget in an iframe), generating statements about their own company, persons, titles, prices, availability or appointments, has the most obvious platform problem after the Hamm ruling. The remaining patterns are manageable but each needs an audit stance.

Architectural implications

Three architectural consequences we have been addressing in client work immediately since 12 May.

First, generative answers need grounding. An LLM that answers freely from its foundation model is no longer tenable in customer-facing scenarios after the Hamm ruling. The architectural response is Retrieval-Augmented Generation (RAG): the model may only answer what is documented in the verified corpus. In the clinic case, this would have meant: if the person dataset contains no specialist titles, the system cannot invent any — generative freedom is constrained to the corpus. This is not sales folklore, this is the technical answer to the senate’s duty-of-care reasoning.

Second, tools beat texts wherever possible. Anyone who maps a structured workflow (appointment booking, search, availability check, form submission) as a generative text dialog pays in hallucination risk for a layer they do not need. A tool-based architecture, in which the agent calls a defined API, gets a structured response and displays it structured, shifts the hallucination risk from the provider to the agent. This is precisely the path standardised by the Web Model Context Protocol (WebMCP): websites expose tools, browser agents call them, the platform operator is liable for the tool returns (deterministic), the agent provider is liable for the generation (probabilistic). Clean cut, clear responsibilities.

Third, audit trail is not optional. When a chatbot makes a statement that is later challenged, it must be reconstructible what the system said when and on what data basis. After the Hamm ruling, this is not just a compliance reflex but an actual defence basis: anyone who can document that the corpus contained the correct information at the time of the answer and that the output was cleanly derived from it stands differently in a regulator conversation than someone who can only say “the system simply said that”. Content provenance with Ed25519 signing on the platform side, i.e. the auditable trail of what was in the system when, moves from an AI Act watermarking topic to a general duty-of-care topic.

Architectural responses at a glance

Four building blocks, in precisely this stacking. We build them together in client platform work, but each makes sense on its own — anyone starting with layer 1 today has the basis for layers 2 to 4 tomorrow.

Layer 1: Corpus grounding (RAG)

The generative answer path draws from a verified corpus, not from the foundation model. In the clinic case, this corpus would have been the structured person database (name, position, actual qualifications), and nothing not in the corpus may be cited. On TYPO3 platforms, this corpus can be built from person records, service descriptions, FAQ datasets and structured content blocks. The Schema.org Person markup in the frontend is then also GEO-ready (Generative Engine Optimisation) — the same structured data layer that grounds your own RAG helps external generative engines answer correctly about you.

Layer 2: Access classes and lock lists

Some topics the dialog system must not answer, even from the corpus: prices under negotiation, legal advice, medical advice, individual conditions, sensitive personal data. An access-class layer defines per topic area whether the system may answer, must escalate, or pauses with a standard referral. This is the operational answer to the duty of care — the system makes no statement about specialist titles if the person dataset has the field empty, and instead escalates.

Layer 3: Tool-based architecture (WebMCP)

Structured workflows run through tools, not through generative texts. An appointment booking is a form tool, an availability check is a read tool, an address change is a mutation tool. The agent calls the tool, gets a structured answer and renders it; the generative part is reduced to answer style, the content is deterministic. This architecture is also viable when the AI landscape looks different in two years — tools are a platform standard, not an AI trend.

Layer 4: Audit trail and provenance

Every answer of the system is logged with source reference, corpus state and timestamp. For sensitive topics, an Ed25519 signature is added: that is the evidence that holds in court or with a regulator. For AI-generated content with external reach, C2PA manifest embedding joins the picture, which is in any case required from 02 December 2026 under AI Act Article 50. Both lines (hallucination defence and watermarking duty) technically run on the same provenance backbone.

Architecture check for your dialog system

A pragmatic first-line check you can walk through with your IT lead in one sitting.

  1. We operate an AI dialog system with external reach on our website. If yes: continue. If no: the ruling does not affect you acutely on the platform side, only preventively.
  2. Our system can generate statements about persons, titles, qualifications, prices, availability or appointments. If yes: the hallucination exposure is platform question number one. If no: continue to 3.
  3. Our system answers exclusively from a verified corpus (RAG) and not freely from the foundation model. If yes: layer 1 is in place. If no: this is the central architectural gap.
  4. We have defined lock lists for topics the system must not answer even from the corpus (prices under negotiation, medical advice, legal advice, individual conditions). If no: layer 2 is missing.
  5. Structured workflows run through tools, not through generative texts. If no: examine where a tool-based architecture could replace the generative layer.
  6. We can reconstruct after the fact what our system said at a specific point in time, on what corpus basis, with which model state. If no: the audit trail is missing — and with it the platform-side defence basis.

Anyone who can clearly answer all six questions with “yes” has sorted the platform side substantially and brought the legal question into a manageable form. Anyone unsure on two or more questions has the point where an architecture audit concretely makes sense.

Platform recommendation

First an if/then overview for the most common constellations.

Act now if you operate a generative chatbot widget with external reach on your website that can produce statements about persons, titles, prices or availability. This is the constellation the ruling directly addresses — architecture audit and migration to a grounded RAG pipeline.

Plan in the quarterly window if you operate a RAG pipeline without an access-class layer, or map structured workflows generatively rather than tool-based. The architecture is viable, but the gaps are visible.

Review on a weekly cadence if you operate a tool-based architecture with RAG fallback and access classes. Audit trail discipline, corpus hygiene and brand voice maintenance are the ongoing tasks.

Mittelstand with off-the-shelf chatbot

If you currently have a foundation-model chatbot embedded as a widget (the usual “installed in five minutes” pattern), the architectural answer is replacement by a platform-native RAG pipeline. This is less effort than first impression suggests: the corpus exists in most cases already as CMS content, person records and FAQ datasets. The generative layer is swapped for a controlled answer pipeline, the external reach remains comparable, the hallucination risk drops drastically.

Mittelstand with their own RAG pipeline

If your pipeline already draws from a verified corpus, the next question is the access-class layer: are there topics categorically off-limits to the pipeline? In the clinic case, the answer would have been “medical qualifications are answered exclusively from the structured person dataset — and if the field is empty, the pipeline escalates to a human”. Implementing this logic in the pipeline is a contained project, not an architectural rebuild.

Mittelstand with AI-agents platform operation

If you operate or plan your own agent platform, WebMCP is the path that makes the architecture scale. Tools instead of texts, audit trail per tool call, defined liability boundary between platform operator and agent provider. This is the state we are currently working on with established clients in AI-agents platform operation.

Providers of generative AI applications

If you build your own products on top of foundation models (customer support bot, AI search, voice assistant) and offer them to your customers, the platform liability moves with you. The ruling is not conclusively interpreted here because the case addresses a B2C platform operation; the analogous application to B2B AI products is however likely. Architecturally we recommend the same stack as for your own applications: RAG, access classes, audit trail, tools where possible.

Frequently asked questions about the OLG Hamm ruling and AI architecture

Do we have to shut down our existing chatbot now?+

No, the ruling does not require a shutdown. It requires that your system’s outputs are not misleading. Anyone running an off-the-shelf chatbot with external reach has an architectural task: to bind the pipeline to a verified corpus and lock sensitive topics through access classes. This is platform work, not a shutdown reflex.

We use ChatGPT or Claude directly via the API. Does the foundation model take us out of liability?+

No. The foundation model is your subcontractor on the API layer, not an independent addressee of the UWG claim. You are the commercial actor in the external-facing role, the model provider is not. Contractually you can negotiate risk-allocation clauses with the provider, but the UWG line from the Hamm ruling lands with you as platform operator. The review of your concrete contract and liability situation belongs to your lawyer.

We have built in a “this chatbot may make mistakes” disclaimer. Is that enough?+

Probably not in the sense of the ruling. A disclaimer does not change attribution — the system remains part of your business organisation. Disclaimers are useful as part of the transparency line (AI Act Art. 50), but they are not a defence against UWG claims regarding concretely misleading statements. The architectural answer is not “disclaimer plus free generation”, but constrained generation.

How large is the architectural overhaul to a RAG pipeline in practice?+

If your content is already in a structured CMS (TYPO3, Sylius, comparable systems), the pipeline build-out is typically a four-to-six-week project: corpus preparation, embedding layer, answer pipeline, access-class layer, audit trail. If the content is in unstructured form (PDFs, static web pages, Word documents), a corpus preparation phase is added in front, three to four weeks more. This is in the order of magnitude of what a Mittelstand company plans for a mid-sized platform project, not a large project.

We deploy browser agents (Claude in Chrome, Operator) on the end-customer side. What does the ruling mean for us as website operator?+

On the website side the answer does not change because of this: what your website actually displays and which tools your frontend exposes falls under your responsibility. What a browser agent derives generatively from page content and shows to the end customer in the agent frontend is a generation by the agent provider, not by the website. A WebMCP tool layer makes this separation explicit — tools deliver deterministic answers, the agent renders them generatively in its own context. Clean interface, clean liability allocation.

Does the ruling affect existing applications from 2024/2025 that still run without RAG?+

Architecturally: yes. Legally: that is for your lawyer to assess in the individual case. What we can say technically: anyone running an existing application on a foundation-model layer without corpus binding and producing external reach has a platform risk that becomes visible after the Hamm ruling. A risk inventory is worthwhile regardless of whether the appeal upholds the ruling in a year’s time — the UWG line stands, other addressees will follow.

Verdict

The Higher Regional Court of Hamm in 4 UKl 3/25 has not set a new legal threshold but clearly articulated an existing one. That is what makes the ruling so important: it is not a technology-hostile special rule but a conservative application of unfair competition law to AI outputs. The UWG line is robust, the appeal may refine the construction but is unlikely to overturn it.

The question is not whether the ruling applies to you — the platform implications are size-independent and apply wherever a generative model produces external reach. The question is whether your dialog architecture addresses the hallucination question in its construction or hopes for a disclaimer. From our perspective in platform work, the answer is unambiguous: controlled architecture, grounded generation, tool-based workflows where possible, audit trail by default. This is the architecture we have been building with for two years, and this is the architecture the ruling has now made explicit for everyone else.

This article discusses the platform and architectural implications of the OLG Hamm ruling. It is not legal advice and does not replace a case-specific assessment by your lawyer or data protection officer.

Before the next consumer-centre warning arrives — let’s talk about the architecture of your dialog system.

We audit, migrate and harden your AI dialog architecture — RAG, access classes, WebMCP, audit trail.

We build the platform side of controlled AI dialog architectures — the legal assessment stays with your lawyer or data protection officer. Concretely: architecture audit of your existing dialog system, migration to a RAG pipeline with verified corpus, access-class layer for sensitive topics, tool-based architecture for structured workflows (WebMCP), audit trail with Ed25519 provenance, C2PA manifest embedding for the watermarking duty from 02 December 2026.

If you, as managing director of a German Mittelstand company, want to sort the platform side of your AI dialog system before the next regulator or consumer-centre enquiry, let’s talk before the upcoming management session. Have a prior look at our AI-Ready CMS as a Service, our AI agents platform operation and our services overview.

Book a time directly

About the author

[Translate to English:] Foto von Kai Ole Hartwig.

Kai Ole Hartwig

Founder · Moselwal Digitalagentur · OnlyOle

Programming since 2002 – self-taught, set up my own business with KO-Web in 2012, now Moselwal. Over 100 projects, with a focus on security, performance, automation and quality.