Medium

What makes a platform AI-ready — four layers beneath the Cloudflare score

Q: Is a sitemap.xml enough for LLM crawlers?

No. A sitemap.xml serves classical search crawlers. LLM crawlers operate with two profiles: training and retrieval. A curated llms.txt plus llms-full.txt is therefore additionally required. Without that layer, your platform is practically invisible to retrieval providers like Perplexity or Brave Search.

Q: What's the difference between ai-train and ai-input?

ai-train controls whether your content may be used to train new models. ai-input controls whether it may be retrieved to answer specific user questions (retrieval). For most SMEs the default is: deny ai-train without a licence, allow ai-input, because visibility in answer generation depends on it.

Drei mattschwarze Server-Edge-Boxen mit sage-grünen Patch-Kabeln auf einem Mosel-Schiefer-Tisch, im Hintergrund modernes Moselhaus mit Fensterblick auf sonnige Weinberg-Terrassen und ein aufgeklappter Laptop mit Code-Diff aus einer llms.txt.

In April 2026, Cloudflare published isitagentready.com, a diagnostic for AI agents. Four dimensions, one score, a sober verdict across 200,000 top sites: “not ready”. We are listed in the corpus at 100/100 / Level 5 “Agent-Native” — our reading from 19 April. The score itself, however, is only the diagnosis. What carries the score are four layers beneath it that every platform has to build if it wants to stay visible in an LLM-mediated world.

What has changed? Schema.org and sitemaps are not enough anymore. LLMs read differently from search engines — they chunk, they look for citation-worthy passages, they follow content signals instead of backlinks. Who is affected? Every SME whose plans to be found in Perplexity, ChatGPT, Claude or Gemini are running on classical SEO routines. What should you read today? The four layers — discoverability, content model, bot access control, API/MCP/skill discovery — with a concrete build plan and decision block.

12. May 2026

TL;DR

The Cloudflare score is the diagnosis. The four layers below are the therapy. If you want to stay visible in the LLM world, you have to build all four — not one.

Layer 1 — Discoverability: llms.txt, sitemap hygiene, RSS discovery, .well-known. Makes the platform discoverable to agents in the first place.
Layer 2 — Content model: Chunking, definition lists, retrievable H3-FAQ structures, clear semantic anchors. Makes the content quotable, not just readable.
Layer 3 — Bot access control: Content Signals Protocol (ai-train, search, ai-input), differentiated crawler allowlist, clear separation of training and retrieval. Turns „allowed“ or „forbidden“ into a per-purpose decision.
Layer 4 — API/Auth/MCP/skill discovery: WebMCP, openapi.yaml, .well-known discovery, skill manifests. Makes the platform agent-operable, not just readable.

Three sentences for decision-makers: Classical SEO routines only half-cover layer 2 and miss layer 1 entirely. Ignoring layer 3 forfeits negotiating power against training crawlers. Ignoring layer 4 leaves you a passive document in the agent web instead of an operable system.

Layer 1 — Discoverability: so agents know you exist

Discoverability is the precondition. If an LLM agent cannot find your platform, no amount of content architecture will help. Cloudflare checks three signals here that we keep active on our own platform: llms.txt at the root, a separate llms-full.txt for the full content, and an RSS discovery that lists not only the blog but service pages and knowledge base entries as well.

What classical SEO forfeits here

A sitemap.xml is enough for search engines, but not for agents. LLM crawlers operate with two distinct profiles: training crawlers want full text and clean markup; retrieval crawlers want chunks and citation passages. The llms.txt gives the retrieval profile a curated map of the platform — what may be quoted, what is canonical, which pages belong together thematically.

What we do at Moselwal

We generate llms.txt and llms-full.txt via the extension moselwal/semantic-delivery from TYPO3 content. One curated index per site (moselwal.de, ole-hartwig.eu, blog.ole-hartwig.eu, nozzleops.de), referencing RSS discovery. The site sets load it automatically — no manual upkeep.

Quick check for your platform

curl -sI your-domain.tld/llms.txt | grep -i content-type
curl -s your-domain.tld/llms.txt | head -20
curl -sI your-domain.tld/.well-known/ai-plugin.json

If all three calls return 200s with correct content types, layer 1 is at least structurally in place. Content quality is a separate question.

Layer 2 — Content model: so you get quoted, not just indexed

Layer 2 is where most SMEs fall short. Not from laziness — but because classical CMS thinking structures texts as essays. LLMs read in chunks. They look for clearly separable answer blocks, where question and answer, term and definition, symptom and remedy sit together.

What “retrievable” means, concretely

Definition lists (<dl>) instead of running prose when the topic is term differentiation. LLM crawlers recognise the question-answer pattern structurally, not just after tokenisation.
H3-FAQ blocks where the H3 heading carries the question and the paragraph below carries the answer. One question, one answer, no nesting.
Quick-check snippets inline — one command, one check, one result. These get cited more often than any long explanatory paragraph.
Operational decision blocks with if-then logic. “If your platform does X, then Y is the right answer.” LLMs build decision paths from these.

Why markdown matters more

LLMs can read HTML, but they understand Markdown better. An <h2> is an H2; a ## Heading is an H2 plus a semantic signal that a deliberate structural break sits here. That is why we generate not only HTML from TYPO3 but Markdown representations as well — llms-full.txt ships the curated content as Markdown.

What classical SEO gets half-right

FAQ schema and HowTo schema are a start, not a replacement. Schema.org markup is a claim about content; chunkability sits in the content itself. If the FAQ section hides six long essays beneath its questions, the schema helps nobody — not search engines, not LLMs.

Cross-link to the operational practice

How we implement this in TYPO3 is covered by the deep dive Optimising TYPO3 for AI retrieval, with a concrete build plan for content blocks, fields, templates and semantic-delivery.

Layer 3 — Bot access control: allow, deny, differentiate

This is where it gets political. Layer 3 is your bargaining position: who may see your content, for what purpose, and at what volume. Cloudflare checks whether the platform differentiates between training, search indexing and retrieval (“ai-input”). A blanket robots.txt with Disallow: / for all AI bots is a political gesture but an operational own-goal — it also blocks the retrieval that drives your visibility in answer generation.

Content Signals Protocol as operational reality

Cloudflare rolled out the Content Signals Protocol in April 2026, an extension of robots.txt logic by three signals: ai-train (training allowed/forbidden), search (classical search indexing), ai-input (retrieval/RAG for answer generation). The three need separate decisions:

ai-train: decide deliberately. If your content is copyright-protected and you have not granted a licence, disallow is the correct default.
search: generally allow. Visibility in Google, Bing, Brave, Kagi depends on this.
ai-input: differentiate. For a service page that customers should find in Perplexity: allow. For internal knowledge base entries: disallow.

Why allowlist instead of denylist

Crawler lists grow faster than robots.txt rules get maintained. An allowlist (“allow these crawlers, block all others”) at the edge — Caddy, Cloudflare Worker, NGINX — is more robust than robots.txt entries that no crawler is required to honour. We run this in Caddy with CrowdSec as WAF; our allowlist covers around 12 crawlers across search engines, retrieval providers and archives.

Quick check for your platform

curl -s your-domain.tld/robots.txt | grep -iE 'ai-train|search|ai-input'
curl -A 'PerplexityBot' -s -o /dev/null -w '%{http_code}\n' your-domain.tld
curl -A 'GPTBot' -s -o /dev/null -w '%{http_code}\n' your-domain.tld

If robots.txt knows none of the three directives and every crawler gets a 200, layer 3 is unconfigured — you have forfeited every bargaining position.

Layer 4 — API/Auth/MCP/skill discovery: operable, not just readable

Layers 1–3 make your platform readable to agents. Layer 4 makes it operable. This decides whether, in two years, an AI agent can only mention your platform (“Moselwal offers platform operations, here is the link”) or actually act on it (“I have just booked you a first call / pulled a status report / created an enquiry”).

Four building blocks for layer 4

openapi.yaml with clear operation IDs, clean schemas and example requests. Curated, not generated.
.well-known discovery with /.well-known/ai-plugin.json or the corresponding protocol manifest pointing to the OpenAPI spec.
WebMCP or an equivalent skill manifest that describes not the HTTP layer but the operational logic: which action triggers what, which auth applies, which quotas hold.
Auth differentiated into useful and protected endpoints. Service endpoints (“retrieve my enquiry status”) need a user token; discovery endpoints (“list available actions”) are public.

Why this is no hype for SMEs

SMEs often equate layer 4 with “agent API for end customers” and push it to 2027. The real lever is internal use today: your staff use Claude or ChatGPT, ask about internal data, want enquiries created. If your own platforms have no MCP/skill manifest, every employee builds workarounds — copy-paste, screenshots to AI, manual briefings. Layer 4 is therefore an internal investment first.

Quick check for your platform

curl -s your-domain.tld/.well-known/ai-plugin.json | jq .
curl -s api.your-domain.tld/openapi.yaml | head -40
curl -s your-domain.tld/.well-known/mcp.json | jq .

If all three calls return 404, layer 4 is not built. Today that is still normal; in twelve months it will be a competitive disadvantage.

Who is affected?

The honest answer: anyone whose business depends on being visible in the search behaviour of the next three years. Cloudflare's finding (200,000 top sites, majority “not ready”) is not an outlier but the normal state. Three SME profiles crystallise:

Profile A — Visibility-on-the-web SMEs: Knowledge-driven providers (consulting, software, platform, B2B services) whose customers are already researching in Perplexity and ChatGPT. Layers 1 and 2 are urgent here; layer 3 a medium concern; layer 4 the investment for the next twelve months.
Profile B — In-house AI SMEs: Companies running Claude, Copilot or ChatGPT Enterprise internally who realise their own systems are not agent-compatible. Layer 4 is the primary lever here — as an internal platform investment, not a marketing move.
Profile C — Platform and SaaS SMEs: If you offer software or data yourself, you have to take all four layers seriously, because your customers will increasingly approach you agent-mediated. Layer 3 is decisive here: differentiated control over who sees what, for which purpose, with which auth.

If you fit none of the three profiles, you do have time. These tend to be providers whose distribution channel is neither search nor recommendation — long-standing existing customers or pure offline business. For everyone else, the verdict is operational: 2027 plans need to start in 2026.

Operator recommendation: in what order to build

Lifting four layers at once fails in nearly every SME project. Recommendation: this order, with clear breakpoints.

Operational decision block

If you have neither llms.txt nor RSS discovery today — then: start with layer 1 and an honest inventory of your content. Which pages belong on the curated map, which are technical leftovers? Only then generate llms.txt. A half-hearted llms.txt with 800 entries is worse than none.
If layer 1 is in place but your texts are essays, not chunks — then: invest in layer 2 before touching layer 3 or 4. Well-discoverable but unquotable content is the most expensive form of visibility. Reformat your top 10–20 pages into definition lists, H3-FAQ and operational decision blocks first.
If layers 1 and 2 are in place but you do not know which crawlers hit your platform — then: start layer 3 with observation, not rules. One week of access-log analysis per user agent, then a deliberately curated allowlist, then content signals. Reverse order = self-block.
If you have internal staff operating ChatGPT/Claude — then: layer 4 is the next investment regardless of the Cloudflare score. Start with the internal system that gets manually briefed most often. A WebMCP skill manifest there often saves 2–4 hours per employee per week.
If you offer SaaS or a platform yourself — then: you have to shape layer 4 as content, not only technically. Which actions are agent-friendly, which not? Which quotas? Which auth? This is a product decision, not a platform decision.

What we deliberately do not do

Blanket-block all AI crawlers because it sounds convenient — you also block the retrieval crawlers your visibility depends on.
Tackle layer 4 before layer 2, because API-building feels more attractive than content refactoring — an agent-friendly endpoint over unquotable content helps nobody.
Dump llms.txt automatically from the CMS without curated selection — it dilutes the LLM view of your platform and hurts more than it helps.

What we run at Moselwal

So this is not abstract: what we run on the four layers, as of today.

Layer 1 — Discoverability

llms.txt and llms-full.txt are generated from TYPO3 via the extension moselwal/semantic-delivery, one curated selection per site rather than full dumps. RSS discovery sits at /feed.rss with Article schema enrichment; sitemap.xml is built from the TYPO3 sitemap generator with explicit doctype configuration.

Layer 2 — Content model

Every blog post follows a fixed spine: Hero → TL;DR with definition list → TOC → H2 sections with H3 substructures → FAQ container with retrievable cards → menu_pages cross-links → CTA. That is not marketing layout but retrievability engineering. Quick-check snippets and operational decision blocks are anchored as fixed building blocks in our editorial guide.

Layer 3 — Bot access control

Caddy with CrowdSec WAF at the edge, a curated allowlist of around 12 crawlers (search engines, retrieval, archives), Content Signals Protocol in robots.txt with differentiated values for ai-train (forbidden without licence), search (allowed) and ai-input (allowed with quotas). Logs are reviewed weekly per user agent.

Layer 4 — API/Auth/MCP/skill discovery

We run our own MCP server for TYPO3 (see MCP server discipline). openapi.yaml is curated, not generated. /.well-known/ai-plugin.json points to the spec. Skill manifests follow once the next WebMCP iteration is stable. Important: don't pull layer 4 forward out of enthusiasm — only when 1–3 are solid.

Diagnostics and self-observation

We measure our own Cloudflare score regularly (100/100 Level 5 Agent-Native, as of 19 April 2026) and build our SearchOps monitoring on top. Building layers 1–4 without measuring state is blind work. A quarterly recheck is the minimum.

Conclusion

The Cloudflare score is not a test you have to pass. It is a diagnostic map that shows where you stand today. If you want to build an AI-ready platform, you don't build the score — you build the four layers beneath it: discoverability so agents find you, content model so you get quoted, bot access control so you keep negotiating power, API/MCP so your platform becomes agent-operable.

For SMEs the operational message is clear: not all four at once, but in order with real breakpoints. Visibility-driven providers begin with layers 1 and 2; in-house AI users with layer 4 on the internal system; SaaS providers have to understand layer 3 as a product decision, not an infrastructure detail.

GEO instead of classical SEO is not a marketing slogan. It is the acknowledgement that search mediation and answer mediation are two different machines, each demanding its own platform architecture. Whoever does not start in 2026 is building the competitive disadvantage of 2028.

Frequently asked questions about AI-ready platforms

Is a sitemap.xml enough for LLM crawlers?+

No. A sitemap.xml serves classical search crawlers. LLM crawlers operate with two profiles: training and retrieval. A curated llms.txt plus llms-full.txt is therefore additionally required. Without that layer, your platform is practically invisible to retrieval providers like Perplexity or Brave Search.

What's the difference between ai-train and ai-input?+

ai-train controls whether your content may be used to train new models. ai-input controls whether it may be retrieved to answer specific user questions (retrieval). For most SMEs the default is: deny ai-train without a licence, allow ai-input, because visibility in answer generation depends on it.

Do we really need layer 4 (MCP/skills) already in 2026?+

Externally not yet — internally often immediately. If your staff operate Claude, ChatGPT Enterprise or Copilot, they are building workarounds every day without MCP manifests. An internal skill endpoint on the system that gets most manually briefed often saves two to four hours per employee per week. Externally, layer 4 becomes a competitive factor between 2027 and 2028.

Is it enough to add FAQ schema and HowTo schema?+

Schema.org markup is a claim about content; chunkability sits in the content itself. If the FAQ section hides long essay paragraphs under its questions, the schema helps nobody. Layer 2 requires real structural changes: definition lists, H3-FAQ with question as heading, short clear answers, quick-check snippets. Schema remains useful as an amplifier, not a substitute.

What does it cost to build all four layers?+

The honest answer: Layer 1 is days. Layer 2 is weeks to months, depending on the existing content base. Layer 3 is conceptually one to two weeks plus continuous observation. Layer 4 is project-shaped, because it is a product, not a platform decision. Attempting all four in a single quarter burns money. Distributing across 12 months with measurements in between works cleanly.

How do I measure whether my platform is AI-ready?+

The Cloudflare diagnosis check at isitagentready.com is a good first indication. Add: weekly access-log analysis by user agent, quarterly review of the llms.txt selection against actual content, monthly spot checks in Perplexity/ChatGPT/Claude whether your platform appears as a source for your core topics. Keeping a platform AI-ready is an ongoing effort, not a project.

[Translate to English:] 100 von 100 Punkten im AI Agent Test

Level 5 Agent-Native: 100/100 on Cloudflare's Check

Weiterlesen →

Broadly executable MCP servers: what the Anthropic SDK chain means for AI agents in the mid-market

Weiterlesen →

Make your platform AI-ready — with plans, not buzzwords

We build TYPO3 and Kubernetes platforms that stay visible in the LLM world. Four layers, clear order, measurable diagnostics. If you want to know where your platform stands and which layer to tackle first, talk to us.

Talk to us →

What makes a platform AI-ready — four layers beneath the Cloudflare score

TL;DR

Layer 1 — Discoverability: so agents know you exist

What classical SEO forfeits here

What we do at Moselwal

Quick check for your platform

Layer 2 — Content model: so you get quoted, not just indexed

What “retrievable” means, concretely

Why markdown matters more

What classical SEO gets half-right

Cross-link to the operational practice

Layer 3 — Bot access control: allow, deny, differentiate

Content Signals Protocol as operational reality

Why allowlist instead of denylist

Quick check for your platform

Layer 4 — API/Auth/MCP/skill discovery: operable, not just readable

Four building blocks for layer 4

Why this is no hype for SMEs

Quick check for your platform

Who is affected?

Operator recommendation: in what order to build

Operational decision block

What we deliberately do not do

What we run at Moselwal

Layer 1 — Discoverability

Layer 2 — Content model

Layer 3 — Bot access control

Layer 4 — API/Auth/MCP/skill discovery

Diagnostics and self-observation

Conclusion

Frequently asked questions about AI-ready platforms

Related articles

Level 5 Agent-Native: 100/100 on Cloudflare's Check

Broadly executable MCP servers: what the Anthropic SDK chain means for AI agents in the mid-market

Six packages, one cluster, one message: the EVM/DeFi npm wave of 6 May 2026

AI-Ready CMS

AI-Ready Commerce as a Service