14. June 2026 11 min read

Kai Ole Hartwig

Open Knowledge Format: Google turns agent knowledge into Markdown — and we've adopted it in the Moselwal Handbook

Q: How stable is v0.1 — can I build on it?

It is a starting point. Versioning follows major.minor: minor bumps are backward-compatible additions, major bumps may break. For a first production use in your own repo it is workable; for an organisation-wide strategy we would watch adoption beyond Google.

Q: What does the required type field mean in practice?

Every concept document must carry a type in its frontmatter, e.g. BigQuery Table, Metric or Playbook. These values are not centrally registered; consumers must tolerate unknown types as generic concepts. It is the only hard requirement of the specification.

Q: Is OKF the same as AGENTS.md or CLAUDE.md?

It is the generalisation of them. AGENTS.md/CLAUDE.md and Obsidian vaults are exactly the patterns OKF formalises. The difference is that OKF pins down the small set of conventions (required type field, cross-link rules, reserved filenames) that make those patterns interoperable with each other.

14 June 2026. On 12 June Google Cloud unveiled the Open Knowledge Format (OKF) — an open specification that represents the knowledge AI agents need as a plain directory of Markdown files with YAML frontmatter. No SDK, no runtime, no proprietary catalog schema. We see it as the most sober, and therefore most interesting, answer yet to the fragmentation of agent knowledge — and we have already adopted OKF in our own handbook.

What happened

On 12 June the Google Cloud Data Cloud team published the Open Knowledge Format in version 0.1. OKF formalises a pattern that has surfaced under many names over the past year — Andrej Karpathy's “LLM wiki”, the AGENTS.md/CLAUDE.md convention files, Obsidian vaults wired to coding agents, “metadata as code” repositories inside data teams. The specification is deliberately small: Google says it fits on a single page, and at its core it requires exactly one thing — that every knowledge document carries a type field in its frontmatter. Everything else is guidance. It ships with reference implementations (an enrichment agent that walks a BigQuery dataset and writes one OKF document per table, plus a standalone HTML visualizer) and three ready-made sample bundles. Spec, code and examples are open on GitHub; Google's own Knowledge Catalog can already ingest OKF.

The problem: knowledge lives everywhere and nowhere

The knowledge a model needs for usable answers is overwhelmingly internal knowledge: the schema of a table, the house-specific meaning of a metric, the runbook for an incident, the join paths between two systems, the deprecation notice for an old API. In most organisations this knowledge sits in mutually incompatible silos: metadata catalogs with their own API, wikis and shared drives, code comments and notebook cells — and in the heads of a few senior people.

When an agent asks “How do I compute weekly active users from our event stream?”, it has to assemble the answer from these scattered surfaces. Every vendor brings its own catalog, its own SDK, its own knowledge-graph schema, and none of it is portable. The result is threefold waste: every agent builder solves the same context problem from scratch, every catalog vendor reinvents the same data models, and the knowledge stays locked behind the surface that produced it.

How OKF works

An OKF bundle is a directory tree of Markdown files. Each file is a “concept” — a unit of knowledge that can be anything: a table, a dataset, a metric, a playbook, an API. The file path is the concept's identity; tables/orders.md becomes the concept ID tables/orders. Each document has two parts.

Frontmatter and body

At the top sits a YAML block for the few fields you want to query, filter or index — type (required), plus the recommended title, description, resource, tags, timestamp. Below it a Markdown body holds everything humans and agents actually read: schema tables, example queries, join descriptions. Conventional headings such as # Schema, # Examples and # Citations carry recommended meaning but are not enforced.

Graph, not just tree

Concepts link to one another with ordinary Markdown links, preferably bundle-relative with a leading slash (/tables/customers.md). The directory thus becomes a graph of relationships richer than the parent/child hierarchy of the file system. Optional index.md files enable “progressive disclosure” — an agent navigates the hierarchy level by level instead of loading the whole bundle into context. Optional log.md files keep a chronological change history.

Conformance

The conformance bar is deliberately low: a bundle is OKF 0.1 conformant if every non-reserved file has parseable YAML frontmatter with a non-empty type field and the reserved files (index.md, log.md) follow their structure. Consumers must be tolerant: unknown type values, missing optional fields, extra frontmatter keys and even broken cross-links must not be grounds to reject a bundle. This permissive consumption model is the real trick — it keeps the format usable while bundles grow, get refactored and are partly generated by agents.

Reading it: a format, not a platform

The core of OKF is a deliberate shift: the answer to fragmented knowledge is not yet another knowledge service but a format. That is the same architectural separation we know from the protocol camp. The Model Context Protocol (MCP) standardises how an agent addresses tools and data sources; A2A standardises how agents talk to each other. OKF sets a layer beside them: it standardises the form in which the curated knowledge itself exists that these agents consume. Protocols move context; OKF describes what the context looks like at rest.

The decisive property is producer/consumer independence. A hand-written bundle can be read by an agent; a bundle produced by an export pipeline can be browsed in a visualizer; a bundle synthesised by one LLM can be queried by another. The format is the contract, the tooling at each end is independently swappable. That property is exactly what the bespoke wiki patterns OKF replaces lack: Karpathy's wiki, the team wiki and a vendor's catalog export all look alike, but none were ever designed to cooperate.

What it means for the Mittelstand

For mid-sized firms the practical lever is bigger than the sober specification suggests. The most expensive form of knowledge in the Mittelstand is the knowledge in the heads of a few key people — the one developer who knows why this table is named the way it is, and the one colleague who knows what a metric really means. OKF turns this implicit knowledge into a versionable artefact that lives next to the code and that an agent can read without anyone buying a catalog product. Knowledge curation becomes a normal software-engineering activity: pull request, diff, review.

The sovereignty axis is the second point, and for the DACH region it is no footnote. An OKF bundle is a folder. It can be shipped as a tarball, hosted in any git repo, mounted on any filesystem — and so by default it sits where you already hold data control. No proprietary API stands between you and your metadata, and switching the consuming model or agent changes nothing about the knowledge files. Anyone building AI knowledge bases today should ask whether the knowledge is trapped in a single vendor's format or in one they own themselves.

The data-protection reflex belongs right here, not in a footnote. OKF describes a form, not a hosting location. Where the bundle sits and who consumes it remains your decision — and therefore your responsibility. If an OKF bundle contains schemas, metric definitions or runbooks that expose personal or business-critical information, then which agent loads which bundle in which region into context is a question for your data protection officer. The format lowers the technical hurdle; it does not remove the duty to know the data flow. The advantage over a cloud catalog is precisely that with a pure file bundle you control that data flow entirely yourself.

What it means for technical development

Architecturally OKF is evidence of a movement running across the whole agent stack: away from heavyweight, vendor-bound services, toward narrow, open contracts anyone can build against. MCP, A2A and now OKF share the same stance — fix the interoperability point, leave the content model and the tooling open. For your own architecture the consequence is to keep the knowledge layer as swappable as the model layer: knowledge as files, versioned next to the code, rather than as an entry in a service you can only reach through its SDK.

The honest limit belongs in the picture. OKF v0.1 is explicitly a starting point, not a finished standard, and it is a Google initiative — adoption beyond Google products is welcomed but not yet proven. The format solves representation, not the hard follow-on questions: how do you keep a bundle current without it drifting? How do you stop an agent from taking stale or poisoned knowledge as truth? The specification deliberately says nothing on these. What it does not show does not belong in our expectations either — but it does belong on the watch list.

What we have actually done

We are not discussing OKF from a distance. We have already adopted the format in our Moselwal Handbook — the growing, versioned knowledge base with which we are building Moselwal into an AI company built on agentic engineering. The idea of a “queryable organization” from that reflection gets a concrete file format in OKF: concepts with type frontmatter, bundle-relative cross-links, index.md for progressive navigation. Our handbook is publicly viewable at gitlab-profile-85e749.pages.moselwal.io.

What convinced us about the step is precisely its lack of attachment to a product. We had nothing to buy and nothing to integrate. The handbook already sat as Markdown in git; OKF gave the few conventions a name, so that the same files can in future be consumed by different agents without a translation layer. That is the difference between “we have a wiki” and “our knowledge is machine-retrievable” — and it cost us an afternoon, not a migration project.

Frequently asked questions about OKF

How stable is v0.1 — can I build on it?+

It is a starting point. Versioning follows major.minor: minor bumps are backward-compatible additions, major bumps may break. For a first production use in your own repo it is workable; for an organisation-wide strategy we would watch adoption beyond Google.

Is OKF suitable for mid-market knowledge management, not just data catalogs?+

Yes. A concept can be any unit of knowledge, including a playbook or a business process without a technical resource. That is exactly how we use it in the Moselwal Handbook. The appeal is that the same bundle is read by humans and queried by agents, with no double maintenance.

What does the required type field mean in practice?+

Every concept document must carry a type in its frontmatter, e.g. BigQuery Table, Metric or Playbook. These values are not centrally registered; consumers must tolerate unknown types as generic concepts. It is the only hard requirement of the specification.

Is OKF the same as AGENTS.md or CLAUDE.md?+

It is the generalisation of them. AGENTS.md/CLAUDE.md and Obsidian vaults are exactly the patterns OKF formalises. The difference is that OKF pins down the small set of conventions (required type field, cross-link rules, reserved filenames) that make those patterns interoperable with each other.

How does OKF relate to MCP and A2A?+

Complementary. MCP governs an agent's access to tools and data, A2A the communication between agents. OKF describes the form of the curated knowledge that agents consume. The protocols move context; OKF describes the context at rest.

Do I need Google Cloud or BigQuery to use OKF?+

No. OKF is vendor-neutral and tied to no cloud, database, model or agent framework. The bundled enrichment agent uses BigQuery and Gemini as a reference implementation, but the format itself is “just Markdown plus YAML”. A bundle is created in any text editor and lives in any git repo.

One concrete recommendation

In this order. First, read the specification — it is short and grasped in one sitting. Second, take an existing knowledge source that is already in Markdown (a team wiki, a README collection, a runbook folder) and add the type frontmatter to the documents — that is the entire entry point, not a migration project. Third, with your data protection officer check which content may go into an agent-readable bundle at all and which agent consumes it in which region, before sensitive knowledge moves into the bundle. Fourth, keep the knowledge layer deliberately vendor-neutral: bundle in your own git, consumption via swappable agents, no binding to a single catalog product. This article reflects our technical and strategic assessment. It is not legal advice and not a data protection impact assessment.

Sources

Google Cloud Blog: “Introducing the Open Knowledge Format” / “How the Open Knowledge Format can improve data sharing” (12 June 2026)
Open Knowledge Format v0.1 — specification (SPEC.md) and reference implementation (GitHub, GoogleCloudPlatform/knowledge-catalog) (accessed 14 June 2026)
Andrej Karpathy: “LLM Wiki” (gist) — the pattern OKF formalises (accessed 14 June 2026)
Kai Ole Hartwig: “Rebuilding Moselwal as an AI company” — track-2 background on the queryable organization (6 May 2026)

Building an AI knowledge base?

Request an architecture check →

We move your knowledge into a portable, vendor-neutral format — an OKF bundle in your own git, consumed via swappable agents, with the data flow under your own control.

About the author

Kai Ole Hartwig

Founder · Moselwal Digitalagentur · OnlyOle

Programming since 2002 – self-taught, set up my own business with KO-Web in 2012, now Moselwal. Over 100 projects, with a focus on security, performance, automation and quality.

LinkedIn · GitHub · Blog · OnlyOle