12. June 2026 5 min read

Kai Ole Hartwig

AI-Ready Without Vendor Lock-in

The next major lock-in risk isn’t called AWS or Azure. It’s called OpenAI.

TL;DR — the 90-second summary

AI dependencies don’t only arise through infrastructure — they arise through proprietary APIs, non-portable embeddings, and model-specific prompting techniques. Building AI features that only work with a single provider repeats the cloud lock-in mistake at a new layer. Portable AI architecture is solvable — if you set the right course early.

The New Lock-in Pattern

Cloud lock-in is well known: proprietary services, no export paths, prohibitive migration costs. The industry has learnt from this — at least partially.

AI lock-in follows the same pattern, only at a new layer. And this time it moves faster, because the integration runs deeper: your application logic talks directly to the OpenAI API. Your embeddings were generated with the Ada-002 model — not portable to other embedding models without complete re-indexing. Your prompts exploit model-specific quirks of GPT-5.5. Your user data flows through the infrastructure of a US company that can unilaterally change its terms of service.

These are real dependencies — and they grow with every feature integration.

Where AI Dependencies Form

Layer 1: API dependency — The most direct lock-in: application code hard-coded against a proprietary API endpoint. Every provider switch requires code changes.

Layer 2: Model prompting dependency — Some prompting techniques work well with one model and poorly with others. Teams that invest deeply in model-specific quirks have built implicit lock-in.

Layer 3: Embedding dependency — Embeddings are vectors generated by a specific model. They are not directly portable between different embedding models.

Layer 4: Data flow dependency — Every request routed through an external AI provider transmits data to that provider’s infrastructure. For sensitive content, this is a compliance question, not just a cost question.

Portable AI Architecture: The Four Layers

Layer 1: API abstraction — No application code talks directly to an AI provider. Instead there is an internal abstraction — an interface that encapsulates model-specific details. Switching from OpenAI to another provider becomes a configuration change, not a code change: Application → AI interface → [OpenAI | Anthropic | Ollama | vLLM]

Layer 2: Model-agnostic prompting — Prompts based on concepts, not model-specific tricks. Clear instructions in natural language work better with any capable model than model-specific formatting magic.

Layer 3: Portable embeddings — Choose an embedding strategy that makes re-indexing possible and plannable. Practically: store the embedding model and version explicitly in the vector database metadata. Migration is then a batch job, not an emergency. Alternatively: choose embedding models available across multiple providers (e.g. text-embedding-3-small is also available via Azure; many open-weights models like nomic-embed-text run locally).

Layer 4: Data sovereignty — For use cases with sensitive data: local inference. Ollama, vLLM, llama.cpp — open-weights models (Llama 4, Mistral Large 3, Qwen 3) run on-premise and are GDPR-compatible without additional conditions.

OpenAI-Compatible APIs as Abstraction Layer

The de-facto standard for LLM APIs is the OpenAI API. That is helpful: most alternatives implement the same API.

Ollama: local inference, OpenAI-compatible
vLLM: high-performance local inference, OpenAI-compatible
Mistral AI, Together AI, Fireworks AI: cloud alternatives with OpenAI-compatible API
Azure OpenAI: Microsoft-hosted, EU data residency possible

Building against the OpenAI API format means you can switch with minimal configuration changes in most cases — if the abstraction is clean.

This is not complete protection (model behaviour differs), but it eliminates layer-1 lock-in.

Data Sovereignty in the AI Context

Every request to an external AI API transmits data. For public content, this is often acceptable. For customer data, internal documents, personal information, and strategic analyses, it is a trade-off that must be made explicitly.

The solution is not necessarily full localisation. It is clear data classification: which data may reach external AI APIs? Which data stays on-premise? Which use cases require local inference? These questions should be answered before the first feature goes live — not after.

When On-Premise Inference Makes Sense

Not every use case needs local inference. But there are clear scenarios where it makes sense: processing sensitive documents (contracts, HR data, internal financial data), latency-critical applications (real-time assistants accessing internal data), cost optimisation at high volume (beyond a threshold, own infrastructure is cheaper than API calls), and compliance requirements (industries with strict data localisation requirements).

Open-weights models are today performant enough for most mid-market use cases. Llama 4, Mistral Large 3, Qwen 3 — the quality gap to proprietary frontier models is closing continuously.

Frequently asked questions about AI architecture and lock-in

Wie viel Aufwand bedeutet eine saubere API-Abstraktion?+

Bei Neuentwicklungen: wenige Stunden, wenn von Anfang an eingeplant. Bei bestehenden Integrationen: hängt von der Tiefe der proprietären API-Nutzung ab. Je früher, desto günstiger.

Reicht es, die OpenAI-API durch Azure OpenAI zu ersetzen?+

Das löst die Datenflusskontrolle (EU-Hosting möglich), nicht aber den Modell-Lock-in. Es ist ein erster Schritt, keine vollständige Strategie.

Was ist mit Fine-Tuning?+

Fine-Tuning erzeugt den tiefsten Lock-in, weil das trainierte Modell proprietär ist. Open-Weights-Modelle ermöglichen Fine-Tuning, das ihr besitzt und portieren könnt. Wenn Fine-Tuning nötig ist, bevorzugt Open-Weights.

Muss ich für Lock-in-Freiheit auf Leistung verzichten?+

Nein. OpenAI-kompatible Alternativen und Open-Weights-Modelle sind für die meisten Business-Anwendungsfälle performant genug. Die Qualitätslücke zu Frontier-Modellen schrumpft jedes Quartal.

Conclusion

AI lock-in is the cloud lock-in of the 2020s — only faster, because integrations run deeper and dependencies are harder to see. Those building AI features now have a choice: portable architecture with some upfront effort, or proprietary dependency with growing repair costs.

The decision isn’t made at the first feature. It’s made at the question of whether there’s an interface.

Currently building AI features?

Request an Architecture Check →

We’ll identify where lock-in forms — before it gets expensive.

Home

Weiterlesen →