5 min read
By

Nemotron 3 Ultra: NVIDIA ships the open heavyweight — and makes the sovereignty question concrete

2 June 2026. Yesterday at COMPUTEX, NVIDIA released Nemotron 3 Ultra — the largest model in the Nemotron 3 family announced in December 2025, whose weights had been promised for the first half of 2026. It is an open-weight reasoning model with around 500 billion parameters, a hybrid latent mixture-of-experts architecture and a one-million-token context window, built explicitly for agentic and multi-agent workloads. What matters is not the benchmark ranking but the licence: open weights for commercial use move the sovereignty question from theory into a concrete operations decision.

What happened

On 1 June 2026, NVIDIA shipped Nemotron 3 Ultra at COMPUTEX in Taipei — the largest tier of the Nemotron 3 family announced on 15 December 2025 (Nano ~30, Super ~100, Ultra ~500 billion parameters; some reports say 550), whose weights had been promised for the first half of 2026. The model uses a hybrid latent mixture-of-experts architecture with up to 50 billion active parameters per token, a one-million-token context window, and was trained in the 4-bit NVFP4 format on NVIDIA's Blackwell architecture. NVIDIA provides not only the weights but also training recipes, datasets and the open-source libraries NeMo Gym and NeMo RL — under the NVIDIA Open Model License, which permits commercial use. The independent benchmarking house Artificial Analysis rates Ultra as the most intelligent open US model, though it trails the Chinese Kimi K2.6.

Why it matters

What is truly notable is the combination of scale and openness. A near-frontier reasoning model of this class as an open-weight release with a commercial licence, training recipes and RL environments is not a tech demo but a strategic statement: NVIDIA positions Nemotron explicitly under the banner of “Sovereign AI” and names Europe and South Korea as regions that want to adapt open models to their own data, regulations and values. Architecturally, Ultra confirms two movements: the latent MoE design, which at a nominal 500 billion parameters activates only around 50 billion per token and keeps inference costs manageable; and model routing as a stack pattern — NVIDIA partners route hard tasks to proprietary frontier models and the bulk of the work to efficient open models like Nemotron. The open heavyweight is therefore not meant as an all-rounder but as a cost-efficient, controllable pillar in a mixed agent stack.

What it means for the Mittelstand

For the German-speaking Mittelstand, Ultra is less a model for immediate self-hosting than a signal that sharpens an overdue decision: rent or own. Anyone running AI agents exclusively through US APIs exports, with every prompt, potentially personal or business-critical data to a third country — which belongs in the record of processing activities, in the third-country check and in a discussion with the data protection officer, and in regulated sectors additionally with MaRisk and DORA (Art. 28). Open weights flip the logic: a model run as an NVIDIA NIM microservice in your own data centre or in an EU cloud keeps the data in-house — the concrete GDPR and sovereignty gain.

The honest caveat belongs here too: Ultra, at around 500 billion parameters, is a data-centre decision — running it demands Blackwell-class GPUs and platform know-how. The realistic entry point is not Ultra but Nemotron 3 Nano (around 30 billion parameters, available today via vLLM, llama.cpp or LM Studio). And mind the procurement path: consuming Nemotron through a US serverless API brings back the third-country reflex that open weights had just dissolved. Sovereignty does not arise from the licence alone, but from the hosting path.

What it means for technical development

Technically, Ultra normalises three observations. First, latent MoE plus NVFP4 training: 4-bit precision on Blackwell cuts memory and training costs far enough that large open models can be built on existing infrastructure without a meaningful loss of accuracy — the path on which open models catch up with the proprietary ones without needing their training budgets. Second, the one-million-token context window, which for multi-agent systems is less a marketing figure than an architectural precondition: long tool histories and entire codebases stay in view across multi-step runs without context “drifting”.

Third, the layer separation in the agent stack. The routing pattern NVIDIA promotes — a frontier model for the hard planning, open Nemotron for the cost-efficient bulk — maps directly onto standardisation via the Model Context Protocol (MCP) and agent-to-agent protocols: the model becomes interchangeable behind a tool and communication layer. Whoever builds their architecture cleanly against MCP and a model-agnostic routing layer can use an open model without locking into a single vendor. The bundled RL libraries (NeMo Gym, NeMo RL) are the path to specialising it for your own domain — a lever that closed APIs do not offer in this form.

One concrete recommendation

In this order. First, honestly map which agent tasks genuinely need a frontier model and which an efficient open alternative carries without loss of quality — most routine steps (summarisation, classification, retrieval) belong in the second category. Second, set up a sovereignty pilot with Nemotron 3 Nano — on-prem or in an EU cloud via vLLM or NIM — before thinking about hosting Ultra; the pilot shows whether your platform bears the load and which data actually stays in-house. Third, choose the procurement path deliberately: your own GPU or EU NIM strengthens sovereignty, US serverless reactivates the third-country reflex. Fourth, review the NVIDIA Open Model License for its specific commercial terms before productive use — “open weights” is not the same as “Apache-2.0-free”.

This article reflects our technical and strategic assessment. It does not replace legal advice or a data protection impact assessment.

Sources

About the author

[Translate to English:] Foto von Kai Ole Hartwig.

Kai Ole Hartwig

Founder · Moselwal Digitalagentur · OnlyOle

Programming since 2002 – self-taught, set up my own business with KO-Web in 2012, now Moselwal. Over 100 projects, with a focus on security, performance, automation and quality.