NVIDIA's Nemotron 3 is a family of open models with a hybrid Mamba-Transformer MoE architecture, 1-million-token context, and full transparency of weights, data, and training recipes — a direct challenge to the assumption that the best AI must be proprietary.
Introduction
For most of AI's recent history, the most capable models have been locked behind proprietary walls — accessible only through rate-limited APIs, opaque pricing, and black-box architectures. NVIDIA's Nemotron 3 challenges that paradigm head-on.
Announced on December 15, 2025, Nemotron 3 is a family of open models, data, and libraries designed to power transparent, efficient, and specialized agentic AI development across industries. It isn't just a collection of impressive benchmarks — it is an architectural rethinking of what open models can be, and it may very well be the most consequential open AI release since Meta's LLaMA.
The Three-Tier Family: Nano, Super, and Ultra
The Nemotron 3 family consists of three models — Nano, Super, and Ultra — each delivering strong agentic, reasoning, and conversational capabilities. Each model is deliberately scoped for a different slice of the deployment spectrum.
Nano: Speed at the Edge
Nemotron 3 Nano is the smallest and most immediately available member of the family. It is a 3.2 billion active parameter model (31.6 billion total) that achieves better accuracy than the previous generation Nemotron 2 Nano while activating less than half of the parameters per forward pass.
Speed is its calling card: on an 8K input / 16K output setting with a single H200, Nemotron 3 Nano provides inference throughput that is 3.3x higher than Qwen3-30B-A3B and 2.2x higher than GPT-OSS-20B.
Super: Enterprise Efficiency
Nemotron 3 Super, released in March 2026, is a 120 billion total, 12 billion active-parameter model that delivers maximum compute efficiency for complex multi-agent applications such as software development and cybersecurity triaging. The model uses a hybrid mixture-of-experts architecture that delivers up to 5x higher throughput and up to 2x higher accuracy than the previous Nemotron Super model.
Ultra: Deep Reasoning
Nemotron 3 Ultra is the crown jewel of the family. With approximately 500 billion parameters and up to 50 billion active per token, it serves as a high-end reasoning engine for complex agentic workflows involving deep analysis, long-horizon planning, and strategic decision-making. NVIDIA has not yet announced a specific release date, though the variant was teased in its original announcement.
The Architecture Breakthrough: Hybrid Mamba-Transformer MoE
The single most important technical decision in Nemotron 3 is its architecture. Rather than building on the standard Transformer backbone, Nemotron 3 integrates three architectures into a single backbone:
- Mamba layers — for long-range dependencies with minimal memory overhead
- Transformer layers — for structural and logical reasoning
- MoE routing — for scalable compute efficiency
This combination solves real deployment problems. Standard Transformers have quadratic memory scaling with sequence length, which makes processing very long documents or maintaining long agent conversations prohibitively expensive. Mamba's state-space model approach handles this with linear-time complexity — which is what makes the 1 million-token context window practical rather than theoretical.
The Mixture-of-Experts component adds another dimension. Only a subset of experts is activated for each token, reducing latency and improving throughput. A 120-billion-parameter model like Super effectively behaves like a 12-billion-parameter model in terms of compute at inference time.
LatentMoE: More Experts at No Extra Cost
Super and Ultra take this further with LatentMoE. Tokens are projected into a smaller latent dimension for expert routing and computation, reducing communication costs while enabling more experts and higher accuracy per byte. The result: the model can call on 4x more experts at the same inference cost compared to a standard MoE design.
The 1-Million-Token Context Window
One of Nemotron 3's most practically significant features is its context window. With a 1-million-token context window, the model can remember more and connect information over long, multistep tasks.
To understand why this matters, consider what a million tokens means in practice:
- Software development: Load an entire codebase into context at once, enabling end-to-end code generation and debugging without document segmentation
- Financial analysis: Load thousands of pages of reports into memory, eliminating the need to re-reason across fragmented chunks
- Agent workflows: Maintain a full day's worth of agent conversation history without re-summarization
The benchmark data backs this up: Nemotron 3 Nano maintains 87.5% accuracy at 64K tokens and 70.56% at 512K tokens, while Qwen3 30B-A3B caps out at 128K with only 60.69% accuracy. The context window holds up under real conditions.
Reinforcement Learning Post-Training: Teaching Models to Think
The way Nemotron 3 is trained post-pretraining sets it apart from conventional fine-tuning approaches. All models are post-trained using multi-environment reinforcement learning, enabling reasoning, multi-step tool use, and support for granular reasoning budget control.
Rather than training in a single simulated environment, the models are exposed to 15 distinct RL environments covering different reasoning types — math, code, tool use, multi-step planning, and more. Three trillion tokens of new Nemotron pretraining, post-training, and RL datasets supply the rich reasoning, coding, and multistep workflow examples.
Granular Reasoning Budget Control
One particularly clever feature is inference-time budget control. Given a user-specified budget on the maximum number of tokens to use in a thinking trace, when the model reaches the budget, a developer can append the `` token to the sequence and let the model continue based on the partial thinking trace. This gives developers fine-grained control over the accuracy-cost tradeoff in production — something that closed-source models typically don't offer.
NVFP4: Training at Unprecedented Precision Efficiency
For Super and Ultra, NVIDIA introduces NVFP4 — a 4-bit floating-point precision that runs on the Blackwell GPU architecture. This significantly cuts memory requirements and speeds up training.
Historically, reducing numerical precision came at the cost of model quality. Downstream task evaluations show that NVFP4 accuracy closely follows BF16 trajectories throughout training — meaning quality is preserved while computational cost is dramatically reduced. On NVIDIA's Blackwell GB300 GPUs, peak FP4 throughput is 3x higher than FP8 throughput, translating directly into faster, cheaper training and inference.
Openness as a Core Design Principle
Perhaps the most radical aspect of Nemotron 3 is what NVIDIA gives away alongside the model weights:
- Model weights openly released under the NVIDIA Open Model License
- Synthetic pretraining corpus — nearly 10 trillion tokens — inspectable and repurposable
- Training and post-training recipes within the Nemotron GitHub repository, enabling complete reproducibility
- NeMo Gym — 15 reinforcement learning training environments
- NeMo RL — post-training libraries
- NeMo Evaluator — evaluation tooling with step-by-step reproduction recipes
This is not the typical open-source release where weights are published but training details remain hidden. NVIDIA is releasing the entire stack.
As Jensen Huang put it at the announcement:
> *"Open innovation is the foundation of AI progress. With Nemotron, we're transforming advanced AI into an open platform that gives developers the transparency and efficiency they need to build agentic systems at scale."*
Industry Adoption: Already Moving Fast
The reception from industry has been striking. Early adopters include:
- Perplexity — offering users access to Nemotron 3 Super for search and as one of 20 orchestrated models
- CodeRabbit, Factory, Greptile — integrating the model into software development AI agents alongside proprietary models to achieve higher accuracy at lower cost
- Edison Scientific, Lila Sciences — powering agents for deep literature search, data science, and molecular understanding
- Enterprise platforms — Amdocs, Palantir, Cadence, Dassault Systèmes, and Siemens deploying and customizing the model to automate workflows
Cloud availability is broad: Nemotron 3 Super can be accessed at build.nvidia.com, Perplexity, OpenRouter, and Hugging Face. Google Cloud's Vertex AI and Oracle Cloud Infrastructure are supported, with Amazon Bedrock and Microsoft Azure coming soon.
Why Nemotron 3 Could Be a Game Changer
The efficiency-accuracy tradeoff is broken — in a good way
Every previous generation of models forced a hard choice: accuracy or cost. Nemotron 3's hybrid architecture, combined with NVFP4 precision and MoE routing, collapses that tradeoff. Enterprises no longer need frontier-scale compute to deploy frontier-grade reasoning.
It redefines what "open" means
The AI ecosystem has long debated whether open-source models can compete with proprietary ones. Nemotron 3 doesn't just argue that they can — it provides the data, recipes, and infrastructure to make it happen. Any team with access to H100 or Blackwell GPUs can now reproduce, fine-tune, and own a model of genuinely competitive quality.
Agentic AI finally has its native infrastructure
The shift from single-model chatbots to collaborative multi-agent pipelines has been underway for some time, but most models were not built with that paradigm in mind. Nemotron 3's main contribution isn't pushing the limits of single-model reasoning, but making agent-based systems more practical to deploy and scale — exactly the right problem to solve as enterprises begin deploying autonomous agents at scale.
The benchmarks are independent
Nemotron 3 Super has claimed the top spot on Artificial Analysis for efficiency and openness with leading accuracy among models of the same size. The model also powers the NVIDIA AI-Q research agent to the number-one position on DeepResearch Bench and DeepResearch Bench II leaderboards.
Privacy-first deployment becomes viable
For privacy-sensitive deployments — on-premises or sovereign AI environments — Nemotron 3 fits best because the models are open and designed to work together. Healthcare, legal, financial services, and government sectors that have been cautious about sending data to third-party APIs now have a credible alternative.
What Comes Next
The scheduled release of the Ultra model will complete the Nemotron 3 family, providing enterprises with options spanning the full spectrum of capability and efficiency requirements. Ultra's approximate 500-billion-parameter scale positions it as a deep reasoning engine — for tasks like long-horizon scientific research, strategic planning, and complex multi-step autonomous workflows that even Super isn't optimized for.
The broader Nemotron 3 ecosystem is also maturing rapidly. Tools like NeMo Gym's training environments are already being integrated by platforms like Prime Intellect and Unsloth. The Nemotron Agentic Safety Dataset is helping teams evaluate the safety of complex agent pipelines — a critical need as autonomous systems take on more consequential tasks.
Conclusion
Nemotron 3 is not a single model — it is an open platform, an architectural statement, and a direct challenge to the assumption that the best AI must be proprietary.
By combining a novel hybrid Mamba-Transformer MoE architecture, a 1-million-token context window, multi-environment RL post-training, NVFP4 precision efficiency, and full transparency of weights, data, and recipes, NVIDIA has built something the enterprise world has been waiting for: a family of models that can be owned, inspected, customized, and deployed without compromise.
As the agentic AI era accelerates, Nemotron 3 may well be the foundation on which much of it is built.
TunerLabs helps enterprises evaluate and integrate open models like Nemotron 3 into production agentic systems. Book a free AI strategy session to explore what this architecture could mean for your business.
Topics:
Master Claude Code
The complete architecture guide — Skills, Agents, Memory & the full Tools reference. Everything in one beautiful page.