SDS 976: NVIDIA’s Nemotron 3 Super: The Perfect LLM for Multi-Agent Systems

NVIDIA just dropped Nemotron 3 Super, a 120-billion-parameter open-weight model that only activates 12 billion parameters at a time and it’s built for the agentic AI era. In this Five-Minute Friday, Jon Krohn breaks down the model’s hybrid Mamba-Transformer architecture, its million-token context window, and why its combination of frontier-class reasoning with blazing-fast throughput matters for anyone building multi-agent systems. Find out how Nemotron 3 Super claimed the #1 spot on the DeepResearch Bench leaderboards, which companies are already adopting it, and where you can start using it today.

Interested in sponsoring a Super Data Science Podcast episode? Email natalie@superdatascience.com for sponsorship information.

In this Five-Minute Friday, Jon Krohn dives into NVIDIA’s Nemotron 3 Super, announced at the GTC conference. The model uses a Mixture-of-Experts (MoE) architecture that gives it the knowledge capacity of 120 billion parameters while only activating 12 billion at inference time, a massive efficiency win. But what sets it apart is its hybrid backbone: it combines Mamba layers (which process sequences in linear time) with traditional transformer attention layers for precise information retrieval, delivering a practical one-million-token context window. NVIDIA also introduced a novel technique called LatentMoE, which compresses tokens before routing them to experts, allowing four times as many specialists to weigh in on each prediction at the same cost. Add Multi-Token Prediction for up to 3x inference speedup, and the throughput numbers are impressive, up to 7.5x faster than comparably sized models.

Jon explains why this matters for multi-agent AI: these systems face two bottlenecks – “context explosion” (where workflows generate up to 15x more tokens than standard chat) and the “thinking tax” (where reasoning at every step becomes too slow and expensive). Nemotron 3 Super is designed to tackle both. The model currently powers NVIDIA’s AI-Q research agent to the #1 position on the DeepResearch Bench and DeepResearch Bench II leaderboards, and has claimed the top spot on Artificial Analysis for efficiency and openness in its size class.

On the openness front, NVIDIA went beyond releasing weights: they’re publishing over 10 trillion tokens of training data, 15 reinforcement learning environments, and their full evaluation recipes via the open-source NeMo Gym library. Adoption is already underway, with Perplexity, CodeRabbit, Greptile, Siemens, Palantir, and Cadence among the early integrators. Listen to the episode to hear where you can access the model today and Jon’s take on what it signals about the future of agentic AI.

ITEMS MENTIONED IN THIS PODCAST:

DID YOU ENJOY THE PODCAST?

Download The Transcript

Podcasts SDS 976: NVIDIA’s Nemotron 3 Super: The Perfect LLM for Multi-Agent Systems

Podcast Transcript

Share on

Related Podcasts

June 26, 2026

June 23, 2026

June 19, 2026

Podcasts SDS 976: NVIDIA’s Nemotron 3 Super: The Perfect LLM for Multi-Agent Systems

Share