NVIDIA Nemotron 3 Super Sets New Standard for Agentic AI with 1M Context, Hybrid MoE, and Open Weights
NVIDIA's Nemotron 3 Super shatters context and throughput barriers for agentic AI, while new research benchmarks reveal both the promise and limits of automated research agents and knowledge graph RAG. Dive into the architectures, numbers, and what’s production-ready now.
One-Line Summary
NVIDIA’s Nemotron 3 Super brings agentic AI into production, offering 5x higher throughput and a 1-million-token memory window for complex, multi-agent workflows.
LLM & SOTA Models
NVIDIA Nemotron 3 Super: 1M-Token Context and 5x Throughput for Agentic AI
NVIDIA has unveiled Nemotron 3 Super, a 120-billion-parameter large language model (LLM) designed specifically for agentic AI—systems where multiple AI agents collaborate or handle long, complex tasks. What sets Nemotron 3 Super apart is its massive 1-million-token context window, allowing it to keep the entire history of a workflow “in memory.” This is a big leap from typical LLMs, which often struggle to remember more than a few thousand tokens, resulting in agents that forget instructions or lose track of goals. 1
The model’s architecture is a hybrid mixture-of-experts (MoE), meaning only 12 billion of its 120 billion parameters are active at any time. This “sparse activation” approach keeps compute costs low while maintaining high accuracy. Innovations like Mamba layers (for efficient memory and compute), Latent MoE (activating multiple specialists at inference for the cost of one), and multi-token prediction (generating several words at once) enable up to 5x higher throughput and up to 2x higher accuracy compared to previous Nemotron models. On NVIDIA’s Blackwell GPUs, the model runs up to 4x faster than on Hopper chips, thanks to NVFP4 precision. 2
Nemotron 3 Super is open-weight and comes with over 10 trillion tokens of training data, 15 reinforcement learning environments, and full evaluation recipes. This openness means developers can fine-tune or deploy the model on their own infrastructure, whether on-premises or in the cloud. Early adopters include Perplexity (for search), CodeRabbit (for coding agents), and large enterprises like Siemens and Palantir. The model is optimized for real-world agentic applications: for example, a software agent can load an entire codebase into context for debugging, or a finance agent can process thousands of pages of reports in one go. 1
The release signals a shift: LLMs are moving beyond chatbots to become engines for automation, orchestration, and multi-step reasoning—especially in environments where context and memory are critical. 3
Nemotron-Cascade 2: Smaller, Smarter Reasoning
NVIDIA also introduced Nemotron-Cascade 2, a 30-billion-parameter MoE model with only 3 billion active parameters. Cascade 2 is tuned for mathematical reasoning, coding, and instruction following, outperforming other models of similar size on tasks like the International Mathematical Olympiad and competitive coding benchmarks. Its “cascade reinforcement learning” pipeline and multi-domain distillation make it highly sample-efficient and specialized for reasoning-intensive domains. 4
Research Papers
AI Research Agents for Machine Learning: Search, Exploration, and Generalization in MLE-bench
A new study from Meta and collaborators explores how AI agents can automate machine learning research, not just by generating code, but by searching, experimenting, and improving solutions over time. The researchers formalize agents as search algorithms that use “operators” (like draft, debug, improve) and different search policies (greedy, Monte Carlo Tree Search, evolutionary). Their best combination boosts the success rate of winning a Kaggle medal from 39.6% to 47.7% on the MLE-bench Lite benchmark. 5
A key insight: the choice of operators (the actions the agent can take) is often a bigger bottleneck than the search strategy itself. By designing smarter operators and pairing them with advanced search methods, the agents can generalize better and avoid overfitting. The study also introduces AIRA-dojo, a customizable framework for benchmarking research agents. 5
Another finding: agents often overfit to validation metrics, which can mislead their search. Selecting final solutions based on test scores (not just validation) could improve real-world performance by up to 13 percentage points. 5
Open Source & Repos
AgenticSciML: Multi-Agent Evolution for Scientific Discovery
AgenticSciML is an open-source framework that coordinates multiple LLM agents to propose, critique, and refine scientific experiments. Instead of a single agent, it uses a “swarm” of specialized roles (proposer, critic, engineer, etc.) and routes most tasks to fast, cheap models, reserving powerful models for creative reasoning. This approach can discover novel solutions and keep costs low—about $0.05–$0.50 per evolutionary generation. It’s especially suited for scientific computing projects in JAX, quantum cognition, and robotics. 6
Bilevel Autoresearch: Self-Improving Research Loops
Bilevel Autoresearch is a meta-framework where an inner loop optimizes a task (like tuning hyperparameters), and an outer loop invents new mechanisms for the inner loop—sometimes by generating new Python code. Experiments show that letting the outer loop autonomously discover mechanisms like Tabu Search or Multi-Scale Bandit can improve optimization performance by up to 5x over standard approaches. 7
Why It Matters
NVIDIA’s Nemotron 3 Super is a milestone for agentic AI: it makes it practical to build agents that can remember and reason across huge workflows, not just answer chat prompts. This opens the door to automation in fields like software development, finance, and scientific research, where context and coordination are key. Meanwhile, research on AI agents and open frameworks shows that the future of AI is not just about bigger models, but smarter, more collaborative systems that can evolve and adapt. 1
Comments (0)