Nvidia releases open 120B-parameter Nemotron 3 Super for agentic AI
The open‑weight model claims up to 5x higher throughput and a 1M‑token context window to keep multi‑agent workflows on track. Nvidia also adds a unified multimodal model as investors pour $2B into China’s Moonshot AI.
One-Line Summary
Nvidia pushes open agentic AI models with long context and multimodal capabilities, while investors back open-weight challengers like Moonshot AI with $2B.
Big Tech
Nvidia unveils 120B-parameter Nemotron 3 Super for agentic AI
Nvidia’s Nemotron 3 Super is an open‑weight AI model with 120 billion parameters (12 billion active at inference) designed to power complex, multi‑agent applications with higher speed and accuracy. It targets real‑world agents that plan, call tools, and complete tasks, and Nvidia says it delivers up to 5x higher throughput than the prior Nemotron Super. 1
To address “context explosion” and the “thinking tax” in multi‑agent workflows, the model offers a 1‑million‑token context window so agents can retain full state and avoid goal drift across long tasks. Nvidia also cites leaderboard results, including the top spot on Artificial Analysis for efficiency and openness and the AI‑Q research agent ranking No. 1 on DeepResearch Bench and DeepResearch Bench II. 1
Under the hood is a hybrid mixture‑of‑experts design that blends Mamba and transformer layers; only 12B of 120B parameters are active per step, with a latent MoE technique and multi‑token prediction to speed generation. On Blackwell GPUs, running in NVFP4 cuts memory needs and pushes inference up to 4x faster than FP8 on Hopper with no accuracy loss, and Nvidia reports up to 2x higher accuracy versus the previous Nemotron Super. 1
Nvidia releases the model with open weights under a permissive license plus a full methodology, including over 10 trillion tokens of training and post‑training data and 15 reinforcement‑learning environments. Early adopters range from Perplexity and software‑agent vendors (CodeRabbit, Factory, Greptile) to enterprise platforms (Amdocs, Palantir, Cadence, Dassault Systèmes, Siemens). It’s available via build.nvidia.com, Perplexity, OpenRouter and Hugging Face, and is packaged as a NIM microservice with distribution across partners like Google Cloud Vertex AI and Oracle Cloud Infrastructure. 1
Industry & Biz
Moonshot AI raises $2B at $20B valuation on open-weight demand
Moonshot AI, the Beijing lab behind the Kimi series of open‑weight large language models, raises about $2 billion at a $20 billion valuation. The round is led by Meituan’s Long‑Z Investment, with Tsinghua Capital, China Mobile and CPE Yuanfeng participating, per a post from Huafeng Capital cited by TechCrunch. 2
Investor appetite for Chinese open‑weight models is growing as some users accept slightly lower top‑end performance in exchange for cheaper inference. Moonshot’s Kimi K2.5 impressed on coding benchmarks earlier in 2026, and its latest Kimi K2.6 is the second‑most used LLM on OpenRouter by usage rank. 2
The company’s business traction includes annual recurring revenue topping $200 million in April, driven by paid subscriptions and API usage. Huafeng Capital also notes Moonshot raised $3.9 billion over the past six months, following a late‑2025 valuation of $4.3 billion and early‑2026 valuation of $10 billion after a $700 million raise. 2
New Tools
Nvidia debuts Nemotron 3 Nano Omni for multimodal agents
Nemotron 3 Nano Omni is an open multimodal model that combines vision, audio and language in one system so agents can reason across video, images, audio and text without juggling separate models. Nvidia positions it as a faster, more accurate path to production multimodal agents, citing leaderboard wins in document intelligence and audio‑video understanding. 3
By integrating vision and audio encoders inside a 30B‑A3B hybrid MoE architecture, it reduces latency from repeated handoffs and achieves 9x higher throughput versus other open omni models with similar interactivity. Early users like H Company report practical gains such as quickly interpreting full‑HD (1920×1080) screen recordings for real‑time computer‑use agents. 3
The model ships with open weights, datasets and training techniques, is available on Hugging Face, OpenRouter and build.nvidia.com as a NIM microservice, and can run from local Nvidia Jetson and DGX systems to data centers and major clouds. Enterprises including Aible, ASI, Eka Care, Foxconn, Palantir and others are adopting or evaluating it for perception, document intelligence and audio‑video workflows. 3
What This Means for You
If your team prototypes agents for research, operations or support, Nemotron 3 Super’s 1M‑token context and MoE efficiency point to fewer re‑prompts, less orchestration glue and more stable long‑running tasks (e.g., loading entire codebases or multi‑hundred‑page reports). That can compress cycle times and reduce failure modes like “goal drift.” 1
If your workflows span video, screen recordings or mixed documents, a unified multimodal stack like Nemotron 3 Nano Omni can cut latency and improve accuracy by avoiding vision/audio model handoffs. This is especially relevant for computer‑use agents, compliance document analysis and contact‑center quality review. 3
Open weights plus NIM packaging expand deployment choices — from local pilots to regulated, on‑prem environments — while keeping a path to cloud scaling. For non‑developer teams, that means faster POCs under existing data‑governance rules, then controlled expansion once value is proven. 1
Moonshot AI’s $2B raise signals growing enterprise interest in open‑weight options alongside proprietary APIs. Expect model portfolios: pairing open models for cost‑efficient inference with premium models for the hardest reasoning, evaluated case‑by‑case by accuracy, latency and total cost. 2
Action Items
- Try Nemotron 3 Super on a long document task: Use OpenRouter or Hugging Face to load a 50–100‑page report and test summarization plus follow‑up Q&A in one session.
- Pilot a multimodal workflow with Nemotron 3 Nano Omni: Upload a PDF with charts plus a short screen recording, and evaluate how quickly it answers questions that require both.
- Baseline your agent costs on long contexts: Run the same multi‑step task with your current setup and with a single long‑context run; record tokens and elapsed time to compare.
- Plan a governed pilot with IT: Review Nemotron 3 Super’s license and NIM deployment options, then schedule a 30‑minute session to choose on‑prem or cloud for a 2‑week trial.
Comments (0)