AI NewsBusiness

5 min read 5/8/2026

NVIDIAAgentic AINemotron 3Open modelsMoonshot AIMultimodal

Nvidia releases open 120B-parameter Nemotron 3 Super for agentic AI

The open‑weight model claims up to 5x higher throughput and a 1M‑token context window to keep multi‑agent workflows on track. Nvidia also adds a unified multimodal model as investors pour $2B into China’s Moonshot AI.

Find in this article

Reading Mode

One-Line Summary

Nvidia pushes open agentic AI models with long context and multimodal capabilities, while investors back open-weight challengers like Moonshot AI with $2B.

Big Tech

Nvidia unveils 120B-parameter Nemotron 3 Super for agentic AI

Nvidia’s Nemotron 3 Super is an open‑weight AI model with 120 billion parameters (12 billion active at inference) designed to power complex, multi‑agent applications with higher speed and accuracy. It targets real‑world agents that plan, call tools, and complete tasks, and Nvidia says it delivers up to 5x higher throughput than the prior Nemotron Super. ¹

To address “context explosion” and the “thinking tax” in multi‑agent workflows, the model offers a 1‑million‑token context window so agents can retain full state and avoid goal drift across long tasks. Nvidia also cites leaderboard results, including the top spot on Artificial Analysis for efficiency and openness and the AI‑Q research agent ranking No. 1 on DeepResearch Bench and DeepResearch Bench II. ¹

Under the hood is a hybrid mixture‑of‑experts design that blends Mamba and transformer layers; only 12B of 120B parameters are active per step, with a latent MoE technique and multi‑token prediction to speed generation. On Blackwell GPUs, running in NVFP4 cuts memory needs and pushes inference up to 4x faster than FP8 on Hopper with no accuracy loss, and Nvidia reports up to 2x higher accuracy versus the previous Nemotron Super. ¹

Nvidia releases the model with open weights under a permissive license plus a full methodology, including over 10 trillion tokens of training and post‑training data and 15 reinforcement‑learning environments. Early adopters range from Perplexity and software‑agent vendors (CodeRabbit, Factory, Greptile) to enterprise platforms (Amdocs, Palantir, Cadence, Dassault Systèmes, Siemens). It’s available via build.nvidia.com, Perplexity, OpenRouter and Hugging Face, and is packaged as a NIM microservice with distribution across partners like Google Cloud Vertex AI and Oracle Cloud Infrastructure. ¹

Industry & Biz

Moonshot AI raises $2B at $20B valuation on open-weight demand

Moonshot AI, the Beijing lab behind the Kimi series of open‑weight large language models, raises about $2 billion at a $20 billion valuation. The round is led by Meituan’s Long‑Z Investment, with Tsinghua Capital, China Mobile and CPE Yuanfeng participating, per a post from Huafeng Capital cited by TechCrunch. ²

Investor appetite for Chinese open‑weight models is growing as some users accept slightly lower top‑end performance in exchange for cheaper inference. Moonshot’s Kimi K2.5 impressed on coding benchmarks earlier in 2026, and its latest Kimi K2.6 is the second‑most used LLM on OpenRouter by usage rank. ²

The company’s business traction includes annual recurring revenue topping $200 million in April, driven by paid subscriptions and API usage. Huafeng Capital also notes Moonshot raised $3.9 billion over the past six months, following a late‑2025 valuation of $4.3 billion and early‑2026 valuation of $10 billion after a $700 million raise. ²

New Tools

Nvidia debuts Nemotron 3 Nano Omni for multimodal agents

Nemotron 3 Nano Omni is an open multimodal model that combines vision, audio and language in one system so agents can reason across video, images, audio and text without juggling separate models. Nvidia positions it as a faster, more accurate path to production multimodal agents, citing leaderboard wins in document intelligence and audio‑video understanding. ³

By integrating vision and audio encoders inside a 30B‑A3B hybrid MoE architecture, it reduces latency from repeated handoffs and achieves 9x higher throughput versus other open omni models with similar interactivity. Early users like H Company report practical gains such as quickly interpreting full‑HD (1920×1080) screen recordings for real‑time computer‑use agents. ³

The model ships with open weights, datasets and training techniques, is available on Hugging Face, OpenRouter and build.nvidia.com as a NIM microservice, and can run from local Nvidia Jetson and DGX systems to data centers and major clouds. Enterprises including Aible, ASI, Eka Care, Foxconn, Palantir and others are adopting or evaluating it for perception, document intelligence and audio‑video workflows. ³

What This Means for You

If your team prototypes agents for research, operations or support, Nemotron 3 Super’s 1M‑token context and MoE efficiency point to fewer re‑prompts, less orchestration glue and more stable long‑running tasks (e.g., loading entire codebases or multi‑hundred‑page reports). That can compress cycle times and reduce failure modes like “goal drift.” ¹

If your workflows span video, screen recordings or mixed documents, a unified multimodal stack like Nemotron 3 Nano Omni can cut latency and improve accuracy by avoiding vision/audio model handoffs. This is especially relevant for computer‑use agents, compliance document analysis and contact‑center quality review. ³

Open weights plus NIM packaging expand deployment choices — from local pilots to regulated, on‑prem environments — while keeping a path to cloud scaling. For non‑developer teams, that means faster POCs under existing data‑governance rules, then controlled expansion once value is proven. ¹

Moonshot AI’s $2B raise signals growing enterprise interest in open‑weight options alongside proprietary APIs. Expect model portfolios: pairing open models for cost‑efficient inference with premium models for the hardest reasoning, evaluated case‑by‑case by accuracy, latency and total cost. ²

Action Items

Try Nemotron 3 Super on a long document task: Use OpenRouter or Hugging Face to load a 50–100‑page report and test summarization plus follow‑up Q&A in one session.
Pilot a multimodal workflow with Nemotron 3 Nano Omni: Upload a PDF with charts plus a short screen recording, and evaluate how quickly it answers questions that require both.
Baseline your agent costs on long contexts: Run the same multi‑step task with your current setup and with a single long‑context run; record tokens and elapsed time to compare.
Plan a governed pilot with IT: Review Nemotron 3 Super’s license and NIM deployment options, then schedule a 30‑minute session to choose on‑prem or cloud for a 2‑week trial.

Sources 4

[1] Nvidia New NVIDIA Nemotron 3 Super Delivers 5x Higher Throughput for Agentic AI | NVIDIA Blog [2] Nvidia NVIDIA Launches Vera CPU, Purpose-Built for Agentic AI | NVIDIA Newsroom [3] Nvidia NVIDIA Launches Nemotron 3 Nano Omni Model, Unifying Vision, Audio and Language for up to 9x More Efficient AI Agents | NVIDIA Blog [4] Techcrunch China’s Moonshot AI raises $2B at $20B valuation as demand for open-source AI skyrockets

Helpful?

0to1log Weekly

Latest AI News