NVIDIA’s 120B ‘Super’ model targets long‑run agents; Microsoft ships a 7B on‑device computer-use agent; MiniMax posts public weights with strong SWE-Pro
Agentic AI is shifting from chat to managed work: NVIDIA’s Nemotron 3 Super focuses on long-context planning with hybrid Mamba‑Transformer MoE, Microsoft’s 7B Fara runs web tasks on-device, and MiniMax M2.7 releases public weights with competitive coding benchmarks.
One-Line Summary
Agentic AI is moving from chat to sustained task execution, with new long-context reasoning models and compact, on-device computer-use agents.
LLM & SOTA Models
Nemotron 3 Super: An open hybrid Mamba‑Transformer MoE for agentic reasoning
This model is built to run as the “brain” of multi‑agent systems that need long, careful thinking without slowing to a crawl. NVIDIA’s Nemotron 3 Super packs a total of 120B parameters with 12B active at inference, a native 1M‑token context window, and reports over 5x throughput versus the previous Nemotron Super while scoring 85.6% on PinchBench as the strongest open model in its class. 1
Under the hood, it mixes Mamba‑2 sequence layers for efficiency with interleaved Transformer attention for precise recall, plus a Mixture‑of‑Experts that uses a new “latent MoE” to consult 4x as many specialists at the same cost by compressing tokens before expert routing. Multi‑token prediction (MTP) forecasts several future tokens per step, enabling built‑in speculative decoding and up to 3x speedups for long generations. Training leans on native NVFP4 (4‑bit) pretraining optimized for Blackwell, which NVIDIA says cuts memory and yields up to 4x faster inference on B200 vs FP8 on H100 while maintaining accuracy. 1
To align behavior with multi‑step agent workflows, Super uses 25T pretraining tokens (10T unique curated), ~7M supervised samples from a 40M pool, and reinforcement learning across 21 environments, totaling 1.2M+ environment rollouts. NVIDIA frames a “Super + Nano” pattern: smaller Nemotron 3 Nano handles straightforward steps, while Super plans and reasons through complex chains in domains like software development or cybersecurity triage. Weights, datasets, and recipes are released so teams can customize and deploy on their own infra. 1
Fara‑7B: An efficient agentic model for computer use
This is a 7B‑parameter model that sees your browser like a person does (screenshots) and then clicks, types, and scrolls to finish tasks such as shopping, booking, or finding information. Microsoft says Fara‑7B runs competitively against much larger systems while being small enough for on‑device use, improving latency and privacy. It’s available on Microsoft Foundry and Hugging Face under an MIT license, with a quantized build for Copilot+ PCs and integration with the Magentic‑UI research prototype. 2
Instead of parsing accessibility trees, Fara‑7B relies solely on screenshots and predicts single‑step actions with brief “reasoning” plus tool calls (e.g., click(x,y), type(), web_search()). It is distilled from a multi‑agent synthetic pipeline and trained on 145,000 trajectories with 1 million steps across diverse websites and tasks. Benchmarks report WebVoyager 73.5%, Online‑Mind2Web 34.1%, DeepShop 26.2%, and the new WebTailBench 38.4%; Microsoft also notes higher efficiency vs UI‑TARS‑1.5‑7B (≈16 steps average vs ≈41). The team recommends sandboxed runs and avoiding sensitive data while Fara evolves. 2
Open Source & Repos
MiniMax M2.7 public weights: A self‑evolving agent model with strong SWE‑Pro
MiniMax releases the M2.7 model weights on Hugging Face, highlighting Mixture‑of‑Experts design and “Agent Teams” features for multi‑agent collaboration. Reported scores include 56.22% on SWE‑Pro and 57.0% on Terminal Bench 2, alongside 39.8% on NL2Repo and 55.6% on VIBE‑Pro, with results aimed at more realistic engineering tasks than toy coding puzzles. MiniMax also describes an internal “self‑evolution” loop that iterated 100+ rounds to improve scaffold settings, claiming ≈30% gains on internal sets. 3
Context for SWE‑bench: community analyses note that SWE‑bench Verified suffers from contamination, while SWE‑bench Pro (1,865 tasks) is harder and more trustworthy; top systems land around mid‑50% on Pro, making M2.7’s 56.22% competitive for agentic coding. Tooling (“scaffolding”) often drives large swings in scores, so treat numbers as system‑level, not model‑only. 4
Ecosystem adoption is underway: popular structured‑output libraries are adding MiniMax provider support, noting OpenAI‑compatible APIs and available models such as MiniMax‑M2.7 and MiniMax‑M2.7‑highspeed. This lowers switching costs for teams already using OpenAI SDKs with a custom base URL. 5
Research Papers
EXAONE 4.5: LG AI Research’s open‑weight vision‑language model for documents and long context
This report introduces EXAONE 4.5 as LG AI Research’s first open‑weight VLM built by adding a dedicated visual encoder to the EXAONE 4.0 framework, enabling native multimodal pretraining. The team emphasizes document‑centric corpora and extends context up to 256K tokens, reporting competitive general benchmarks and state‑of‑the‑art results among similar‑scale models for document understanding and Korean contextual reasoning. 6
A related industry study from IBM Research explores training smaller, multi‑task code LLMs: at 7B scale, careful model‑merging of task‑specialized checkpoints reached 92.7% Pass@1 on HumanEval (Qwen Coder 2.5 7B), edging past a task‑specific fine‑tune at 90.9%, while retaining summarization ability. At smaller scales, data mixing was preferred. The takeaway for enterprises: merging/mixing can pack multiple skills into compact models without major regressions—useful alongside larger agentic systems. 7
Community Pulse
Hacker News (82↑) — MiniMax M2.7’s public weights excite users who want strong local inference, with reminders to check licensing terms; many compare it to other capable local models.
"Absolutely - I'm one of these types of people who just want local inference myself. I have a Strix Halo rig and I'm thrilled to have Minimax M2.7 weights to run locally. Like I said, this is still an unambiguously good thing, and follows some of the spirit of open source. Just know that Minimax M2.7 is offered with a noncommercial license. If you use it for commercial purposes, you may be on the hook, liability-wise." — Hacker News 3
Why It Matters
Today’s updates show a split strategy for agentic AI: heavy “planner” models like Nemotron 3 Super push long‑context reasoning and throughput, while small, efficient agents like Fara‑7B bring computer‑use skills to the edge. Public weights such as MiniMax M2.7 broaden local experimentation—just mind license terms—while enterprise teams mix and merge smaller code models to control cost. 1 2 3
Try This Week
- MiniMax M2.7 local test: Download the public weights on Hugging Face and run a quick SWE‑bench‑style patch task—check license terms before use. https://huggingface.co/MiniMaxAI/MiniMax-M2.7 3
- Fara‑7B browser automation demo: Try the Magentic‑UI research prototype with Fara‑7B to automate a simple web form in a sandboxed environment. Start from the Microsoft Research blog links to Foundry/Hugging Face. 2
Comments (0)