Arcee ships a 400B open-weight reasoning model as Anthropic mobilizes 'Glasswing' for zero-day hunting
Open-weight sovereignty meets coordinated AI defense: Trinity Large Thinking targets on-prem reasoning while Mythos Preview quietly finds decades-old bugs across critical software.
One-Line Summary
Open-weight giants and defensive AI collide: Arcee ships a 400B-parameter reasoning model while Anthropic’s Mythos mobilizes Big Tech to hunt thousands of software vulnerabilities.
LLM & SOTA Models
Arcee Trinity Large Thinking
Arcee releases Trinity Large Thinking, a reasoning-focused, open-weight large language model that the 26-person startup trains on a reported budget of about $20 million — notable because the base they tout is a massive 400B parameters. The model is downloadable and API-accessible under the permissiveApache 2.0 license, offering on‑premise control for teams that want model sovereignty versus closed APIs. Early benchmarks suggest it competes with top open models, though it does not surpass closed systems from Anthropic or OpenAI. 1
The pitch is straightforward: Trinity aims to be a high‑capability reasoning model that Western orgs can self‑host without the licensing caveats seen in some popular open alternatives. TechCrunch emphasizes that while it is not a head‑to‑head threat to Meta’s latest Llama, it avoids “not‑really‑open” license debates and is already a top pick in agent ecosystems like OpenClaw via OpenRouter usage data. In short, it trades absolute SOTA crowns for practical control, cost predictability, and deployment flexibility. 2
Arcee frames the release against recent turbulence in closed ecosystems — for example, changes to third‑party usage terms for coding agents — arguing that open‑weight, permissively licensed models reduce platform risk. For buyers, the key numbers here are the parameter scale (400B), the permissive license (Apache 2.0), and the reality check that performance is competitive with leading open models but not yet eclipsing closed leaders. 2
Open Source & Repos
Anthropic Project Glasswing
Project Glasswing is a new industry coalition — Amazon, Apple, Broadcom, Cisco, CrowdStrike, the Linux Foundation, Microsoft, Palo Alto Networks and more — centered on Anthropic’s unreleased Claude Mythos Preview model to proactively surface vulnerabilities in critical software. In early runs, Mythos reportedly finds “thousands” of previously unknown issues, including a 27‑year‑old OpenBSD bug and a 16‑year‑old FFmpeg flaw that automated tools scanned five million times without catching. Anthropic says all reported issues are patched. 3
Anthropic commits up to $100M in usage credits and**$4M in direct funding** to open‑source security groups, but Mythos Preview remains restricted to partners due to dual‑use risks. Coverage notes Mythos outperforms Claude Opus 4.6 on security‑oriented tests like CyberGym, with launch partners spanning AWS, Apple, Google, JPMorganChase, Microsoft, and NVIDIA. The thesis: AI is collapsing the window from discovery to exploitation from months to minutes — so defenders need equivalent acceleration. 4
Analysts frame Glasswing as a rare “Manhattan Project”‑style alignment among rivals, driven by existential infrastructure risk. Beyond headline numbers, the meaningful shift is operational: pushing agentic coding and reasoning models into coordinated disclosure workflows, with required cross‑org sharing of findings to harden the open‑source substrate most systems depend on. Expect fast iteration: Anthropic hints frontier capabilities will step‑change within months, demanding continuous defense upgrades. 5
JuliusBrussee/caveman
Caveman is a lightweight “skill” for Claude Code (and Codex variants) that forces ultra‑concise, telegram‑style answers, claiming roughly 65–75% output token reductions and up to about3x faster response times in examples. It ships intensity levels (Lite/Full/Ultra) to strip greetings, hedging, and filler while keeping technical accuracy; reported before/after cases show large cuts on React re‑rendering explanations, PostgreSQL pool configs, and Docker multistage builds. One‑line install and session‑wide toggles make it easy to trial. 6
Community roundups cite benchmarks like 1,180→159 tokens (87% cut) on a React debugging explanation and 2,347→380 (84% cut) for Postgres pooling notes. However, the maintainer clarifies that savings target output tokens, not hidden “reasoning tokens,” and that the skill itself consumes some context on load; proper evaluation should weigh input/output tokens, latency, and quality together. In short: promising in practice, but treat the “~75%” as preliminary. 7
Why it’s trending: API costs scale with tokens, and teams often read only the code plus short notes. Caveman exploits this by compressing prose without touching code generation. That said, some users report comprehension dips under “ultra” brevity — a reminder that prompting for brevity is a trade‑off. If you try it, measure end‑to‑end loop cost and correctness, not just token counters. 8
METATRON: AI Pentest Assistant (Local)
METATRON is a fully offline, CLI‑driven penetration testing assistant for Parrot OS/Debian that runs recon tools (nmap, nikto, whois, dig, whatweb, curl), aggregates results, and analyzes them with a local LLM via Ollama — no API keys, cloud, or data exfiltration. Its model, “metatron‑qwen,” is a fine‑tuned variant of huihui_ai/qwen3.5‑abliterated:9b with a 16,384‑token context window, tuned for precise, non‑fluffy security analysis. 9
A standout is its agentic loop: the model can ask METATRON to run more scans as needed, rather than a single pass. It also correlates recon output with public CVEs via DuckDuckGo search without credentials, keeping the offline‑first posture. For auditability, it logs into a five‑table MariaDB schema (findings, severities, remediation, exploit attempts, session summaries) and exports HTML/PDF reports. 10
Positioning‑wise, METATRON speaks to orgs that can’t send sensitive banners or internal IPs off‑prem. It complements the Glasswing story: one hardens the commons with a restricted frontier model; the other empowers private assessments with a local 9B‑class model and deterministic tooling — different threat models, similar automation arc. 11
Community Pulse
Hacker News (56↑) — Arcee’s Trinity is seen as technically solid but incremental, with value in price/performance.
"Looks like an incremental improvement, technically. Seems to benchmark around Kimi K2.5 but it's cheaper and faster." — Hacker News
Hacker News (879↑) — Caveman earns praise for cutting tokens and friction, but users want clearer docs and less jargon.
"Fair enough! The simple answer is: we did a lot of work to make the model better at coding without requiring complicated installation or configuration. One comman to install and run. All the benefits of claude code, without any of the limitations or rug pulls." — Hacker News
"I only understood half of the tech jargon in your answer. If I understood it all I’d probably run it myself. If someone who is less knowing than me is your customer, you need to explain in simpler terms!" — Hacker News
Why It Matters
Two complementary trends define today: open‑weight sovereignty and AI‑accelerated defense. Arcee bets that a permissive, high‑parameter reasoning model you can self‑host is now a strategic asset, even if it trails the very top closed systems. At the same time, Anthropic’s Mythos shows how frontier models change cyber timelines — finding decades‑old bugs and forcing defenders to coordinate at unprecedented speed and scale. 1 3
For practitioners, the middle layer is getting more practical: Caveman reminds us that prompt‑level compression can materially cut cost and latency, while METATRON shows local models can already drive useful, auditable workflows. The upshot: expect a stack where local, open, and frontier‑assisted tools coexist — with careful trade‑offs among capability, control, and risk. 6 9
Comments (0)