AI NewsResearch

6 min read 4/12/2026

GoogleGemma 4open weightson-device AIMoESWE-bench Pro

Google’s Gemma 4 sets a new open, on‑device AI baseline

Google releases Gemma 4 under Apache 2.0 with models that run from phones to workstations, while GLM‑5.1 shows open weights can now compete in real coding tasks. Here’s what changes for builders today.

Find in this article

Reading Mode

One-Line Summary

Google releases Gemma 4, an open, multimodal model family built for on-device reasoning.

LLM & SOTA Models

Gemma 4: Open, multimodal AI that runs from phones to workstations

Think “local-first Copilot”: Gemma 4 handles text, images, video (and audio on smaller variants) directly on your device, not just in the cloud. It ships in four sizes — Effective 2B (E2B), Effective 4B (E4B), 26B Mixture of Experts, and 31B Dense — under an Apache 2.0 license. The 31B ranks #3 and the 26B ranks #6 among open models on Arena AI’s text leaderboard as of April 1, delivering unusually strong intelligence per parameter. Edge models offer 128K context; larger ones go up to 256K, all trained across 140+ languages. ¹

What’s new for builders: native function calling, structured JSON, and system instructions for agents; high‑quality code generation offline; and vision features that include OCR and chart understanding. For laptops/workstations, unquantized bfloat16 weights fit on a single 80GB NVIDIA H100, with consumer‑GPU quantized builds available. On mobile and IoT, E2B/E4B target low latency and power, developed with the Pixel team and hardware partners like Qualcomm and MediaTek to run fully offline with near‑zero latency. ¹

Ecosystem support lands on day one: vLLM adds deployment across Nvidia, AMD, Intel GPUs and Google TPUs, handling Gemma 4’s 128K–256K context windows and agentic I/O patterns. That means you can serve locally with vLLM, or scale on GKE/TPUs without changing your stack. If you prefer a GUI or desktop workflow, early integrations span Transformers, Ollama, llama.cpp, MLX, LM Studio, and more. ²

For teams that care about privacy and data control, Apache 2.0 plus on‑device inference offers digital sovereignty: build and deploy on‑prem or air‑gapped while reusing familiar tools (Vertex AI, Cloud Run, GKE) when you need to burst. Benchmarks continue to evolve, but today’s headline is clear: Gemma 4 moves beyond “chat” into agent‑ready reasoning on accessible hardware. ¹

GLM‑5.1: Open‑weight coding model posts 58.4% on SWE‑bench Pro

GLM‑5.1, a 754‑billion‑parameter mixture‑of‑experts model from Z.AI, becomes the first open‑weight system to beat closed frontiers on SWE‑bench Pro with 58.4% — ahead of GPT‑5.4, Claude Opus 4.6, and Gemini 3.1 Pro in that specific test. It’s MIT‑licensed, comes with a 200K context and 128K output tokens, and is trained for multi‑hour autonomous sessions; Z.AI showcases an 8‑hour run that built a functional Linux desktop across 655 steps without human intervention. Treat demos as claims until independently reproduced, but the direction is notable. ³

On human‑validated coding, GLM‑5.1 hits 1530 Elo on Arena.ai’s Code Arena (April 10), placing third behind two Claude Opus variants and ahead of listed GPT/Gemini entries. Coverage emphasizes that it leads coding‑centric benchmarks but trails top models on math/science reasoning — meaning it’s a strong coding specialist rather than an all‑around champion. ⁴

Why it matters: open weights plus high coding performance shift buyer calculus. You can self‑host (air‑gapped, fine‑tune on proprietary repos), avoid per‑token fees, and still get near‑frontier capability — if you have the hardware. Caveats include heavy infra needs, platform/tooling gaps versus turnkey IDE assistants, and the need for third‑party verification of long‑horizon reliability. ³

Open Source & Repos

PokeClaw와 OpenClaw: On‑device phone control and a local‑first agent framework

PokeClaw is a GitHub project claiming to be the “first on‑device AI that controls your Android phone,” powered by Gemma 4 with no cloud and no API key. It targets Android 9+, ships under Apache 2.0, and focuses on local control rather than remote APIs — aligning with Gemma 4’s mobile‑first push. As with any phone‑control tool, test in a sandbox and audit permissions carefully. ⁵

In parallel, OpenClaw (coverage review) is an open‑source, local‑first agent framework: bring your own model (closed API or local), wire tools and channels, and automate multi‑step workflows with state and approvals. It’s aimed at builders who prefer self‑hosting and extensibility over a single polished chat UI; the write‑up underscores trade‑offs (your infra, your safety rails) and why it’s trending among teams trying to own runtime and data. ⁶

You’ll also see unofficial “Gemma 4 APK” download pages circulating. These emphasize offline use and privacy but come with the usual sideloading risks (malware, missing updates). If you experiment, follow platform security guidance and prefer official hubs (Google AI Edge Gallery, Play Store equivalents) when they exist. The APK pages themselves advise caution and outline generic sideload steps. ⁷ ⁸

Hermes HUD UI: A browser dashboard for a persistent, tool‑using agent

Hermes HUD UI brings a web “consciousness monitor” to Hermes, the self‑hosted, model‑agnostic agent from Nous Research. The dashboard mirrors the popular TUI, showing identity, memory, skills, sessions, projects, cron jobs, costs, and live chat across 13 tabs, updating in real time via WebSocket. Quick start is a single repo clone and script; it expects Python 3.11+, Node 18+, and a running Hermes instance with data in ~/.hermes. ⁹

Hermes itself runs persistently on a local machine or low‑cost VPS, separates “conversation” from “execution,” and uses explicit tools (terminal, files, web) with configurable backends (local, Docker, SSH, serverless options). It stores state under ~/.hermes (configs, SOUL.md identity, memories, skills), supports profiles, and offers a messaging gateway with pairing/allowlists for safety. The guide stresses treating Hermes like infrastructure: auditable actions, sandboxed execution, and backups. ¹⁰ ¹¹

For teams piloting agents, this combo — a persistent agent plus a live, queryable dashboard — makes operations tangible: you can observe memory growth, cost trends, and tool behavior, then tighten permissions or refactor skills accordingly. That’s the difference between a tab you “chat” in and an assistant you operate. ⁹

Why It Matters

Gemma 4 moves “serious” reasoning and multimodality onto devices people already have — under a permissive license and with day‑zero infra support. Pair that with an open‑weight coding specialist like GLM‑5.1 and fast‑maturing agent stacks, and you get new freedom to run, tune, and monitor AI where your data lives. ¹ ² ³

Try This Week

Gemma 4, no server: Try E2B/E4B demos in Google AI Edge Gallery or run 26B/31B with vLLM locally to feel 128K–256K context and JSON/function-calling. ¹ ²
Hermes + HUD in 15 minutes: Install Hermes, then spin up Hermes HUD UI to visualize sessions, skills, and costs while running a small terminal task in Docker. ⁹ ¹⁰
Phone sandbox: Install PokeClaw on a spare Android device to test local control flows, reviewing permissions and action logs before daily use. ⁵

Sources 10

[1] Deepmind Gemma 4: Byte for byte, the most capable open models [2] Vllm Announcing Gemma 4 on vLLM [3] Huggingface Welcome Gemma - Google’s new open LLM (2024) [4] Sdd GLM-5.1: The Open-Source Model That Just Beat Everyone on SWE-bench Pro [5] Buildfastwithai GLM-5.1: First Open-Weight Model in Top 3 of Code Arena [6] Github PokeClaw (PocketClaw) — first on-device AI that controls your Android phone [7] Clickwise OpenClaw AI Explained (2026): Features & Comparison [8] Apkkit Scaricare Gemma 4 APK 1.0.11 per Android & iOS - APKKIT [9] Github Hermes HUD — Web UI [10] Glukhov Hermes AI Assistant - Install, Setup, Workflow, and Troubleshooting

Helpful?

0to1log Weekly

Latest AI News