Point, speak, act: DeepMind’s AI pointer comes to Chrome
DeepMind demos a pointer that understands what you select and why, turning pixels into actions like “compare these” or “get directions.” Microsoft details an agent system that uncovered 16 Windows vulnerabilities, while new repos sharpen agent workflows for everyday builders.
One-Line Summary
AI steps out of chat windows and into the interface itself: a context-aware pointer from DeepMind, an agentic security system from Microsoft, and open-source tools to run agent teams and multimodal backends.
Research Papers
DeepMind turns the mouse pointer into a context-aware helper
DeepMind shows an experimental pointer that lets you point and speak to act across apps, so you can ask for what you need without crafting long prompts. Powered by Gemini, the demos include pointing to an image of a building and saying “Show me directions,” or using Google AI Studio to edit an image or find places on a map. The idea is to meet you where you work—on the canvas—rather than shuttling information into a separate AI chat window. 1
The team frames four principles: keep you in flow across apps; “show and tell” by capturing the visual and semantic context around the pointer; embrace natural shorthand like “this” and “that” combined with pointing; and turn pixels into actionable entities like places, dates, or objects. In practice, that looks like hovering over a stats table to request a pie chart, or highlighting a recipe and asking to double the ingredients—without typing a detailed prompt. 1
DeepMind says these concepts are being woven into products: starting today, you can use your pointer to ask Gemini in Chrome about the part of a page you care about (for example, select products to compare or point to where you want to visualize a couch), and “Magic Pointer” is coming to the new Googlebook laptop experience, with more experiments planned across platforms like Google Labs’ Disco. The throughline is shifting the work of conveying context from you to the computer. 1
LLM & SOTA Models
Microsoft debuts MDASH, a multi-model agentic security harness
Microsoft introduces a system that coordinates more than 100 specialized AI agents across an ensemble of frontier and distilled large language models (LLMs) to find, debate, and prove exploitable bugs end-to-end. Using this harness, researchers found 16 new vulnerabilities across Windows networking and authentication, including four Critical remote code execution flaws in components like the Windows kernel TCP/IP stack and IKEv2. 2
On tests, the system found all 21 planted vulnerabilities with zero false positives in a private driver, recalled 96% of five years of Microsoft Security Response Center (MSRC) cases in clfs.sys and 100% in tcpip.sys, and scored 88.45% on the CyberGym benchmark of 1,507 real-world vulnerabilities—about five points above the next entry. Microsoft is using the harness internally and running a limited private preview with customers. 2
MDASH runs a structured pipeline—Prepare, Scan, Validate, Dedup, and Prove—with distinct auditor, debater, and prover agents plus domain plugins (for example, a CLFS proving plugin). Because targeting, validation, deduplication, and proving are model-agnostic, organizations can swap in newer models without losing prior configurations and plugins. 2
Open Source & Repos
notebooklm-py gives programmatic access to Google NotebookLM
notebooklm-py is an unofficial Python application programming interface (API) and agentic skill that exposes full programmatic control of Google NotebookLM—including capabilities the web UI doesn’t surface—via Python, a command-line interface (CLI), and AI agents like Claude Code, Codex, and OpenClaw. It’s MIT-licensed and published on PyPI. 3
Release v0.4.1 (May 11, 2026) notes additions like a “notebooklm auth refresh” CLI, a keepalive parameter on the NotebookLM client, an environment variable for refresh commands, and two new dataclass fields, with Python 3.10–3.14 indicated via badges. It’s aimed at teams automating NotebookLM workflows or wiring it into existing agent stacks. 3
claude_codex_bridge makes terminal-first multi-agent teams visible and controllable
claude_codex_bridge (CCB) runs visible, supervised agent teams for Claude, Codex, Gemini, OpenCode, and Droid from one terminal workspace, with project memory and tmux-based supervision. It’s cross-platform (Linux, macOS, Windows) and currently lists version 6.1.15. 4
The v6.1.14 release (May 13, 2026) documents a macOS Keychain fallback for Claude credentials and clarifies diagnostics boundaries—support bundles must not follow fallback Keychain symlinks—signaling attention to secret management in agent ops. 4
Pixeltable targets declarative, incremental backends for multimodal apps
Pixeltable describes itself as a declarative and incremental backend for multimodal AI applications, with an Apache 2.0 license and a published PyPI package. CI badges indicate active tests and nightly runs. 5
The pitch is a backend that helps teams structure data and processing for applications that mix text, images, or other modalities, while keeping workflows declarative and incremental to avoid full recomputes. 5
Community Pulse
Hacker News (245↑) — Mixed: some see voice+pointer as inclusive design, others call it a repackage of drag-and-drop and question scope and practicality. 6
"Isn't "drag the rectangle" and visual interaction exactly the point of the research in the article? Speech is the perfect side channel to this interaction, not a context switch to text. Also, I doubt DeepMind is designing for existing programmers and savvy computer users. They are thinking about the other billions of people in the world. Speech is the skill people will already have, not typing." — Hacker News 6
"I looked at the first example and I'm astonished that they took a standard click and drag mouse move, injected the need to speak into an llm and then acted like it was revolutionary. Imagine trying to convince someone in the 90s that that's a step forward." — Hacker News 6
Why It Matters
The center of gravity is shifting from chatboxes to on-screen, context-aware actions. DeepMind’s pointer concept—and early Chrome integration—suggests UI design that captures what you’re pointing at and why, reducing prompt writing and context switching for everyday tasks. 1
Microsoft’s results point to “the system around the model” as the durable advantage: orchestrated agents, validation, and proving can raise recall while keeping noise low. That design principle may spill over into enterprise tooling beyond security as teams look for model-agnostic, upgrade-friendly pipelines. 2
What to Try This Week
- Use the pointer demos described by DeepMind: open the DeepMind post and try the Google AI Studio examples to point-and-act on images or maps. 1
- Automate NotebookLM:
pip install notebooklm-py, then script a small workflow (auth refresh + a data query) to see how a CLI-friendly agent can drive a research notebook. 3
Comments (0)