Agents go mainstream: Google embeds Gemini, Nvidia ships Vera, OpenAI tests finance
Agents jumped from chat to action: Google made Gemini a built‑in helper, Nvidia shipped a CPU for agent orchestration, OpenAI opened a money view in ChatGPT, and a $5B TPU venture took shape — all pointing to faster, cheaper assistants in your daily tools.
This Week in One Line
Google embedded Gemini agents across its core apps and launched the faster 3.5 Flash model; Nvidia began shipping its Vera CPU to top labs; OpenAI opened a finance preview in ChatGPT; and Blackstone funded a $5B TPU venture — translation: agents are moving into everyday workflows.
Week in Numbers
- 12,000+ — Financial institutions ChatGPT’s finance preview can connect via Plaid (U.S. Pro only). 1
- 88 — Custom Olympus cores inside Nvidia’s new Vera Central Processing Unit (CPU). 2
- $5B — Blackstone’s equity commitment to a new U.S. data-center and Tensor Processing Unit (TPU) venture with Google. 3
- 900M — Monthly active users (MAUs) of the Gemini app, per Google’s I/O update. 4
- $100/month — Price of Google’s new AI Ultra tier, with higher usage caps. 5
- $2.167B — Publicis’ all-cash deal to acquire LiveRamp, at $38.50 per share. 6
- $40M — Series B funding raised by Dust for its ‘multiplayer’ enterprise agent platform. 7
Top Stories
OpenAI pushes into enterprise agents and personal finance
OpenAI said GPT‑5.5 now powers Databricks customer agents that handle messy, document-heavy work, with Databricks reporting fewer parsing errors and the first 50%+ accuracy on its OfficeQA Pro benchmark versus prior models. For consumers, a preview finance experience in ChatGPT lets U.S. Pro users connect bank and investment accounts via Plaid across 12,000+ institutions and ask questions grounded in real balances and transactions; OpenAI frames it as learning-focused and not professional advice. For non-specialists, this points to AI that can finally survive scanned PDFs at work and answer money questions at home under clearer data controls. 8 1
Google turns Gemini from chatbot to embedded helper
Google detailed Gemini features inside Search, YouTube, and Workspace — from Ask YouTube that jumps to the right clip to voice-powered drafting in Docs — and reported scale: 900 million Gemini app MAUs, 2.5 billion monthly AI Overviews users, and 3.2 quadrillion tokens processed monthly across Google surfaces. The company also highlighted provenance work (SynthID in Chrome and Search) and infrastructure plans to meet demand. For everyday work, this reduces steps between a question and an answer inside tools you already use. 4
Google launches Gemini 3.5 Flash for faster agent work
Gemini 3.5 Flash debuts as Google’s speed-first model for planning, tool-calling, and coding, available as the default in the Gemini app and AI Mode in Search, and to developers via Google’s API and enterprise platforms. Google reports about 4× faster output token rates at often less than half the cost versus other frontier models, plus strong results on agent and multimodal benchmarks (e.g., Terminal‑Bench 2.1 at 76.2%). For teams, this pairs lower latency with agent workflows that can split tasks among sub‑agents to finish real jobs. 9
Nvidia ships Vera CPU to frontier AI customers
Nvidia delivered its Vera systems — host CPUs tailored for agent orchestration, retrieval, and tool use — to Anthropic, OpenAI, SpaceXAI, and Oracle Cloud Infrastructure. Vera features 88 custom Olympus cores and 1.2 TB/s memory bandwidth, designed to keep GPUs fed and agent response times low; Oracle plans “hundreds of thousands” of units starting in 2026, underscoring that CPU capacity can bottleneck agents as much as GPUs do. For builders, this signals that agent performance is an end‑to‑end systems problem, not just a big‑GPU problem. 2 2
Blackstone and Google form a $5B TPU capacity venture
Blackstone committed $5 billion in equity to launch a U.S. AI infrastructure company with Google that will offer data‑center capacity and Google’s Tensor Processing Units (TPUs) as a service, targeting the first 500MW online by 2027. Led by Google veteran Benjamin Treynor Sloss, the move gives enterprises another lane to specialized compute beyond Nvidia‑centric “neoclouds.” If you’re planning pilots, this points toward more routes to reserve non‑GPU accelerators with predictable terms. 3 3
Anthropic acquires Stainless to simplify SDKs
Anthropic is buying Stainless, whose software auto‑generates software development kits (SDKs) from API specs and is used by OpenAI, Google, and Cloudflare. Anthropic plans to wind down hosted Stainless products while customers retain rights to generated SDKs, a step that complements its Model Context Protocol focus and tightens the path from Claude to enterprise systems. For teams, expect less custom glue code to connect agents to internal services. 10 11
Google adds a $100 AI Ultra plan and lowers top-tier price
Google introduced a $100/month AI Ultra plan with 5× higher usage than Pro, 20TB storage, and access to agent-first tools, while reducing the top tier to $200/month. It is also moving the Gemini app from daily prompt caps to compute‑based limits that refresh every five hours. For heavy users, this reframes budgeting around task complexity (text vs image/video) instead of prompt counts. 5 12
Publicis to buy LiveRamp for $2.167B
Publicis agreed to acquire LiveRamp for an enterprise value of $2.167 billion ($38.50/share), calling it a bet on “data co‑creation” that powers dependable agents. LiveRamp will operate as a neutral, interoperable platform under Publicis’ Technology segment. For marketers, the message is clear: governed data collaboration can make or break agent quality. 6 6
Alibaba unveils faster Zhenwu chip and teases next Qwen model
Alibaba announced its Zhenwu M890 AI processor with about 3× performance over its prior part, 144 GB memory, and 800 GB/s interchip bandwidth, and said its next large model, Qwen3.7‑Max, is coming. Alibaba reports 560,000 Zhenwu units delivered to 400+ customers across 20 industries, pointing to more local compute options in China as Nvidia’s shipments face constraints there. If you serve China users, put regional hardware availability on your roadmap. 13
Apple preps AI writing help and natural‑language Shortcuts
Reports indicate iOS/iPadOS 27 will add a grammar checker, “Help Me Write,” and natural‑language Shortcuts so users can describe automations in plain text, plus AI‑generated wallpapers. UI details include a translucent revision panel and a “Write With Siri” keyboard toggle. For teams that draft and approve on mobile, this could compress review cycles without new apps. 14 15
Trend Analysis
AI agents moved from chat to action. Google made Gemini an ambient helper inside Search, YouTube, and Workspace and launched Gemini 3.5 Flash as its default fast executor, while OpenAI’s finance preview, Figma’s on‑canvas assistant, and IrisGo’s desktop buddy show agents stepping into daily docs, money, design, and repetitive PC work. For most readers, this means fewer copy‑paste steps and more “do it for me” flows across familiar tools. 4 9 16 17 1
Under the hood, infrastructure is rebalancing to match agent workloads. Nvidia shipped Vera, a CPU designed for orchestration and tool‑use steps, while Blackstone put $5B into a TPU venture with Google and Decart raised capital to ease chip switching — all signals that CPUs, TPUs, and portability layers will shape latency and costs as much as GPUs. If your assistants stall, it may be a scheduling and data‑movement problem, not a model issue. 2 3 18
Access and pricing shifted too. Google added a $100 AI Ultra plan and moved to compute‑based quotas, aligning budgets with task complexity while reporting massive usage (900M Gemini app MAUs; 3.2 quadrillion tokens/month) across its surfaces. For teams, that implies metering by input type and length — not just prompt counts — when evaluating value for money. 5 4
Finally, compute and supply chains are diversifying. Alibaba’s Zhenwu rollouts, reported interest in Tenstorrent from Intel and Qualcomm, and Zyphra’s AMD‑first fundraise all point toward more non‑Nvidia paths, especially in regions with export constraints. Keep an eye on availability and tooling maturity before betting on alternatives for production. 13 19 20
Watch Points
- “Gemini Spark” — Google’s personal agent powered by 3.5 Flash; watch for broader beta access and guardrails. 9
- “500MW TPU venture” — signs of site selection and early customers in the Blackstone–Google JV will hint at enterprise access timelines. 3
- “Stainless wind‑down” — Anthropic is acquiring Stainless and plans to sunset hosted products; confirm SDK maintenance plans. 10
Open Source Spotlight
- Osaurus — Native macOS agent runner with local and cloud model backends, Model Context Protocol (MCP) server, and 20+ built‑in plugins; good for privacy‑sensitive, offline workflows. osaurus-ai/osaurus
- Firecrawl — Web search/scrape/clean toolkit now with a /parse endpoint for PDFs, DOCX, HTML, and more, returning clean Markdown/JSON; ideal for feeding agents predictable inputs. firecrawl/firecrawl
- Netron — Visual model viewer for ONNX, TensorFlow Lite, PyTorch, Core ML, and more, available as desktop apps and a one‑click browser version; handy for quick layer/tensor inspection. lutzroeder/netron
- Nvidia Video Search & Summarization — Reference blueprints for GPU‑accelerated vision agents that search and summarize video; a starting pattern for production apps. NVIDIA-AI-Blueprints/video-search-and-summarization
- Onyx — Open-source chat front end that works with many large language models; useful for teams testing providers behind a single interface. onyx-dot-app/onyx
What Can I Try?
- Put Gemini 3.5 Flash on a real task: in the Gemini app or AI Mode in Search, draft and revise a short brief, then time the full loop vs your current assistant. 9
- If eligible, connect the ChatGPT finance preview: link one account via Plaid and ask two budget or subscription questions to see grounding quality. 1
- Clean a PDF for your agent: upload it to Firecrawl’s /parse to get Markdown or JSON with tables preserved, then compare accuracy vs your current parser. 21
- Audit CPU-bound agent steps: list tool calls, retrieval, and code-exec phases; ask your vendor how CPU scheduling and memory bandwidth affect latency. 2
- Right-size Google AI spend: check whether the $100 AI Ultra tier’s higher usage and compute-based caps match your workload mix. 5
Comments (0)