OpenAI cash, AWS pact, Nvidia’s Vera CPU, and faster ChatGPT anchor an agent-first week
Agent platforms and guardrails shipped, ChatGPT got faster by default, Nvidia targeted agent bottlenecks with a new CPU, and OpenAI locked in massive funding and AWS capacity — a week about taking agents from pilot to production.
This Week in One Line
OpenAI raised $122B and signed a 2 GW AWS Trainium pact; Nvidia unveiled a Vera CPU for agent loops; Google shipped an enterprise agent platform; ChatGPT switched to GPT-5.5 Instant — faster, more governed agents are moving into everyday work.
Week in Numbers
- $122B — OpenAI’s new funding round to “accelerate the next phase of AI,” at an $852B valuation. 1
- 88 cores — Nvidia’s Vera CPU packs 88 Arm-based Olympus cores to speed non‑GPU agent work. 2
- 2 GW — Trainium capacity OpenAI agreed to consume under its expanded AWS partnership. 3
- 6.5× — Download lift Appfigures attributes to image‑model launches versus chatbot upgrades. 4
- 220,000 GPUs — Compute at SpaceX’s Colossus 1 that Anthropic says it can tap, tied to higher Claude limits. 5
- €1B+ — SAP’s four‑year investment as it acquires Prior Labs to bring tabular AI into core products. 6
- 1,000,000 tokens — Context window in Nvidia’s Nemotron 3 Super for long, agentic tasks. 7
Top Stories
OpenAI raises $122B, outlines an agent‑first “superapp” and multi‑cloud stack
OpenAI announced a $122B round at an $852B valuation to “accelerate the next phase of AI,” positioning compute and product integration as core advantages. The company detailed a broad infrastructure strategy (multiple clouds and chips, plus an in‑house Broadcom‑partnered chip) and a plan to unify ChatGPT, coding, search, and browsing into a single agent‑first surface. It also reported scale figures: 900M weekly active users, 50M+ subscribers, $2B in monthly revenue, APIs processing 15B tokens per minute, and an ads pilot topping $100M ARR (Annual Recurring Revenue). For non‑specialists, this signals AI that does multi‑step work, not just chat. 1
OpenAI and Amazon: 2 GW Trainium, Bedrock integration, and a stateful runtime
OpenAI and Amazon agreed to co‑create a “Stateful Runtime Environment” delivered via Amazon Bedrock and made AWS the exclusive third‑party cloud distributor for OpenAI Frontier. The deal includes about 2 gigawatts of AWS Trainium capacity, expands a prior $38B agreement by $100B over eight years, and adds a $50B Amazon investment ($15B upfront plus $35B contingent). For AWS shops, this points toward OpenAI agents aligned with Bedrock permissions and monitoring. 3
Nvidia unveils Vera CPU to clear agent bottlenecks
Vera is a data center CPU built for the “glue” around Large Language Models (LLMs, Large Language Models): code runs, tool calls, and orchestration loops. It features 88 Arm‑based Olympus cores, up to 1.2 TB/s memory bandwidth, and up to 1.8 TB/s coherent bandwidth to Rubin GPUs over NVLink‑C2C; Nvidia cites up to 50% faster “sandbox” performance than traditional CPUs. Racks can integrate up to 256 liquid‑cooled Vera CPUs to support 22,500+ concurrent CPU environments, promising steadier latency in agent workflows. 8 2
ChatGPT switches default to GPT‑5.5 Instant
OpenAI made GPT‑5.5 Instant the default ChatGPT model, emphasizing quicker answers with fewer sensitive‑topic mistakes and surfacing “memory sources” so users can view and edit the context powering replies. Reports cite higher scores versus GPT‑5.3 Instant (e.g., AIME 2025: 81.2 vs. 65.4) and parity with GPT‑5.4 latency. In the API, GPT‑5.5 appears as “chat‑latest,” while paid users can still select GPT‑5.3 for three months to compare behavior. 9 10
Google launches Gemini Enterprise Agent Platform for workplace agents
Google Cloud introduced the Gemini Enterprise Agent Platform (evolving Vertex AI) to build, scale, govern, and monitor AI agents for employees. It bundles creation tools (Agent Studio, an upgraded Agent Development Kit), long‑running Agent Runtime with Memory Bank, identity/registry/gateway for governance, and quality controls via simulation, evaluation, and observability, with access to 200+ models in Model Garden. The message: standardized tooling so IT can manage while business users build. 11
Gemini File Search adds multimodal retrieval with page‑level citations
Google expanded the Gemini API’s File Search so apps can retrieve across text and images with custom metadata filters and return page‑level citations — a direct boost to Retrieval‑Augmented Generation (RAG, 검색 증강 생성) workflows that need verifiable sources. Powered by Gemini Embedding 2, developers have reported measured gains (e.g., +40% Recall@1 for one use case, 60%→~87% Match@20 for another) and support for mixed inputs per request. For teams, this means answers that point to the exact page and screenshot. 12 13
Microsoft open‑sources an Agent Governance Toolkit
Microsoft released an Agent Governance Toolkit with policy enforcement, zero‑trust identity, execution sandboxing, and reliability engineering patterns mapped to the OWASP Agentic Top 10 risks. The active repo (v3.4.0 on May 5) includes documentation and a quick start, plus recent fixes to reduce false‑positive contributor‑risk flags. For teams piloting tool‑calling agents, it’s a packaged baseline for guardrails. 14
Anthropic secures SpaceX Colossus compute, raises usage limits
Anthropic announced access to the full Colossus 1 data center — over 300 MW and more than 220,000 Nvidia GPUs — alongside higher rate limits for Claude Pro/Max and API tiers. The company framed it as part of a multi‑gigawatt capacity build with commitments across Amazon, Google/Broadcom, Azure, and others. More headroom translates to longer coding and analysis sessions without reworking workflows. 5
Nvidia releases Nemotron 3 Super (and Nano Omni) for agentic AI
Nemotron 3 Super is an open‑weight model (120B parameters, 12B active) tuned for multi‑agent planning and tool use, with a 1‑million‑token context window to keep long tasks on track; Nvidia cites up to 5× higher throughput versus the prior Nemotron Super. A companion open multimodal model, Nemotron 3 Nano Omni, integrates vision and audio encoders to reduce hand‑off latency and claims up to 9× higher throughput than other open “omni” models at similar interactivity. Open weights and NIM packaging broaden deployment choices from local to cloud. 7 15
Apple reportedly tests model choice in iOS 27
Apple is said to be adding an “Extensions” setting that lets users pick which third‑party AI model powers features like Siri and Writing Tools on iOS 27 (and reportedly on iPadOS/macOS 27). Early tests include integrations from Google and Anthropic, signaling a user‑centric, multi‑model experience inside native apps. That could let people match specific tasks to model strengths without app‑switching. 16 17
Trend Analysis
Agent software hardened this week: platforms and guardrails shipped alongside verifiable retrieval. Google’s Gemini Enterprise Agent Platform formalizes build‑to‑governance workflows; Microsoft’s Agent Governance Toolkit wraps policy, identity, and sandboxes; Google’s File Search adds page‑level citations, moving RAG (Retrieval‑Augmented Generation) from “answers” to audit‑ready evidence. For teams, that’s a cleaner path from pilot to production with traceable outputs. 11 14 12
Infrastructure also tilted toward agent bottlenecks and long horizons. Nvidia’s Vera CPU targets the CPU‑bound steps that stall agents between LLM calls; OpenAI’s 2 GW AWS deal and Bedrock‑based stateful runtime point to persistent state and cloud alignment; Nvidia’s Nemotron 3 Super extends context to 1,000,000 tokens so agents can keep full task state without re‑prompt churn. The through‑line is shorter waits and fewer fragile hand‑offs. 8 3 7
Capital and capacity concentrated further around enterprise deployment. OpenAI’s $122B raise underscores a push to turn research into durable products; Anthropic’s access to 220,000 GPUs at SpaceX Colossus and broader multi‑GW commitments expand immediate headroom; pairing with AWS Bedrock suggests smoother procurement and governance for agent stacks in existing enterprise clouds. The signal: more AI will arrive inside systems companies already use. 1 5 3
User experience kept pace: ChatGPT moved to GPT‑5.5 Instant with editable “memory sources,” Google’s file‑grounded answers now cite exact pages, and Apple is reportedly testing per‑feature model choice. Together, these shifts imply assistants that are faster, more transparent, and easier to fit to task and tone inside everyday apps. 9 12 16
Watch Points
- “Stateful Runtime Environment” — If you see previews or docs, that’s OpenAI and AWS bringing persistent agent state into Bedrock, a likely on‑ramp for enterprise pilots. 3
- “iOS 27 Extensions” — Apple’s reported per‑feature model picker would normalize multi‑model use inside default apps; watch which providers get listed. 16
- “Dec 2, 2027 (EU high‑risk AI)” — A provisional EU deal delays some high‑risk obligations and adds an intimate‑deepfake ban from Dec 2; expect compliance playbooks to adjust. 18
Open Source Spotlight
- Agent Governance Toolkit — Policy gates, zero‑trust identity, sandboxing, and reliability patterns for autonomous agents; a practical baseline for governance pilots. microsoft/agent-governance-toolkit
- Solace Agent Mesh — Event‑driven orchestration for multi‑agent systems that must react to real‑world messages and coordinate across business apps; good for integration‑heavy teams. SolaceLabs/solace-agent-mesh
- Ollama — One‑stop local runner for open‑weight models with installers, Docker image, and official Python/JS clients; handy for quick local prototypes. ollama/ollama
- Pi (agent harness) — Unified LLM API, coding‑agent CLI, TUI/web UIs, Slack bot, and vLLM deployment scaffolding; useful to scaffold coding agents across surfaces. earendil-works/pi
- ColorBench — Vision‑language benchmark focused on color perception and robustness; helpful for testing where visual illusions trip models. tianyi-lab/ColorBench
What Can I Try?
- Require cited answers: upload a long PDF and relevant screenshots to Gemini File Search, then ask 5 work questions and verify page‑level citations and visuals. 12
- Add a guardrail: run Microsoft’s Agent Governance Toolkit quick start, create one policy gate, and sandbox a tool call in a test workflow. 14
- Re‑run a task in ChatGPT: use GPT‑5.5 Instant for a recent brief or spreadsheet and audit “memory sources,” correcting anything stale. 9
- Draft an AWS pilot: write a one‑pager on a “Stateful Runtime” use case (inputs, permissions, success metric) and ask your AWS owner about timing and access. 3
- Replace polling: set up Gemini API event‑driven webhooks for a long‑running job and verify signed deliveries with automatic retries. 19
Comments (0)