Anthropic debuts hosted Claude agents, aiming to make AI actually ship
Anthropic’s new Managed Agents promise long-running, tool-using AI you don’t have to babysit. Here’s what changed, why enterprises care, and how you can try it this week.
One-Line Summary
Anthropic opens Claude Managed Agents to help teams deploy hosted AI agents.
Big Tech
OpenAI acquires Promptfoo to secure its AI agents
OpenAI buys Promptfoo, a 2024-founded AI security startup whose tools test large models against attacks and misbehavior. OpenAI says Promptfoo’s tech will plug into OpenAI Frontier to automate red teaming, evaluate agent workflows for security issues, and monitor activities for risk and compliance. Promptfoo reports usage at over 25% of Fortune 500 companies, and OpenAI plans to continue building its open source offering. 1
The deal speaks to a practical gap: as agent features roll into business workflows, buyers want evidence that automated actions won’t leak data or be manipulated. OpenAI’s move mirrors a broader industry shift from model demos to governed, monitorable automation that enterprises can approve. No price is disclosed, but the integration target is OpenAI’s enterprise agent platform. 1
For teams choosing tools, this reinforces that “agent safety” is becoming table stakes: expect more built-in guardrails, audit logs, and policy controls across major platforms this year. If you’re evaluating vendor roadmaps, look for automated red teaming and trace-based evaluations, not just better prompts. 1
Industry & Biz
Cohere–Aleph Alpha merger talks: Europe–Canada tie-up on the table
Reuters reports Canada’s Cohere and Germany’s Aleph Alpha are in merger talks, with Berlin supportive and willing to be a key customer of a combined company to advance digital public services. Discussions reportedly start early this year and reach an advanced stage, with a plan to headquarter in both countries. Aleph Alpha declines detailed comment, calling strategic partnership talks standard; Cohere cites ongoing evaluation of strategic opportunities across Europe. 2
If completed, the tie-up could create a transatlantic alternative to US big tech AI stacks, with potential government anchor demand in Germany. For buyers, that could mean another “sovereign-friendly” option for regulated sectors and public services that need regional assurances. The timeline and structure remain unconfirmed. 3
Context: infrastructure alignment is accelerating. Anthropic, for example, signs a multi-year agreement to run Claude workloads on CoreWeave data centers, part of a broader trend of model providers locking in specialized AI cloud capacity. For customers, these partnerships can impact availability, latency, and regional deployment options. 4
U.S. National AI Policy Framework: preemption push and child safety focus
The White House releases a National Policy Framework for AI on March 20, 2026, calling for a single federal standard that preempts certain state AI laws while preserving state authority over child protection, consumer protection, zoning for AI infrastructure, and states’ own AI procurement. It emphasizes regulatory sandboxes, using existing agencies (no new AI regulator), workforce training, and a hands-off stance on copyright training data pending court outcomes. 5
Practically, companies should expect continued fragmentation in the short term: the Framework signals direction but needs Congress to act. Meanwhile, an earlier Executive Order sets up a federal AI Litigation Task Force to challenge state laws seen as obstructive and explores conditioning some federal funds on states’ AI policy posture. Translate this as rising scrutiny on disclosure mandates that conflict with federal positions. 6
Legal analysts note clear priorities: protect minors (age assurance, parental tools), avoid a new federal AI regulator, and support innovation via sandboxes and data access. For teams operating across states, keep consumer protection compliance front and center—those authorities are explicitly not preempted in the Framework proposals. 7
New Tools
Anthropic: Claude Managed Agents 공개 베타
Anthropic launches Claude Managed Agents as a public beta on the Claude Platform: a suite of APIs for building and deploying cloud-hosted agents where Anthropic operates the infrastructure. It provides sandboxed code execution, state checkpointing, credential management, auth, and end-to-end tracing, plus long-running sessions that can continue for hours and recover from disconnects. Early users include Notion, Rakuten, Asana, Vibecode, and Sentry. Pricing adds $0.08 per active session-hour on top of standard Claude token rates. 8
Beyond basic tool use, Anthropic previews multi-agent coordination (agents launching sub-agents) and a self-evaluation loop that refines prompts toward success criteria. In internal tests on structured file generation, Managed Agents improve task success by up to 10 percentage points vs. a standard prompting loop, with larger gains on harder tasks. Developers can mix classic prompt–response with agent orchestration as needed. 8
Why it matters: moving from prototypes to production usually stalls on plumbing—execution sandboxes, state, credentials, observability, and permissions. Managed Agents centralize that, shrinking the time and team size required to stand up practical automations like “read a repo and open a pull request” or “extract structured data across documents,” which early adopters are already shipping. 9
For non-engineering teams, the takeaway is simpler: you can delegate multi-step work to Claude with clearer guardrails and trace what happened in the console. Notion is testing Custom Agents inside workspaces, while Rakuten connects agents to Slack and Teams for spreadsheets, slides, and apps—hinting at familiar places you can pilot this without rebuilding your stack. 8
Databricks acquires Quotient AI to strengthen agent evaluations
Databricks acquires Quotient AI, a team known for quality improvements on GitHub Copilot, to bolster continuous evaluation and reinforcement learning for AI agents. Quotient analyzes full agent traces to detect issues like hallucinations, reasoning failures, and incorrect tool use, clustering them into datasets and reward signals to monitor and fine-tune agents over time. 10
Databricks plans to embed Quotient across Genie (chat with your data), Genie Code (autonomous agent for data/ML workflows), and Agent Bricks (to build and scale agents on your data). The pitch: not just running agents in production, but improving them with every interaction via trace-driven signals. 10
For teams, this underscores a new baseline: if you deploy agents, you’ll need evaluations that reflect real-world traces, not just static benchmarks. Expect platform roadmaps (Databricks and others) to lean into trace capture, analytics, and feedback loops as core “quality ops” for agents. 10
Community Pulse
Hacker News (41↑) — Skepticism about Anthropic’s agent push centers on risk and overpromising.
"As usual it's a matter of degree. Opus is also not the worst at hacking things either. Sometimes it hacks things 'by accident' you see. If Mythos is better at it, then at some point, yeah, I can see how that might start to become a problem. Especially running unsupervised." — Hacker News 8
What This Means for You
-
From “chatbot” to “teammate”: Hosted agents make it easier to hand off multi-step work (e.g., drafting slides from a brief, structuring spreadsheet outputs, kicking off a code change) without your team building execution sandboxes, state, or permissions from scratch. Think of it like hiring a temp worker that comes with time-tracking and activity logs built in. 8
-
Governance is not optional: OpenAI’s Promptfoo deal signals that buyers now expect automated red teaming, traceability, and policy controls baked into agent platforms. If your boss asks “how do we know it didn’t touch PII?”, you’ll want vendor-native tracing and permission scopes ready to show. 1
-
Strategy hedge: With Cohere–Aleph Alpha talks and model–infrastructure pairings like Anthropic–CoreWeave, the landscape is consolidating. For you, that means better-integrated stacks—but also more lock-in. Pilot in tools that speak your existing workplace surfaces (Slack, Teams, Notion) so you can switch backends later if needed. 2
-
Policy watch: The U.S. Framework aims to preempt some state rules while preserving consumer protection and child-safety enforcement. If you operate across states, keep your disclosures, age assurance, and content safeguards current; federal preemption may narrow scope later, but state AGs remain active now. 5
Action Items
- Spin up a Claude Managed Agent pilot: In the Claude Console, define a simple multi-step task (e.g., summarize a PDF set to a spreadsheet) and review the built-in session tracing to see if it fits your team’s workflow.
- Test secure tool scopes: Create a low-risk credential (read-only drive or repo) and attach it to a Managed Agent task to practice setting permissions and reviewing access logs.
- Kick the tires on agent evaluations: Read Databricks’ Quotient AI post and map which trace signals (hallucination, tool errors) you’d want if your agent shipped—use it to draft your internal “go/no-go” checklist.
- Add an AI security feed to your stack: Bookmark a legal brief on the U.S. AI Framework and set up an AI security tracker like AI Sec Watch to stay ahead of policy and vulnerability updates.
Comments (0)