Microsoft to unveil its own coding AI to boost Copilot
Reuters says Microsoft is preparing a homegrown coding model and other specialized AI for Build, as Asana buys StackAI and Groq lines up $650M for inference.
Reuters says Microsoft is preparing a homegrown coding model and other specialized AI for Build, as Asana buys StackAI and Groq lines up $650M for inference.
Researchers show a recurring photo-perspective shortcut across model families and release SpatialTunnel to separate true 3D reasoning from image-position cues.
Agents jumped from chat to action: Google made Gemini a built‑in helper, Nvidia shipped a CPU for agent orchestration, OpenAI opened a money view in ChatGPT, and a $5B TPU venture took shape — all pointing to faster, cheaper assistants in your daily tools.
Researchers show a recurring photo-perspective shortcut across model families and release SpatialTunnel to separate true 3D reasoning from image-position cues.
The funding pushes Anthropic past OpenAI in value, while the new model adds “effort control” and faster, cheaper responses; OpenAI, meanwhile, outlines election safeguards.
The upgrade focuses on practical control: a faster-and-cheaper fast mode, effort controls for cost/quality trade-offs, and parallel subagents for big code tasks — with testers reporting more ‘honest’ outputs.
The Devin maker cites fast enterprise uptake and a $492M run-rate as investors pile in, while YouTube begins auto-labeling photorealistic AI videos to boost transparency.
A single “native multimodal” embedding reports strong retrieval scores across major image, video, and text benchmarks, pointing to simpler pipelines for search, recommendations, and retrieval-augmented generation.
OpenAI’s CEO revises his early fears about job losses and stresses the ‘human part’ of work, while investors pour $113M into OpenRouter and momentum stocks ride the AI trade.
MotiMotion introduces a “reason-then-generate” approach to motion control and a new benchmark. Three agent-training papers target reliability from rewards to terminal feedback, and LocalAI ships a no‑GPU engine under MIT License.
Three papers propose attractor-based reasoning, a Shannon scaling law, and staged vision training—pointing to better accuracy by tuning compute and reducing noise. Here’s what it means for budgets, prompts, and vendor evaluations.
New research frames inference as converging to learned ‘attractors,’ treats model training as a noisy channel with capacity limits, shows vision-language models learn more by separating seeing from thinking, and turns language-driven virtual photography into an executable 3D agent task.
Google is remaking Search around conversational agents and says Gemini 3.5 Flash powers AI Mode, while 10‑second AI video creation appears in its apps — as pricing and security pressures reshape how teams adopt AI.
A convex-optimization tokenizer replaces greedy rules with a global objective, improving bits-per-byte for language models and certifying how close the vocabulary is to optimal. Plus: live music diffusion on consumer laptops, AI’s forecasting limits, promptable 3D animals, and an incremental engine for always‑fresh agent context.
Agents jumped from chat to action: Google made Gemini a built‑in helper, Nvidia shipped a CPU for agent orchestration, OpenAI opened a money view in ChatGPT, and a $5B TPU venture took shape — all pointing to faster, cheaper assistants in your daily tools.
The Financial Times frames planned listings by SpaceX, OpenAI, and Anthropic as a reality check on AI demand, while Corsair introduces Grace Blackwell–based workstations and servers for private AI deployments.
LoREnc suppresses recoverable low‑rank signals so stolen weights or unauthorized adapters fail, while authorized adapters restore full quality with under 1% overhead. Also in focus: self‑regulated planning that saves tokens, safer shared caches for multi‑agent systems, and a study showing chatbots’ reliance on retrieval.
AutoRubric-T2I teaches a vision‑language judge to grade images with learned checklists, outperforming prior reward models while using under 0.01% of human preference data. New papers also push execution‑grounded coding agents and steadier long‑context attention.
Exa raises $250M at a $2.5B valuation, Parallel adds $100M — and Alibaba counters with a 3x-faster AI chip.
The new diffusion-based system adds variable-length generation and targeted inpainting, trained on licensed and Creative Commons data, and runs on consumer hardware. The team reports under-2-second outputs on an H200 and a few seconds on a MacBook Pro M4.
The AMD-aligned lab trains and serves open-weight models on AMD hardware. OpenAI, meanwhile, courts YC startups with $2M in tokens as Figma and IrisGo bring agents into everyday work.
A new paper argues the popular Rotary Positional Embedding loses its locality and token-order cues as context grows, while three studies push practical gains in efficient diffusion-MoE inference, VLM training, and clinical agents.
At I/O 2026, Google debuts Gemini 3.5 Flash and the new Omni family, says it now processes 3.2 quadrillion tokens a month, and pairs the push with a $100 AI Ultra plan and a $5B TPU JV with Blackstone.
Google’s new model emphasizes doing multi‑step work, not just chatting — it becomes the default in the Gemini app and powers a 24/7 agent, while open‑source tools focus on cleaner inputs and deployment patterns.
The first Vera systems leave Nvidia’s labs with 88 custom cores to handle agent workloads, while Apple lines up AI writing tools for iOS 27 and marketers chase data collaboration with Publicis–LiveRamp.
A new study shows passive JavaScript tracking can fingerprint browsing agents by their on-page actions. Alongside, a geospatial audit urges shared tests and weights, and LangChain ships a testing update.
Databricks reports a 46% error reduction on its OfficeQA Pro agent benchmark with GPT‑5.5, and ChatGPT opens a U.S. Pro preview that links to 12,000+ institutions. The rollout into real workflows lands as TechCrunch spotlights jurors debating trust in OpenAI’s leadership.
A few-shot reinforcement approach matches full-data baselines with just 128 examples, while a cleaned omni-modal benchmark clarifies real gains — and a macOS app packages local AI agents for everyday use.
AI left the chat window. OpenAI stood up a $4B deployment unit and a security program, Microsoft’s agentic system found 16 Windows flaws, Meta added encrypted no‑log chats, and Claude moved into SMB tools — with one concrete action you can try now.
Nectar Social raises $30M to scale autonomous marketing agents, while Colorado rewrites its AI law to focus on how tools influence decisions—reshaping what legal and HR must document.
A new image-generation paper reports consistent ImageNet-256 gains by keeping training steps on spherical paths — no architecture changes. Two more studies push single-image 3D from satellites, stress-test long video consistency, and lift Gemini 3.1 Pro’s coding Elo by 405 with a pairwise “tournament.”
The preview lets developers review outputs and approve changes from iOS and Android after linking to a Mac running Codex. OpenAI also spotlights enterprise uptake, with Sea reporting 87% weekly active usage among Codex users.
Lighthouse Attention compresses sequences around standard attention during pretraining, then removes itself after a short recovery phase. New papers also stress-test table understanding, speed up Mixture-of-Experts routing, and replay real news to grade adaptive agents, while a Kubernetes inference stack ships a breaking upgrade.
The new bundle ships with 15 ready-made workflows and deep connectors, with approvals built in. Training and a U.S. roadshow aim to help owners move beyond simple chat use.
FlowCompile builds a reusable set of accuracy–latency plans for structured agent pipelines before they run, reporting up to 6.4x speedups. Companion papers focus on shorter reasoning, communication-light MoE inference, and a full-stack voice agent benchmark.
New “instant AI workforce” Hirebase arrives in closed beta to run agents across Google Docs, Slack, and Notion — alongside Meta’s encrypted AI chat, Microsoft’s agentic security system, and Alphabet’s $2.1B bet on AI drug design.
DeepMind demos a pointer that understands what you select and why, turning pixels into actions like “compare these” or “get directions.” Microsoft details an agent system that uncovered 16 Windows vulnerabilities, while new repos sharpen agent workflows for everyday builders.
Daybreak combines GPT-5.5-Cyber and Codex Security to help organizations find and validate software flaws before attackers do. Google also outlines Gemini-powered automations coming to Android.
A new framework forecasts how much models learn from mixed, repeated datasets—reporting 0.15% mean and 0.96% max loss error up to 7B parameters and 425B tokens—so teams can pick data recipes with confidence.
The new, majority-controlled unit starts with Tomoro’s 150 engineers and 19 investment partners—arriving as Google flags AI-driven hacking and a U.S. group pushes safety screens for federal AI deals.
ROPD turns teacher responses into prompt-specific checklists to score student rollouts, beating logit-based on-policy distillation in most tests. New work on model selection, agent skills, and test-time scaling also targets lower-cost, safer AI deployment.
Nous Research’s Hermes Agent ships the “Tenacity” release and now leads OpenRouter by daily tokens. Lemonade’s latest update adds vLLM ROCm to speed up on-device large language models.
ActCam lets creators steer both motion and camera in generated footage. Meanwhile, a shared expert pool makes mixture-of-experts (MoE) models more efficient, and Hermes Agent climbs to #1 by usage.
Agent platforms and guardrails shipped, ChatGPT got faster by default, Nvidia targeted agent bottlenecks with a new CPU, and OpenAI locked in massive funding and AWS capacity — a week about taking agents from pilot to production.
CNBC reports Nvidia’s 2026 equity commitments top $40B, including $30B to OpenAI and up to $3.2B in Corning and $2.1B in IREN. Here’s what it means for capacity, pricing, and your roadmap.
A 1B‑active, 14B‑total expert model trained on 1 trillion tokens keeps near full‑model quality when loading just 25% (≈1% drop) or 12.5% (≈3% drop) of experts — a concrete path to lower memory use without giving up capability.
A fast-rising Chinese model maker is courting fresh capital while Brussels gives companies more time to implement high‑risk AI safeguards. Under courtroom scrutiny, OpenAI is emphasizing ‘trusted’ cyber access and controlled coding agents.
One‑line installers and a Docker image streamline local runs for Kimi‑K2.5, GLM‑5, MiniMax, DeepSeek, Qwen, and Gemma. New papers chart where AI‑written GPU kernels fail, organize audio‑plus‑vision learning, introduce a biomedical tool‑calling dataset, and prescribe training when good data is scarce.
The open‑weight model claims up to 5x higher throughput and a 1M‑token context window to keep multi‑agent workflows on track. Nvidia also adds a unified multimodal model as investors pour $2B into China’s Moonshot AI.
KinDER bundles 25 physics-grounded robot environments and a Gymnasium library to stress-test planning, while new benchmarks flag creativity and app-builder weaknesses — and a one-token confidence trick offers a cheaper hallucination filter.
OpenAI makes a faster, less error-prone model the ChatGPT default and adds visible memory sources. Meanwhile Apple tests multi-model choices for iOS 27, Anthropic secures SpaceX compute, and SAP buys a tabular-AI lab—signals that AI is moving from demos to deployment.
Two surveys codify how to design and govern the data flows behind RL-tuned reasoning models and evolving agent skills, while Google ships multi‑token prediction to speed Gemma 4 and developer webhooks for long jobs.
Google’s File Search now works across images and text and can cite the exact page it pulled from—pushing RAG toward audit-ready answers—while Microsoft ships an open-source toolkit to govern what agents can do.
HiL-Bench plants hidden blockers in coding and SQL tasks to test whether agents ask clarifying questions instead of guessing. Its Ask-F1 metric focuses on judgment, and early reinforcement learning results show this skill is trainable.
Appfigures says image model releases generate 6.5× more downloads than standard model updates — but only ChatGPT turned one surge into $70M in 28 days. At the same time, Anthropic and OpenAI are forming private‑equity joint ventures to push AI into mid‑market firms, while Morgan Stanley flags AI-driven flows into Hong Kong tech.
Odysseus trains a multimodal agent to make 100+ decisions in Super Mario Land and goes at least 3× farther than prior agents. Meanwhile, open models scale on tough exams and fresh benchmarks stress-test video lectures and visual honesty.
A massive raise gives OpenAI long-term compute across multiple clouds and chips. Paired with a new AWS tie-up and Nvidia’s agent-focused CPU, it signals how workplace AI will actually run.
A new paper proposes an event-driven cascade for computer-use agents: run a small policy by default and call a stronger model only when monitors flag stalls or semantic drift. Live workflow benchmarks and fresh visual datasets show why targeted compute and better evaluation matter.
Agentic AI went practical: GPT‑5.5 targets multi‑step work, OpenAI opened multi‑cloud and government channels, the Pentagon scaled Gemini to millions, and Nvidia unveiled a CPU for agent loops. Here’s what changed — and one hands‑on experiment to run.
The Defense Department inks agreements with Google, Nvidia, OpenAI and others to run AI tools on classified systems. Also today: Nvidia ships a multimodal model for faster agents, Meta buys a humanoid AI startup, and IBM rolls out an enterprise SDLC assistant.
Nvidia’s Nemotron 3 Nano Omni folds audio, vision, and text into one lightweight system. Also in focus: faster red‑teaming for long‑context attacks, evidence that fine‑tuning can shift safety, a consumer‑GPU training boost, and a self‑hosted personal agent.
Cohere moves to buy Germany’s Aleph Alpha at a $20B valuation as Cisco ships an open-source provenance tool and investors pour fresh capital into legal and clinical AI. For teams, this means more vendor choice, tighter compliance, and clearer paths to production.
Nemotron 3 Nano Omni unifies audio, vision, and language in a 30B‑A3B system with open weights. New papers highlight safety drift after fine‑tuning and cheaper, faster red‑teaming and training on consumer GPUs.
Microsoft, Alphabet, Amazon and Meta report as investors look for proof that massive AI capex is translating into cloud growth and profits—while ad platforms and customer support get fresh AI upgrades.
RADIO‑ViPE links natural‑language queries to 3D regions using only monocular video, while new research tightens multi‑turn agent reliability and compresses diffusion LLMs without losing quality.
U.S. agencies get a compliant path to GPT‑5.5 and AWS customers gain Bedrock access as lawmakers hear about cyber‑capable models — while China blocks Meta’s Manus deal and Citi lifts AI’s 2030 market to $4.2T.
GPT-5.5 matches GPT-5.4’s latency while posting higher scores on coding and computer-use benchmarks, and it’s rolling out to Plus, Pro, Business, and Enterprise in ChatGPT and Codex. API access is delayed pending additional safety work, as Nvidia touts a CPU built for agentic AI and investors back agent infrastructure and clinical AI.
Researchers tie fine-tuning–induced hallucinations to interference in a model’s existing knowledge and propose a self‑distillation recipe to steady outputs. Meanwhile, HyLo extends context up to 32× and Nvidia’s Nemotron 3 Nano Omni claims 9× higher multimodal throughput.
A new paper automates the prompts, tools, and evaluation loops that make agents work, while fresh RL techniques and a rigorous literature‑search benchmark expose what today’s systems still miss.
The reworked pact keeps Microsoft’s license through 2032 and removes AGI triggers, while OpenAI can take its models to AWS and Google Cloud. Enterprises get real choice as AI platform competition shifts to cloud marketplaces.
A process-aware reward model lifts data-analysis agents by 7.21% and 11.28% and delivers 78.73%/64.84% with reinforcement learning, while SketchVLM makes reasoning visible and promptfoo packages evals for teams.
Schwarz Group will invest €500 million as Cohere absorbs Germany’s Aleph Alpha to pitch a transatlantic alternative for regulated sectors. Also inside: Google Cloud’s agent platform in travel and AWS’s three-API agent harness.
Researchers introduce 3D-VCD, an inference-time check that contrasts a scene with a deliberately distorted version to suppress ungrounded tokens. Alongside, papers push adaptive diffusion training, RL that builds full websites, and a terminal-native coding agent you can run locally.
Big agent week: GPT‑5.5 tackles multi‑step work, DeepSeek slashes long‑context costs, and Google locks up billions in compute for Anthropic — with Gmail and Adobe bringing assistants into everyday workflows.
Cash meets compute: Google puts $10B in now with up to $30B more for Anthropic, as Amazon and OpenAI intensify the race. Plus, Cohere–Aleph Alpha and ComfyUI’s $500M valuation show where enterprises and creators are placing their bets.
Long-horizon “agentic” work is moving from demos to production: Zhipu’s GLM‑5.1 releases open weights with 8‑hour autonomous runs, while Moonshot’s Kimi K2.6 goes GA with 300‑agent swarms.
China’s DeepSeek ships two MoE model previews that it says approach frontier performance, while Cohere moves on a sovereign AI merger and Nvidia backs core data infrastructure.
DeepSeek V4 combines a 1.6T-parameter MoE with million‑token prompts and cut‑rate API pricing. New papers show how to make MoEs cheaper to serve, reshoot videos in 4D, and rein in vision‑language hallucinations.
Available to Plus, Pro, Business and Enterprise users, GPT-5.5 shows gains in coding, computer use, and knowledge-work benchmarks — and nudges OpenAI toward its “superapp” vision.
GPT-5.5 hits ChatGPT with GPT‑5.4‑level latency and stronger coding, browsing, and analysis skills. At the same time, Google’s Gemma 4 and Alibaba’s Qwen3.6‑27B push efficient open models, while new MoE research trims training compute.
Capital, channels, and speed converge: OpenAI is negotiating a PE-backed venture, deepening ties with Infosys, and cutting agent latency — while vertical tools in sales and support draw fresh funding and deals.
A new "micro model + cloud" handoff starts a reply locally and lets a larger model finish mid‑sentence, masking network lag. Alongside, fresh papers refine LoRA layer picking, stress‑test agent judges, and unify robot training from language to action.
ChatGPT Images 2.0 can now consult the web and create up to eight consistent visuals from one prompt. Meanwhile, policy heat and hiring wars remind teams to balance new power with compliance and capacity.
River-LLM uses a KV-sharing trick so decoder-only models can skip layers mid-generation without losing context, claiming real wall‑clock gains. Also in focus: a dataset cataloging 3,632 reward hacks in terminal agents and a healthcare model trained on 25B records across 7.2M patients.
Bloomberg reports Google is preparing new chips for inference after striking deals with Meta and Anthropic. At the same time, Adobe and Siemens push agentic AI into enterprise workflows, hinting at faster, cheaper automation ahead.
Analyzing 935 ablation experiments, researchers report a heavy‑tailed distribution of fitness effects in AI architecture tweaks—68% harmful, 19% neutral, 13% helpful—and logistic bursts of new ideas. The same issue also brings a new robotics benchmark and a practical fix for diffusion models’ sampling bias.
The wafer-scale chip maker reports $510M in 2025 revenue and moves ahead with a mid-May IPO plan, highlighting hyperscaler demand and pressure on Nvidia’s dominance.
RadAgent turns chest CT reading into a transparent, tool-using workflow and posts big gains in accuracy and robustness. Meanwhile, new agent papers and repos focus on navigable knowledge, coherent web UIs, and the 'harness' around models.
AWS becomes OpenAI’s go‑to distributor, Meta books 1+ gigawatt of custom AI chips, Anthropic upgrades Claude for tougher coding, and Chrome’s AI Mode goes split‑screen. The net: agents inch closer to doing real work, not just chat.
Claude Design builds a brand-aligned system from your files, then generates decks and app UIs you can hand off to Claude Code. Google, meanwhile, folds AI Mode deeper into shopping and travel tasks.
The new flagship boosts spreadsheet/presentation work, ships native computer-use for agents, and posts big gains on OSWorld and BrowseComp. Google counters with Gemma 4 under Apache 2.0 and a robotics model that reads analog gauges.
Chrome’s AI Mode now calls nearby stores to check stock and opens sites side-by-side with AI, while Adobe and OpenAI roll out assistants that complete multi-step creative and desktop tasks.
Researchers propose a three-phase residual stream that cuts perplexity by 7.2% at 123M parameters with just 1,536 extra weights and nearly 2x faster convergence. Alongside, new papers push RL fine-tuning and visual reasoning, while system optimizers squeeze 2–5x speed from kernels and compilers.
Factory, which builds autonomous coding agents, is in talks to raise a new round led by Khosla Ventures, with Keith Rabois set to join the board — squaring up against Anthropic, OpenAI, and Cursor.
Opus 4.7 is built to take on complex, hours-long tasks with fewer handholds, leading key coding benchmarks while adding stricter cybersecurity safeguards and higher‑resolution vision — all at the same price.
The social giant commits over one gigawatt of in-house MTIA accelerators and taps Broadcom’s design, packaging, and networking—while Broadcom CEO Hock Tan exits Meta’s board to advise on chips.
A new study shows desktop/web agents can cause serious harm even when users give innocuous instructions, while fresh training and architecture work races to make models faster and safer. We also track a major open-source agent release that brings enterprise-grade features to mobile and the browser.
A week after Anthropic’s Mythos preview, OpenAI unveils a more-permissive cyber variant to a restricted cohort and scales its Trusted Access program. At the same time, EU authorities say they are sidelined from testing Mythos, underscoring who controls frontier cyber AI.
A 120B-parameter hybrid Mamba-Transformer activates just 12B per token, serves 1M context, and claims up to 7.5x higher throughput than rivals — with weights and datasets on Hugging Face. It lands amid a broader MoE wave spanning text-to-image and open LLMs.
An internal memo from OpenAI’s new revenue chief touts Amazon as its enterprise channel and criticizes Microsoft’s constraints — while taking aim at Anthropic’s reported run-rate.
A first-of-its-kind survey maps how models get stuck attending to the wrong tokens — and what to do about it. Meanwhile, researchers ship a traceable agent debugger, a declarative agent workflow language, and a tougher quantum-code benchmark.
OpenAI plugs a pricing gap with a $100 ChatGPT Pro plan while Microsoft debuts in-house speech, voice, and image models. Japan’s SoftBank rallies industry for homegrown ‘physical AI,’ and U.S. regulators push ad giants on boycott conduct.
LG AI Research releases EXAONE 4.5 with native vision-language training and a 256K context window tuned for document-heavy use, while NVIDIA's Nemotron 3 Super targets agent workloads with a hybrid Mamba-Transformer MoE. Two vision papers push open-world 3D detection and parameter-efficient generation.
A busy week: Meta’s new model pushes its app into the Top 5, Anthropic limits access to a powerful bug‑finding AI, Microsoft ships three in‑house models, and Alibaba claims the top video generator — pointing to AI that’s more embedded, more gated, and more useful.
Anthropic withholds its new model amid cybersecurity risks while Oracle and Amazon channel billions into AI data centers and chips. Meanwhile, practical tools like Lucid’s Claude Connector and Upstage Studio bring agentic AI closer to daily workflows.
A new theory paper sets a floor on how few steps diffusion samplers can take, while fresh research tackles the open-loop vs. closed-loop gap in autonomous driving and makes coding agents harder to break. If you care about speed, today is about knowing the limits—and building around them.
A restricted-release security model, a Chinese video generator topping charts, and a record capital raise show how power is shifting in AI—who gets access, who sets the pace, and who pays for it.
A once-anonymous video generator, HappyHorse-1.0, is confirmed as Alibaba’s work after it raced to the top of global leaderboards. At the same time, a new paper and tools rethink how AI agents remember and manage state.
Meta’s first model under its Superintelligence Labs is here—and it’s already lifting downloads. At the same time, OpenAI and Anthropic shift to gated releases for their most cyber-capable models.
A mega-raise at OpenAI, Google’s open Gemma 4, Microsoft’s budget-friendly media models, and an AWS partnership point to AI that’s cheaper to run, easier to self-host, and closer to day‑to‑day work.
Money, policy, and engineering all moved this week: OpenAI’s $10B raise and a U.S. AI framework set the stage, Google’s KV‑cache compression points to cheaper inference, and an Anthropic leak spotlights cybersecurity stakes—plus a real-time, on‑device TTS to try.