Vol.01 · No.10 Daily Dispatch May 11, 2026

Latest AI News

AI · PapersDaily CurationOpen Access
AI NewsResearch
6 min read

AI video gets director-level camera control without training

ActCam lets creators steer both motion and camera in generated footage. Meanwhile, a shared expert pool makes mixture-of-experts (MoE) models more efficient, and Hermes Agent climbs to #1 by usage.

Reading Mode

One-Line Summary

Control and efficiency take center stage: zero-shot video gets precise camera steering, a shared expert pool trims model waste, transformers close in on step-by-step reasoning, and an open-source agent surges in real-world use.

Research Papers

ActCam lets you steer both camera and actor in generated video

ActCam is a method that lets you generate a new video where the character’s motion comes from a “driving” clip and the camera follows a path you set — without retraining the model. It builds on pretrained image-to-video diffusion models that accept conditioning from scene depth and character pose, giving creators cinematography-like control rather than prompt trial-and-error. 1

The method produces pose and depth conditions that stay geometrically consistent across frames, then runs a single sampling process with a two-phase schedule: early steps condition on pose plus sparse depth to lock down scene structure, later steps drop depth and keep pose-only guidance to refine high-frequency details without over-constraining the output. It also enables per-frame control of intrinsic (e.g., focal length) and extrinsic (position/orientation) camera parameters. 1

Across benchmarks with diverse motions and challenging viewpoint changes, ActCam improves camera adherence and motion fidelity over pose-only control and other pose+camera methods, and is preferred in human evaluations, especially under large viewpoint shifts — pointing to more reliable, director-level control in practice. 1

UniPool decouples MoE experts from layers to cut waste

UniPool rethinks Mixture-of-Experts (MoE) transformers by replacing per-layer expert sets with a single shared pool accessed by independent routers in each layer. This treats expert capacity as a global budget, decoupling depth from linear expert growth and letting capacity flow to where it is most useful. 2

To keep training stable and balanced, UniPool adds a pool-level auxiliary loss that evens out expert utilization and adopts NormRouter for sparse, scale-stable routing. The authors also find redundancy in current allocations: replacing deeper layers’ learned top-k routers with uniform random routing reduces downstream accuracy by only 1.0–1.6 points in several production MoE models. 2

Across five LLaMA-architecture scales — 182M, 469M, 650M, 830M, and 978M parameters — trained on 30B tokens from The Pile, UniPool lowers validation loss by up to 0.0386 versus matched vanilla MoE baselines. Reduced-pool variants using only 41.6%–66.7% of the vanilla expert-parameter budget match or outperform layer-wise MoE, making pool size an explicit depth-scaling hyperparameter. 2

How transformers scale on implicit logical reasoning

This study tests whether transformers can do deductive reasoning internally — without writing out steps — on tasks built from Horn clauses, a basic kind of logical rule. The goal is to separate true reasoning from shortcuts and see how scale and training setup affect it. 3

By decorrelating provability from spurious cues and enforcing algorithmic alignment, sufficiently deep models with a bidirectional prefix mask approach the performance of explicit Chain-of-Thought (CoT) prompting across graph topologies and problem widths. 3

However, CoT remains necessary for depth extrapolation — when test problems require longer reasoning chains than seen during training. The result draws a line: scaling and better setup can close the gap for in-distribution logic, but step-by-step prompting still matters for harder generalization. 3

An AI workbench that collaborates like a mathematician

AI co-mathematician is an interactive workbench where agentic AI supports mathematicians through ideation, literature search, computation, proof attempts, and theory building — in an asynchronous, stateful workspace that mirrors human collaboration. 4

It manages uncertainty, clarifies intent, tracks failed hypotheses, and outputs native mathematical artifacts. In early tests, it helps researchers solve open problems, surface new directions, and find overlooked references. 4

On benchmarks, it reaches 48% on FrontierMath Tier 4, which the authors present as a new high among evaluated AI systems — suggesting strong problem-solving while keeping the workflow highly interactive. 4

Open Source & Repos

Hermes Agent surges to #1 on OpenRouter and ships v0.13.0

Hermes Agent is an open-source, self-improving agent from Nous Research that executes tasks, reflects on performance, and writes reusable skill files so it improves the longer you run it; the project uses the MIT license. 5

Marktechpost reports that as of May 10, 2026, Hermes leads OpenRouter’s daily app and agent rankings with 224 billion daily tokens versus OpenClaw’s 186 billion. The latest release, v0.13.0 on May 7, adds a Kanban multi-agent task board with heartbeat monitoring, a /goal command to stay locked on a target, Checkpoints v2 with state pruning, gateway auto-resume after restarts, and Google Chat support. 6

The analysis also outlines a low-friction migration path (hermes claw migrate) and frames a broader split in open-source agents: breadth of channels (OpenClaw) versus depth of learning (Hermes). Some teams run them together — orchestrating with one and executing repeatable loops with the other. 6

Why It Matters

Today’s papers and projects point to more controllable creative tools and more efficient scaling: ActCam moves video generators closer to camera-crew control, UniPool shows how to spend expert parameters where they count, transformer results calibrate when to ask for step-by-step reasoning, and Hermes’ uptake signals appetite for agents that learn from use. 1

This Week, Try

  1. Hermes Agent quickstart: Clone the repo and follow the README to run local tasks. https://github.com/NousResearch/hermes-agent
  2. ActCam paper walkthrough: Read the arXiv and compare the two-phase conditioning design to pose-only control. https://arxiv.org/abs/2605.06667v1

Sources 6

Helpful?

Comments (0)