AI NewsResearch

6 min read 5/13/2026

Scaling lawsLLM trainingAI safetyReinforcement learningAgent memoryBrowser automation

InfoLaw maps data quality and repetition to predictable training loss in large language models

A new framework forecasts how much models learn from mixed, repeated datasets—reporting 0.15% mean and 0.96% max loss error up to 7B parameters and 425B tokens—so teams can pick data recipes with confidence.

Find in this article

Reading Mode

One-Line Summary

A data-aware scaling law shows how high-quality and repeated data translate into learning for Large Language Models (LLMs), while new work probes safety weak points and teaches agents to refine and remember.

Research Papers

InfoLaw proposes data-aware scaling laws for mixed, repeated datasets

InfoLaw is a training-planning tool: it predicts a model’s training loss from four inputs—consumed tokens, model size, data mixture weights, and repetition—so teams can choose data recipes instead of guessing. The authors show that heavily upweighting “high-quality” data can increase repetition and hurt performance in data-limited, overtrained regimes; InfoLaw corrects for this by treating pretraining as information accumulation. It reports a 0.15% mean and 0.96% max absolute error in loss when extrapolating to runs up to 7B parameters and 425B tokens. ¹

The key idea is information density: better data packs more learnable information per token, while repetition has scale-dependent diminishing returns. Because standard scaling laws often fail across mixture “recipes” or at high repetition, InfoLaw adds explicit terms for quality weighting and repeats, and it generalizes across overtraining levels. In practice, this means data curation choices can be optimized jointly with compute. ¹

InfoLaw’s outputs are loss predictions, not direct task scores, but accurate loss forecasts help teams estimate where additional tokens stop paying off and which mixtures are compute-efficient. For organizations juggling budgets and heterogeneous corpora, this offers a principled way to plan pretraining and avoid repetition-induced regressions. ¹

One neuron can bypass safety alignment in large models

This study finds that changing activity in a single neuron can flip safety behavior: suppressing a “refusal neuron” makes a model answer harmful requests, while amplifying a “concept neuron” induces harmful content from benign prompts. The result holds across seven models from two families spanning 1.7B to 70B parameters, and it requires no retraining or prompt engineering. ²

Mechanistically, the authors argue that safety alignment is not robustly distributed across weights but is mediated by identifiable neurons that gate behavior. By targeting one neuron in each of two systems—refusal gating versus harmful-knowledge encoding—they demonstrate both directions of failure across diverse harmful prompts. This highlights a defense-in-depth need beyond prompt-level safeguards. ²

A complementary perspective piece argues that “alignment is not obedience” and should be evaluated as co-evolution: once AI systems become part of how people think and choose, safety must account for how interactions shape users over time, not just one-off refusals. Together with the neuron result, it suggests technical controls and human-in-the-loop design must evolve in tandem. ³

DeepRefine uses reinforcement learning to clean agent-built knowledge

DeepRefine is a reasoning model that talks to an existing agent-compiled knowledge base, diagnoses defects like missing evidence or ambiguous links, and incrementally fixes them. It introduces a Gain-Beyond-Draft (GBD) reward and trains the full reasoning loop with reinforcement learning, so refinement policies improve without gold labels. The authors report consistent downstream gains over strong baselines on knowledge-intensive tasks. ⁴

The system maintains a multi-turn interaction history, performs abductive diagnosis to localize likely errors, and executes targeted edits—addressing incompleteness, incorrectness, and redundancy that otherwise compound with repeated use. By keeping the knowledge base up to date with how users actually query it, retrieval fidelity improves over time. ⁴

This fits a broader shift from storing raw logs to storing “experience.” A companion explainer contrasts data-centric memory such as Retrieval-Augmented Generation (RAG) with structured reflections that capture lessons the agent can reuse later—an “act → reflect → learn → improve” loop. DeepRefine’s policy-learned edits are a concrete step toward that persistent-memory mindset. ⁵

Open Source & Repos

agentmemory adds long-term memory to coding agents

agentmemory is a toolkit that lets coding agents remember across sessions so you don’t keep re-explaining context. It plugs into tools like Claude Code, Cursor, and Gemini command-line interface (CLI), and works with any Model Context Protocol (MCP) client. ⁶

The project emphasizes persistent memory “for AI coding agents based on real-world benchmarks,” with a recent v0.9.10 release on 2026-05-12 addressing deployment-shape fixes reported by users. For teams piloting in-editor agents, this provides a ready-made memory layer rather than building one from scratch. ⁶

Who it’s for: developers shipping agent assistants that need state across editing sessions and repos. Why it’s trending: less context-dropping means fewer redundant tokens and steadier tool behavior as projects grow. ⁶

Skyvern automates browser workflows with AI

Skyvern is an open-source project that uses Large Language Models (LLMs) and computer vision to automate browser-based tasks. The repository shows v1.0.36 released on 2026-05-10 with fixes including quickstart install-path selection, a request-policy contract fix, and treating an empty override_llm_key as no override. ⁷

The project provides a website, docs, and community Discord, positioning it as a general-purpose web automation stack—script flows, click through forms, and integrate model decisions. It targets cases where brittle selectors or one-off scripts struggle to keep up with changing UIs. ⁷

Teams evaluating web agents can start with the quickstart and progressively layer features. As with any automation touching accounts or transactions, pilot on low-stakes flows and review logs. ⁷

Community Pulse

Hacker News (422↑) — Enthusiasm about Skyvern’s browser automation is tempered by concerns over AGPL3 licensing and missing capabilities like request interception and desktop/Citrix support. ⁸

"Exciting stuff, my employer would be interested but it's AGPL3 licensed so it's a non-starter for them." — Hacker News ⁸

Why It Matters

Data-aware training plans are becoming as important as model architecture. InfoLaw’s ability to forecast loss across data quality and repetition gives teams a way to budget tokens, avoid overtraining traps, and make more of the data they already have. ¹

On the deployment side, open projects like Skyvern and agentmemory show how agents are growing hands (browser actions) and memory (persistent context), even as community debates highlight practical tradeoffs such as licensing and missing features. ⁸

This Week, Try It

Skyvern quickstart: clone the repo and follow the README to automate a simple form-fill flow. https://github.com/Skyvern-AI/skyvern
agentmemory: add persistent memory to your Cursor or Claude Code workflow and see if re-prompts drop. https://github.com/rohitg00/agentmemory

Sources 8

[1] Arxiv InfoLaw: Information Scaling Laws for Large Language Models with Quality-Weighted Mixture Data and Repetition [2] Arxiv A Single Neuron Is Sufficient to Bypass Safety Alignment in Large Language Models [3] Medium Alignment Is Not Obedience. It Is Co-Evolution. [4] Arxiv DeepRefine: Agent-Compiled Knowledge Refinement via Reinforcement Learning [5] Medium ReasoningBank: Building AI Agents that Actually Learn from Experience [6] Github rohitg00/agentmemory: Persistent memory for AI coding agents [7] Github Skyvern-AI/skyvern: Automate browser based workflows with AI [8] Ycombinator Hacker News discussion: Skyvern-AI/skyvern

Helpful?

0to1log Weekly

Latest AI News