Analogy-driven retrieval helps models reason, adding up to 7.1 points on AIME 2025
A new post-training recipe pairs an analogy-aware retriever with reinforcement fine-tuning. Meanwhile, HyperTool and EurekAgent show how coarser tool calls and environment design stabilize agents and can deliver new results for under $11.
One-Line Summary
AI agents advance by learning reusable reasoning patterns and by compressing tool use, while open-source agents bring these ideas to terminals and trading systems.
Research Papers
Analogy-aware retrieval lifts math reasoning with RA-RFT
This work teaches large language models (LLMs) to look up solved examples that use the same underlying “trick” instead of merely similar words, then fine-tunes the model to follow those traces. The method, called Retrieval-Augmented Reinforcement Fine-Tuning (RA-RFT), reframes retrieval-augmented generation (RAG) around reasoning benefit rather than surface similarity. 1
Technically, RA-RFT distills a “gold-relevance” retriever that ranks contexts by expected reasoning help, and then applies reinforcement fine-tuning with those analogous demonstrations under verifiable outcome rewards. The authors also analyze retrieval diversity and find that reasoning-aware retrieval surfaces complementary strategies that act as distinct scaffolds for each problem. 1
Across challenging math benchmarks, RA-RFT consistently outperforms standard reinforcement fine-tuning such as Group Relative Policy Optimization (GRPO). For example, it lifts AIME 2025 average@32 accuracy by 7.1 and 2.8 points over GRPO for Qwen3‑1.7B and Qwen3‑4B, suggesting retrieval quality is an orthogonal axis of improvement to reward design or training curricula. 1
Practically, this points to a recipe for smaller models: pair a reasoning-aware retriever with reinforcement fine-tuning so the model learns reusable patterns, and watch whether the gains carry from math to coding and open‑ended tasks. 1
HyperTool compresses multi-step tool use into a single call
HyperTool is a unified, executable Model Context Protocol (MCP)–style interface that lets a model send one code block which calls multiple existing tools, passes intermediate values locally, and hides deterministic subroutines from the visible reasoning trace. It tackles the “execution‑granularity mismatch,” so the model no longer spends context managing low‑level dataflow step by step. 2
Trained by synthesizing HyperTool‑format trajectories and verifying them in real MCP environments, the approach boosts average accuracy on MCP‑Universe from 15.69% to 35.29% for Qwen3‑32B and from 9.93% to 33.33% for Qwen3‑8B, surpassing GPT‑OSS and Kimi‑k2.5 on average accuracy. 2
EurekAgent: engineering the environment for autonomous discovery
EurekAgent argues that the bottleneck in autonomous scientific discovery is shifting from scripting agent workflows to shaping the environment—resources, constraints, and interfaces—that guide agent behavior and curb reward hacking. In other words, design the lab around the agent as carefully as the agent itself. 3
It engineers four dimensions—permissions for bounded execution and isolated evaluation; artifacts with filesystem and Git‑based collaboration; budgets for exploration; and human‑in‑the‑loop controls—and reports state‑of‑the‑art results on math, kernel engineering, and machine learning tasks, including a new 26‑circle packing result found with under $11 in total API cost. 3
The authors open‑source code and results and call for “environment engineering” as a core direction for reliable, auditable research agents. 3
Open Source & Repos
Qwen Code: a terminal AI coding agent gets v0.18.0
Qwen Code is an open‑source AI coding agent that runs in your terminal, exposing a Command‑Line Interface (CLI) so developers can converse with an assistant while editing and running code without leaving the shell. The project provides multilingual docs and a Node.js package for quick setup. 4
Release v0.18.0 (Jun 12, 2026) includes maintenance updates and a CLI fix that skips “thought” text when copying output, with Node.js 22 or higher required. 4
NautilusTrader: Rust‑native trading engine advances blockchain adapters
NautilusTrader is a production‑grade, Rust‑native trading engine with a deterministic event‑driven architecture for quant and algorithmic trading teams that need reproducible, low‑latency execution. It’s available as an open repository with active release cadence. 5
Version 1.228.0 Beta (Jun 8, 2026) adds BSC chain support to the blockchain adapter with UniswapV3 and PancakeSwapV3 decentralized exchange (DEX) registrations, plus Aerodrome Slipstream pool‑event signatures and parsers for bootstrap and replay on Base. 5
Why It Matters
Shifting focus from bigger models to better structure—where to look (reasoning‑aware retrieval) and how to act (coarser‑grained tool execution, engineered environments)—is yielding measurable gains on hard reasoning tasks. 1
For builders, the takeaway is concrete: combine retrieval and reinforcement fine‑tuning, collapse routine tool micro‑steps, design budgets and permissions early, and lean on maturing OSS agents to speed daily workflows. 2
This Week to Try
- Qwen Code in your terminal: install the Node.js package and try coding with an assistant from the shell (see GitHub). https://github.com/QwenLM/qwen-code
- Skim HyperTool’s examples: read how one code block orchestrates multi‑tool workflows and compare accuracies on MCP‑Universe. https://arxiv.org/abs/2606.13663v1
Comments (0)