Vol.01 · No.10 CS · AI · Infra May 13, 2026

AI Glossary

GlossaryReferenceLearn
LLM & Generative AI

Agentic RAG

Difficulty

Plain Explanation

Traditional RAG fetched context once and hoped the answer was inside that single batch, which struggled on multi-hop questions. Agentic RAG puts a planner in the loop, like a careful librarian who adjusts queries, tries another catalog, reads specific passages, and only then compiles notes. The agent iterates—retrieve, read, assess gaps, re-retrieve, and validate—until evidence suffices, then composes the answer. The control policy decides when to search, which interface to use (keyword, semantic, or chunk read), and what to keep or discard before generation.

Examples & Analogies

  • Policy Q&A with evidence checks: a timeline question needs documents from different years; the agent starts broad, then narrows with exact terms and cites both sources.

  • Engineering runbook lookup: the agent reads a small chunk, detects a dependency, then retrieves a parent summary to confirm the wider procedure before drafting steps with citations.

  • Literature mini-survey: alternate semantic search for coverage and chunk reads for details, pruning irrelevant hits until notes align with scope.

At a Glance


Traditional RAGActive RAGAgentic RAG
Control policyFixed, one-shotRetrieve-on-confidence in one passExplicit planner with multi-step control
IterationsNoneLocal triggersGlobal, multi-turn loops
Tool useSingle retrieverSingle retrieverMultiple tools (keyword, semantic, chunk read)
Context constructionStatic top-kInjected during generationDynamic select, discard, re-retrieve
TerminationAfter one fetchEnd of generationPolicy decides when to stop

Agentic RAG separates planning from generation so the system can iterate, switch tools, and stop based on evidence sufficiency rather than a fixed pipeline.

Where and Why It Matters

  • Complex questions: if the first retrieval misses evidence, the planner can reformulate the query, switch retrieval tools, or inspect narrower chunks.

  • Token control: the system can read selectively instead of stuffing every top-k result into the prompt, keeping context smaller and cleaner.

  • Internal knowledge bases: policy, code, support, and research corpora often have hierarchy, so stepwise section-to-chunk reading is more reliable than one flat search.

  • Operational debugging: logged trajectories show whether a failure came from bad retrieval, wrong tool choice, weak reranking, or a verifier that stopped too early.

Common Misconceptions

  • ❌ "Agentic RAG means more retrieval." → ✅ It means policy-driven retrieval that can also discard, switch tools, or stop early.

  • ❌ "Active RAG and Agentic RAG are the same." → ✅ Active RAG triggers retrieval within one pass; Agentic RAG uses a multi-step planner.

  • ❌ "Agents remove the need for verification." → ✅ Research emphasizes step-level checks and verifiers to control risks.

How It Sounds in Conversation

  • "Promote this to Agentic RAG so the planner can re-query when the first pass looks thin."

  • "Expose keyword, semantic, and chunk-read tools, but cap the loop at three retrieval steps."

  • "The failure was a tool-selection issue, not a generation issue; it should have narrowed lexically before semantic search."

  • "The answer is correct, but one sentence is not covered by evidence. Add an evidence-coverage check before generation."

  • "Log process scores so we can see whether failures come from query rewriting, chunk reading, or early termination.

Related Reading

References

Helpful?