Vol.01 · No.10 CS · AI · Infra May 13, 2026

AI Glossary

GlossaryReferenceLearn
LLM & Generative AI

Context Engineering

Difficulty

Plain Explanation

Teams struggled because early prototypes worked once, then broke in production. The model had to guess from vague prompts, noisy history, and mismatched tools. As sessions grew, responses slowed and wandered off-topic, and costs crept up with every extra token. Context engineering fixes this by deciding exactly what the model should read each turn. Picture a trip where you pack a small, curated carry-on instead of an overstuffed suitcase: only the instructions, facts, and tool outputs that matter for this step make it in. Everything else is left out or summarized so the model stays focused and fast. Concretely, you control transient context (the current call’s instructions, messages, tools, and output format) and persistent context (state, long-term store, and life-cycle hooks between calls). You route in external knowledge via retrieval, inject user or session data, constrain outputs with schemas, and prune or compact history as it grows. This reduces wrong tool picks, keeps latency predictable, and contains token spend.

Examples & Analogies

  • Policy-aware support triage: A helpdesk agent classifies a ticket, then reads only the three most relevant policy snippets and last two conversation turns. Tool specs are limited to "lookup_policy" and "create_case" with clear input schemas so the model can’t make unsafe calls.
  • Release-note summarizer for DevOps: Before summarizing a week of commits, the agent retrieves top-ranked diffs, inserts team-specific formatting rules as system instructions, and compacts older turns into a one-paragraph reminder. The output is constrained to a title, bullets, and a risks list.
  • Sales email assistant with preferences: The agent pulls the rep’s tone preference and customer tier from long-term store, fetches two product facts via retrieval, and hides unrelated history. It emits a structured draft with fields for subject, opener, body, and next-step CTA.

At a Glance

Prompt engineeringContext engineeringRAG (retrieval)
ScopeWording and examplesFull input assembly and governanceSupplying external knowledge
Inputs managedPrompt textInstructions, history, tools, outputs, schemasSnippets from a corpus
PersistenceUsually statelessMix of transient and persistent stateStateless per query
Primary leverageBetter phrasingReliability, cost/latency controlInformation relevance ceiling
Common failureNicely worded but uninformedBloat, missing or misformatted contextGood retrieval but poor assembly

Context engineering composes prompting and retrieval with memory, tools, and output constraints to deliver the right information at the right time.

Where and Why It Matters

  • Retrieval as quality ceiling: For knowledge-grounded agents, the relevance of retrieved snippets bounds answer quality; perfect wording cannot rescue bad snippets.
  • System instructions as a top lever: Writing instructions like a spec (with role- and step-aware rules) turns many production failures into single-line fixes.
  • Tool visibility and schemas: Tight tool naming, descriptions, and input contracts reduce wrong tool picks and prevent expensive or unsafe API calls.
  • Session management practice: Summarization, reminders, and compaction keep long sessions coherent while containing token cost and latency growth.
  • Isolation via sub-agents: Delegating scoped work to specialized sub-agents caps prompt bloat and limits the blast radius of side quests.

Common Misconceptions

  • ❌ Myth: Context engineering is just a smarter prompt. → ✅ Reality: It governs instructions, history, tools, retrieved data, and output schemas across turns.
  • ❌ Myth: More context is always better. → ✅ Reality: Extra tokens raise latency and cost and can dilute attention; compact and filter aggressively.
  • ❌ Myth: RAG alone solves grounding. → ✅ Reality: Good snippets help, but assembly, instructions, tool configs, and memory policy decide final reliability.

How It Sounds in Conversation

  • "Let’s cap the context window to 8k and add a history summary after turn 12 to keep latency under 1s."
  • "Please tighten the tool schema—make customer_tier required so the model stops guessing."
  • "Our evals show retrieval is fine; the miss is system instructions—add the billing rule per session metadata."
  • "Token spend spiked; we need compaction and to hide web_search during the classification step."
  • "Ship the new output JSON schema so downstream parsers stop breaking on free-form answers."

Related Reading

References

Helpful?