Context Engineering
Plain Explanation
Teams struggled because early prototypes worked once, then broke in production. The model had to guess from vague prompts, noisy history, and mismatched tools. As sessions grew, responses slowed and wandered off-topic, and costs crept up with every extra token. Context engineering fixes this by deciding exactly what the model should read each turn. Picture a trip where you pack a small, curated carry-on instead of an overstuffed suitcase: only the instructions, facts, and tool outputs that matter for this step make it in. Everything else is left out or summarized so the model stays focused and fast. Concretely, you control transient context (the current call’s instructions, messages, tools, and output format) and persistent context (state, long-term store, and life-cycle hooks between calls). You route in external knowledge via retrieval, inject user or session data, constrain outputs with schemas, and prune or compact history as it grows. This reduces wrong tool picks, keeps latency predictable, and contains token spend.
Examples & Analogies
- Policy-aware support triage: A helpdesk agent classifies a ticket, then reads only the three most relevant policy snippets and last two conversation turns. Tool specs are limited to "lookup_policy" and "create_case" with clear input schemas so the model can’t make unsafe calls.
- Release-note summarizer for DevOps: Before summarizing a week of commits, the agent retrieves top-ranked diffs, inserts team-specific formatting rules as system instructions, and compacts older turns into a one-paragraph reminder. The output is constrained to a title, bullets, and a risks list.
- Sales email assistant with preferences: The agent pulls the rep’s tone preference and customer tier from long-term store, fetches two product facts via retrieval, and hides unrelated history. It emits a structured draft with fields for subject, opener, body, and next-step CTA.
At a Glance
| Prompt engineering | Context engineering | RAG (retrieval) | |
|---|---|---|---|
| Scope | Wording and examples | Full input assembly and governance | Supplying external knowledge |
| Inputs managed | Prompt text | Instructions, history, tools, outputs, schemas | Snippets from a corpus |
| Persistence | Usually stateless | Mix of transient and persistent state | Stateless per query |
| Primary leverage | Better phrasing | Reliability, cost/latency control | Information relevance ceiling |
| Common failure | Nicely worded but uninformed | Bloat, missing or misformatted context | Good retrieval but poor assembly |
Context engineering composes prompting and retrieval with memory, tools, and output constraints to deliver the right information at the right time.
Where and Why It Matters
- Retrieval as quality ceiling: For knowledge-grounded agents, the relevance of retrieved snippets bounds answer quality; perfect wording cannot rescue bad snippets.
- System instructions as a top lever: Writing instructions like a spec (with role- and step-aware rules) turns many production failures into single-line fixes.
- Tool visibility and schemas: Tight tool naming, descriptions, and input contracts reduce wrong tool picks and prevent expensive or unsafe API calls.
- Session management practice: Summarization, reminders, and compaction keep long sessions coherent while containing token cost and latency growth.
- Isolation via sub-agents: Delegating scoped work to specialized sub-agents caps prompt bloat and limits the blast radius of side quests.
Common Misconceptions
- ❌ Myth: Context engineering is just a smarter prompt. → ✅ Reality: It governs instructions, history, tools, retrieved data, and output schemas across turns.
- ❌ Myth: More context is always better. → ✅ Reality: Extra tokens raise latency and cost and can dilute attention; compact and filter aggressively.
- ❌ Myth: RAG alone solves grounding. → ✅ Reality: Good snippets help, but assembly, instructions, tool configs, and memory policy decide final reliability.
How It Sounds in Conversation
- "Let’s cap the context window to 8k and add a history summary after turn 12 to keep latency under 1s."
- "Please tighten the tool schema—make
customer_tierrequired so the model stops guessing." - "Our evals show retrieval is fine; the miss is system instructions—add the billing rule per session metadata."
- "Token spend spiked; we need compaction and to hide
web_searchduring the classification step." - "Ship the new output JSON schema so downstream parsers stop breaking on free-form answers."
Related Reading
References
- A Survey of Context Engineering for Large Language Models
Surveys retrieval, processing, and management components of context engineering.
- Context Engineering | Agently Docs
Layered context, agent vs request scope, structured IO, tools, and memory control with examples.
- Context engineering
Practical levers: retrieval first, instructions, tool config, session compaction, and monitoring.
- Context engineering in agents
Defines transient vs persistent context, agent loop, and middleware control points.
- The Guide to AI Context Engineering in 2026
Describes a minimal context stack and an orchestrator-centric assembly pattern.
- What is context engineering? Components, techniques, and best practices
Overview of techniques and best practices for controlling agent context.