Notes on Context Engineering

Notes on Anthropic's Context Engineering Blog post¶

Working with LLMs effectively requires thinking in context, which means considering the holistic state available to the LLM and what kinds of behaviors that state might yield
Context engineering simple definition: set of strategies for curating context for LLMs (during inference and after) to optimize performance
Context state: system instructions, tools, external data, message history, etc...

Image taken from Anthropic article

Why is context engineering relevant? Due to context rot, which is the process where as the context of a model increases, its ability to recall information from that context decreases
Attention scarcity in LLMs is related to its transformer architecture where tokens attend to each other within the context leading to n^2 pairwise relationships for n tokens, therefore as context increases the ability to to attend to all these relationships gets stretched thin, creating a tension between context size and attention focus.
Models are also more used to shorter sequences, meaning less experience with context wide dependencies
Great quote defining the goal of context engineering:

good context engineering means finding the smallest possible set of high-signal tokens that maximize the likelihood of some desired outcome.
System prompt should be a balance between too strict instructions and too vague, calibrating to allow for behavioral guidance while maintaining flexibility for strong heuristics to guide behavior.
Organize prompts with sections (using XML or markdown headers)

minimal set of information that fully outlines your expected behavior.

Start with minimal and improve instructions based on failure modes
Curate carefully examples for optimizing Agent behavior and avoid overhead in maintenance or context bloat
Agents: "LLMs autonomously using tools in a loop."
"Just-in-time" context strategies means dynamically loading what the agent needs when it needs it
Allow for discovery and autonomous data retrieval for a self-managed context windows to keep the agent focused on relevant subsets rather than drown in potentially irrelevant information
Hybrid strategies with agents (like using CLAUDE.md files for Claude Code) represent this acknowledgement that as agents become more intelligent they will require less human curation over time so the advice Anthropic gives is:

"do the simplest thing that works"
Techniques to address context polution constraints for long horizon tasks:
compaction: summarizing a conversation near its context window limits and starting a fresh new context window.
- tool results clearing is a simple example of something that can be removed from the agent's trace without much impact in performance
structured note-taking: agents writes external notes for later retrieval on demand
- For example a todo list for keeping track of progress, having a NOTES.md file for maintaining critical context
- They exemplify this with an example of Claude playing pokemon
- They built a memory tool with the Sonnet 4.5 launch
sub-agent architectures:
- involves using specialized sub-agents with clean focused tasks and a clean context window
- ideal for complex research tasks
When to use each strategy:
- Compaction for extensive back-and-forth conversational tasks
- note-taking for iterative development with clear milestones
- multi-agent architectures for complex research and analysis where parallel exploration pays dividends

Agentic Loops¶

New skill to develop: designing agentic loops
Yolo mode (running coding agents with automatic approval)
Risks:
- bad shell commands deleting or maing stuff
- exfiltration attacks
- attacks that use your machine as a proxy to attack another target
options
- Run your agent in a secure sandbox
- acceptable options for most
  - https://github.com/apple/container
  - Docker
  - But remember this is not perfect! (container escapes)
- Use someone else's computer
- Just risk it! (avoid expositing it to potentially palicious sources)
- Best option for Simon: Github Codespaces
Set credentials with tight budget limit on staging environments to contain any potential damage then design a cool agentic loop like this for testing, exploring, prototyping...
- Criteria for when to design an agentic loop is the existence of a bunch of potential trial and errors in the task
Examples of scenarios for agentic loops: debugging, performance optimization, upgrading dependencies, optimizing container sizes.
Automated tests!

The value you can get from coding agents and other LLM coding tools is massively amplified by a good, cleanly passing test suite.

Subscribe to my Newsletter Subscribe to my YouTube