What is context engineering for AI agents?

Context engineering is the discipline of deciding what information, tools, memory, and history an AI agent sees when it runs, and how that context reaches the model. It is the successor to prompt engineering: instead of wording a single instruction, you assemble the full working set the model reasons over for a task. Glean's founder and CEO Arvind Jain calls bringing the right context to the model the genuinely hard problem in building agents, because a capable model with the wrong context still produces wrong work.

How is context engineering different from prompt engineering?

Prompt engineering is about the wording of the one instruction you give a model. Context engineering is about everything else the model sees: the data, documents, prior examples, tools, and history that surround that instruction. As models got better at following instructions, the bottleneck moved from how you phrase the ask to what information you put in front of the model. Andrej Karpathy frames the context window as your program and the model as the interpreter, which makes assembling that context the real programming task.

Why do AI agents fail or give wrong answers?

Most agent failures are context failures, not model failures. If you point a capable model at all of your internal data without curating what it should use, Arvind Jain says it often goes into a loop and gets confused, or pulls the wrong information and completes the task incorrectly. The fix is usually not a bigger model. It is giving the agent the specific context it needs for the task, the right documents, the relevant history, and the right tools, and leaving out what would distract it.

What is a context layer for AI agents?

A context layer is a single place an agent can go to get the information it needs, instead of each agent being wired separately to every system. Glean founder and CEO Arvind Jain describes a horizontal layer that connects to a company's systems so any agent, even one built inside Salesforce, can pull the right context through an MCP or REST API. A startup does not need Glean to use the idea: pick the few systems that hold your truth, make them queryable by an agent, and route every agent through that one source rather than re-wiring context for each bot.

How much autonomy should you give an AI agent?

Match autonomy to the cost of being wrong. Cresta CEO Ping Wu argues customer-facing agents need precision more than autonomy, and that for sensitive actions like moving money you are better off putting the logic in code than trusting a model. For low-stakes work like research, more freedom is fine because the result can be vaguely right and still useful. A good default is to keep a human in the loop for anything that changes data or leaves the company, and to let agents run unattended only where a mistake is cheap and visible.

Context engineering for AI agents that work

You can feel the gap the first time an agent gets something obviously wrong. The model is clearly capable, but it answers as if it has never seen your business, because it has not. In a Greylock Change Agents conversation, Arvind Jain, founder and CEO of Glean, put the real problem plainly: bringing the right context to the model is the genuinely hard problem in building agents. That problem has a name now: context engineering for AI agents. For anyone building with them, it is becoming the work that matters most.

What context engineering actually is

Context engineering is the discipline of deciding what an AI agent sees when it runs: the data, the tools, the prior examples, the history, and the instructions, and how all of that reaches the model. It is the successor to prompt engineering. Prompt engineering was about the wording of a single instruction. Context engineering for AI agents is about assembling the whole working set the model reasons over before it acts.

Andrej Karpathy has a clean way to think about it: the context window is your program, and the model is the interpreter. That reframes the job. You are not writing logic line by line anymore, you are deciding what to put in front of the model, which is the same shift I wrote about in moving from vibe coding to agentic engineering. On the same Greylock panel, Ping Wu, CEO of Cresta and a co-founder of Google's Contact Center AI, pointed to recent guidance on building good agents through context engineering and tool use. The model is increasingly the easy part. The context is the part that is yours.

Why agents fail without the right context

Most agent failures are context failures, not model failures. Ask Arvind Jain how much of a hard task an agent gets right because of the model versus the company's own data, and his answer is both, working together. The failure mode is specific: if you skip the context work and just let a model loose on all of your internal information, he says, it often goes into a loop and gets confused, or it uses the wrong information and solves the task incorrectly.

That is worth sitting with, because it changes how you debug. A hallucination is often just the model running without the context it needed, so the instinct to reach for a bigger model is usually wrong. The agent did not need more raw intelligence, it needed the right documents, the relevant history, and the right tools, with the noise left out.

The upside of getting context right is just as concrete. Jain's example: a legal team gets contracts that are ninety pages of dense terms, and a playbook of what the company will and will not accept. Give an agent that playbook, the company's past contracts, and the standards, and it can reason through a new contract and redline it, saving a lawyer weeks. None of that works without the context. The model is the same model everyone else has.

Context is captured, not written

Here is the part founders tend to miss. You do not sit down and write good context. You capture it from how work already happens. Glean's whole model, Jain says, is to observe how people complete tasks and bring "this collective human intelligence in one place," so that when you ask a model to run a business process it has the historical context to make smart decisions and do the work itself. The guiding question is simple: given any task, how do people actually complete it?

For a startup, you do not need Glean's machinery to apply this. The mistake is writing an idealized prompt that describes how you wish the job were done. The fix is to capture how your best people actually do it: the resolved support tickets, the sales playbook that closes, the docs people really follow, the past examples that worked. That is how you give AI agents business context that reflects reality instead of a fantasy. And because captured context can still be wrong or stale, you have to measure whether it improves output, which is exactly what evals are for.

Build a context layer, not one-off prompts

Most company knowledge is siloed across a dozen systems. Jain's observation is that humans already deal with this: to finish one task, a person typically moves across four or five systems, checking Salesforce, then email, then a Slack channel. An agent has to work the same way, which means it cannot live inside a single app's data.

Glean's answer is a horizontal context layer that connects to everything, so any agent, even one built inside Salesforce, can reach the right context through an MCP or REST API instead of being re-wired by hand. You do not need to buy that to steal the principle. Pick the few systems that hold your company's truth, make them queryable by an agent, and route every agent through that one source rather than hard-coding context into each bot. Where that context physically lives, one store plus a tool registry, is its own build, and I covered that stack in internal AI infrastructure for startups. Context engineering is the layer of judgment above it: deciding what goes in, what stays out, and which agent gets what.

Curate it, or drown in agent sprawl

Once agents are easy to build, you get too many of them. Jain described one customer that built two thousand agents in six months, and his own company running thousands internally, to the point that for any given task there are eight different agents that all look and feel similar. His word for the result is a mess, and the answer is curation. Some companies now let anyone build an agent, but a small team approves and promotes the good ones into a published library, so most people only ever see high-quality agents.

The context lesson underneath the sprawl: ten agents with thin context are worse than two with good context. More bots do not compound, good context does. Treat your agents and the context behind them as a product surface you curate, not a pile that grows on its own.

Precision, autonomy, and the security line

As a CISSP, the part of this conversation I keep returning to is that context is an access-control problem. The more context you hand an agent, the more you have to govern what it can see and do. The operators on the panel land in the same place from experience. Wu says customer-facing agents need precision more than autonomy. For sensitive actions like a bank transfer, anything touching a system of record, he argues you are better off putting the logic in code than trusting a model. Jain says the vast majority of Glean's agents are still human in the loop, and recommends keeping a human there for any agent that mutates real state, like sending an email or writing a record. A data-enrichment agent that just adds notes to a CRM is safe to run unattended, because a person will see the output anyway.

Translate that into a few rules for your own build:

Least privilege for context. Give an agent the context the task needs, not read access to everything. Permission-aware retrieval, where the agent only sees what the user is allowed to see, is the model to copy.
Keep secrets out of the context window. Anything you put in the context can end up in a log, so credentials and keys do not belong there.
Human in the loop for any action that changes data or leaves the company.
Keep your context portable. Jain's point that a customer's data belongs to the customer, not to the app holding it, is also a buying rule: prefer tools that let your context flow, and avoid the ones that lock it in.

What to do this week

Pick your single most important agent or workflow and write down, in plain language, the context a good human would need to do that job: the documents, the past examples, the rules. That list is your spec.
Capture it from reality, not memory. Pull the actual tickets, contracts, playbooks, or transcripts your best people use instead of writing an idealized prompt.
Route the agent through one context source, not five hard-coded prompts. Even a single shared store beats per-bot wiring.
Give the agent the least context that does the job, and turn on human-in-the-loop for anything that writes data or contacts a customer.
Wire the change into an eval so you can tell whether more or different context improved the output.
Before you add a second agent, confirm the first one has the context it needs. One well-fed agent beats ten starving ones.

The model is now the commodity. The hard part, the part that is actually your company's, is the context only you have and how you feed it to the machine. Building that into how your startup operates is what the AI Operating System for Startups is about.

Sources

The Enterprise Brain for AI Agents with Glean and Cresta (Greylock's Change Agents series), the conversation this article distills.
Enterprise Context for Glean's AI Agents (Greylock), the focused clip on bringing the right context to the model.
Profiles: Arvind Jain, founder and CEO of Glean, and Ping Wu, CEO of Cresta and a co-founder of Google's Contact Center AI.
On context as the new programming: Andrej Karpathy at Sequoia's AI Ascent, distilled in from vibe coding to agentic engineering.