
AI coding agents do not always fail because they lack context. On longer implementation tasks, they can fail because the active context contains too many competing versions of the requirement. The operational problem is not context length by itself. It is context authority.
I saw this while using a coding agent to build a feature that produced audit events across a multi-step workflow.
The feature itself was normal product work. A user moved through a business workflow, and different parts of that workflow needed to create audit records. Some events were caused directly by an authenticated user action. Others happened later, inside internal workers or event handlers, as the system kept processing the workflow.
Those later events were still part of the same business process, but they were no longer happening inside the original user request. That distinction was important.
The audit trail needed to show what happened, which tenant it happened in, what business object was affected, and whether the action should be attributed to a human user or to the system. The worker path could not pretend that a logged-in user existed. But the system also could not drop attribution just because the event was internal.
Each time I clarified one path, the agent handled it and disturbed the other. When I corrected the second, it returned to the first.
My first instinct was to keep explaining.
That made the session longer. Not clearer.
We have learned to treat missing context as one of the main reasons AI coding agents fail. That is often true. A coding agent without the right files, repo conventions, data model, API shape, or product intent will usually make a mess. It will invent patterns, miss boundaries, and produce changes that are locally plausible but wrong for the system.
But longer product work exposes a different failure mode.
Sometimes the agent does not need more information or a larger context window. It already has the information. The problem is that the information exists as a history of competing corrections rather than one coherent model of the requirement.
The agent had the context, but not the workflow model
The Ciphrix implementation was not about adding a single audit log call.
It was a feature with a workflow behind it. Multiple operations could emit audit events depending on how the workflow moved. Some operations were user-facing. Others were internal continuations of the same business process.
A simplified shape looked like this: a user initiates or changes a business object inside a tenant; the API layer authenticates the user and authorizes the external request; the system records an audit event for that user-initiated action; internal processing continues through a worker or event handler; that internal step may update related state or emit another business event; the system records another audit event, but this one should be attributed to the system process, not to a fabricated user.
The cases were linked. They were not two unrelated features.
A single workflow could include both human-attributed and system-attributed events. Tenant context had to survive across both paths. The affected business object had to remain consistent. The audit trail had to remain meaningful across the whole workflow, not just at the initial API boundary.
This is where the coding agent struggled.
The first implementation path treated attribution as if it always meant an authenticated user. That made sense for the API path. It did not make sense for the worker path.
So I clarified that the internal worker had no human user and must not invent one.
The agent adjusted, but it went too far in the other direction. It weakened the actor information, which created a different problem: system-triggered actions still needed meaningful attribution. We were not trying to create anonymous audit records. We were trying to represent system activity honestly.
So I clarified that system activity must remain auditable.
The agent then moved back toward a human-style actor model.
The loop began.
The agent was not hallucinating random architecture. It was reacting to real constraints from the feature. The problem was that it kept treating the latest correction as a replacement for the previous one.
“No fabricated user” became “less attribution.”
“Still auditable” became “maybe it needs a user after all.”
The missing concept was not in the codebase. It was in the model of the workflow: attribution and authentication are related, but they are not the same thing.
A system action can be explicitly attributed without pretending it was performed by an authenticated human.
Coding-agent context accumulates faster than truth
A working session records the full path through a problem.
That path is messy by design. You start with an assumption. The agent makes a change. You review it. An edge case appears. You correct the agent. It finds another affected path. A test fails. A workaround is proposed. You reject it. A more accurate boundary emerges. The product requirement sharpens.
That is normal engineering work.
The issue is that the active context now contains many different kinds of information: the current requirement, old assumptions, rejected options, temporary workarounds, failed implementations, debugging clues, final decisions, and unrelated discoveries that happened while searching through nearby files.
To a human operator, those things have different authority.
We remember that one statement was a correction to an earlier mistake. Another was a local constraint. Another was a settled invariant. Another was just part of the investigation.
The context window does not naturally arrange them that way. Claude Code’s context window documentation for coding agents is useful here because it makes the working set visible: instructions, files, model responses, and tool content all become part of what the session is carrying. The agent sees a sequence and has to infer what is still true, what has been replaced, which instruction was local, which rule is universal, and whether the latest correction replaces or complements the earlier one.
The longer the session runs, the more that inference matters. Anthropic’s platform documentation describes the context window as the model’s working memory, which is the right frame for coding work: what remains active influences what the agent can notice, reuse, and accidentally reopen.
This is why “just add more context” can start working against the task. More material may help when the agent lacks information. But once the agent is carrying multiple active versions of the requirement, another correction can make the session more complete and less clear at the same time.
Relevant context can still become noise
Context pollution is often described as irrelevant information entering the prompt.
That is the obvious version: a long log dump, an unrelated file, a previous task that never left the session, or a stack trace from an issue that has already been fixed.
But the more dangerous version is related context that is no longer useful for the next decision.
A rejected design is still relevant to the history of the task. An old failure is still relevant to the investigation. An earlier implementation is still relevant to understanding what changed. A correction is still valid within one use case. A broad product discussion may still be related to the feature.
But each of those can become harmful if it keeps competing with the settled requirement.
In the workflow-audit case, all of these statements were relevant: audit records need attribution; user-initiated events should be attributed to the authenticated user; internal workers have no authenticated user; workers must not fabricate identity; system activity must still be auditable; tenant context must be preserved across the workflow; API-layer authorization should not be re-created inside the internal handler.
None of those statements was noise in the ordinary sense. Each was true. The problem was that the conversation had not consolidated them into one actor model for the whole workflow.
The agent kept moving between them as if they were competing instructions.
That is what polluted context looks like in real coding work. It is not always a pile of unrelated content, sometimes it is a pile of true statements without a clear order of authority.
Information can be relevant to the story of the task without being useful for the next decision. That is the practical side of context engineering for AI agents: not stuffing the prompt with everything available, but deciding which tokens should be active for the decision in front of the agent.
Corrections are not the same as a specification
Conversational coding workflows naturally become reactive.
“That case should work differently.”
“Do not add authentication there.”
“But the event still needs attribution.”
“Do not lose the tenant.”
“Do not change the API layer.”
“That earlier approach was wrong.”
This is a reasonable way to steer a human engineer in real time, especially when both people share enough background to infer the conceptual model behind the corrections.
With an agent, the same sequence can degrade into chronological instruction-following.
The latest correction receives too much weight. The earlier requirement is treated as less current. The agent may describe both cases accurately and still implement only the newest correction.
A specification does something different.
It states the complete set of valid cases. It separates concepts that must remain separate. It names the invariants that apply across paths. It makes the chosen model explicit. It defines the boundaries of the task.
That is why the clean session worked.
The original conversation described how the model was discovered.
The new prompt described the model itself:
- Build the workflow audit behaviour across both user-facing and internal execution paths.
- User-initiated business actions should create audit records attributed to the authenticated user within the correct tenant context.
- System-initiated business actions inside workers or event handlers should also create audit records, attributed explicitly to the system or internal process, without fabricating user authentication, roles, or permissions.
- Both paths must produce valid business audit records. Tenant context must always be preserved. Internal workers must not pretend to be authenticated users. Authorization remains at the external/API boundary. Attribution and authentication are related, but not identical. The audit system must represent both human and system actors deliberately.Presented this way, the agent implemented the behaviour successfully.
The codebase had not become easier. The model had not suddenly become more capable. The underlying requirements had not changed.
Or, more accurately, the requirements had changed during discovery, but they were no longer being presented as discovery.
What changed was the form of the active context.
The model had enough information in both sessions. In the first, that information arrived as a history of disagreement. In the second, it arrived as a coherent specification.
The warning sign is oscillation
The point where I should have stopped was visible earlier.
The agent had started circling.
It returned to an approach that had already been rejected. Fixing one edge case disturbed another. The same conceptual distinction needed to be explained more than once. The implementation changed direction without converging. Each answer became more elaborate while the actual solution remained unstable.
That pattern is different from ordinary iteration.
Ordinary iteration reduces uncertainty. The patch gets smaller. The tests get more focused. The remaining failure becomes narrower. The agent starts preserving decisions from earlier turns.
Circling does the opposite.
The diff becomes harder to review. Correct code from two iterations earlier gets rewritten. Nearby abstractions get pulled into scope. The agent starts solving the problem again instead of finishing the agreed change. It explains the distinction correctly, then implements a patch that only respects half of it.
This is usually the point where another correction feels tempting. You think: fine, I will explain the missing bit one more time.
And sometimes that works.
But the real issue may be that the session now contains too many partially valid versions of the case.
Once the agent begins oscillating between valid requirements, it is no longer useful to treat the problem as a missing-instruction problem.
It has become a context-structure problem.
A clean session fixed what more explanation could not
The manual fix was simple.
I stopped the active implementation and reconstructed the requirement outside the session.
That meant separating genuine requirements from debugging history, failed approaches, speculative ideas, and local workarounds. It also meant resolving the apparent contradiction that the agent had been stuck on.
In this case, the collapsed concepts were clear.
Attribution is not identical to authentication. A system actor is not a fabricated human. Tenant context does not imply user context. Audit recording is not the same as authorization. Workflow continuity does not mean every event shares the same actor.
Once those distinctions were explicit, the implementation packet became much smaller.
The fresh session did not need the whole transcript. It needed the resolved requirements, relevant files, architectural constraints, explicit non-goals, and success checks.
The old session remained useful for me. It contained the path we had taken through the problem. It included mistakes that helped expose the real boundary. It showed which approaches failed.
But that history did not need to remain active in every implementation decision.
Resetting the session is not the same as forgetting organisational knowledge. It is moving historical material out of the execution context.
The agent should not have to carry every failed attempt in order to implement the final decision.
More recently, I used subagents to partition context
I ran into a similar pattern later, but handled it differently.
The details were less clean than the workflow-audit example, but the shape was familiar. One agent was dealing with multiple distinct use cases. The concerns began interfering with each other. More clarification was not creating convergence.
This time, rather than manually abandoning the whole session, I asked the agent to split the problem across focused subagents.
Each subagent received one concern, limited context, and a narrow analysis boundary. The parent agent kept the broader coordination role and was responsible for reconciling the outputs.
That was the important progression.
The first time, I manually created a clean context.
More recently, I used orchestration to create several clean contexts.
A fresh session and a subagent are not the same mechanism, but they address basically the same underlying problem. Not every concern needs access to the entire history. This is one of the useful properties of Claude Code subagents: the delegated coding work happens in its own context and returns results to the main conversation. Distinct concerns can often be reasoned about independently. The parent agent should receive conclusions, invariants, edge cases, and integration constraints, not every exploratory step.
This is not an argument that every task should be split across agents. Partitioning can create its own problems if the boundaries are poor. Two subagents can produce incompatible designs. They can optimise for their local case and leave integration unresolved. They can duplicate work and make the parent’s job harder.
The pattern only helps when the concerns are genuinely separable.
For example, one subagent can analyse the user-attributed path and another can analyse the system-attributed path. Each can return invariants, edge cases, required changes, and integration constraints. The parent can then create the unified design.
The value is not that multiple agents are more impressive than one.
The value is that each context has a cleaner job.
Reset, partition, or continue the coding-agent session?
I am starting to use a simple decision rule.
Continue the current session when the agent is still converging. It may have small mistakes, but it is preserving the core model. Each correction narrows the remaining work. The diff remains reviewable. The same settled decision is not being reopened.
Restart with a clean specification when the requirements are now understood, the implementation is bounded, and the current session is carrying too much failed history. This works well when one coherent model can represent the whole task and there is no real benefit in preserving the reasoning process inside the execution context.
Use focused subagents when the task contains distinct concerns that can be analysed independently, but the parent still needs to coordinate the whole change. This is useful when verbose investigation would pollute the main context and when the outputs can be returned in a structured form for reconciliation.
The main mistake is to use subagents as a way of creating more activity.
The goal is not more agents. It is cleaner authority.
Each working context should have a clear responsibility. One agent should not be asked to carry the full product discussion, investigate every edge case, implement the database changes, update the API, modify the frontend, adjust tests, and review its own work while also remembering which earlier ideas were rejected.
Sometimes that works.
When it does not, the failure is expensive because everything is mixed together.
Planning, debugging, implementation, and review all blur into one long thread. When the result fails, it is hard to tell why. Was the requirement incomplete? Did the agent misunderstand a bounded concern? Did it reopen a decision? Did the implementation violate an invariant? Did an old workaround leak into the final patch?
Cleaner contexts make failure easier to diagnose, at least most of the time.
The goal is not less context. It is cleaner direction
This is where the argument is easy to misread.
I am not arguing for vague prompts. I am not arguing that agents should work from one-line instructions. I am not arguing that long context windows are bad.
Coding agents need context. For real product work, they often need a lot of it.
They need to know the objective. They need the valid use cases. They need the invariants. They need the current behaviour. They need the relevant files, schemas, services, tests, and conventions. They need decisions already made. They need security, tenancy, compatibility, and layering constraints. They need non-goals. They need success checks.
What they do not always need is the full history of how those things were discovered.
Planning and discovery may require broad context. Focused analysis may require one use case and a relevant code path. Implementation may require the resolved specification and nearby files. Review may require the original specification, the resulting diff, test results, and risk areas.
There is no universal context bundle.
Context should change with the job. Google’s production guidance for multi-agent systems describes a similar operating rule as scope by default: give each model call or subagent the minimum context required, and let it reach for more through tools when needed.
That is the practical operating model I am beginning to trust.
Let the initial agent explore the messy problem. Watch for repeated oscillation between requirements. Stop adding corrections once the session begins circling. Extract the valid use cases and invariants. Separate resolved decisions from investigation history. Then either restart with a clean consolidated specification or partition distinct concerns into focused subagents.
Give each execution context only the information needed for its current role. Have the parent retain current conclusions and integration constraints, not every exploratory trace. Review the implementation against the consolidated requirement. Turn adjacent concerns into separate tasks rather than silently enlarging the current one.
The workflow-audit implementation was a small example, but it changed how I think about agentic coding work.
The fresh session did not have more information than the original one. It had less.
But what remained was authoritative: the workflow, the valid attribution paths, the distinction between authentication and attribution, and the invariants they shared.
More recently, splitting concerns into focused subagents was a more deliberate version of the same intervention.
In one case I reset the context. In the other I partitioned it.
Both worked because the agent no longer had to carry the full history of the disagreement into every new decision.
The context window is often treated like storage: the more we can fit into it, the more capable the agent should become.
I am starting to think it is better understood as a working environment.
What we leave inside it affects what the agent notices, reopens, and changes.
Coding agents do not always need the complete history of the task. They need a trustworthy representation of what is now true.

