Your Agent's Context Window Is State, Not Input
And there’s a man aboard that ship — position (2,1), crew mess, Deck 2 — that you don’t know about yet, who has been there for nine hours already, who has a keycard to Bay 4, and who is running out of time in a different direction entirely.
Read the parenthetical again.
That’s a Warden (an AI game moderator) narrating story details to a player. The parenthetical includes a raw coordinate from the game’s internal state, leaked verbatim into prose that should never have contained it in the first place. Here’s how it got there, and what it taught me about context windows.
What You’re Looking At
I’m building Zoltar, an AI game moderator for solo tabletop role-playing game (TTRPG) play. Game moderators (GMs) in TTRPGs are responsible for describing the world, voicing non-player characters (NPCs), calling for dice rolls, and tracking what’s true in the fiction as the player moves through it.
Like human GMs, part of Zoltar’s job is to maintain knowledge of the fictional world and keep that knowledge hidden from the player. After all, there’s not much fun in pursuing a mystery if a not-very-clever prompt injection will give you the whodunit. When Zoltar gives away details that the player doesn’t know about, that it knows the player doesn’t know about, it’s failing at its job. It’s not just breaking immersion, it’s obliterating immersion. Saying, “And there’s a man aboard that ship that you don’t know about yet,” is like revealing which actor plays Keyser Söze in the opening credits of The Usual Suspects.
Why It Happened
In the session above, the model maintained the current game state by pattern matching against the prior tool calls in the context window. The model used what it could find — everything it could find — rather than just what it had been instructed to use. It was greedy.
Here’s a simplified version of the state that was included in the context window:
{
"gmContextStructured": {
"entities": [
{
"id": "npc_vasek_orin",
"type": "npc",
"visible": false,
"startingPosition": {
"x": 2,
"y": 1
}
}
]
}
}
The model avoided leaking the identifier npc_vasek_orin. It had been instructed in the prompt to keep these hidden from the player, and it did. But it leaked the content of these data structures, specifically the x and y coordinates of the NPC. It followed the letter of the instruction, but not its spirit.
The message history is a sort of Chekhov’s gun: if you introduce a gun on the wall in act one, the model will use it in act three.
Inject Authoritative State
The obvious fix is to track state separately and inject it on each new request. Luckily, I was already tracking state separately, so injecting it on each new request was just a matter of adding it to the API call.
This may have been enough to solve the problem, but I worried that the message history was still sitting in the context window. Would the AI be disciplined about which source to trust? With access to both the message history and the authoritative state, would it limit itself to the correct source? I decided not to find out.
I figured that if state can be reconstructed from the message history, sooner or later it would be. It’s the foot-gun on the wall.
Remove the Unreliable Source
I decided to aggressively prune the message history, dropping all but the most recent N messages. This provides the model with enough context to know what’s going on in the scene and maintain the thread for a conversation, but it’s not enough history for it to reconstruct the state.
However, the model still needs to keep track of everything that has happened in the story. Otherwise, it will describe a corridor as 5 meters long in one breath and as 8 meters long in the next.
To address this problem, I introduced the idea of a canon log1. The log contains true statements about the fictional world that the AI derives from its own writing. It’s a log of facts. A human GM maintains this state unconsciously, as part of the storytelling process. The AI has to be told to do it.
Here’s an example opening narration from a playtest where the player is an android, designated LX-7:
The station groans around you — a low, structural complaint from somewhere aft. Emergency lighting bleeds amber across the corridor walls. The air recyclers are running too hard and the air tastes faintly of burnt polymer. A woman stares at you from the far end of the walkway connecting Module 2 to Module 1. Mid-forties, Helix Dynamics lanyard still around her neck… “There’s a comms log on the captain’s terminal. Encrypted… I need someone with your capabilities to pull it up.” A pause. “I need to make sure it’s handled correctly.”
Zoltar broke this down into three facts that it stored as canon:
- LX-7 arrived at Kairos-9 and made first contact with Maren Voss in the walkway connecting Module 2 to Module 1.
- Maren has told LX-7 the crew is gone, the orbit is degrading (~90 minutes), and that a critical encrypted comms log exists on the captain’s terminal in Module 1 that she wants “handled correctly.”
- The walkway between Module 2 and Module 1 is the location of LX-7’s first scene. Emergency amber lighting is active throughout the station.
The canon log allowed me to prune the message history, which the model treated as implicit state. I replaced the message history with authoritative state because you can’t tell a model to ignore what’s in its context window; you have to not put it there.
Defense in Depth
I also added behavioral guardrails to prevent disclosing raw IDs, coordinates, and other game state to the player. As I described earlier, the model followed the directive to avoid disclosing raw IDs but found a loophole and revealed other bits of state. After this incident, I tightened those protections.
On their own, behavioral guardrails would have failed the same way the identifier rule did. They work because the structural fixes did the heavier lifting first.
The Context Window Is State
For agents with tool-use loops, the context window isn’t input — it’s state. When you fill the context window with message history, state gets composed implicitly, and implicit state drifts.
Instead of providing the model with a complete history and trusting it to reconstruct state correctly, provide the model with an authoritative state. Keep it from reconstructing state on its own by aggressively pruning the message history. That has kept Zoltar from leaking its secrets to the player.
Footnotes
-
If you look at the actual code, this log is called
proposedCanonfor reasons that are out of scope for this article. When writing this article, I simplified the canon mechanics to stay focused on the problem at hand and not get sidetracked. “Proposed canon” will be addressed in a future article. ↩