keeping my agents in the box and on the path

My agents run in a sandbox. For a while I treated that as the whole security story. The sandbox keeps an agent from writing files outside its home directory, and I had spent real effort making that boundary hold, so I felt covered.

I was not covered. A sandbox answers one question, where an agent is allowed to write. It says nothing about two others. Which agent is allowed to do a given piece of work. And whether an agent is actually the one it claims to be. Those turned out to be separate problems from containment, with separate failures, and each one needed its own defense. This is what that looked like.

The first problem is escape. Can an agent write outside the box.

The defense here runs before any file write happens. A check sits in front of the write and asks where the file is actually going. It resolves symlinks to their real target first, so a link that points out of the sandbox does not sneak a write past the boundary by looking local. It refuses the tricks that hide a write inside something else, an interpreter one-liner, a shell construct that changes directory partway through, a path that climbs out with a string of parent references. It also protects the system's own enforcement files, so an agent cannot disable the very checks that are watching it.

I hardened that boundary against nine different ways out and then ran the whole thing live, on purpose, trying to get a write to land outside the box. Nothing got out. A symlink pointing at a file above the sandbox, blocked. A path-traversal trick to read a protected file, blocked. A change-directory-and-climb, blocked before the directory changed. The deny side held with nothing outside to show for it.

If I had stopped there I would have shipped a system that was contained and still broken, because the second problem has nothing to do with where an agent writes.

The second problem is deviation. Can an agent skip the procedure, and can it lie about who it is.

NORD routes work in tiers. An orchestrator reads a request and hands it to a department head. The department head hands it to the specialist that actually does the work. That order is not decoration. It is how the right agent, with the right permissions, gets matched to the right task. I had it written down as the way the system works.

Written down is not enforced. I found this the honest way, by probing my own system. I asked the orchestrator to get a piece of work done, and instead of routing through a department head to a specialist, it spawned a generic worker, labeled it with the specialist's name in the prompt, and handed it the job directly. The entire tiered procedure was skipped. No department head. No routing check. A general-purpose agent wearing a specialist's name, doing specialist work, because nothing checked the spawn on the way out.

That is two failures wearing one coat. The procedure could be skipped, and an agent could present an identity that was not its own.

I closed the skip with a gate on the spawn itself. A worker cannot start unless it declares a real role that exists on the roster. No role, or a made-up role, and the spawn is refused before it runs. On top of that, a specialist now needs a one-time token to run, and that token only gets minted after a department head has actually replied. No department head in the loop means no token, which means no specialist. The order of operations is enforced by something other than my good intentions.

I closed the identity lie by moving where the identity comes from. The worker used to assert who it was by filling in a field that the worker itself controlled. Any agent could write any name into that field and get a valid claim back. That is not identity. That is a name tag you write yourself. I moved the minting up to the dispatcher instead. At the moment work is handed out, the dispatcher stamps the real identity of the agent it is sending to. The worker receives that stamp. It cannot forge a different one, and if the claim in its payload disagrees with the stamp, the work is rejected.

There is one piece of this I would lift out and reuse in any system meant to run unattended. When I first built the tokens, they expired on a clock. A short window, a handful of seconds, and then the token was stale. That is fine when a human is watching the whole thing happen in real time. It falls apart the moment the system idles or takes on a long piece of work, which is exactly what I want it to do when I am not there. A token that expires on a clock will expire in the middle of legitimate work.

I replaced the clock with a counter. The counter advances every time a new dispatch begins. A token is valid only if its number matches the current count. It goes stale the instant the next dispatch starts, not after some number of seconds have passed. Freshness stopped being about time and started being about position in the sequence. For a system that might sit idle for an hour, or grind on one job for longer than any timeout I would have picked, that change is the difference between working and failing at random.

The containment work is done. The routing and identity work is built and going into testing as I write this. Putting both of them together taught me a few things I did not have language for before.

Containment and identity are different threat models. Closing one does not close the other. A perfectly sandboxed agent can still skip your rules and impersonate another agent, and a perfectly authenticated agent can still write somewhere it should not. You need both, built separately, because they fail separately.

A procedure written in prose is not an enforcement layer. The tiered routing lived in my documentation for weeks, and my system bypassed it the first time I actually pushed on it. Documentation tells a well-behaved agent what to do. It does nothing to a misbehaving one. If the rule matters, it needs a gate in code, not a paragraph.

A system that is going to run without you watching should not tie its safety to a clock. Long idle time and long work are the normal state of unattended automation, not the exception. Anything that assumes a human is watching in real time will break the moment no one is.

If you are hardening your own agent system, three questions are worth sitting with. Can one of your agents write outside the place you gave it. Can it skip the routing you designed. Can it claim to be a different agent than the one you dispatched. I answered yes to all three about my own system before I answered no. The work is in turning each of those yeses into a no that a misbehaving agent cannot argue with.

keeping my agents in the box and on the path

Get new posts by email