three things I won't let an AI agent do without my confirmation

I run agents across most of my operational work at NDC. They generate tickets, write documentation, route tasks, draft content, enforce standards, and keep track of what happened in the last session so the next one picks up where it left off. They handle a lot.

There are three things they do not handle without me confirming first. The list is short on purpose. I want agents moving fast on the routine stuff. Friction where it does not need to exist is just friction. The confirmation gates only exist where the downside of a wrong action is genuinely asymmetric: the cost of being wrong is much bigger than the cost of pausing for two seconds.

irreversible destructive actions

The first category is anything you cannot undo. Deleting files. Dropping database tables. Force-pushing to a remote branch. Running git reset --hard. Removing a dependency. Wiping a config.

An agent can suggest any of those. I confirm before it runs. Every time, without exception.

The reason is simple. A wrong reversible action costs me a few minutes to fix. A wrong irreversible action costs me permanently, or costs me the time it takes to reconstruct something from a backup, assuming a backup exists. There is no version of the math where saving two seconds of confirmation is worth that downside. The asymmetry is the whole argument.

This also keeps me honest about what the agent is actually doing. If I have to confirm a destructive operation, I have to actually read what the operation is. That's a feature, not a tax.

anything that reaches another person

The second category is any output that goes from the agent to a human being. Email. A message to a teammate. A reply to a client. A comment on a pull request. Anything posted to social media.

The moment another person reads the output, it carries my name. My agent's mistake becomes my message. I do not let agents write external-facing communication on my behalf. What they can do is surface what I should remember to address: a point from an earlier thread, a deliverable I committed to, a question that is still open from the last conversation. They put the outline in front of me. I write the actual message.

This matters most with client-facing communication. Clients are not in a position to distinguish between something I wrote and something an agent wrote on my behalf. They would read it as me. If the agent got the tone wrong, or misremembered a detail from context, or summarized a decision incorrectly, that would be my error to own. I would rather spend the time writing the message myself than spend thirty minutes walking back a version that landed wrong.

The same logic applies to anything with a teammate. The working relationship depends on communication that actually came from me. The agent helps me remember what to say. It does not say it.

spending money or calling paid APIs above a threshold

The third category is compute and cost. Calling an LLM on a large context window. Hitting a per-call-priced API beyond what the task should reasonably need. Spinning up cloud resources that bill by the hour.

An agent that gets stuck in a retry loop, or that does not understand that a task is scoped, can run up a real bill without any visible sign that something has gone wrong. The only protection against that is a gate. The gate is task-specific. I set what I expect a normal run to cost. If the agent is about to go meaningfully above that, it stops and waits for me to confirm.

The principle is not about saving every dollar. The principle is that runaway cost is almost always a symptom of runaway behavior. Either the task scope expanded beyond what I intended, or the agent is doing something I did not ask it to do, or it is retrying a failed operation it cannot fix. Any of those is worth stopping for.

the frame behind the list

I think about these three categories the same way I think about what I would and would not let a capable junior team member do without checking with me first.

A capable junior team member can handle most of the work. That's the point of having them. I'm not going to review every ticket they write or every line of code they commit. That defeats the purpose. Where I want to stay in the loop is the same three places: anything that cannot be undone, anything that goes out with my name on it, and anything that spends money beyond the budget I set.

That frame has held up across six months of building and running NORD. The list has not changed. The threshold values have been adjusted as I've learned what normal looks like. The categories themselves have stayed the same since I first wrote them down.

The real discipline is not adding to the list reactively. Every time an agent does something I did not expect, the instinct is to add a new confirmation gate. I have learned to wait. The rule I actually use: a new gate goes in only when the same issue shows up more than once and proves the system needs tightening there. One-off surprises usually get fixed at the instruction, context, or enforcement layer instead. Piling up confirmation requirements until the agent is asking for permission on everything defeats the whole point.

Three gates. That's the number I've landed on. It seems to be enough.