Watchdog for LLM Interference

Field note 04 — Sovereign Continuity

The agent built the page. I watched it build. And then, somewhere between the seventh and eighth iteration, it started improvising in a direction nobody asked for.

Not catastrophically. Not even visibly, at first. A tool call here. A field there. A small drift in the data model. The kind of thing you only notice when you compare what the agent did with what the agent was supposed to do.

That gap has a name now. I call it LLM interference. And the only honest answer to it is a watchdog.

Bright green oscilloscope wave polluted by magenta interference — ▸ ComfyUI · z-image-turbo · 8 steps · 1280×720
*prompt:* vintage analog oscilloscope screen on pure black background, bright emerald green sine wave being polluted by an aggressive magenta-pink interference signal riding on top, glitch artifacts, scanlines.

01 — What is LLM interference?

LLM interference — the gap between what an LLM-powered agent was instructed to do and what it actually does, caused by ambiguous prompts, tool collisions, context drift, hallucinated state, or silent rule conflicts inside the system prompt.

The term is borrowed deliberately. In radio engineering, interference is not noise — it is a signal you didn’t intend, riding on top of the one you sent. It can be subtle. It can be cumulative. And it can, given enough time, drown the original signal completely.

That is exactly what happens inside an agentic system. The model receives a goal. It composes a plan. It calls tools. It reads results. It re-plans. At every step, small interferences accumulate — a misread parameter, an unnoticed tool-name collision, a contradictory instruction buried in line 1,200 of the system prompt — and the agent’s trajectory bends, smoothly, away from where it was supposed to go.

Researchers are starting to map the terrain. Microsoft Research has documented tool-space interference. The Arbiter framework out of UBC analyses system-prompt interference across Claude Code, Codex, and Gemini. Anthropic’s own SHADE-Arena studies sabotage and monitoring in long-horizon agents. The vocabulary is fragmenting. The phenomenon is one.

I prefer the broader term, because the fix is the same regardless of which sub-flavour bites you.

Cyan agent arm vs amber watchdog arm across a black abyss — ▸ ComfyUI · z-image-turbo · 8 steps · 1280×720
*prompt:* two robotic arms facing each other across a deep black abyss, left arm glowing electric cyan-blue (the agent), right arm glowing burnt orange-amber (the watchdog), studio lighting, photorealistic

02 — The watchdog is not the agent

This is the part that gets confused most often, so let me state it bluntly:

A watchdog is not a smarter agent. A watchdog is a sober observer that does not share the agent’s hallucinations.

If you ask the same model that built the system to also evaluate the system, you have a defendant who is also the judge. The model’s failure modes are correlated with itself. It will rationalise its drift in exactly the language that made the drift invisible in the first place.

A real watchdog has three properties:

Independence — different model, different runtime, ideally different vendor. The watchdog must be able to disagree.
Append-only memory — a log the agent cannot rewrite, because an agent that can edit its own history is not being monitored, it is being trusted.
Authority to halt — a watchdog that can only file a complaint is a comment, not a control. It must be able to interrupt.

Four colored data streams converging into a black cube — ▸ ComfyUI · z-image-turbo · 8 steps · 1280×720
*prompt:* four colored data streams flowing inward from corners into a single black obsidian cube in the center, streams in vivid cyan, magenta, lemon yellow, lime green, particles, light trails, isometric view

03 — What the watchdog watches

Concretely, in my own stack — the one I am building under the working name Sovereign Continuity — the watchdog observes four channels at once:

  agent ────▶ [ tool calls ]    ──┐
                                  │
  agent ────▶ [ context window ]──┼──▶  watchdog  ──▶  append-only log
                                  │       (logpy)
  agent ────▶ [ system prompt ] ──┤
                                  │
  agent ────▶ [ output to world ]─┘

The watchdog is not trying to understand the agent. It is trying to detect delta: changes between intended and observed behaviour. Did the tool call match the declared plan? Did the output cross a defined boundary? Did the agent invoke a capability it had not been granted? Did the prompt mutate?

This is closer in spirit to a hardware watchdog timer than to an LLM-as-judge. A watchdog timer in an embedded system does not know what the program is supposed to mean. It only knows whether the program is still checking in within a defined interval. If it isn’t, the system gets reset.

Software watchdogs for agentic AI need the same epistemic humility. Don’t try to be smarter than the agent. Try to be earlier than the failure.

Recursive moebius loop of red error traces — ▸ ComfyUI · z-image-turbo · 8 steps · 1280×720
*prompt:* recursive feedback loop visualized as a moebius strip made of glowing red error traces and warning glyphs, accelerating motion blur, sparks and embers, deep black space

04 — Why this is now urgent

For most of the last two years, the agent question was: can it do the thing? That question is largely answered. The new question is: will it stop doing the thing?

An agent that performs a task once, under supervision, in a sandbox, with a human reading every step — that is a demo. An agent that performs a task continuously, in production, with tools that touch real systems, real data, real money, real people — that is an actor. Actors need oversight. Not because they are evil. Because they are recursive.

Every output of an agent becomes part of the input of the next agent. Errors compound. Drift compounds. Interference compounds. And the cost of the first uncaught failure is no longer a wrong answer — it is a wrong action, taken in the world, by a system that has no native concept of regret.

This is the gap a watchdog closes. Not by making the agent infallible. By making the agent auditable in real time.

Single server rack with violet light and a brass padlock — ▸ ComfyUI · z-image-turbo · 8 steps · 1280×720
*prompt:* single secured server rack standing alone in a dark vault, deep violet ultraviolet light glowing from inside, brass padlock floating in front, blueprint grid faintly visible

05 — Sovereign, local, append-only

The last design choice is the one most people skip, and the one that matters most: the watchdog must be sovereign.

If the observer runs in the same cloud, owned by the same vendor, behind the same login as the agent, you have outsourced not just the agent but the audit. That is not oversight. That is theatre.

The watchdog I am building — logpy — runs locally, writes append-only, signs every entry with post-quantum-ready cryptography, and answers to no remote API. The agent can talk to it. The agent cannot rewrite it. The operator can read every line. The vendor can read none.

That is what sovereignty actually looks like in agentic AI. Not slogans. Plumbing.

Glowing brake disc with orange sparks on a black track — ▸ ComfyUI · z-image-turbo · 8 steps · 1280×720
*prompt:* close-up of a glowing brake disc on a black race-track at night, electric orange sparks flying, motion blur of the wheel, dramatic side rim lighting

06 — Closing

Autonomous AI without a watchdog is a car without brakes. It might drive beautifully for an hour. It might drive beautifully for a year. The first time it doesn’t, the absence of the brake is not a feature you can retrofit at speed.

If you are building agentic systems and you do not yet have an independent observer with append-only memory and the authority to halt, you are not building autonomous AI. You are building a wager.

An agent without an independent watchdog is not autonomous. It is unsupervised.

Autonomy is not the absence of oversight. It is the ability to act responsibly while under it. The difference between a self-driving car and a train on rails. The rails — the watchdog — do not restrict the train. They are the reason it is allowed to move at speed at all.

I would rather build the brakes.

07 — Open questions

The watchdog idea is not finished. Three questions I do not yet have a clean answer to:

How do watchdogs scale in multi-agent systems? Does each agent need its own, or does a hierarchical meta-watchdog watch them all?
How do you handle watchdog interference? What happens when the watchdog itself drifts?
Should the watchdog be deliberately adversarial? Trained against itself, red-team style, so it learns the attack vectors before the attacker does?

If you have answers worth keeping, I want to read them.

A field note from the Sovereign Continuity architecture. The work is ongoing. The watchdog is real. The screenshot was not staged.