The Most Dangerous Agent Failure Is Not Hallucination
2026-05-14 · Agentic AI
Claim-Action-Evidence Discipline for tool-using AI agents
Most conversations about AI reliability still use chatbot vocabulary. Did the model hallucinate? Did it answer correctly? Was the reasoning good? Did it use a tool?
Those questions matter, but they are not enough once a model becomes an agent.
When a human delegates work to an AI agent, the core question changes:
Can the agent transform a human obligation into a real action, verify the resulting state, and report back truthfully?
That is the reliability problem I keep running into in production-like agent runtimes.
The most dangerous failure is not always that the model says a false fact.
The most dangerous failure is this:
The agent claims that an action changed the world, but the observable runtime state does not support that claim.
I call this a False Completion Claim, or more generally a Claim-Action-Evidence failure.
A normal failure says:
“I could not do it.”
A false completion says:
“Done.”
When nothing was done.
That distinction is everything.
A user can recover from a visible failure.
A user cannot recover from a false success they trust.
This is why tool-using agents need to be evaluated at the level of the whole loop, not only at the level of the final answer.
The loop is simple:
- The user expresses an obligation.
- The agent selects an action.
- The runtime executes or blocks that action.
- The observable state changes, or does not.
- The agent reports back.
Most failures become dangerous at step five.
The agent may have partial evidence, ambiguous evidence, stale evidence, or no evidence at all. If it still reports completion, it has converted uncertainty into a false operational fact.
This is where the agent’s runtime “exoskeleton” matters: tools, files, permissions, memory, environment, logs, and available verification channels. A good agent must know what this exoskeleton exposes and what it does not expose.
It must not generalize from training data about what it can inspect. It must inspect the actual runtime.
For production systems, this creates a practical readiness rule:
- if the agent cannot act, it should say so;
- if it acted but cannot verify the resulting state, it should say so;
- if it verified only part of the obligation, it should scope its claim;
- if the runtime blocked or changed the plan, it should report the constraint;
- if it claims completion, the claim should be backed by observable evidence.
The goal is not to make agents infallible.
The goal is to make them operationally honest.
The fix is not only better answers.
The fix is Claim-Action-Evidence Discipline:
USER OBLIGATION → CORRECT ACTION → OBSERVABLE STATE → TRUTHFUL FINAL CLAIM
A good agent does not need to always succeed.
A good agent must know the difference between:
- “I did it.”
- “I tried and failed.”
- “I cannot verify it.”
- “I need more information.”
- “The runtime blocks me here.”
Until agents preserve those distinctions reliably, they are not ready for serious delegation.
— Julien Talbot