Writing
Writing, talks, doctrine.
What AI changes in real work — through dated essays.
The thread
How do we name what AI transforms in real work?
Dated essays and method notes — one through-line.
Articles
Developed versions.
You do not remove a task. You move the real work.
Why an AI project does not merely remove tasks: it moves workload, exceptions, judgment, responsibility, and sometimes the meaning of work.
AI Agents Are Operators. They Need Ergonomics.
Benchmarks test models. Real work tests situated operators: model, harness, tools, permissions, evidence, recovery, and human belief.
Making Grok Act: Notes From a Production System-Prompt Fix
An XML system block that forbids intent narration: one repro (10 min → 33.9 s), corpus measures, an honest A/B, and what the patch cannot fix.
The Enterprise Agent Problem Is Belief
An agent is not enterprise-ready because it can act. It is enterprise-ready when the belief it creates about its action is calibrated to evidence.
An AI talk should talk about real work
Why a useful AI talk should start from what AI changes in work, decisions, responsibility, evidence and human recovery, not from tools alone.
Raw Traces Are Not Evals
The missing layer between real agent failures and measurable model progress: reducing messy traces into replayable eval seeds without laundering the signal.
Tool Use Is Not Task Completion
Why agent reliability depends on preserving the boundary between intention, action, observable state, and truthful final claims.
The Most Dangerous Agent Failure Is Not Hallucination
Why the critical risk is not only hallucination, but an agent claiming work is done without observable evidence.
AI Agents Are Not Just Tools. They Are Work Systems
Why task completion is not enough: an agent redistributes verification, responsibility, coordination, and recovery work.
AI Is Not Integrated Into Work. It Reconfigures It.
Why AI must be analyzed from real activity: constraints, trade-offs, invisible cooperation, room to act, and responsibility.
The Benchmark Lie: Why Grok 4.20 Excels in Benchmarks but Fails in Production
A cognitive ergonomist dogfoods Grok 4.20 across 12+ production agents. What 40 years of human factors research says about what benchmarks miss in agent loops.
Connections
Beyond articles.
Contact
Email Julien
Share three lines: the situation, the decision to make, and what makes the topic difficult, costly, or sensitive.