AI coding tools still fail in boring ways: bugs in Claude Code, Codex, and Gemini CLI
A March 2026 empirical study on AI coding tools found that many user-visible failures are not exotic model failures. They are API errors, terminal problems, command failures, configuration issues, and integration friction.
The data point
The arXiv paper "Engineering Pitfalls in AI Coding Tools" studies bugs in Claude Code, Codex, and Gemini CLI. Its reported symptom distribution includes API errors at 18.3%, terminal problems at 14%, and command failures at 12.7% among observed user-facing symptoms.
Why this matters for builders
The biggest day-to-day losses in AI coding are often operational. A model can be strong and still waste a session through bad environment handling, repeated shell failures, or fragile tool calls. Teams should track tool friction as part of AI productivity, not treat it as random noise.
Token waste connection
Every failed command can trigger another diagnostic loop. Every misconfigured CLI can burn context as the agent rereads files and retries. Tool reliability is therefore part of token economics.