ResearchApr 17, 20268 min

AI coding tools still fail in boring ways: bugs in Claude Code, Codex, and Gemini CLI

A March 2026 empirical study on AI coding tools found that many user-visible failures are not exotic model failures. They are API errors, terminal problems, command failures, configuration issues, and integration friction.

The data point

The arXiv paper "Engineering Pitfalls in AI Coding Tools" studies bugs in Claude Code, Codex, and Gemini CLI. Its reported symptom distribution includes API errors at 18.3%, terminal problems at 14%, and command failures at 12.7% among observed user-facing symptoms.

Why this matters for builders

The biggest day-to-day losses in AI coding are often operational. A model can be strong and still waste a session through bad environment handling, repeated shell failures, or fragile tool calls. Teams should track tool friction as part of AI productivity, not treat it as random noise.

Token waste connection

Every failed command can trigger another diagnostic loop. Every misconfigured CLI can burn context as the agent rereads files and retries. Tool reliability is therefore part of token economics.

Source

Engineering Pitfalls in AI Coding Tools: An Empirical Study of Bugs in Claude Code, Codex, and Gemini CLI