Perplexity Agent API adds fallback chains while deprecating older Gemini routes
Perplexity's latest developer updates are not just feature additions. They are a reminder that agent builders now have to manage two problems at once: orchestration reliability and constant provider churn.
/v1/models endpoint, and OpenAI-compatible routing while deprecating older Gemini routes.What Perplexity changed
In its March and April 2026 docs updates, Perplexity positioned the Agent API as a managed runtime for agentic workflows, not just another wrapper over model calls. The company says the runtime can orchestrate retrieval, tool execution, reasoning, and multi-model fallback through one endpoint. It also added more third-party model options including GPT-5.4, Claude Sonnet 4.6, NVIDIA Nemotron, and Gemini 3.1 Pro Preview.
At the same time, the changelog says older Gemini routes were deprecated and removed in quick succession. google/gemini-2.5-flash was removed on March 20, 2026. google/gemini-2.5-pro and google/gemini-3-pro-preview followed on April 1. Perplexity also added a new unauthenticated GET /v1/models endpoint so builders can inspect current availability before hard-coding assumptions.
Why this matters for agent builders
There are two ways to read this release. The optimistic read is convenience: one API key, one agent runtime, one compatibility layer, and easier swapping across frontier providers. The more operational read is that routing is now part of your reliability surface. If your agent depends on a specific reasoning shape, search behavior, or structured-output quirk, a fallback chain is not a free abstraction.
That is especially true for long-running research agents and coding agents. A clean abstraction can still create messy spend when a fallback model makes extra tool calls, expands context more aggressively, or behaves differently under the same prompt contract. Model churn becomes token churn fast.
The TRH angle: reliability can hide waste
Builders often treat model fallback as purely positive because it improves uptime. It does improve uptime. But it can also mask a degraded cost profile. If one route fails and another route completes the job with longer reasoning, more searches, or weaker first-pass precision, the task still "works" while token efficiency quietly drops.
That is why Perplexity's update matters. It makes the API more useful, but it also makes observability more important. Teams should log which model actually answered, how many steps were used, how much context was consumed, and whether fallback materially changed the output or spend.
What builders should do next
First, stop assuming your preferred provider route will still exist next month. Poll the models endpoint, pin the models you truly depend on, and keep a tested migration map for each agent. Second, compare cost and behavior across fallback chains with the same task set instead of trusting "OpenAI-compatible" as a guarantee of equivalent output.
If you are building OpenClaw-style or terminal-first research flows, Perplexity's updated search integrations and structured results are useful. Just do not let convenience hide the fact that your effective runtime changed.