Nvidia's Nemotron 3 Ultra is now available on Vercel AI Gateway. For engineers building multi-step agent systems, that combination matters: a capable open reasoning model, accessible through a single gateway layer, without standing up your own inference infrastructure.
Nemotron 3 Ultra is a Mixture-of-Experts model built specifically for orchestrating long-running agent workflows. Its design targets the hard parts of agentic systems: planning, tool use, sub-agent delegation, and error recovery. These are not simple one-shot tasks. They require a model that can hold context across many turns and recover gracefully when something goes wrong.
The context window is 1 million tokens. That is a meaningful ceiling for agents that need to track extended conversation histories, large tool outputs, or accumulated state across many steps.
On the performance side, throughput reaches up to 350 tokens per second. The model also comes with up to 30% lower cost, though the source does not specify the baseline that figure compares against. Still, for teams running high-volume agent workflows, a cost reduction at that scale is worth paying attention to.
The Vercel AI Gateway integration means you can route requests to Nemotron 3 Ultra alongside other models without changing your application's core architecture. That is useful if you are already using the gateway and want to experiment with a reasoning-focused model on specific agent tasks, like planning loops or tool-heavy sub-tasks, while keeping other parts of your stack unchanged.
What should you do with this today? If you are running multi-turn agent workflows and hitting context limits or cost pressure, Nemotron 3 Ultra is worth a direct test. The 1M token window and the explicit design focus on planning and error recovery make it a concrete fit for the use cases where most agent systems break down. Pull it in through the Vercel AI Gateway, route a representative workflow through it, and measure against your current setup on both output quality and cost.