HTTP requests are cheap. Vercel prices them at roughly $2 per million, a fraction of a cent per call. A single prompt to an agent running on a frontier model can cost $2. That is a million-times cost multiplier, and it makes inference theft one of the highest-margin attacks a bad actor can run today.
Vercel has seen this type of attack on their own APIs. That detail matters. This is not a theoretical threat model. It is something happening in production, against real infrastructure, right now.
The economics explain why. An attacker who finds an exposed AI endpoint can run inference essentially for free on your bill. The cost asymmetry is extreme: cheap to probe, expensive to serve. Every unauthenticated or poorly protected AI route you ship is a liability with a clear dollar value attached.
Traditional API abuse was painful but bounded. Scrapers and bots driving HTTP traffic could inflate costs, but the per-request price kept damage manageable. AI inference flips that math completely. One abused agent call costs what a million ordinary requests would. Rate limits and IP blocks designed for conventional APIs are not calibrated for this scale of per-request expense.
If you have AI endpoints exposed to the internet, the risk of abuse is high. That is a direct quote from the source, and it is the right frame for how builders should treat this.
What should you do today? Treat every AI endpoint as a high-value asset, not a regular API route. Apply authentication before anything touches your model. Add rate limiting scoped to the cost of inference, not the volume of HTTP calls. Monitor spend per user or session, not just aggregate traffic. If a single request can cost dollars, your abuse detection needs to fire at the request level, not after thousands of calls have landed.