Skip to main content

Oct 13 2025

Cerebras Inference: Now Available via Pay Per Token

The fastest AI inference in the world is now just $10 away.

Today, we’re making Cerebras Inference available to everyone through pay-per-token pricing. Start building on the world’s fastest AI infrastructure for as little as $10 — no contracts, no friction, just add your credit card and go.

We believe our developer tier delivers the most compelling inference API in the industry. Run the world’s leading open weight models from Qwen3 235B Instruct and Thinking, GPT OSS 120B, and Qwen3 Coder 480B — all at 20x the speed of closed source model providers running on GPUs. Moreover, we’ve heard your calls for higher rate limits – our developer tier has over 10x higher limits than our free tier, so you can build, iterate, and scale without friction.

Cerebras Code Revamped with Higher Rate Limits

Our self-serve pay-per-token tier is the easiest way to start building. But for developers in the flow – experimenting, coding, and creating — there’s Cerebras Code. Cerebras Code Pro ($50/month) and Cerebras Code Max ($200/month) are designed for high-volume vibe coding, with discounted per-token pricing for uninterrupted, agentic coding sessions. The Max plan also unlocks enhanced rate limits up to 1.5 million TPM, giving you the freedom to build without slowdown.

For production workloads, our monthly subscription and enterprise tiers continue to offer the highest capacity, priority routing, and dedicated support from our team.

Get Started Today

To get started, visit cloud.cerebras.ai to grab your API key. Deposit $10 through our Billing tab and start building with the world's fastest inference today.