Skip to main content

GLM-4.7 from Z.ai is live on Cerebras at 1,000 TPS! Frontier intelligence for coding, tool-driven agents, and multi-turn reasoning. >>

Pricing

Inference API access

Free

The easiest way to get started with Cerebras

  • Access to all Cerebras powered models
  • The world’s fastest inference – 20x faster than OpenAI and Anthropic
  • Community support via Discord
get api key

Developer

Generous rate limits for power users

Everything in Free

  • Self-serve payment starting at just $10
  • 10x higher rate limits than free tier
  • Higher priority processing
get api key

Enterprise

Highest throughput, custom weights, and guaranteed uptime

Everything in Developer, plus:

  • Highest rate limits for production workloads
  • Lowest latency with dedicated queue priority
  • Support for custom model weights
  • Model fine-tuning and training services
  • Dedicated support team with response time guarantees
contact sales

Cerebras Code

Pro
$50/month

  • Top open source model access with fast, high-context completions.
  • Send up to 24 million tokens/day ($48/day worth of value)
  • Ideal for indie devs, simple agentic workflows, and weekend projects.
sign up

Max
$200/month

  • Top open source model for heavy coding workflows.
  • Send up to 120m tokens/day ($240/day worth of value)
  • Ideal for full-time development, IDE integrations, code refactoring, and multi-agent systems.
sign up

Developer tier Pricing

*Preview models are intended for evaluation purposes only, and are not intended for use in production environments. They may be discontinued at short notice.

**Scheduled for deprecation Jan 20, 2026 as part of ongoing efforts to serve the most up-to-date models.

Partners

Get access to Cerebras Inference through our partner APIs