Skip to main content

Cerebras Systems and U.S. Department of Energy Sign MOU to Accelerate the Genesis Mission and U.S. National AI Initiative. Read more >>

Nov 18 2025

The world’s fastest GLM-4.6 – now available on Cerebras

Today, Cerebras is releasing GLM-4.6 — our most capable model yet on the Cerebras Inference API. GLM-4.6 brings major upgrades across reasoning, tool use, and coding, combining exceptional intelligence with an unmatched speed of 1,000 tokens per second on Cerebras. For many tasks, GLM-4.6 is comparable to Sonnet 4.5 in output while running 17x faster and 25% cheaper on Cerebras. GLM-4.6 is available today with our pay-as-you-go developer tier starting at $10 or our Cerebras Code plan starting at $50/month.

GLM-4.6

GLM-4.6 is widely regarded as one of the world’s top open coding models. GLM-4.5 is ranked #1 model for tool calling on the Berkeley Function Calling Leaderboard (BFCL), ahead of Opus 4.1. GLM-4.6 performs on par with Sonnet 4.5 on LM Arena’s web-development leaderboard, based on thousands of user votes.

Across real-world usage, developers highlight four defining strengths:

  • Tool-calling reliability — Executes multi-step tool chains with precision, passing structured arguments cleanly, maintaining state across calls, and avoiding the looping or malformed-JSON errors common in earlier open models.
  • Web-development fluency — Generates full-stack, ready-to-run applications — from React + Tailwind front-ends to Node and Flask back-ends — with clean file structures, minimal syntax fixes, and strong contextual continuity across files.
  • Token efficiency — In zAI’s CC-Bench suite, GLM-4.6 used 26% fewer tokens than Kimi K2-0905 and 31% fewer than DeepSeek V3.1 Terminus, making it one of the most efficient open models available.
  • Code-editing accuracy — Based on live telemetry from Cline, a leading agentic IDE, GLM-4.6 achieved 94.5 % accuracy in editing existing code — approaching Sonnet 4.5’s 96.2 %.

In short, GLM-4.6 is a landmark open weight release, closing the gap between leading open and closed coding models for reasoning, tool use, and coding. 

GLM-4.6 on Cerebras – Coding at 1,000 TPS


GLM-4.6 continues Cerebras’s track record of being the world’s fastest inference provider. The above chart shows leading open and closed coding models, using the fastest provider for each. Cerebras runs GLM-4.6 at over 1,000 tokens per second – more than 3× faster than the leading Kimi K2 provider, and nearly 20× faster than Sonnet 4.5. Code edits that previously took two or three minutes now complete in seconds on Cerebras. Developers say that fast code generation on Cerebras eliminates the need to multi-task while waiting for the model, making development more engaging and enjoyable.

https://x.com/simonfarshid/status/1987777400214843670

Best Price-Performance

With high-end products, users often pay a disproportionate premium for a moderate performance improvement. For example, a Ferrari is roughly three times faster than a Camry in 0-60 acceleration but costs ten times as much. Cerebras is 20x faster than GPU-based providers, but can you afford it?

Despite building a leapfrog inference product, we are priced at a reasonable premium to other providers. The result is that on a price-performance basis, Cerebras is still better value. For example:

  • Compared to GPT-5, Cerebras is 1.8x more expensive but 6x faster
  • Compared to Sonnet 4.5, Cerebras is 17× faster and 25% cheaper

Cerebras is not just fast, it’s better value and a better use of developers’ time than slower or cheaper coding models.

Available Today Starting at $10

GLM-4.6 is available today on all our offerings, including our pay-as-you-go developer tier starting at just $10. Our dev tier includes generous rate limits comparable to Anthropic and OpenAI and is supported in Cline, VS Code, OpenCode, RooCode, and others.

GLM-4.6 is also available on Cerebras Code, our popular monthly subscription plans for developers who use coding models every day. Based on user feedback, we’ve significantly increased token limits since launch, making these the best options for enthusiasts and professional coders alike:

  • Code Pro — $50/month: 1 million TPM with 24M tokens per day
  • Code Max — $200/month: 1.5 million TPM with 120 million tokens per day

These plans deliver major savings over pay-as-you-go and make Cerebras a practical choice for full-time, high-volume development.

GLM-4.6 on Cerebras combines near frontier intelligence, world class tool-calling, and unmatched speed. It doesn’t replace Sonnet 4.5 for every task but complements it, giving developers the flexibility to choose the right model for the job. With platforms like Cline and OpenCode making model-switching seamless, developers can use GLM-4.6 for 80% of coding tasks and turn to deeper-reasoning models for the rest — saving both time and money.

Try GLM-4.6 on Cerebras today. As always, we welcome your feedback on Discord or X.