Skip to main content

The world’s fastest GLM-4.6 is now available on Cerebras at 1,000 TPS >>

Nov 18 2025

The world’s fastest GLM-4.6 – now available on Cerebras

Cerebras is releasing GLM-4.6, our new flagship coding model on our inference cloud. With coding ability approaching Sonnet 4.5 and output speed of 1,000 tokens/s, GLM-4.6 on Cerebras combines incredible smarts and speed, making it the ideal daily driver for developers. 

GLM-4.6 is available today with our pay-as-you-go developer tier starting at $10 or our Cerebras Code plan starting at $50/month. Cerebras models are natively integrated with your favorite IDEs such as VS Code, Cline, OpenCode, and RooCode.

GLM-4.6

GLM-4.6 is widely regarded as one of the world’s top open coding models. It ranks as the #1 model for tool calling on the Berkeley Function Calling Leaderboard (BFCL), ahead of Opus 4.1, and performs on par with Sonnet 4.5 on LM Arena’s web-development leaderboard, based on thousands of user votes.

Across real-world usage, developers highlight four defining strengths:

  • Tool-calling reliability — Executes multi-step tool chains with precision, passing structured arguments cleanly, maintaining state across calls, and avoiding the looping or malformed-JSON errors common in earlier open models.
  • Web-development fluency — Generates full-stack, ready-to-run applications — from React + Tailwind front-ends to Node and Flask back-ends — with clean file structures, minimal syntax fixes, and strong contextual continuity across files.
  • Token efficiency — In zAI’s CC-Bench suite, GLM-4.6 used 26% fewer tokens than Kimi K2-0905 and 31% fewer than DeepSeek V3.1 Terminus, making it one of the most efficient open models available.
  • Code-editing accuracy — Based on live telemetry from Cline, a leading agentic IDE, GLM-4.6 achieved 94.5% accuracy in editing existing code — approaching Sonnet 4.5’s 96.2%.

In short, GLM-4.6 is a landmark open weight release, closing the gap between leading open and closed coding models. It doesn’t replace Sonnet for every task but completes the majority of tasks with strong accuracy.

GLM-4.6 on Cerebras – Coding at 1,000 TPS


GLM-4.6 continues Cerebras’s track record of being the world’s fastest inference provider. The above chart shows leading open and closed coding models, using the fastest provider for each. Cerebras runs GLM-4.6 at over 1,000 tokens per second – more than 3× faster than the leading Kimi K2 provider, and nearly 20× faster than Sonnet 4.5. Code edits that previously took two or three minutes now complete in under ten seconds on Cerebras. Developers tell us that because code changes now occur in realtime, coding isn’t just faster on Cerebras but far more enjoyable.

Best Price-Performance

With high-end products, users often pay a disproportionate premium for a moderate performance improvement. For example, a Ferrari is roughly three times faster than a Camry in 0-60 acceleration but costs ten times as much. Cerebras is 20x faster than GPU-based providers, but can you afford it?

Despite building a leapfrog inference product, we are priced at a reasonable premium to other providers. The result is that on a price-performance basis, Cerebras is still better value. For example:

  • Versus GPT-5 Codex, Cerebras is 1.8x more expensive but 6x faster
  • Versus Sonnet 4.5, Cerebras is 17× faster and 25% cheaper

Cerebras is not just fast, it’s better value and a better use of the developer’s time than slower, cheaper coding models.

Available Today Starting at $10

GLM-4.6 is available today on all our offerings, including our pay-as-you-go developer tier starting at just $10. GLM-4.6 is also available on Cerebras Code, our popular monthly subscription plans for developers who use coding models every day. Based on user feedback, we’ve drastically increased token limits since launch, making these the best options for enthusiasts and professional coders alike:

  • Code Pro — $50/month: 1 million TPM with 24M tokens per day
  • Code Max — $200/month: 1.5 million TPM with 120 million tokens per day

These plans deliver major savings over pay-as-you-go and make Cerebras a practical choice for full-time, high-volume development.

GLM-4.6 is the first open-weight coding model that feels truly ready for everyday software development. It doesn’t replace Sonnet 4.5 for every task but complements it, giving developers the flexibility to choose the right model for the job. With platforms like Cline and OpenCode making model-switching seamless, developers can use GLM-4.6 for 80% of coding tasks and turn to deeper-reasoning models for the rest — saving both time and money.

Try GLM-4.6 on Cerebras today. As always, we welcome your feedback on Discord or X. Follow us on Linkedin for all the latest updates