Skip to main content

Jun 02 2025

Cerebras May 2025 Newsletter

From beating Nvidia with a new world record for Meta’s Llama 4 Maverick model, to launching the fastest real-time reasoning with Qwen3 32B, to powering developer favorite tools like OpenRouter and Quora’s Poe, May was another record-breaking month for Cerebras.

  • Llama 4 Maverick - Cerebras more than doubled Nvidia’s published performance, achieving 2,500+ tokens/sec per user - and it’s coming soon via Meta’s Llama API.
  • Qwen3 32BReasoning now runs in real-time, for the first time, through the combination of blazing fast wafer-scale technology and Alibaba’s state-of-the-art model for code gen, tool calling and advanced reasoning.
  • Supernova – Our biggest event of the year showcased the fastest AI inference and training in the world, and brought together customers, partners, and developers building what they can’t on GPUs.
  • Calling all developers – Have a bold idea? Join the Supernova Startup Program for priority access to Cerebras compute, new models, technical support, and marketing opportunities.

Cerebras Beats NVIDIA Blackwell in Meta Llama 4 Maverick Inference

Nvidia announced that 8 Blackwell GPUs in a DGX B200 could demonstrate 1,000 tokens per second (TPS) per user on Meta’s Llama 4 Maverick. This week, the same independent benchmark firm Artificial Analysis measured Cerebras at more than 2,500 TPS/user, more than doubling the performance of Nvidia’s flagship solution.

Cerebras has set a world record for LLM inference speed on the 400B parameter Llama 4 Maverick model, the largest and most powerful in the Llama 4 family. Artificial Analysis tested multiple other vendors, and the results were as follows: SambaNova 794 t/s, Amazon 290 t/s, Groq 549 t/s, Google 125 t/s, and Microsoft Azure 54 t/s.

For businesses, this unlocks faster time to market for reasoning-heavy and agentic AI applications — where speed isn’t just a metric, it’s a change agent. Find out more. 

Realtime Reasoning is Here: Qwen 3 x Cerebras

Qwen 3-32B is a dense, reasoning-optimized model that consistently outperforms on benchmarks like GPQA, ARC, and MATH — delivering intelligence scores on par with, or better than, GPT-4.1. It supports extended context windows and excels in tasks requiring multi-hop reasoning, scientific Q&A, and step-by-step logic. On Cerebras, Qwen runs at 2,400 tokens per second per user — making it the fastest production-grade deployment of Qwen in the world, ideal for real-time, high-IQ AI applications.

It’s designed as a drop-in replacement for GPT endpoints, enabling a seamless switch for production applications that demand faster, smarter inference. And for the first time in history, a reasoning model is running in real time — unlocking more intelligent results without delay.

What will you build?

🌟 Supernova: Built for the Most Ambitious AI Builders

Our flagship event brought together ~1,000 attendees and luminary speakers from Meta, Microsoft, Google, Mayo Clinic, GSK, AlphaSense, and more to showcase how ultra-fast AI inference is catapulting their success. From instant voice, robotics, and customer experience demos running 20x faster than GPUs, to deep tech talks on agents, instant AI, and foundation model acceleration — Supernova wasn’t just an event, it was a launchpad for a new era of AI applications, powered by the speed of Cerebras.

Watch the Keynotes

We’re excited to announce an exciting new program to give startups the AI tools they need to build the best product possible. Join today!

We’ve been moving fast — here are a few things you don’t want to miss:

  • OpenRouter and Quora Poe: Two of the most popular model-serving platforms now support Cerebras endpoints, delivering instant, high-speed responses for top open-source models like Qwen, Llama, and DeepSeek.
  • We are supporting Cerebral Valley’s Llama 4 hackathon with Meta for Developers on May 31st-June 1st in NYC! Sign up today! 
  • Cerebras x IBM: AI Without Compromise – We’re partnering with IBM to help enterprises accelerate AI adoption without choosing between bleeding-edge performance and enterprise-grade reliability.
  • No more waitlist for Cerebras Inference. Start here

Follow us to get on the list for 2025 events