Skip to main content

Aug 05 2025

Cerebras Launches OpenAI’s gpt-oss-120B at a Blistering 3,000 tokens/sec

Cerebras is a day one launch partner for OpenAI’s new open-weight model, gpt-oss-120B, now available on Cerebras Cloud. Developers can run the model at 3000 tokens per second at full 128k context with streaming, high-throughput inference that scales from prototype to production.

Cerebras makes it possible to integrate gpt-oss-120B into demanding workloads—including agentic reasoning, knowledge retrieval, and long-context generation—with ease and speed.

Performance and Pricing

  • Throughput: 3000 tokens per second
  • Input: $0.25 per million tokens
  • Output: $0.69 per million tokens

About the Model: OpenAI’s gpt-oss-120B

gpt-oss-120B is OpenAI’s most capable open-weight model, released under the Apache 2.0 license. It uses a Mixture-of-Experts architecture with 117 billion total parameters, 5.1 billion active parameters per token, and a 128-expert configuration across 36 layers. The model supports a 128k context window, enabling complex multi-turn reasoning and long-form memory.

The model was trained using techniques adapted from OpenAI’s proprietary systems, including reinforcement learning and supervised alignment. It delivers competitive results across key benchmarks, outperforming or matching o4-mini on MMLU, AIME, HealthBench, and TauBench.

gpt-oss-120B is designed for use in agentic workflows, is compatible with the OpenAI Responses API format, and supports flexible reasoning effort control.

Get Started

Try gpt-oss-120B now free in the Cerebras Cloud

For enterprise support, dedicated capacity, or custom deployment plans, contact us to learn more.