Cerebras is a day one launch partner for OpenAI’s new open-weight model, gpt-oss-120B, now available on Cerebras Cloud. Developers can run the model at 3000 tokens per second at full 128k context with streaming, high-throughput inference that scales from prototype to production.
Cerebras makes it possible to integrate gpt-oss-120B into demanding workloads—including agentic reasoning, knowledge retrieval, and long-context generation—with ease and speed.
Performance and Pricing
- Throughput: 3000 tokens per second
- Input: $0.25 per million tokens
- Output: $0.69 per million tokens
About the Model: OpenAI’s gpt-oss-120B
gpt-oss-120B is OpenAI’s most capable open-weight model, released under the Apache 2.0 license. It uses a Mixture-of-Experts architecture with 117 billion total parameters, 5.1 billion active parameters per token, and a 128-expert configuration across 36 layers. The model supports a 128k context window, enabling complex multi-turn reasoning and long-form memory.
The model was trained using techniques adapted from OpenAI’s proprietary systems, including reinforcement learning and supervised alignment. It delivers competitive results across key benchmarks, outperforming or matching o4-mini on MMLU, AIME, HealthBench, and TauBench.
gpt-oss-120B is designed for use in agentic workflows, is compatible with the OpenAI Responses API format, and supports flexible reasoning effort control.
Get Started
Try gpt-oss-120B now free in the Cerebras Cloud
For enterprise support, dedicated capacity, or custom deployment plans, contact us to learn more.