Skip to main content

Aug 06 2025

OpenAI GPT OSS 120B Runs Fastest on Cerebras

OpenAI’sGPT OSS 120B model is now available on Cerebras. The first open weight reasoning model by OpenAI, OSS 120B delivers model accuracy that rivals o4-mini while running at up to 3,000 tokens per second on the Cerebras Inference Cloud. Reasoning tasks that take up to a minute to complete on GPUs finish in just one second on Cerebras. OSS 120B is available today with 131K context at $0.25 per M input tokens and $0.69 per M output tokens.

GPTOSS120B is a 120 billion parameter mixture-of-expert model that delivers near parity performance with OpenAI’s popular o4mini on core reasoning benchmarks. It excels at chain of thought tasks, tackling coding, mathematical reasoning, and health related queries with class leading accuracy and efficiency. With its public weights release under Apache 2.0, it offers transparency, finetuning flexibility, and the ability to run on the Cerebras Wafer Scale Engine in the cloud and on-prem.

Cerebras is proud to offer launch-day support for OSS 120B. On OpenRouter, Cerebras was measured at 3,045 token/s– 15x faster than the leading GPU cloud. Artificial Analysis found that Cerebras offered the best combination of speed and latency, with time to first token of just 280 milliseconds and output speed of 2,700 tokens/s.

Inference speed is critical for agentic and coding applications. Reasoning models are still used sparingly in production workloads as they can take up to a minute to produce the final answer. Cerebras runs OSS 120B so fast, it returns answers as quickly as non-reasoning models. Artificial Analysis found that Cerebras was the only provider that returns the first answer token in a single second – comparable to popular instruct models like GPT-4.1 and Claude 4 Sonnet.

GPT OSS 120B is available today on the Cerebras Cloudand through our partners HuggingFace, OpenRouter, and Vercel.

Try GPT OSS 120B on Cerebras →