Today, we’re announcing that OpenAI’s new GPT-5.3-Codex-Spark model, powered by Cerebras, is rolling out today in research preview. This marks the first release in our collaboration between Cerebras and OpenAI. Codex-Spark is designed for real-time software development where responsiveness matters as much as intelligence. Powered by the Cerebras Wafer-Scale Engine, it runs at over 1,000 tokens/s, enabling near-instant feedback in live coding environments.
Agentic coding has fundamentally changed software development. For the first time, machines can autonomously work for hours or days without human supervision. But this mode of interaction can also leave developers feeling out of the loop with long wait times and less opportunity to direct the work. As software development is iterative, developers need to inject taste, direction, and sensibility along the way. Codex-Spark is designed for working with Codex in real-time. It is fast, responsive, and steerable, putting the developer in the driver’s seat.
Codex-Spark is a highly capable small model optimized for fast inference. On agentic software engineering benchmarks such as SWE-Bench Pro and Terminal-Bench 2.0, it produces more capable responses than GPT-5.1-Codex-mini while completing tasks in a fraction of the time.
Codex-Spark excels at making precise edits, revising plans, and answering contextual questions about your codebase. It’s a fast way to visualize new layouts, refine styling, and test new interface changes.
"Cerebras has been a great engineering partner, and we’re excited about adding fast inference as a new platform capability. Bringing wafer-scale compute into production gives us a new way to keep Codex responsive for latency-sensitive work, and we’re excited to learn from developer feedback on how to compose our compute capabilities into one smooth workflow," said Sachin Katti, Head of Compute at OpenAI.
Codex-Spark is just a taste of what’s possible on Cerebras hardware. Our broader goal is to accelerate a wide spectrum of AI workloads across both real-time and asynchronous use cases. Our purpose-built Wafer-Scale Engine features the largest on-chip memory of any AI processor, enabling high-speed inference at thousands of tokens per second per user. The architecture scales out to thousands of systems, extending fast memory capacity into the multi-terabyte domain to support trillion-parameter models for both training and inference. We expect to bring this ultra-fast inference capability to the largest frontier models in 2026.
Codex-Spark is rolling out as a research preview for ChatGPT Pro users across the Codex app, CLI, and VS Code extension, with API access rolling out to select design partners. Codex-Spark rolling out today.