Dec 10 2025

Case Study - Cognition x Cerebras

The Dawn of Real-Time Coding Agents

TL;DR

Powered by Cerebras Inference, Cognition’s SWE-1.5 and the SWE-grep family deliver frontier-level coding performance up to 13x faster than general-purpose models—keeping developers in flow while they explore codebases, ship features, and debug complex systems.

"Working with Cerebras lets us treat speed as a first-class design parameter. When your agent runs at ~1000 tokens per second, you have the opportunity to optimize all parts of the agent together, including context retrieval, UI, and model behavior. Cerebras enables us to pursue an entirely new set of bets, from deeper codebase understanding to fundamentally new interaction patterns."

Scott Wu

CEO, Cognition

The Challenge

AI is redefining software development, turning natural language prompts into working code. But for an AI coding assistant to be useful, it must feel instantaneous and handle large, complex projects seamlessly. Until now, AI coding on GPU meant frustrating delays—20 to 30 second generation times that broke a developer’s concentration. Even slight lags forced context-switching. Developers were stuck choosing between smaller, faster models that lacked skill and larger models that were too slow. The industry needed a solution that delivered more speed, consistency, and scale—without compromising intelligence.

The Solution

Cognition co-designed its agents, models, and inference stack end-to-end, and chose Cerebras as the fastest inference provider to bring this experience to life in Windsurf and Devin.

SWE-1.5 is Cognition’s latest frontier-size coding model with hundreds of billions of parameters and near-SOTA performance on difficult software engineering tasks. Served on Cerebras, SWE-1.5 runs at up to 950 tokens/second—6x faster than Haiku 4.5 and 13x faster than Sonnet 4.5—so developers no longer have to choose between “thinks fast” and “thinks well.” Developers now use SWE-1.5 as a daily driver for exploring large repositories, building full-stack applications, editing configs, and making fast, precise changes, like updating Kubernetes manifests, in under five seconds.

But they didn’t stop there. Cognition’s research showed that traditional agents spend more than 60% of their first turn just retrieving context—often taking 20+ seconds before making a single edit. Cognition developed SWE-grep and SWE-grep-mini, specialized sub-agents for highly parallel code search. Running on Cerebras Inference, they power Windsurf’s new Fast Context subagent, matching the retrieval quality of frontier coding models while cutting context-gathering time by an order of magnitude. The result: deep code understanding in seconds, not minutes.

By co-optimizing the model (SWE-1.5), agent harness (Cascade), and inference layer (Cerebras), Cognition delivers a cohesive agent experience tuned on real engineering workflows, not just benchmarks. With SWE-1.5 and Fast Context on Cerebras, plus parallel tool calls and highly optimized pipelines, search and reasoning collapse into a few seconds. Reinforcement learning on rich, real-world coding environments, combined with ultra-fast inference, produces an agent that feels like a real, pair-programming teammate.

Conclusion

Cognition’s SWE-1.5, SWE-grep, and SWE-grep-mini agents showcase what’s possible when agent labs and infrastructure providers co-design for speed and intelligence together. From frontier-scale coding models to specialized retrieval sub-agents, Cerebras Inference provides the throughput and latency required to keep engineers in flow and unlock the next generation of software engineering agents.

Download the Case Study