Alibaba's Qwen3 Coder 480B Instruct model is now available on Cerebras. Qwen3 Coder is one of the top coding models in the world with coding ability that rivals Claude 4 Sonnet and Gemini 2.5. Running on the Cerebras Wafer Scale Engine, Qwen3 Coder reaches an unprecedented 2,000 tokens per second. Coding problems that take 20 seconds on Sonnet 4 finish in just one second on Cerebras. To make Qwen3 Coder widely accessible, we are also launching Cerebras Code – two monthly subscription plans with generous rate limits at $50 and $200 per month.
Just two weeks after launch, Alibaba’s Qwen3 Coder 480B has soared in adoption, reaching #2 in OpenRouter’s coding model leaderboard, overtaking Gemini 2.5, DeepSeek V3, Kimi K2, and Claude 4 Opus. It’s widely praised as the first model that matches Claude 4 Sonnet – the industry’s leading coding model – in accuracy and dependability in real world software engineering tasks.
Cerebras is proud to take the world’s leading open-weight coding model and turbocharge it to 2,000 tokens per second. That means developers can generate 1,000 lines of JavaScript in just 4 seconds, versus 30 seconds on Gemini 2.5 Flash or 80 seconds on Claude 4 Sonnet. Instant code-gen returns developers to flow-state, eliminating the painful start-stop cadence of slow, GPU based code generation.
Cline – the leading coding agent for VS Code – is a great way to use Cerebras Inference. Simply select ‘Cerebras’ in the API Provider dropdown and select qwen-3-coder-480b as the model. Cline takes high level commands and is able to one-shot webapps or make sophisticated changes to existing projects.
Qwen3 480B is available today on Cerebras Inference Cloud and our partners OpenRouter and HuggingFace at $2 per million input or output token. We serve the model from our US based data-centers with 131K context, FP8 precision, and zero data retention.
To make instant AI coding widely accessible, we are launching two monthly subscription plans – Cerebras Code Pro at $50/m and Cerebras Code Max at $200/m. These plans offer equal or higher rate limits than comparable plans from Cursor and Anthropic while giving you 20x higher coding speed.