May 20 2025

Bringing Cerebras Inference to Quora Poe’s Fast Growing AI Ecosystem

Now live on Poe: Cerebras, delivering the fastest inference in the world.

Build your own bots on Poe with Cerebras:
✅ Choose a model (Llama 3.3 70B, Llama 4 Scout, DeepSeek R1, DistilLlama 70B, Qwen 3 32B)
✍️ Add a prompt template
🔗 Connect your data
🧠 Chain it with other tools

Sneha Khanvilkar

We are excited to take another step in our mission to make lightningfast inference available everywhere. Cerebras Inference is now natively supported inside Quora’s Poe, the consumer-friendly chat hub that hosts a vibrant universe of user created AI bots built on dozens of models.

Poe’s mission is to make world class language models accessible in one place. By adding Cerebras to the roster, Poe’s 10M+ monthly users can now experience subsecond responses from models running entirely on Cerebras —GPUs can’t match this speed.

Why Poe + Cerebras?

Build your own bots. Pick any Cerebras model as a base bot, add a prompt template, connect your data sources, and even chain it with other models.
Remix conversations. Hand off a Cerebras chat to Eleven Labs for texttospeech or to a research bot. No copy-and-paste necessary.
Universal multimodal. Attach images to your prompt, even when chatting with “text only” models.
Unified billing. Pay for Cerebras usage through Poe’s single invoice—start free or upgrade your Poe subscription for higher quotas.
Explore the ecosystem. Compare Cerebras’ wafer scale latency and cost side-by-side with every other provider on Poe.

Get Started in less than 30 Seconds

Open Poe and click Create Bot.
Select Cerebras in the Base Model dropdown.
Paste a prompt template or keep Poe’s default.
(Optional) Attach an image—Cerebras handles multimodal logic.
Hit Save. Your bot is live, shareable, and billable through Poe’s single invoice.

Supported Models at Launch

Llama 3.1 8B
Llama 3.3 70B
Llama 4 Scout
DeepSeek R1 DistilLlama 70B
Qwen 3 32B

Want something else? Let us know on Discord or X—community demand drives our roadmap.

Lightning fast inference unlocks new AI use cases and is now accessible to an even broader developer community. We can’t wait to see what you build.

The world’s fastest GLM-4.6 is now available on Cerebras at 1,000 TPS >>

Bringing Cerebras Inference to Quora Poe’s Fast Growing AI Ecosystem

Bringing Cerebras Inference to Quora Poe’s Fast Growing AI Ecosystem