Developers can now run Cerebras inference inside Docker containers, deployed with Docker Compose, to create safe and reproducible environments for AI-generated code. By combining Docker’s containerization with Cerebras’ inference speed, teams can build and evaluate new code quickly while keeping experiments isolated and repeatable.
Cerebras and Docker: Speed Meets Safety
Cerebras runs the fastest AI inference in the world. The tool can generate code at more than 2,500 tokens per second.
An easy and practical way to run Cerebras’ code is with Docker Compose. Compose simplifies running complex, multi-container applications, like agentic applications combining the agent loop, tools, and other supporting services. Run one command, and, with a single configuration file, start every service in your product without depending on specific agentic framework details. So, Compose is well equipped to allow users to deploy Cerebras agents alongside local models.
Developers can pair a lightweight local model like Qwen3 4B Q4_K_XL for conversation along with Cerebras for large-scale generation in the same setup. This structure makes it straightforward to balance speed, cost, and safety in daily development.
Together, Docker and Cerebras form a workflow that accelerates development while keeping environments clean and consistent. Developers can generate code with high-speed inference and execute it inside isolated containers that can be started and closed down on demand. This combination is scalable, repeatable, and designed to support both rapid experimentation and production-grade deployments.
Introducing DevDuck
We built DevDuck to showcase an example of this in practice. DevDuck is a multi-agent system written in adk-python that leverages Cerebras code generation and execution with Docker Compose for orchestrating the components. DevDuck includes three agents: a local agent that runs models on-device, a coordinator agent, and a Cerebras-based agent. These agents collaborate to generate and review code, routing requests internally between them. Before execution, Docker containers are initialized to sandbox all created code, ensuring isolation and reproducibility. When tasks are complete, containers can be shut down cleanly, keeping the environment safe and consistent between runs.
DevDuck also visualizes all tool usages. To see exactly what happened behind the scenes, click on the DevDuck icon to watch the exact flow and tool usage that DevDuck took. Here, watch as our user looks at a visual log of how our tools completed a sandbox initialization.
Try it Yourself: Instructions
DevDuck is easy to try for yourself. First, make sure that you have Docker installed, which you can download from the official website.
Next, open a terminal and run the following commands:
“git clone https://github.com/shelajev/docker-cerebras-demo”
Once you’ve cloned the repository, the final step is to establish your environment. DevDuck uses two models: one local model and one Cerebras inference model. To use the system, enter your Cerebras API key into the .env file, which you can obtain from the Cerebras platform.
All that’s left to do is run the program. To build and start DevDuck, run:
“docker compose up --build”
The compose setup spins up our agents and Docker's MCP gateway which manages, in this example, the MCP tools for working with the node sandbox containers.
You can initialize the sandbox with a single prompt. DevDuck has three separate agents, but the Cerebras agent does most of the heavy lifting and tool calling.
To initialize the sandbox, or use any tool, simply ask for any task and the program will automatically navigate to the correct agent and take care of everything else. Here, the DevDuck agent automatically hands off to Cerebras, which then sets up the Docker Compose sandbox in seconds. You can say:
“Initialize the sandbox”
“Hey Cerebras, init the sandbox”
“Please initialize my container. Thank you!”
With Compose, we can easily host multiple containers. So, our agents can easily switch between each other and call tools with no extra work on your part. With Cerebras inference, expect sandbox initializations, file creation, and code generation to be done in seconds.
When Cerebras agent has access to a sandbox, it can generate and run code in it. This particular example configures the sandbox to not have network access, which improves security posture of the whole setup.
Beyond DevDuck
Altogether, Cerebras inference and Docker Compose together provide a fast, isolated, and reproducible environment for AI-generated code. Developers can run local models alongside Cerebras agents, enabling hybrid workflows that balance speed, cost, and safety. Systems like DevDuck demonstrate how this setup allows code generation, execution, and testing to happen reliably and at scale.
Try Cerebras inference and Docker Compose yourself. Sign up for a Cerebras API key here, and get Docker Compose here.