In the past few years, we've developed series of interesting workflows. Think Ralph loops and multi-agent orchestration systems. The idea is writing very descriptive prompts and running 8-hour sessions, or having 10 instances running on your machine at all times.
Most of this complexity spawned from one issue: LLMs are slow. If you prompt and wait, you'll get less done than if you prompt and move on to the next task.
Spark is fast.
Codex Spark changes how developers work with AI. A coding model generating 1,200+ tokens/second makes real-time collaboration possible, but it also requires a different approach. At this speed, sloppy interactions have consequences, and working with LLMs needs to be much more deliberate.
This guide is a practical playbook for how we've been using GPT-5.3-Codex-Spark.
Know when to use Codex vs Spark
Codex now spans two complementary modes:
- Deep mode: large prompts, long-running tasks
- Fast mode: rapid, iterative collaboration
GPT-5.3-Codex-Spark is a Codex variant served on Cerebras WSE hardware, optimized for low-latency interactions and capable of generating over 1,200 tokens/second.
Because generation is near-instant with Spark, you can bake /diff, /review, unit tests, and even browser QA into every commit cycle without losing time. Debugging is easier. Writing careful, specific prompts is easier. Working in the details is easier.
Tip #1: Treat Codex Spark like a pair programmer
Instead of writing long, detailed, complex prompts with multiple requests, treat Codex Spark like you would a coworker. Interrupt and redirect is a core interaction pattern with Spark. For many developers, this is a new muscle.
Focus on one thing at a time: collaborate, don't delegate. Stay in the loop and don't navigate to other tabs or windows while the model is running. Spark is a power tool; its speed enables you to focus on the details while still getting things done. The instant feedback lets you move deeper into your repo without having to use multiple windows to do so.
Example ways to guide and steer the model:
- "Only change ___"
- "Don't touch types yet."
- "Show me the diff before continuing."
A good guideline for collaborating on a feature: start by creating tests, run them, loop until they pass, then move on to the next task. This workflow collapses what was traditionally a full product cycle into minutes.
Tip #2: Speed Makes Validation Cheap
Validation and feedback cycles are foundational to successful interactions with AI agents, but they take time to build out. Keeping your evals, tests, rules, and docs up to date would typically be another maintenance burden.
Because of Spark's speed, tests and evaluations between steps add minimal overhead to the dev cycle. You can make a plan, explore, create tests and environments, update documentation, and run QA jobs in a matter of minutes.
Validation frameworks improve model performance and outputs, and reduce dev cycles spent fixing bugs and regressions.
- Pre-commit hooks: Spark can configure tailored hooks that automatically activate on certain events. For example, running QA when you commit.
- Test suites: Validate that new changes don't break existing functionality. Spark is fast enough to run your full suite between every task.
- Linting and type systems: Enforce code style and catch type errors at write time.
- Browser-based QA automation: Verify that UI changes actually work in the browser.
- Readiness reports and diff reviews: Summarize what changed and flag potential issues before you review anything yourself.
The above let you move as fast as Spark allows without having to backtrack and clean up after it. They also give the model feedback while it's working, which increases its autonomy. Many Codex-compatible harnesses like OpenCode, Pi, and Droid already ship with these features built in, so the setup cost is minimal.
Tip #3: Explore more implementation paths
With slower models, developers commit to an implementation path early because exploration is expensive. With Spark, you can generate and test multiple approaches before making that decision. This is especially valuable when you're still discovering how you want something to work or you'd like customization.
Without Codex Spark we might prompt: “Please implement a sidebar for Navigation with 4 tabs, home, about, blog, and contract use a Tokyo Night theme.
5 minutes: "I built a sidebar with a Tokyo Night blue and purple theme with white text. On hover, we use a purple highlight."
With Codex Spark we could prompt: "I want a sidebar for navigation. I like Tokyo Night themes, but I'd like you to generate 10 versions of the sidebar with variants in themes and components."
1 minute later: "I built 5 navigation sidebars in /mocks. Each has its own theme and component hierarchy. I've opened them in your browser—let me know which you prefer so I can implement it in your app."
This even works for slides, photos, videos, and writing
Tip #4: Run 1 session, not 10
Slower models pushed us toward parallelism. You'd open 10 tabs, fire off 10 prompts, and wait. You were incentivized to work from a bird's-eye view because you could only review a few hundred lines of code every few hours.
With Spark, you're generating thousands of lines a minute. You cannot let that pile up across multiple agents making changes at the same time. Reviewing code is the new bottleneck, and Spark works best as a pair programmer rather than a delegated workforce.
Use 2–3 sessions max, and only for truly independent tracks (e.g., backend vs. frontend, separate services).
Tip #5: Fresh context window, persistent memory
Codex calls a session a thread. Threads can be local (sandboxed) or cloud. You can run multiple threads, but avoid having multiple threads touch the same file.
In general, models produce their sharpest, most deterministic outputs when operating within a narrower, well-scoped context. As context grows, compaction and instruction drift become more likely because the model must compress more state. That said, Spark performs strongly across its full 128K window. The key isn't avoiding large context entirely, but rather using it intentionally.
At ~1,200 tokens/s, Spark can fill its 128K context window in roughly 2 minutes of continuous generation. To get the most out of Spark, start fresh sessions often and keep your research context inside your repo.
Treat the context window as working memory, not long-term storage. Keep durable state in files. Keep sessions focused. Reset when goals are complete.
A useful mental model is:
The best trick for managing context length: break larger tasks into small, bounded goals. Prompt each session to make incremental progress toward one goal, so that every session produces bug-free, well-documented, working code. You'll get more deterministic behavior and fewer compaction surprises.
The following commands are how we're getting the best out of Spark:
- /new — Use this whenever you successfully complete a task.
- /fork — If you ever feel satisfied with the output, fork and rename the session to use it later.
- /permissions — The model is fast enough that you can't steer it mid-flight without this.
- /review — Reviewing code with Spark is a breeze and will help you catch slop fast.
- /skills — Spark excels at executing predefined workflows instantaneously.
- /rename — Renaming sessions helps categorize good runs from bad runs. Use this often.
Tip #6: Externalize state and keep an ‘external’ memory
When you go into a pair programming session, you'll usually be taking notes, discussing a work ticket, or sharing files back and forth. Keeping track of what's going on in written form while working with others and moving quickly makes it possible to do good work.
Spark is fast enough to get the full benefits of documentation, tickets, and scratchboards without the writing and maintenance of these files becoming a burden.
A minimal external memory system:
- AGENTS.md — repo-specific norms, commands, gotchas (Codex CLI can scaffold this via /init).
- PLAN.md — checklist-style plan + definition of done
- PROGRESS.md — short running log: what changed, what failed, what’s next
- VERIFY.md — exact commands to prove it works (tests, lint, e2e)
This eliminates ambiguity and reduces drift across sessions. Git becomes your memory layer. Spark becomes a stateless executor.
When Spark executes, it should:
- Read PLAN.md
- Implement one checklist item
- Run VERIFY.md commands
- Update PROGRESS.md
This is the same core idea behind "Ralph loops", where each iteration uses clean context while memory persists via git and small files.
Ralph loops shine with Spark. Instead of running one overnight, you can run it in minutes and get just as much done. Ralph's architecture spawns new sessions for every unit of work automatically.
Tip #7: Use the right Codex model for the task.
A helpful workflow that leverages the thoughtfulness and intelligence of Codex-5.3 and the speed of Codex-5.3-Spark is to use Codex-5.3 for generating a comprehensive plan, then switch to Codex-5.3-Spark to implement quickly.
The Codex app and CLI now offer sub-agent features, so you can configure Spark to handle specific types of tasks: updating documentation, researching, cleaning up, or making targeted changes.
Codex then focuses on the work that actually requires deep context and reasoning. Total time to completion drops because you're reducing how much the slower model has to do.
Example #1: Architect with Codex, execute with Spark.
Example #2: Use Spark for focused tasks while Codex handles longer-horizon work.
You can think of the two models as two parts of a development lifecycle: GPT-5.3-Codex for research, planning, and final reviews, where it benefits from longer context and precision; GPT-5.3-Codex-Spark for rapid execution and iteration.
Tip #8: Tighter Permissions & Restrictions
At 1,200 tokens/s, Spark can cover a lot of ground very quickly. In the past, because models were slow, we asked them to do more per session with less oversight. Now we can set precise guardrails and correct the model as it works.
For example:
- Ban the model from deleting files: no rm -rf
- Max diff size: ≤ 150 LOC (or "≤ 30 LOC unless asked")
- Model cannot commit until all tests pass, types are correct, and a commit message is included
- Model can only read and edit—no creating new files
The Codex CLI has explicit controls for approvals and sandboxing. You can use GPT-5.3-Codex-Spark in remote environments to take advantage of its near-instant response times and SOTA intelligence. Follow common standards when running Codex Spark autonomously: use containerization, virtual machines, and sandboxes.
- CLI supports -yolo modes
- codex exec defaults to a read-only sandbox; widen permissions only when needed.
- Use /permissions to tighten or relax mid-session.
The Codex CLI enforces these rules during runtime with little to no configuration needed from the end user. Explore similar tools to see what works for you.
Tip #9: Doc-gardening agent (anti-drift as a continuous process)
OpenAI describes a "doc-gardening" agent that scans for stale documentation and opens fix-up PRs, enforced via CI checks. Documentation is critical for increasing the likelihood that the model will produce high-quality code that works on the first run.
When inference wait times are negligible, generating and refining documentation becomes effortless. Our rule of thumb for all LLMs is to maintain two documentation patterns:
- Meant for humans — setup guides and higher-level information about the repo.
- Meant for LLMs — patterns, system prompts, standards, diagrams, and file locations.
You can couple Spark with CI tooling to automate the production, review, updating, and cleanup of your documentation. This keeps the context in your repo valuable and efficient, improving both your experience and your model's output.
Try it yourself!
Codex Spark on Cerebras WSE hardware changes the interaction model. Codex Spark is about giving developers control over code quality without sacrificing the benefits of using AI to accelerate their workflows.
Now, you can:
- Collaborate in tight loops without losing momentum
- Add guardrails without slowing down
- Explore multiple implementation paths instantly
- Keep context clean and deterministic
- Move fast without creating slop
You can get started with Codex Spark here.