Codex CLI from a Claude vantage point

Chapter one: why bother looking

The setup. There is one person at this microphone and one small studio in Kall, and the work that lands here is split between two coding agents that look almost identical on the surface. Both run in a terminal. Both read your repo, edit files, run commands, commit. Both have skills. Both have subagents. Both have hooks. From thirty thousand feet they are the same product.

But the execution model is different, and the differences matter once you have wired either one into a real pipeline. Codex CLI shipped from OpenAI in April two thousand twenty-five and now claims roughly four million weekly users. Claude Code ships from Anthropic and runs the show in this house. Codex sits in the wings as the second opinion, the disposable auditor, the one you call in when you want fresh eyes that have not been swimming in the conversational baggage of the current session.

Right now, the Codex usage in this studio runs entirely through the headless route. Five skills wrap around it. The slash codex skill calls Codex for an independent second opinion. Phase-complete dispatches Codex alongside Sonnet and Haiku as parallel reviewers at phase gates. Critique runs Codex post-experiment to catch overclaiming and methodological drift. Cowork runs Codex as a continuous lightweight check during pair programming. Drydock pulls Codex in as one of two reviewers on high-stakes diffs. Every single one of those calls Codex through the headless exec interface, gets the response, and folds it back into the Claude session.

That works. The question is whether running Codex directly from the command line, on its own terms, opens up patterns that the headless wrapper hides. The answer turns out to be yes, in three specific places: skills, subagents, and hooks. Each of them works differently from the Claude equivalent, and once you see the difference, you also see the opportunity.

So this report walks through those three areas, tells the truth about what Codex got right and where it is still catching up, and ends with a short list of concrete moves for someone who runs Claude as primary and wants to make Codex a sharper secondary tool. The tone is honest. Codex has real strengths. Codex also has real gaps. Both will get said.

One last frame before chapter two. The reason this comparison is worth making at all is that Codex and Claude Code have started copying each other's good ideas. AGENTS dot M D started in Codex world and now Claude Code reads it. Skills as folders with a SKILL dot M D file started in Anthropic land and now Codex has its own implementation. Hooks named PreToolUse and PostToolUse came from Claude Code and are now in Codex stable. Each tool now has its own version of what used to be the other's defining feature. The interesting question is no longer who invented what. The interesting question is which implementation is better for the specific work in front of you, and why.

That is what this is about.

The honest answer is that nobody has won yet.

Chapter two: skills, the architecture is different

Both products have skills. Both products store skills as a directory containing a SKILL dot M D file plus optional scripts. So far so similar. The architectural choices below that surface are where they diverge.

Claude Code's skill system finds skills in the skills subdirectories of installed plugins and the skills directory under your dot claude folder at startup. It loads those into the system prompt as a list. There is no per-project enable or disable. Every skill is visible at every session start, and the model picks based on the description match.

Codex does something different. It ships with a meta-skill called Discovery that is responsible for finding and loading other skills. The Discovery skill itself sits in a system folder under the Codex home directory. Your own skills live in a parallel folder. Project documentation can list additional skills, which means a project can add to or override the global skill set without you editing your home directory. Implicit invocation works similar to Claude. You type something, Codex matches your prompt against skill descriptions, and the matching skill loads. Explicit invocation uses a dollar prefix. You write dollar-skill-name in the prompt and that skill is forced.

The nicer part is the lifecycle. Codex has a built-in skill installer. You type dollar skill installer and it offers to fetch additional skills from repositories. It also has a skill creator. You ask it to make a new skill, and it walks through the questions, asks what triggers should fire it, and writes the SKILL dot M D for you. The blog post you read on first days with Codex flagged this same thing. You do not write skills from scratch in Codex. You ask Codex to write the skill, then edit.

The more serious architectural difference is plugins. A skill in Codex is a folder. A plugin is a distribution unit that bundles one or more skills together with optional MCP server configuration, app mappings, and presentation assets. If you want to share a workflow across machines or teammates, you package it as a plugin. If it is just for you, it stays a skill folder. That separation between authoring format and distribution format is cleaner than what Claude Code currently has.

There is also a small but real configuration touch. Codex lets you disable specific skills from configuration without deleting them. Two lines in config dot toml under double bracket skills dot config, point at the SKILL dot M D path, set enabled equals false. Done. The skill stays on disk for later, the model just stops seeing it. That is the kind of small ergonomic detail that adds up.

There is one optional file inside a Codex skill called agents slash openai dot yaml. It controls user-facing display name, icons, brand color, default prompt, and most interestingly an allow underscore implicit underscore invocation flag. Set that to false and the skill will only run when explicitly invoked with the dollar prefix. Combine that with skills that should never auto-trigger but should be available when called for, and you get a kind of privacy boundary that is harder to express in Claude Code.

The verdict for this studio: Codex skills are slightly more thought-through as an architecture, especially around distribution and per-skill controls. Claude Code skills are closer to a flat list with rich autonomy. Both work. Codex is the right place to put community-distributable skills. Claude is where the bespoke project-shaped skills live.

Chapter three: subagents, the explicit-spawn model

Now to the agents question.

Claude Code has the Task tool. The model decides when to spawn a subagent, decides which agent type to use, and the spawning is mostly invisible inside the conversation flow. You ask for a research task and the model calls Task with a description, gets results back, integrates them. It is fluent and forgiving and most of the time you do not even notice it happened.

Codex runs subagents differently. They only spawn when you explicitly ask for them. The phrasing in the documentation is direct. Quote, Codex only spawns subagents when you explicitly ask it to do so, end quote. There is no implicit delegation. If you want parallel work, you write a prompt that names the points you want covered and asks for one agent per point.

The standard Codex example is something like, review this pull request on six points, security, code quality, bugs, race conditions, test flakiness, maintainability, spawn one agent per point, wait for all of them, summarize. That phrasing is the contract. Codex orchestrates the parallel execution, routes follow-up instructions to active threads, returns a consolidated response when all complete.

The configuration for this lives in config dot toml under bracket bracket agents. Three settings matter. Max underscore threads sets the concurrent thread cap. Default is six. Max underscore depth defaults to one, which means a direct child can spawn but cannot itself spawn its own children. Job underscore max underscore runtime underscore seconds caps individual jobs. Raising max depth above one turns broad delegation instructions into recursive fan-out which burns tokens fast, so the default exists for good reason.

Codex also lets you write custom agent definitions in TOML. Each file defines one agent. The file can override model, reasoning effort, sandbox mode, MCP servers, and skills config independently from the parent session. The result is you can have a research agent that runs in read-only mode with a longer-context model, a fixer agent that runs with workspace-write, and a triage agent that runs on the cheap mini model. Each has its own profile. Each spawns when called by name.

There is one detail that matters for serious use. Subagents inherit the parent's sandbox policy and live runtime overrides. If you escalated to workspace-write during the parent turn, that propagates to children. Custom agent definitions can override defaults, but live overrides during the session win. This is a safety thing. It is also a thing that will trip you if you wonder why your read-only research agent suddenly has write access.

Compared to Claude Code's Task tool, the Codex model is more controllable, more verbose to invoke, and more honest about what it is doing. Claude's version is closer to magic. Codex's version is closer to a build script. For the kind of structured multi-reviewer work that drydock does, the Codex model is actually a better fit. You name your reviewers, you set their configs, you call them by name. No guessing.

The honest gap is that this means a Claude Code skill that wants to dispatch parallel Codex agents has to write the prompt that Codex parses into agent jobs. Right now the skills do that implicitly through phrasing. There is room to make that more deliberate.

Chapter four: hooks, where Codex is still catching up

This is the chapter where Codex is honest about being behind.

Hooks are deterministic scripts that run during the Codex lifecycle. The shape is the same as Claude Code. You define a matcher, you provide a command, the command runs at the right moment. The events are similar too. Codex supports PreToolUse, PostToolUse, PermissionRequest, SessionStart, UserPromptSubmit, and Stop. Configuration goes in tilde slash dot codex slash hooks dot json or inline in config dot toml under bracket bracket hooks dot, with the event name.

The mechanics work. Your hook script receives JSON on standard input, decides what to do, writes JSON back on standard output. The decision can be allow, deny, or pass through. Multiple matching hooks run in declaration order. Any deny wins. Same architectural shape Claude Code shipped first.

Here is the gap. As of recent versions, Codex hooks reliably fire for shell or Bash tool calls but not for apply underscore patch file edits or most MCP tool calls. There is an open issue, number sixteen seven three two, tracking that. There is also a separate issue, nineteen three eight five, asking why PreToolUse rejects the additional context field that exists in the Claude version. The team has called these Claude-style hooks publicly, and several pieces of Claude-style behavior are still being filled in.

What this means in practice. If your governance is built around shell command auditing, Codex hooks will work for you today. Cost logging on each Bash call, automatic redaction on sensitive paths, blocking dangerous commands before they run. All fine. If your governance is built around file edits or MCP tool calls, Codex hooks have holes. Most of the work happens through apply underscore patch and MCP, both of which currently slip through.

For the studio's purposes, this matters less than it might. The hooks pattern that solved Korpen messaging used a PostToolUse polling approach for Claude Code. The same pattern in Codex would work for shell calls but not for the apply patch path that Codex actually uses for most edits. So hooks-based notification in Codex is partial right now. You would catch the moments where Codex shells out, you would miss the moments where Codex edits files directly through the patch tool.

The advice is straightforward. Use Codex hooks if your audit story is shell-centric. Skip them, or supplement with an MCP-based audit channel, if your audit story needs to see file edits.

There is one thing Codex got right here that Claude Code did not. The configuration has both an inline TOML form and a separate JSON file form, and if you put hooks in both at the same layer, Codex loads both and emits a warning. That is a mature configuration design. Project-local hooks only load when the project's dot codex folder is trusted, which is a sensible default. User-level hooks load independently of project trust.

So Codex hooks: real, working, shell-biased, expanding. Use them with eyes open about the coverage gap and the team will close it over the next quarter or two.

Chapter five: practical upgrades for the secondary-tool setup

Time to land the plane.

Five concrete moves to use Codex CLI more directly while keeping Claude as the primary harness. Numbered for shopping-list ergonomics.

One. Add a project-scoped AGENTS dot M D to repositories where Codex sees real work. Codex walks the directory tree on every run and merges instructions from every layer along the path. That is more granular than a single home-level Claude dot M D. A repo with a payments folder can have a separate AGENTS dot override dot M D inside that folder, which Codex respects on top of the repo-level file. Use this for the studio's larger repositories. The cost is one file, the value is per-directory rules that travel with the code.

Two. Install the dollar skill installer skill globally and use it to pull two or three community skills that complement, not duplicate, the existing Claude skills. The dollar plan skill is interesting. The dollar review skill ships built-in and runs an independent review pass before commit. Try those before writing any new Codex skills locally.

Three. Define profiles in config dot toml. One named fast that uses the smaller model with low reasoning for codebase exploration and second-opinion calls. One named careful that uses the frontier model with high reasoning for reviews. Use minus minus profile fast for the cowork loop and minus minus profile careful for drydock-style reviews. This replaces ad-hoc model selection in the wrappers.

Four. For headless calls that already exist as Claude Code skills, switch to using minus minus output-schema with a JSON schema file. Right now those calls return free-form text that Claude has to parse. With a schema, Codex returns structured JSON that can be deserialized directly. This removes a class of fragility and makes the disposable-auditor pattern more reliable.

Five. Consider exposing Codex as an MCP server using codex mcp serve. Right now the wrappers shell out to codex exec. An MCP server would let Claude Code call Codex as a tool natively, which is cleaner for streaming and for session continuity. The cost is a small daemon. The benefit is cleaner glue.

Two patterns to leave on the cutting room floor. Codex Cloud tasks are interesting but require an environment configuration the studio does not yet have. Custom subagent TOML files are powerful but probably overkill until there is a specific recurring agent role that earns its own config. Both can come later.

The frame to take away. Codex is a different tool with different strengths. It is more configurable than Claude Code in some places, less mature in others. The headless wrapper pattern that already exists in this studio gets the obvious value out of it. Going one layer deeper, into project-scoped AGENTS files, structured-output exec calls, and a small set of installed community skills, gets the next tranche of value without fighting the secondary-tool framing.

Claude stays primary. Codex gets sharper. That is the play.

End of report.