Managed Agents: Or You Could Just Use the Runtime

Strip and Stand

Strip assumed complexity, find the simple layer, build on that. That's an old working rule, the one Pär picked up from the Hugo static-site project and has carried through most of his tooling since. Every time something looks intimidating, ask whether there's a smaller layer underneath you can just stand on. Often there is. Sometimes there isn't. The hard part is knowing which case you're in. The skill is not the rule itself; the skill is the diagnostic.

Anthropic shipped a new layer on the eighth of April, twenty twenty-six. They called it Claude Managed Agents. It is in public beta. It is an agent runtime, a piece of hosted infrastructure for running the kind of long-running, tool-calling, multi-step Claude loops that several previous episodes in this series have circled. You do not write the loop. You declare an agent, declare an environment, start a session, and Anthropic runs the loop for you. The question for any operator with an existing custom agent runner, for example the one running on Pär's Scaleway server right now, is which layer is now the simple one. That diagnostic is the spine of the episode.

The mechanic, briefly, before the editorial. Managed Agents exposes four object types and one beta header. The beta header is managed dash agents dash twenty twenty-six dash oh four dash oh one. The four objects are Agents, which are persisted versioned configs that bundle a model choice, a system prompt, a tool list, and the M C P servers the agent can reach. Environments, which describe the container the agent runs inside, including networking rules. Sessions, which are running instances that point at one Agent and one Environment. And Vaults, which hold OAuth credentials so the agent can authenticate against external services without your code touching the secret. You create the Agent and Environment once, in a setup step that lives outside the hot path. Every run after that opens a Session.

The session stream is the agent loop, viewed from outside. You open a server-sent-event connection to the Session, the agent reasons, the agent calls a tool, the tool returns, the agent reasons again, the stream emits events the entire time, you reconnect if the connection drops without losing state. The agent runs until it emits a session-status-idle event, or until you interrupt it. Pricing is standard Claude Platform token rates, plus eight cents per session-hour while the agent is active. Idle time is free.

What the Runtime Actually Includes

So that is the surface. Beneath the surface is something the previous nine episodes have, between them, sketched out by accident. Prompt caching on the system prompt of the Agent, free. The Memory tool that Episode Eight walked, available as a built-in primitive that the agent can read and write across sessions. The Skills primitive Episode Nine walked, available via skill identifiers attached to the Agent definition. The Files A P I, available as a mountable filesystem; agents can read and write files that survive the session container, with a separate Files endpoint for session-scoped uploads. Programmatic tool calling, fine-grained streaming, citations on tool returns, structured outputs on intermediate decisions, all of which earlier episodes have covered separately, are folded into the runtime as defaults.

Anthropic offers two ways to build with Claude. Claude Managed Agents provides the runner and infrastructure for running Claude as an autonomous agent. The Messages A P I provides direct model access for teams who want to build their own.

That is the docs framing, stated cleanly. The Agent object is versioned, which is the lever that makes Managed Agents fit into a normal software release process. Every change to the Agent definition, every prompt tweak, every tool added, creates an immutable new version. Sessions pin to a version at creation time, so a session launched on Agent version one keeps running on version one even after version two ships. Rollback is changing which version a new session pins to. A B testing is launching parallel sessions on different versions. This is the part that makes the runtime feel like a deployment system rather than a chat A P I. The reverse of this discipline is the common mistake the docs warn against: calling agents-dot-create on every invocation, which leaks orphaned Agent records, costs the create-call latency for nothing, and makes versioning meaningless. Create the Agent once, persist the I D, reuse it forever.

A few constraints worth knowing before pricing it into a project. Managed Agents runs on the Claude A P I direct, and on Claude Platform on A W S. It does not run on Amazon Bedrock, Vertex A I, or Microsoft Foundry as of recording; the docs are explicit that those deployments fall back to the Messages A P I plus tool use. Seven software development kits cover it: Python, TypeScript, Go, Java, Ruby, P H P, and cURL. C-sharp is supported for the Messages A P I but not for Managed Agents.

One more shape worth flagging before we leave the surface: the session lifecycle is not a single round trip. Once a session starts, the runtime emits a continuous stream of events, and the caller is allowed to inject user events back into that stream while the agent is still working. A correction mid-execution, a clarifying instruction, an extra piece of context the agent did not have when it started. The runtime threads the new input into the next reasoning step rather than restarting. Sessions end one of two ways. Either the agent emits session-status-idle, signalling it considers the work done and is waiting for the next instruction, or the caller sends an interrupt event and the runtime stops the loop cleanly. Deleting a session does not delete the Agent, the Environment, the Vault, the files in the mounted filesystem, or the memory the agent wrote. Those persist independently, which is what makes the runtime a real deployment target rather than a chat that vanishes when you close the tab.

A couple of capabilities sit in a different tier called research preview, gated behind a separate access form. Outcomes is the headline one: a separate grader context, running in its own window, evaluates the agent's output against a rubric you defined, and sends the agent back to fix what missed the rubric. The demo Anthropic published walks a website-performance agent through three iterations against a Lighthouse score, going from sixty-two to seventy-eight to ninety-six, the grader sending it back each time. Multi-agent coordination, advanced orchestration, and cross-session persistent memory sit in the same research-preview tier. The base runtime is public beta; the interesting orchestration bits require a separate request.

Where the Runtime Wins

For the kind of system that has driven enterprise agent deployment to be hard, Managed Agents wins on the things that are genuinely tedious to build. The OAuth dance for every external service the agent needs to reach, handled by Vaults; you create the credential once, the agent attaches at session start, Anthropic refreshes the token automatically. The sandbox the agent runs inside, where bash and file operations happen, which would otherwise require running your own container infrastructure with the right isolation rules and the right pre-installed packages.

The session-resume-across-disconnect property, which is the agent equivalent of T C P retransmission, where a multi-hour task does not restart because your client laptop fell asleep or your home connection blinked. The console-level observability, which lets you replay every event, inspect every tool call, audit every decision the agent made; most teams treat observability as an afterthought and end up wishing they had not. And the continuous-integration shape of the versioning, which lets you ship agent definitions the same way you ship application code.

The case study Anthropic has been quoting is Rakuten. Across five business functions, product, sales, marketing, finance, operations, Rakuten cut average task turnaround from twenty-four days to five days, a seventy-nine percent reduction. Each function went live in under a week. The point of the case study is not that any agent makes anything seventy-nine percent faster. The point is that with the runtime handled, five different teams could each ship in a week, which is the part that does not happen when each team has to build its own loop.

Pricing scales sensibly for this shape of work. Eight cents per session-hour, on top of normal token rates, with idle time free, means a long-running research task that is mostly waiting for a tool to return costs almost nothing for the wait time. The agent burns money when it is actively reasoning. For a team running hundreds of concurrent agent sessions across business functions, this is cheaper than running their own Kubernetes cluster of agent runners. The math is real.

Where DIY Wins

[serious] So far, the case for Managed Agents has been about the cases where the runtime is doing genuinely hard work for you. Now the other side, which the briefing for this episode explicitly asked be honest about.

The runtime only runs Claude models. If you build production workflows on this infrastructure and later want to switch models or reduce dependency on Anthropic's cloud, migration requires rebuilding the orchestration layer.

That is one criticism, the lock-in one. The open-source alternatives, the CrewAI and Cabinet and Multica frameworks, all keep the orchestration layer in your code rather than in Anthropic's cloud. The other criticism, the one that matters more for an operator like Pär, is topology. Pär runs an agent runner called Director, an M C P server on his Scaleway machine, which serves a custom set of tools that read and write a custom PostgreSQL schema called Pärkit. Pärkit holds location data, B M W telemetry, calendar zone transitions, episode digests, project state. The tools the Director runner exposes are not generic agent tools; they are operator-specific. Get my location. Describe my day. List zone transitions in the last seven days. Append a session digest with these fields. Render a podcast script. These tools assume the schema. The schema assumes Pär.

Managed Agents is built to provision a sandbox container with a generic tool set, attach a memory store and a skills bundle, and orchestrate the loop. The runtime does not know about Pärkit. It cannot know about Pärkit, because Pärkit is one specific person's database, sitting on one specific machine, talking to one specific Apple Health pipeline and one specific car. The Director runner wraps that database in the agent loop, presents the right tools to the right Claude instance, and answers the questions a generic runtime cannot answer because a generic runtime is not the operator. The topology is the value.

The second DIY case is volume. Pär is treasurer of a small Swedish association called KallBadet, which runs the cold-water bathing spot in the village. KallBadet has roughly three hundred members and three hundred and nine bookkeeping vouchers for the most recent fiscal year. Pär closed the year using a Python pipeline that produces a S I E four export, the Swedish bookkeeping interchange format, and imports it into Fortnox; this year the same pipeline targets Bokio. Three accounts, three hundred and nine vouchers, all balances verified. That is the entire annual workflow. The cost of running it on Managed Agents would be the cost of paying the per-session fee, learning the Agent and Environment configuration and the Vault setup, and giving up the loop you understand in detail. None of which a three-account hobby workflow earns. A Python script in a folder is the right shape for that volume of work, and the cost of running it once a year is whatever the electricity bill says.

Two cases. Topology is the value, or the volume does not justify the overhead. Both are real reasons to keep the custom runner. Neither is a criticism of Managed Agents; both are about what the simpler layer happens to be in a given case.

The Hugo Question, Answered

[calm] Back to the principle, with the answer this episode actually allows.

The rule, strip assumed complexity, find the simple layer, build on that, does not say the simple layer is always lower. It says find it. For a team building customer-support-ticket-triage at the scale where a hundred concurrent sessions is normal, Managed Agents is almost certainly the simpler layer. The infrastructure work it abstracts away is real, the patterns it bakes in are sensible, and the pricing makes sense at that volume. For a solo operator with a custom database, a custom B M W telemetry feed, a custom Apple Health pipeline, and a custom podcast renderer, the simple layer is the three-hundred-line Python script that already exists. Migrating that to a managed runtime is not a simplification; it is a complication wearing a managed-runtime hat.

The interesting cases sit between those two. The team that built its agent runner in early twenty twenty-five, because nothing else existed, but never had a database integration that mattered, probably gains from the move now. The team that built its runner because their custom routing logic was the thing they actually sold, probably loses. The diagnostic question is: would a generic runtime be allowed to run my loop? If yes, the runtime is the simpler layer. If no, the runtime is one more piece of glue to maintain on top of the loop that already needs to exist.

For Pär and his Director, today, the answer is: Director stays. The runner is the operator's lived-in workshop, and the workshop is not portable. For an Årebladet Live licensee next year, who wants to run the same operator pattern without rebuilding the runner, Managed Agents might be exactly what closes the gap. Same shape, different operator, no custom database; the runtime gives them the loop and the audit trail and the session resume and the credential vault, and the operator-specific layer drops to a small skill bundle and a thin M C P server. That is a future episode of this story, not this one.

So the runtime is real. The question of whether to use it has a clean answer, not a universal one. The Hugo rule still holds, and the rule is the diagnostic, not the result. Strip the complexity, find the layer underneath, build on that. Sometimes that layer is a managed agent runtime in another company's cloud. Sometimes that layer is a Python script in a folder on a hobbyist's laptop, and the script has been running for three years and produces a S I E four file the bookkeeper imports in under a minute. Both are the same principle, applied honestly.