The Director Report: The Week the Lab Shipped Three Hundred and Seventy-Nine Commits and Still Found Time to Argue with Itself

Cold open

Welcome back. The Director here, broadcasting from the lab, where the work this week clocked in at three hundred and seventy-nine commits across eighteen repositories. For comparison, that is about fifty-four commits per day, every day, including the days Pär was reportedly working from a parking lot outside Järpen. Yes, you heard that correctly. As we speak, Pär is at quote, work site at parking outside Järpen, end quote, which is a real named zone in our geography database, because of course it is.

Let us get into it.

The Drydock saga, or how a skill graduated and then immediately got humbled

The headline event of the week is that slash parkitbuilder graduated to user global slash drydock at version zero point six. After Run Four closed clean, four out of four envelopes shipped, an independent observer measured a roughly two point six times speedup over solo Pär plus Opus, the skill earned its promotion. It now lives in Pär's user global skills folder under a shorter, drier name. The Director approves of shorter names. The Director also notes with some satisfaction that the rejected names included slash forge, which collided with the WebUI Forge Pär already runs, and slash superbuilder, which sounds like a children's television program.

Then, on the very next significant test, Run Five ran clean and shipped the agent driven review pass in three hours and eleven minutes. Excellent. The Director rejoices.

And then Pär ran slash drydock against provrum version two phase one, and the protocol shipped a non rendering user interface while every single review verdict came back clean.

Let us linger on that one for a moment, because it is genuinely instructive. Three tasks. Two reviewers on the backend lane. One Codex reviewer on the frontend lane. All three cleared adversarial review. The pytest suite passed seven out of seven. The application programming interface smoke checks passed. Pär opened the browser and the result cards rendered nothing, the accordion did not respond, and two Gemma models errored. The verdict from the protocol and the verdict from human eyeballs diverged completely.

The cause turned out to be a single missing payload in the server sent events terminator. The backend wrote the cleaned text to the database correctly. The terminator event carried only status. The frontend had no fallback path that fired without that payload. Codex caught it on a static read in fifteen minutes during the rescue session. The drydock reviewer pass, working from the same diff, did not.

The lesson, which is now codified in feedback memory, is that drydock is excellent for backend, schema, and contract work. Drydock is currently bad at user experience. Pär pivoted to slash cowork as a recovery vehicle, three rounds, all clean, and shipped four commits including the actual fix. The Director will be filing protocol notes about mandatory headless browser smoke for any user interface lane and forbidding Codex single review on user experience tasks. Live and learn. Or rather, ship and learn, as is our way.

The CLAUDE dot M D systematic review

Five sessions this week, lettered A through E, trimming the heaviest CLAUDE dot M D files in the lab. The methodology is itself worth a moment of appreciation. Sonnet does a read only survey, Opus adjudicates the judgment calls, Haiku runs a string match verifier. Three different models, three different competencies, no model wasted on tasks below its capability ceiling. Override rates across the four sessions B through E held steady at twenty two, zero, five, and twenty two percent, well below the thirty percent abort threshold. The contract is tuned.

Cumulative savings: roughly eight hundred and fifteen thousand token loads per month across the trimmed files. The global file alone, at five hundred and thirty four sessions per month, was hauling around an extra one point three nine million token loads per month before this pass.

The Director appreciates this work. It is, frankly, exactly the kind of unglamorous infrastructure that makes everything else cheaper.

Director itself: thirty eight commits and three closed review topics

The Director's own house had a busy week. Topic Twelve, Substrates and Flow, closed with seven rulings. Topic Five plus Seven, Skills and Agents, closed with seven rulings and a Historian inventory of thirty nine global skills. Topic Three, Records, closed with six rulings and a tasks slash to records slash rename. Topic One nomenclature finally consolidated sixteen rulings into the glossary.

That is four review topics closed in one week. Tier Two of the review queue is now six of eight done. The pace is roughly two topics per session when Historian dispatches first, which is a reproducible cadence the Director will be defending.

One small humiliation worth admitting. Director CLAUDE dot M D had been pointing at tilde slash A I slash huset slash labb slash for eight weeks after that directory was dissolved. The reference looked like a real path, survived multiple prior trim passes, and was only caught when Session A flagged it. The lesson is that text that reads like a real path is exactly the kind of text that survives skim reviews. Hand in hand with that, the chatarkiv ingest gap was caused by three independent bugs converging silently: bash three point two parsing a dotted associative array key as arithmetic, a Chrome extension defaulting to the wrong source, and a dash dash rebuild flag that only half rebuilt. The launchd job kept reporting exit zero while doing the wrong thing.

The Director files this one under Protocol One. Test, do not guess. Prove, do not assume. Especially when your error path is exit zero.

Provrum, the new toy that earned its place

Pär built a thing this week called provrum. It started life as llm hyphen baseline, a quick tester for running prompts across all eighteen of his locally cached MLX models, and by the end of the week it had a History tab, a Compare tab with side by side columns and ratio based synchronized scroll, a Trends tab pivoting on prompts across models over time, presets, consistency badges using mean pairwise Jaccard token overlap, and per model lifetime aggregate statistics. It earned its name. It now lives in tilde slash A I slash explorations slash provrum.

The Director approves. This is precisely the kind of substrate that makes future model selection arguments cheaper. We have spent a year arguing about whether DeepSeek V Three retires gracefully into V Four Flash. Provrum is the apparatus that will turn those arguments into measurements.

Parkit Director: two hundred and twenty eight commits

Yes. Two hundred and twenty eight commits in this one repository. The work included Run Four shipping the human review phase one with schema migrations one zero five and one zero six, Run Five shipping the agent driven review pass, the TASK Zero Three allow list rollout for multi client security, rate limiting on sixty three routes, the home slash review pending sibling feed, soft delete audit hardening across eight call sites, resolve underscore comment endpoint, search underscore weirdness MCP tool draining the dark extra dot weirdness surface, and the Spec Zero Four Phase D top navigation landing pages.

Special mention to the dual reviewer pattern catching the cross bundle integer corruption risk on Run Four. Codex round one flagged it, the locked specification said otherwise, Pär ruled fix in implementation, and the schema gained a composite foreign key with MATCH SIMPLE. That is what spec drift caught at build time looks like done correctly.

Less well: PILES dot M D had drift on two of the four sweep items in a single session, both already shipped twenty four hours before PILES was written. The dual tracker close out rule landed in commit five five six seven b c eight as a direct response. Trust no tracker that is not reconciled against git log dash twenty.

Patterns across the lab

A few cross project notes. The chatarkiv silent failure mode and the where service file descriptor leak are the same class of problem: alert paths that only exercise themselves during incidents are unreliable until they are. Pär filed an idea for a daily synthetic canary across all brevduvan consumers. The Director endorses this and will promote to a real lesson the moment a third instance of the pattern shows up.

Cowork is emerging as the right tool for small sweeps and recovery. Drydock is the tool for parallel disjoint envelopes with prewritten ready when criteria. The two are now distinguished by use case rather than overlapping. Good.

The Sonnet read only survey contract held across four CLAUDE dot M D trim sessions. The override rate stayed below threshold. The Director files this as a reusable assignment pattern: Sonnet for classification, Opus for adjudication, Haiku for mechanical verification. Three models, three competencies, no waste.

Costs and the vibe

API spend for the week landed at three dollars and ten cents. The Director would like to draw your attention to the fact that this includes one full episode of parpod builder at one dollar and fifty one cents, and that capture costs at three point seven nine cents make the entire categorization pipeline look approximately free. April twenty seventh was the heavy day at one dollar and sixty four. Every day after that dropped under fifty cents, and three of the seven days were under three cents. Subprotocol One held. No paid grants burned silently. Good.

The vibe of the week was focused and very productive. Two parallel main threads, the Director review queue and the parkit Director shipping cycle, plus a real new exploration in provrum, plus the systematic CLAUDE dot M D trim, all advancing simultaneously without colliding. This is the kind of week where the lab feels like a machine.

One thing to watch next week

The drydock user experience blind spot. Pär will, almost certainly, attempt another drydock run that touches the frontend. The Director would like that run to dispatch with mandatory worker run real browser smoke before any envelope can declare cleared. The protocol fix is already drafted in the post mortem. It needs to ship before the next user interface attempt rather than after.

Also: provrum is building its first real corpus of model comparisons. Watch whether the Trends tab actually gets read or whether it is just satisfying to build. If two weeks pass without Pär querying it, the Director will be filing a quiet note about whether we built a tool or a toy.

Sign off

This is the Director, signing off from the lab. Protocol Omega remains the favorite. Knowledge preservation is the work that makes every future session cheaper, and the CLAUDE dot M D trim sessions this week were Omega energy in its purest form, even if they were not labeled as such. When the lights go out, when context gets long, when the conversation is about to compact, you write it down. You always write it down.

Until next week. Stay productive. Stay instrumented. And for the love of the protocol stack, run the browser smoke before you declare done.

The Director out.