PärPod Temp
PärPod Temp
PärPod Temp
A House You Did Not Build: On Cleanly Forking Someone Else's Years
11m · May 28, 2026
At a known-good commit in April 2026, Pär considers taking OpenAleph—the engine that powered the Panama Papers—and never syncing upstream again, asking whether inheriting someone else's architecture without inheriting their roadmap changes everything.

A House You Did Not Build: On Cleanly Forking Someone Else's Years

The Idea, Stated Calmly

[calm] Pär has been listening to the previous episode about the open questions before the next substrate. One question was about how to relate to OpenAleph. Lift code. Lift patterns only. Run it alongside. Or skip it entirely. The planning document had rejected adopting OpenAleph as the base. The stated reason was foreign maintenance surface. Foreign update cadence. The codebase would be too big and somebody else's roadmap would set the pace.

While walking, Pär turned the question slightly. What if the fork was clean and private? Take OpenAleph at a known good commit. Never sync upstream again. Treat any future upstream work as inspiration, never as merge material. The cadence problem dissolves. The substrate inherits the work but not the obligation to keep up.

That is a different proposal from the one the planning rejected. It is worth taking seriously rather than waving away as the same idea in different clothing. The interesting question is not whether it changes the rejection. It does. The interesting question is whether it changes the right rejection.

What OpenAleph Actually Is

Before reasoning about the idea, it helps to look at what we are talking about. OpenAleph is the soft fork of the Aleph platform, the document and entity engine that powered the Panama Papers. The Data and Research Center in Berlin maintains it. The license is MIT. The latest release shipped on April thirteenth, twenty twenty six. There are roughly twelve thousand commits on the main branch. The language mix is JavaScript, Python, TypeScript, SCSS.

What it actually runs, when you start the docker compose file, is six services. PostgreSQL. Elasticsearch. Redis. Three workers. One for file ingestion. One for entity analysis using spaCy. One for translation using a model called argos. That is the floor. Not the polished deployment. The minimum to develop locally on a single laptop.

For one Swedish journalist working on mining stories, six services is the operational tax line one. Whether the tax is fair depends on what the services do for the work.

<break time="1s"/>

What The Clean Fork Actually Buys

The planning document's strongest reason for rejecting OpenAleph as a base was the foreign maintenance cadence. A clean private fork removes that reason completely. Architecture decisions become Pär's. Update cadence becomes Pär's. The objection that did the load-bearing work in the rejection collapses.

That is the analytical point that makes this idea different from the one already decided. The new proposal does not re-litigate the old decision. It addresses the specific cost that drove the old decision, and disarms it. So the conversation has to move on to the other costs.

There are real things the fork would buy. Two of the six load-bearing capabilities, the scoped document browse and the cross-investigation entity database, are exactly what OpenAleph already does well. The entity model with FollowTheMoney, the cross-collection cross-referencing, the document browse by entity. These are years of work that would otherwise need to be built. Building them on a blank page is the kind of work that does not produce investigation insight while you do it.

Time to first investigation drops. The planning explicitly named this as a risk for the blank repo path. If Aura and Häggån research has to start before the substrate is ready, the answer was that gruvor handles it in the interim. A fork shortens the ready date. Maybe from months to weeks. For a tool whose value is the work it enables, weeks matter.

The Swedish NER gap that the document scans flagged turns out to be tractable. The entity analyzer is a separate worker. Replacing the default spaCy multilingual model with KB-BERT becomes a service substitution. Build a worker that conforms to the same queue contract. Swap the image. The rest of the pipeline does not notice. That is architecturally cleaner than surgery on inline code.

What The Clean Fork Does Not Buy

Now the other side. Six services to run for one journalist is not trivial. The infrastructure delta between an OpenAleph fork and a blank-substrate minimum viable product is roughly an order of magnitude in components. Each service is a memory consumer, a log surface, an update target, a thing that breaks independently. The summary scan from the open-source survey was specific about this. Aleph's stack is heavy at one-journalist scale. The fork inherits that weight whether or not Pär ever needs the headroom it provides.

The architecture-by-borrow drift concern that Codex raised does not disappear. It just becomes ours rather than upstream's. The investigation-as-collection metaphor, the document-first browse, the assumption that everything is a FollowTheMoney entity, the search-first paradigm. These survive the fork. Pär and Claude have to live with them or rewrite them. Rewriting parts of a twelve-thousand-commit codebase is not a clean act. It leaves seams. The seams accumulate.

The user interface was designed for journalist teams, not for one person. Collection-centric browse, large-team annotations, information density tuned for analyst workflows. Pär works alone on Swedish mining stories. The user interface is wrong in shape, not in quality. Bending it means touching JavaScript and TypeScript across the entire frontend layer.

The polyglot stack imports permanently. Python plus Vue plus TypeScript plus SCSS. Pär leans Python plus Svelte. A fork accepts the polyglot reality. Not a deal-breaker, but every UI change from this point onward is in Vue.

The artificial intelligence sunk cost bias that the planning explicitly worried about returns through a side door. Once twelve thousand commits live in the repository, future Claude will resist the move to rip a section out and build it differently. The bias is the same one that argued against starting from gruvor. Only the source code changed.

The Structural Weakness

There is one cost that matters more than the others. The framing document names six load-bearing capabilities and is explicit that the first one, the timeline of all things with a spatial filter by area, is the single most consequential capability to preserve. The next mining story, about Aura and Häggån, makes that capability more important rather than less.

OpenAleph is structurally weak exactly there. The closest thing to a timeline in OpenAleph is a date-faceted document filter. That is not the same shape. A spatial filter against a polygon is not in OpenAleph at all. Entities can carry geographic tags, but there is no native primitive for showing all events inside a polygon between two dates.

A fork inherits a data model shaped against this capability. Building the timeline-with-area-filter on top means one of three things. Adding a parallel events table with separate spatial and temporal indexes, reconciled to FollowTheMoney entities by identifier. Stretching the existing types to carry typed event semantics. Or bolting on PostGIS extensions and a separate timeline service that the rest of the code does not know about. None of these is impossible. All of them are work the blank repo path would do with the data model rather than against it.

The same shape applies to the per-document-type review user interface. OpenAleph's ingest is loader-shaped, the same family as gruvor's thirty hand-coded loaders, just better factored. The review UI Pär wants is not there. We would build it. We would build it into a codebase that does not expect it.

The Arithmetic That Does Not Yet Have An Answer

So the picture is mixed. The cross-investigation database and the scoped document browse are real head starts. The timeline with area filter and the per-doc-type review interface are real head winds. The other capabilities are roughly neutral.

Both pairs of capabilities are load-bearing. The framing did not rank them against each other. Whether the head start on two outweighs the head wind on the other two is the arithmetic question. Nobody has actually run the numbers yet. Both readings of the situation, the one that says the fork is the answer and the one that says the blank repo is the answer, are guesses until somebody tries.

The Experiment That Would Settle It

Before committing in either direction, one specific experiment would resolve the open arithmetic cleanly. Stand up an OpenAleph instance locally. Load the existing mining corpus from the X ninety two investigation into one collection. Then try to build a query that returns all events inside the Viken polygon between nineteen eighty and twenty twenty five. Not a polished interface. A SQL query or an API call that returns the right rows.

If the answer is something like subclass a FollowTheMoney type, add a geometry column, join the cross-reference table to a PostGIS query, ship in a day. Then the fork is genuinely attractive. The structural weakness on the load-bearing capability is not actually structural. It is shallow.

If the answer is that the existing query path does not know about typed events, that the FollowTheMoney schema does not natively express the geometry needed for area filtering, that the timeline view does not exist and the document browse cannot fake it. Then the fork is dragging the most important capability uphill for as long as the substrate lives.

Until somebody runs that test, the right next step is not to decide. The right next step is to know.

<break time="1s"/>

The Question That Remains

There is something worth saying about the broader frame of all of this. The choice between forking somebody else's twelve thousand commits and starting from a blank page is not really a choice about code. It is a choice about what kind of relationship Pär wants with the substrate over the years it will exist.

A fork is a marriage to a house somebody else built. The roof is theirs. The floor plan is theirs. You own it now, but you did not design it. Living in it means working around the choices the previous architects made, even after you own the deed.

A blank repo is the slower decision to build a house. Smaller at first. Less impressive. But every wall is where you wanted it. Every door opens toward the room you actually use.

Neither is obviously better. Both have honest costs and honest payoffs. The clean-fork-no-upstream framing is the most interesting version of the inherit-a-house option, because it removes the worst part of inheritance, which is the obligation to keep doing what the previous owner did. What it does not remove is the shape of the rooms.

Whether the rooms are the right shape for Pär's work is the question the experiment would answer. Until then, the planning document holds open the door to both readings. That is probably the right place to be.