PärPod Temp
PärPod Temp
PärPod Temp
02 — Build approach: blank repo with selective borrow
Episode 227m · May 28, 2026
A blank repository starting May 26, 2026, will selectively borrow function-level logic from the frozen gruvor codebase rather than inherit its architecture—rewriting each borrowed piece from first principles while explicitly lifting permissively-licensed solutions from open source.

02 — Build approach: blank repo with selective borrow

Date: 2026-05-26 Status: draft, first pass. Depends on 01-framing.md being ratified (it is, 2026-05-26 23:22). Inputs: 01-framing.md (C1–C6 + hard constraints), Pars_view.md Q11–Q12, omega-1 §4 (what was valuable in gruvor), omega-2 §4 (the six forks), omega-2 §5 (operational micro-forks). Purpose: specify what "blank repo with selective borrow from gruvor" means, why it is the chosen approach over the alternatives, and what choices remain open for Section 03 (function-level keep/borrow/drop) and Section 04 (MVP scope).

Pär ratified blank+borrow on 2026-05-26 23:22 over sliced-gruvor, OpenAleph-as-base, and Datashare-anchored. This document specifies that choice concretely. It does not re-litigate it.


1. The decision in one paragraph

The next substrate starts as a fresh repository, sibling to arebladet2, that borrows from gruvor at the function level — not the file or implementation level. Gruvor remains frozen in place until the new substrate is up; nothing is dragged forward by inertia. Direct code lift from permissive-license open source is explicitly fine — this is an internal tool, attribution discipline applies, but reproducing well-solved problems from scratch is not the goal. OpenAleph, Datashare, gitscrape, and similar are evaluated as candidate sources to lift from or run alongside this approach, not as alternative bases. Each function carried over from gruvor specifically is rewritten from its requirement, with gruvor's implementation read as one reference among several.


2. Why this and not the alternatives

Why not sliced-down gruvor (omega-2 Fork F or similar)

The omega-2 forks A/F treat "keep the spine, fix the entry points" as the lower-risk path. The reasons it does not get picked here:

Why not OpenAleph-as-base

OpenAleph (DARC, MIT-licensed, the open-source successor to alephdata/aleph) is a real candidate, named by Pär in Q7. Reasons it is not the base — stated honestly, since this is the rejection Codex flagged as the weakest in an earlier draft:

Why not Datashare-anchored

The omega-1 §5 sweet-spot hypothesis was "Datashare for documents + thin Swedish-pull layer + parmaps for maps." The omega-2 C1 standup found Datashare ingested 225 X92 docs in seconds with full-text search + in-browser PDF + entity browse. Real strengths. Reasons it is not the anchor:

Why blank+borrow specifically


3. What "blank" means concretely

What blank does not mean:


4. What "borrow" means concretely

Borrowing operates at the function level (not the file level) for gruvor, and at the code level (not just integration level) for permissive-license open source. The substrate is an internal tool; license discipline is real, but the framing is "use what is useful, attribute when required, do not reproduce gruvor's shape by inertia."

Four borrow tiers. The full keep/borrow/drop table is Section 03's job — this section commits to the tier shape only.

Tier 1 — Borrow the function from gruvor, rewrite the implementation. Things gruvor solved that the substrate needs: Bolagsverket-ingest, SGU/Bergsstaten pulls, the timeline model (capability, not data structure), spatial joins on EPSG:3006, PII-redaction discipline, the press-question sequencing methodology. The new substrate writes these from their requirement; gruvor's code is read as a reference, not pasted. Why rewrite-not-paste for gruvor specifically: gruvor's code carries the shape that omega-1/omega-2 flagged (4185-line CLI, route assumptions, scheduler patterns); the requirements are clean, the implementations are not.

Tier 1B — Lift code directly from permissive-license open source. OpenAleph (MIT), gitscrape, FollowTheMoney schema (MIT), spaCy sv_core_news_lg model, KB-BERT, smaller utility libraries. Where a permissive-license (MIT/BSD/Apache) project has a clean implementation of something the substrate needs, lifting code is explicitly fine. Internal tool, no redistribution concerns for permissive licenses. Discipline: attribute in source where the license requires it, keep a THIRD_PARTY.md log of where things came from, and re-evaluate license compatibility if the substrate ever graduates to a public deploy.

Datashare is not in Tier 1B. Datashare is AGPL-licensed and gets handled separately. The clean boundary is running Datashare as a separate unmodified service (we issue API calls to it; nothing lifted into our repo). Lifting Datashare code into the substrate or modifying it and exposing the combined result over a network — including internally over our own UI — can trigger AGPL §13 source-offer obligations on the combined work. "Internal tool" does not exempt this; AGPL's network-conveyance trigger fires regardless of public-deploy status. If Section 03 wants Datashare's document-viewer or full-text-search UI, the default path is "embed an iframe to the unmodified Datashare service," not "lift its viewer code into our repo." Modifying Datashare or lifting any of its source files needs a deliberate decision, not a shrug.

Architecture-by-borrow drift — explicit risk. Tier 1B can import an entity model, document model, ingest pipeline, or schema piecemeal along with its architectural assumptions. The assumptions are not visible at the function level but compound at the C1/C4 level. Section 03's per-candidate check has to evaluate each lift against the framing's capabilities, not just against "does this function work."

Tier 2 — Migrate the data from gruvor, drop the code. Things where gruvor's output is durable but its production code is not worth carrying: the X92 dossier corpus, the 27 dossier folders' content, the parquet snapshots, the API credentials, BankID-pulled documents. The new substrate consumes these as inputs to its own bootstrapping.

Tier 3 — Drop entirely. Things that didn't earn their build cost (omega-1 §4 list): /companies/{orgnr} expansion, /graph route shape, scheduler/LaunchAgent/doctor stack as currently implemented, write surfaces (/inbox, /dropins, /story), scaffold routes, the 4185-line CLI. Nothing borrowed; not even as reference.

Tier 1B is the change from "blank repo with selective borrow from gruvor" to "blank repo with selective borrow, including direct code lift from permissive OSS." It is the path that closes the gap on document-corpus features (where Datashare and OpenAleph are mature) and Swedish NER (where KB-BERT exists). Section 03 names specific Tier 1B candidates per capability.


5. Repo location, name, and shape

Location

~/ai/tools/<name>/ — under tools (it is a tool, with code), not projects (which are research/explorations per ~/ai/projects/CLAUDE.md). Sibling to arebladet2/. Specifically not inside arebladet2/ per the framing's "modular for arebladet2, not built into arebladet2 yet" hard constraint.

Name (open question)

Candidate constraints:

This document does not pick a name. Pär picks the name; a few candidates to react to:

None of these is a recommendation. The naming choice is the kind of thing that benefits from one clean decision rather than from comparison; Pär picks or vetoes and proposes.

Shape (open, listed only to flag the questions Section 04 will close)

This subsection lists open shape questions, not decisions. Each is Section 04's to commit on. Listing them here only to keep Section 03 (component selection) from inheriting them as quiet assumptions:


6. How this serves the C1–C6 capabilities — compatibility check, not design

This is a no-obvious-incompatibility check, not a capability-delivery argument. Demonstrating that nothing in blank+borrow forbids a capability does not protect the load-bearing parts of that capability. The actual delivery is Section 04's (MVP scope) and beyond. Codex flagged this section as "compatibility theatre" in an earlier draft — this rewrite restates it honestly.

No capability is in obvious tension with blank+borrow. C1 and C4 carry unresolved load that Section 04 must address; flagging here so they don't get inherited as "solved" by Section 03.


7. How this serves the hard constraints


8. Risks of this approach (named, not hand-waved)

Blank+borrow is not free. The honest cost list:

These are the risks. They are tractable. They are not reasons to reverse the decision.


9. What this section deliberately does not commit to


10. Acceptance signal for this section

This section is good enough to anchor Section 03 if Pär can answer yes to:

If any answer is no, this section is revised before Section 03 starts.