PärPod Temp
PärPod Temp
PärPod Temp
OSS scan — consolidation across four agents
11m · May 28, 2026
Four parallel OSS agents converged on OpenSanctions + Docling + MapLibre + PostGIS as load-bearing lifts for a Python-Svelte investigation platform—but Section 03 still has to prove each one earns its place.

OSS scan — consolidation across four agents

Date: 2026-05-26 23:42 Inputs: four parallel general-purpose agents, each scoping a different capability slice. Raw scans live alongside this file:

Purpose: condense the four scans into a single shortlist Section 03 (03-keep-borrow-lift.md) can score against. Convergence across agents is the strongest signal; single-agent recommendations are noted but trusted less.


1. Strong convergence — load-bearing lift targets

Multiple agents independently named these. All MIT or Apache-2.0. No AGPL in the lift path.

| Component | License | Lift target | Capability | Convergence | |---|---|---|---|---| | OpenSanctions triofollowthemoney (schema) + nomenklatura (entity resolver) + zavod/yente (ingest + search/reconcile API) | MIT | The entity layer spine. Pers-store of "X = X" decisions that survive across investigations. | C4 (and C6 architecturally) | Agents 1, 2, 3, 4 | | OpenAleph | MIT | Reference architecture only — cherry-pick FtM-schema usage, ingest patterns, Vue viewer if useful. Don't adopt as base. | C3, C4 reference | Agents 1, 2, 3 | | vis-timeline | Apache-2.0 / MIT dual | The boring-correct timeline component. Pair with MapLibre/Leaflet for bbox spatial filter. | C1 | Agents 1, 4 | | MapLibre GL + maplibre-gl-draw | BSD-3 | Frontend map with polygon-draw for area filter. Avoids gruvor's point-only filtering bug. | C1 | Agent 4 (with agent 1 referencing Leaflet alternative) | | PostGIS + TimescaleDB | PostgreSQL / Apache-2.0 | Storage layer for spatial+temporal events. Boring, correct. | C1 | Agent 4 | | Label Studio config DSL | Apache-2.0 | Lift the XML labelling-config schema language. Don't lift the platform. Build a Svelte review UI on top of the DSL. | C2 | Agents 1, 3 | | Docling (IBM) | MIT | Default document parser — Bolagsverket PDFs, prospectuses, regulator decisions. | C2 parse step | Agent 3 | | olmOCR (AllenAI) | Apache-2.0 | The right tool for scanned Swedish historical docs (Bolagsverket scans, 1960s–80s prospectus). | C2 OCR step | Agent 3 (alone but high-confidence) | | KB-BERT (KBLab, KB.se) | MIT | Swedish NER fine-tuned on SUC 3.0. Non-negotiable replacement for OpenAleph's default spaCy multilingual NER on Swedish corpus. | C2 entity step | Agent 4 | | python-stdnum | LGPL | Orgnr validation + a lot of other Swedish identifier formats. | C6 ingest hygiene | Agent 4 | | Splink (UK Ministry of Justice) | MIT, very active | Entity resolution at scale — kicks in above ~100k entities. dedupe (MIT) and recordlinkage (BSD-3) are smaller-scale alternatives. | C4 | Agent 2 | | memorious | MIT | Crawler-for-press patterns. The cleanest path to C5 (press articles as first-class entities) when paired with a thin FtM Article subclass. | C5 | Agent 1 |

Single converged recommendation across all four agents: the OpenSanctions trio (followthemoney + nomenklatura + zavod/yente) is the architectural spine the substrate should be built around. Everything else composes onto it.


2. Run-alongside-only (license boundary)

Lifting code from these would propagate strong copyleft. The clean path is "co-located service, called via API, unmodified." Section 03 can decide whether to run them, but lifting is off the table.

| Component | License | Why blocked from lift | Possible run-alongside use | |---|---|---|---| | Datashare | AGPL-3 | AGPL §13 network conveyance triggers source-offer obligations on the combined work, even internally. | C3 doc viewer + full-text search via iframe to unmodified service. | | DocumentCloud | AGPL-3 | Same. | Same shape if Datashare's posture isn't enough. | | Marker | GPL-3 | Strong copyleft on direct lift. | Run as subprocess for fast Apple Silicon PDF→Markdown. | | MinerU | AGPL-3 | Same as Datashare. | PDF parsing as subprocess. | | Paperless-ngx | GPL-3 | Strong copyleft on direct lift. | Tag/correspondent/type model worth reading; don't lift code. | | Khoj | AGPL-3 | Wrong abstraction anyway (RAG-shaped). | Skip. | | FalkorDB | SSPL (per agent 2 knowledge — confirm) | SSPL is generally considered non-OSI and lift-hostile. | Skip, prefer Kùzu or Postgres. |


3. Inspiration-only (license-blocked or maturity-blocked)

Read these for design ideas; do not lift code.

| Component | Why inspiration-only | |---|---| | forensic-architecture/timemap | Do-No-Harm license — lift design patterns, not code. Best reference for spatial+temporal investigation UI. | | PANO, CJWorkbench | CC BY-NC (non-commercial). Ambiguous internal-use status; safer to read patterns and skip. | | OpenCTI | Apache-2.0 but STIX schema is cyber-shaped; lift Investigation-page UX, not vocabulary. | | Aleph (alephdata/aleph) | Sunsets Dec 31 2025. Dead-end. OpenAleph is the live branch. | | Anytype | Apps are source-available, not OSS. UX inspiration only; no code lift. | | Kùzu | MIT but Apple-acquired in 2025 — staleness risk on open-source side. Prefer Postgres unless we genuinely need property-graph traversal. | | TerminusDB | Apache-2.0, small team. The "git-for-data branches per investigation" idea is interesting, but maturity/team-size risk is real. Lift the idea, not the engine. |


4. Sweden-specific shifts surfaced by the scans


5. Open uncertainties from the scans


6. Suggested composition (synthesised, not committed)

Compiled from the strongest-converged signals. Section 03 is where this gets formally decided per-component.

Everything load-bearing is MIT, Apache-2.0, BSD, or compatible. No AGPL or strong-copyleft in the lift path. Datashare is the one AGPL service worth keeping as a run-alongside option for C3.


7. What this consolidation deliberately does not do


8. The single sentence

If the substrate were sketched tonight from these scans, the answer would be: a Python-and-Svelte tool built around the OpenSanctions trio for entities, Docling+olmOCR+KB-BERT for ingest, MapLibre+vis-timeline+PostGIS for the spatial-temporal layer, with Datashare optionally co-located for document reading. Section 03 decides whether each component actually earns its place.