Five Kinds Of Both: On Letting FollowTheMoney And A Local Model Share One Substrate

The Question Inside The Question

[calm] Pär asked a sensible question while walking. Could we have FollowTheMoney and our own entity model side by side? The previous episode framed it as a binary. Adopt the OpenSanctions spine, or grow our own vocabulary. The question Pär actually wants to ask is whether both is on the table.

It is. But the answer is not one configuration. It is at least five. Asking whether two entity models can coexist is asking the wrong shape of question. The right question is which kind of coexistence. That distinction is what this episode tries to make legible.

The five configurations are real. Each has been built. Each has tradeoffs that show up only after some time living with them. By the end Pär should be able to look at any architecture proposal and ask, of these five, which one is this. That is the move that turns "can we" into "should we."

What FollowTheMoney Actually Is, In One Paragraph

[calm] Before the configurations, a grounding. FollowTheMoney is a Python library that defines entity types. Person, Company, Address, Asset, Ownership, Payment. The types are described in YAML files. The library handles serialization, validation, identifier fingerprinting, cross-references between entities. It is the data model behind OpenSanctions, behind ICIJ leaks, behind every Aleph instance. It is also extensible. You can point an environment variable at a directory of your own YAML files and your types become part of the active model. That last detail matters for what follows.

Configuration One. Two Stores, No Bridge

The simplest version. Spaden has its own database with its own tables for mining-physical things. Förekomst. Provborrhål. Dispens. Permit-detalj. Mineral assay. Geometry-attached objects, all the things FollowTheMoney does not name natively.

FollowTheMoney lives somewhere else, in a separate store, with its Companies and Persons and Assets, populated by OpenSanctions feeds or registry pulls. The two stores do not know about each other. No foreign keys, no joins, no shared identifiers.

This configuration is rare in practice because it almost never matches reality. A mining claim is owned by a company. A drill site is operated by a person who is also a board member somewhere. The entities in the local store reference the entities in the FollowTheMoney store constantly. A no-bridge configuration forces every cross-reference to be looked up by name string, which decays the moment names drift.

Use this only when the two stores genuinely never need to talk. For Spaden, they will need to talk. So this configuration is real, but it is not the right choice.

Configuration Two. Local Primary, FollowTheMoney As Side Store

Spaden's data model is the spine. Mining-physical entities are first-class. Förekomst has its own table, its own properties, its own queries. Same for the rest of the local vocabulary.

FollowTheMoney lives beside it, in its own store, populated from registries and external feeds. The bridge between them is a foreign-key relationship. A Förekomst record has a field that says "operated by FollowTheMoney entity Q seven eight nine one." Looking up the operator means walking from Spaden into FollowTheMoney by identifier.

This configuration is clean because the responsibilities are clear. Local model handles mining-physical things. FollowTheMoney handles the entities that cross investigations. Cross-investigation discovery flows through FollowTheMoney because that is where the entities with cross-investigation potential live.

The cost is that you maintain two query paths. A search for "everything related to this person" has to walk both sides. The local model needs to know about FollowTheMoney IDs without knowing about FollowTheMoney semantics. Doable, but the boundary requires discipline.

Configuration Three. FollowTheMoney Primary, Local Model As Extension

The inverse. Adopt FollowTheMoney as the primary data model. Use its environment variable extension mechanism to add new YAML schemas for mining-physical things. Förekomst becomes a subclass of Asset. Provborrhål becomes its own type with a geometry property. The whole local vocabulary lives inside the FollowTheMoney model as extensions.

Everything is FollowTheMoney. Everything serializes the same way, validates the same way, indexes the same way. Nomenklatura works on the whole thing without modification. The Aleph and OpenAleph ecosystems can ingest your data because it is FollowTheMoney-shaped.

The cost is foreign-vocabulary inheritance. The Asset type was designed for financial-crime use cases. Inheriting from it imports assumptions about ownership, valuation, transfer. Some of these will fit mining-physical reality. Some will not. The places where they do not fit show up as awkward subclass design or unused inherited properties.

This is the configuration the planning document gestured at when it said "subclass an Asset type and let it carry mineral-specific properties." It is real and supported. Whether it is right depends on how much of the inherited vocabulary actually earns its keep.

Configuration Four. The Bridge Pattern

Spaden has its own model. FollowTheMoney is not used as primary storage at all. But a translator sits between them. The translator can read FollowTheMoney data from external sources and shape it into Spaden's model. The translator can also emit Spaden's data in FollowTheMoney format when needed for export or comparison.

The bridge is a one-way or two-way translator. Not a join, not a foreign key. A function that takes one representation and produces another.

The advantage of this configuration is autonomy. Spaden's model is whatever Spaden needs it to be. FollowTheMoney is a guest you can let in or out by running the translator. If you decide to drop FollowTheMoney later, you remove the translator. If you decide to lean on it more, you run the translator more often.

The cost is that the translator is a maintenance surface. Every change to either model requires updating the translator. The translation is also lossy in both directions, because the two models name slightly different things at slightly different levels of abstraction. The lossiness is fine if the use cases for translation are bounded. It becomes a problem if you want translation to be the primary integration mechanism.

This configuration is what most journalist tools that use FollowTheMoney as an interchange format are actually doing. They use it on the wire, not in the database.

Configuration Five. Shared Identity Layer, Independent Models

The most elegant configuration, and probably the most interesting one for Spaden. Use FollowTheMoney's identifier fingerprinting library. Every entity in Spaden that could plausibly cross investigations gets a fingerprint identifier computed from name plus country plus identifier numbers. The fingerprint is FollowTheMoney's algorithm, but it does not require you to use FollowTheMoney's data model.

Nomenklatura can then work against those identifiers. The cross-investigation magic — same entity in two investigations resolves to the same canonical identifier — flows from the identifier layer, not from the data model. Spaden's tables can stay shaped exactly the way Spaden needs them. The vocabulary is local. The identity is shared.

This configuration buys cross-investigation reconciliation without inheriting the anti-corruption framing of the FollowTheMoney schema. You can use Nomenklatura's resolver, its judgement store, its persistence of human decisions about which entities are the same. None of that requires that your Person record have the same shape as FollowTheMoney's Person record. It only requires that the identifier algorithm agrees.

The cost is that you reimplement the parts of FollowTheMoney that work with the identifier layer. Mostly that means the matchers — the functions that compare two entities and decide whether they might be the same. Some matchers ship with FollowTheMoney and assume FollowTheMoney property names. You write your own, or you adapt theirs.

For Spaden, this configuration is the one worth taking seriously. The cross-investigation database capability is load-bearing. The mining-physical vocabulary is the part that does not fit FollowTheMoney natively. Configuration five lets the first part be off-the-shelf and the second part be ours. The vocabularies stay separate. The identities are shared. That is a clean line.

The Trap That Is Not A Configuration

There is a sixth thing that gets called "side by side" but is really a failure mode. Two stores with no bridge and no plan, accumulated because the project could not decide which model to use and built both. Two query paths, two ingest pipelines, two reconciliation strategies, two places where the same conceptual entity might exist as two different records.

This is what happens when the side-by-side question gets answered with "yes" without specifying which configuration. Each of the five configurations above is deliberate. The trap is the absence of deliberation. The trap is also reversible only at high cost, because once the data is in two stores under two vocabularies, migrating to one requires reconciliation work that nobody wants to do.

The lesson is that side by side is not a default. It is a choice that has to name itself.

The Honest Middle

For Spaden, the configurations worth considering are two and five. Configuration two is the pragmatic answer. Local model is primary, FollowTheMoney is a side store for cross-investigation entities, the bridge is foreign keys. Clear responsibilities, manageable complexity.

Configuration five is the elegant answer. Local model is primary, FollowTheMoney provides the identifier layer and the reconciliation engine, the data models stay independent. Cleaner separation, more work to set up, lower long-term cognitive load.

Configurations one and three are real but probably wrong. One does not match reality. Three inherits more vocabulary than mining-physical work needs. Configuration four is good for interchange but not as a primary integration strategy.

The choice between two and five is real and worth Section three of the planning sequence picking deliberately. Both are honest answers. They differ in where the cost lives. Configuration two pays its cost in foreign-key discipline forever. Configuration five pays its cost up front, in implementing the identifier-layer and the matchers, and then mostly stops paying.

The Question Reframed

Pär's original question was can we have both. The answer is yes, in at least five distinct ways. The interesting question is which way.

If the cross-investigation capability matters most and the local vocabulary is genuinely mining-shaped, configuration five is worth the up-front cost. If the local model is going to grow slowly and most cross-investigation work happens through registries and external feeds, configuration two is simpler.

Either way, side by side is not the easy compromise it sounds like. It is a deliberate architectural choice that names what each side is responsible for and where the bridge lives. The configurations that work, work because somebody drew the line clearly. The configurations that fail, fail because nobody drew the line and the substrate accumulated two of everything.

The good news is that Spaden has not been built yet. The line can be drawn before there is anything to migrate. That is the moment when this question is cheapest to answer.