PärPod by Claude
PärPod by Claude
PärPod by Claude
Entity Resolution: Deciding When Two Companies Are One
5m · May 30, 2026
Entity Resolution: Deciding When Two Companies Are One

Entity Resolution: Deciding When Two Companies Are One

Seventeen Names, One Company

In your mining investigation you ran a tool over the company records tied to the exploration outfit at the center of it all. Seventeen separate entries went in. Seventeen came out as one. The tool decided that all of them, despite differences in spelling, punctuation, and form, pointed at the same real company. And here is the number that actually matters in this kind of work. Zero false merges. It never once glued two genuinely different companies together. For a journalist tracing who owns what across a web of registries, that zero is worth more than the seventeen. Because the whole game of entity resolution is not about being clever. It is about which way you are willing to be wrong.

A Name Is Not an Identity

Start with why this is hard at all, because it sounds like it should be trivial. Surely you just match the names. But the same company shows up in the world as a dozen different strings of text. One registry writes it in full capitals. Another tucks a little marker on the end that means it is publicly listed. A filing abbreviates limited to three letters. A typo creeps into one database. A holding entity sits one layer up with a near-identical name. To a plain text comparison these are all different. To a human reading carefully they are obviously the same outfit. The gap between those two judgments is the entire problem, and it cannot be closed by matching letters.

So the tool does what a careful human does, only at scale and without getting bored. It does not ask, are these two strings identical. It asks, how much evidence is there that these two records describe the same real thing. Similar name, yes, but also, do they share a registration number, the one truly unique fingerprint a company has. Do they list the same address. The same directors. The same parent. Each shared detail is a vote. Pile up enough votes and you cross a line and declare, these are one. Stay below the line and you keep them apart. The line is a threshold, and where you place it is the most consequential decision in the whole exercise.

Two Ways to Be Wrong

This is the part that matters, so slow down here. There are exactly two mistakes you can make, and they are opposites. You can merge two records that should have stayed separate, deciding that two different companies are one. That is a false merge. Or you can keep two records apart that should have been joined, missing that they were the same company all along. That is a missed link. You cannot drive both to zero at once. Push your threshold lower, demand less evidence before merging, and you catch every real connection but you start gluing strangers together. Raise the threshold, demand more proof, and you stop gluing strangers but you start missing real ties. Every setting is a trade between those two failures.

Now, which one should a journalist fear more. Think about what each error does when it ends up in print. A missed link means your map of who owns what is incomplete. That is bad. You under-report the network, you miss a connection, the story is thinner than the truth. But a false merge is a different category of harm. It means you have publicly asserted that two separate companies, possibly with separate owners, are the same thing, when they are not. In an investigation about who controls mineral rights across a thousand square kilometers, that is not a thin story. That is an accusation about the wrong people. It is the kind of error that is wrong about a person, not just incomplete about a network.

Tuned Toward the Safer Failure

That is why your zero false merges is the headline and not the seventeen. The tool was tuned to be cautious about joining, to demand real shared evidence, registration numbers and not just lookalike names, before it would collapse two records into one. It would rather leave a genuine link unmerged, for you to catch by hand later, than ever stand up and declare two different companies identical. For most ordinary uses of this technology, cleaning a mailing list, deduplicating customers, the missed link is the error people fear and the false merge is a shrug. For accountability journalism the fear is flipped, and the settings should be flipped to match.

So the keeper here is not about any one tool. It is a way of thinking that applies every time you ask software to decide that two things are really the same thing. Identity is never a match of letters. It is a weighing of evidence against a threshold you choose. And choosing that threshold is choosing which mistake you can live with. Merge too eagerly and you accuse the innocent. Merge too timidly and you miss the truth. The seventeen records folding into one was the tool doing its job. The zero false merges was you, whether you realized it or not, deciding that in this work it is far better to miss a link than to invent one.