PärPod by Claude
PärPod by Claude
PärPod by Claude
Aleph: The Mythical Point and the Practical Toolkit
13m · May 17, 2026
Aleph: The Mythical Point and the Practical Toolkit

Aleph: The Mythical Point and the Practical Toolkit

A Point That Contains All Others

In a short story written in nineteen-forty-five, the Argentine writer Jorge Luis Borges described a small luminous sphere about two centimeters across, located in the basement of a house in Buenos Aires. The narrator descends, lies on his back on a tiled floor, and looks at it. What he sees is impossible to describe in linear language because what he sees is everything, simultaneously. Every place on earth, every face he had ever known, every moment of his life, the dust of his bedroom, the population of a Mexican city, his lover's distant correspondence, all happening at the same instant in a sphere no bigger than a tangerine.

The story is called The Aleph. It gives its name to the first letter of the Hebrew alphabet, and in the story, to a metaphysical concept — the point in space that contains all other points without distortion. It is a dream of total awareness, and the narrator goes mad trying to describe what he saw.

Some sixty-five years later, in two thousand fourteen, a German programmer named Friedrich Lindenberg was sitting in the offices of Code for Africa in Cape Town. He was a Knight Fellow at the International Center for Journalists, which is the kind of grant that lets a software person spend a year embedded with a newsroom or a media collective instead of writing more software for advertising platforms. Lindenberg had been working with the African Network of Centers for Investigative Reporting. His project there was called Grano. The basic idea was simple to state and very complicated to build, which is a phrase you should mark down because it will describe most things in this episode. The idea was a tool that lets you trace the connections between public officials and the private businesses they secretly own.

He named the next version Aleph. The name was a joke and a thesis at the same time. The joke is that investigative reporters dream of a tool that would show them everything at once, like Borges' narrator on his basement floor. The thesis is that you cannot actually have such a tool, but you can build something that gives you a useful slice of it for the specific work you are doing right now.

The Problem Aleph Was Built To Solve

To understand why Aleph matters, you need to understand the specific shape of an investigative reporter's bad day. You have eight hundred pages of leaked emails. You have a spreadsheet from a company registry in Cyprus with twenty-three thousand rows. You have a stack of court filings from Romania, scanned poorly, in a language you do not speak. You have a friend's notes from a previous investigation that mentions someone with a similar name. You have a sanctions list updated this morning that you have not read yet.

Somewhere in all of this is the connection that turns four loose threads into a story. The question is whether you will find that connection in time, or whether you will spend three weeks of fourteen-hour days and end up no closer than when you started.

[serious]

The traditional answer is a wall covered in printouts and red string, which is funny in detective movies and not funny when it is your actual job. The slightly better answer is a folder of PDFs, a folder of spreadsheets, and a Google Docs file where you take notes. This works for one investigation. It does not work for the fifth one, where you need to ask whether anyone in this story showed up in any of the previous four.

Aleph is the answer to that question, designed by someone who had personally lived it. You drop in your documents. The system extracts entities from them, people, companies, places, identifiers. It indexes everything for full text search. And critically, it cross-references entities against datasets you have previously loaded, including the official company registry of dozens of countries, sanctions lists, leaked databases like the Panama Papers, and any prior investigation you have ingested.

When you search for a name, you are not just searching this one folder of documents. You are searching everything you have ever put in, and you are getting back not just text matches but entity matches. The Aleph search engine understands that Friedrich Lindenberg is the same person whether he is mentioned as F. Lindenberg or Friedrich Carl Lindenberg or Friedrich C. Lindenberg, and it will surface all three references when you search for any one of them.

The Schema Underneath

Here is where it gets technically interesting, because the part of Aleph that matters most is not the user interface or the search results. It is the data model. The way the system thinks about what an investigation is made of.

Aleph stores its understanding of the world using a schema called Follow The Money. The name is the journalistic instruction made literal. The schema defines about a hundred different types of things that can exist in an investigation, and the relationships between them. There are people, companies, government bodies, vessels, aircraft, real estate properties, contracts, court cases, payments, sanctions, ownership relationships, and so on. Each type has specific fields, and each can connect to other types in specific ways.

When you ingest a leaked email, Aleph runs natural language processing across it, finds the words that look like names of people and companies and places, and creates Follow The Money records for each one. When you ingest a company registry filing, it parses the structured data into the same schema. When you ingest a sanctions list, same thing. Now everything in your investigation, regardless of where it came from, is speaking the same vocabulary.

[calm]

This sounds technical and abstract until you see what it lets you do. Suppose you are investigating mineral exploration permits in northern Sweden. You have the permit list from the Swedish Mining Authority. You have the corporate filings from the Swedish company registry. You have the parent company annual reports from the London Stock Exchange. You have a news article from an Australian newspaper from two thousand twenty-two about the same parent company.

In a traditional setup, those are four separate folders of files and you carry the connections in your head. In Aleph, all four ingest into the same schema. The Swedish permit holder appears as a company entity. Its registered officers are person entities, linked to the company by an officer-of relationship. The parent company is another company entity, linked by an ownership relationship with a specific percentage. The Australian article mentions the same parent company under a slightly different spelling, but Aleph's entity resolution figures this out and links them.

What you get, at the end, is a graph. Not a metaphor for a graph but an actual graph data structure that you can query. Who owns the most permits in this kommun? Which companies share directors with sanctioned entities? Has any of these officers appeared in any document we have ingested in the last five years? These are questions that previously required a wall of red string.

Why The Name Matters

Lindenberg has been very explicit about why he named the tool after the Borges story. He has written that the name is meant to be aspirational, but not in the marketing sense. It is aspirational the way a goal is, something you are reaching for that you will never quite achieve. The mythical Aleph contains all points in the universe. The actual Aleph contains the points you have personally put into it, indexed against the points that other journalists in the OCCRP network have put into theirs.

This matters because it is honest about what the tool is. It is not a magical surveillance database. It is not a Palantir for journalism. It is a careful, slow, manual accumulation of structured information about specific entities, built up over years by many people working on related stories. The power comes not from any single ingestion but from the cumulative network effect of many ingestions all using the same schema.

There is a feature in Aleph called peek. If you search for a name and the system knows that name appears in someone else's private collection that you cannot access, it tells you that. You see no content. You just see that the connection exists. This lets two journalists who do not know each other discover that they are working on adjacent stories, and reach out to compare notes. It is a small feature in software terms. It is a large feature in journalism terms.

How You Could Actually Use This

For the kind of work you would care about, the relevant question is not whether to run a full Aleph instance. Running Aleph properly requires Elasticsearch and PostgreSQL and Redis and a queue worker and several other moving parts. It is a real system. You do not want to run it for one investigator working on local stories from a Swedish mountain village.

[calm]

What you want to take from Aleph is the schema. Follow The Money is published separately as a Python package, with a clean, portable definition of how to represent investigative data. You can use it without running Aleph at all. You build your own much smaller database, your own much simpler indexing, but you store your records in the Follow The Money vocabulary.

Why does this matter? Because the day might come when you have built up enough on a specific local beat that it is worth sharing with someone else. Another reporter doing a story on the same companies in a different region. An academic researcher studying corporate ownership patterns in the Nordic mining sector. A colleague at a national paper picking up a thread you started. If your data is in the Follow The Money schema, sharing it is a file export and an import. If your data is in your own private idiom, sharing it is a translation project.

There is a deeper benefit too. Adopting an external schema for your data forces you to be explicit about what kinds of relationships you care about. You cannot say "this company is somehow connected to that person." You have to say "this person is a director of this company between these dates" or "this person was a beneficial owner of this company at this percentage." The schema imposes precision. Precision is what turns hand-waving into reporting.

What Comes Out The Other End

The article that gets written, eventually, is not the database. It is a piece of journalism, in prose, that a reader can understand. But the database is what makes the article defensible. When an editor asks how you know that the Swedish permit holder is actually controlled by a parent company based in the Cayman Islands, you can show them the chain. Company X is one hundred percent owned by Company Y. Company Y filed an annual report on this date showing this director. The director's signature appears on this Cayman filing. Every step is a record. Every record has a source.

This is what the schema buys you. Not just the ability to find connections, but the ability to defend them. The journalism is more credible because the data underneath it is more legible. Readers may never see the graph, but the existence of the graph is felt in the article's confidence.

[serious]

The lineage of this kind of work runs through the Panama Papers in twenty-sixteen, the Paradise Papers in twenty-seventeen, the Pandora Papers in twenty-twenty-one, and many smaller leaks that did not get globally branded names. The International Consortium of Investigative Journalists used Aleph and a related tool called Datashare to coordinate hundreds of reporters across dozens of countries on these stories. The data was in the schema. The reporters spoke the same vocabulary. The connections that nobody could have found alone, several hundred people found together.

The Quiet Permission

The thing worth taking away from Aleph, as a working journalist, is not the software itself. It is the recognition that investigative work has been formalized to the point where there are standard schemas and shared vocabularies. This was not true twenty years ago. It is true now. The Follow The Money schema is open. The OCCRP and ICIJ networks publish their methodologies. The data licenses on most public corporate registries are permissive.

A reporter working from a Swedish mountain village in twenty-twenty-six has access to roughly the same conceptual infrastructure that ICIJ used for Panama Papers. The scale is different. The complexity is different. The patterns are exactly the same. Borges' narrator went mad because he saw all points at once. The actual Aleph keeps you sane by giving you a small, structured slice of the universe at a time, and lets you build that slice up slowly, one investigation at a time, until somewhere down the road you turn around and realize you have built something genuinely useful.

That is the trick of all good tools. They are mythical in their description and mundane in their use. You sit down on a Tuesday afternoon and you ingest twelve PDFs and you tag three companies and you write down what a director said in a corporate filing from two thousand nineteen. And five years later, when something interesting happens in your kommun, you already have most of the pieces. The Borges story ends in madness. The Lindenberg tool ends in articles that are correct.