The Teller And The Accountant: On Whether Spaden Needs One Database Or Two

The Question, Stated In Plain Words

[calm] Pär asked, while travelling, about the difference between two kinds of database workload. Transactional and analytical. The acronyms are inconvenient for spoken language. The shapes they describe are not.

The transactional kind is what a database does when it remembers one thing at a time. A new document arrives. A review decision gets recorded. An entity is marked as the same as another entity. Each of these is small. Each is fast. Each must not be lost.

The analytical kind is what a database does when it summarises many things at once. How many events happened in the Viken polygon between nineteen eighty and twenty twenty five, grouped by year. How many documents mention this company across all investigations. What fraction of mining permits in Åre kommun also involved a Bolagsverket filing the same week. These questions touch a lot of rows. They take longer. They produce a single answer or a small table.

Both kinds of work happen in Spaden. The question is whether they happen in one database or two.

The Metaphor That Helps

The clearest way to think about this is two roles in a bank. The teller and the accountant.

The teller takes a single deposit, hands you a receipt, files the slip, and is done. Each interaction is fast. Each interaction is unique. The teller never gets to think about the bank's overall financial position. That is not the job. The job is being correct about each individual deposit.

The accountant never sees a single deposit go through. The accountant sees the entire month, summed and grouped and broken out by region and product line. The accountant takes longer than the teller. The accountant's answers are larger. The accountant tolerates a small delay between when a deposit happens and when it shows up in the report.

The teller's database optimisations are for speed and correctness on small transactions. The accountant's database optimisations are for speed on large scans and aggregations. These are different shapes. A database that is great at one is often only adequate at the other.

The technical names for the two workloads are Online Transaction Processing for the teller and Online Analytical Processing for the accountant. From here on, just teller-work and accountant-work.

What Spaden Actually Does

Spaden does both. Plenty of teller-work and meaningful accountant-work, mixed together.

The teller-work in Spaden looks like this. A new Bolagsverket document arrives. Spaden writes it to disk, indexes its text, creates entity records for the company and any people mentioned, ties it to an investigation, sets a status flag. All of this happens for one document at a time, hundreds of times a week. Pär reviews a sample in the per-doc-type review user interface; the validation is recorded, the loader's confidence creeps up. Pär resolves two entities that turn out to be the same company under different spellings; the resolution is recorded, the canonical identifier is updated. Each of these is a small write that must not be lost.

The accountant-work in Spaden is the part Pär actually came to the substrate for. Show me all events in the Viken polygon between two dates, on a timeline. Show me every entity that appears across more than one investigation. Show me a heatmap of mining permit activity across Jämtland by decade. Find me companies whose board members also serve on Åre kommun's planning committee, across the entire archive. These touch many rows. They group, aggregate, filter, join. They are the queries that make the cross-investigation database real, not just nominal.

These two workloads are both load-bearing. Spaden cannot pick one. It has to do both.

Why The Shapes Are Different

A database tuned for teller-work stores rows side by side. Each row is one record. Pulling one row is fast because the database knows where each row begins and ends. Indexes help it jump to the right row without scanning.

A database tuned for accountant-work often stores columns side by side instead of rows. All the dates in one place. All the locations in another. This lets the database scan only the columns it needs for a question. If the question is "count by year," the database reads the year column and nothing else. The scan is fast even when there are tens of millions of rows, because the database is reading one tenth or one twentieth of the total bytes.

For a teller-style write, columnar storage is bad. The database has to update many places to add one row. For an accountant-style read, row storage is bad. The database has to scan all the columns of every row even when it only cares about one. Each shape has its preferred use; each shape pays a cost when used for the other.

This is the structural reason a single database that handles both perfectly is rare. Most actual systems compromise. They are good at one and acceptable at the other.

At Spaden's Scale, One Postgres Does Both

The good news is that Spaden is not at a scale where the compromise hurts.

One journalist over many years produces a corpus that is large by human standards but small by database standards. Tens of thousands of documents. Hundreds of thousands of entity-property rows. Low millions of events when the timeline grows to its mature shape. These are numbers a properly indexed Postgres handles without complaint. Both the teller-work and the accountant-work fit on one server, in one schema, with one set of backups.

The boring correct answer is Postgres with two extensions enabled. PostGIS handles the spatial queries — point-in-polygon, area filters, joins on geometry. TimescaleDB handles the time-series side — events grouped by year or month, with index structures tuned for time-range scans. Both extensions are mature. Both run on a single Postgres instance. Both are supported by the existing VPS Postgres that the deploy decision is now leaning toward.

For Spaden's expected first year, this is the right answer. One database. Two extensions. Teller-work and accountant-work in the same place. Backups in the same place. Authentication in the same place. The complexity budget stays small.

When That Stops Being Enough

The Postgres-does-both answer holds until a specific kind of query starts misbehaving. The shape is usually the same. An accountant-work query, run interactively, takes uncomfortably long. Five seconds, then ten, then a minute. The user interface waits. Pär notices.

When that happens, there are two natural moves. Neither is dramatic.

The first move is to tune the existing Postgres. Better indexes for the queries that hurt. Materialised views that pre-compute the heavy aggregations. Partitioning by time so the database only scans recent partitions for recent queries. This buys another year of growth without a second database.

The second move, when tuning is not enough, is to add a separate analytical engine for the heavy accountant-work. The Postgres stays primary; it owns the writes and the canonical state. A separate engine reads from Postgres at intervals, holds a copy shaped for analytical scans, and answers the heavy queries fast. Pär does not see the split. The user interface routes queries to the right engine quietly.

The candidate for that second engine, at Spaden's scale, is almost certainly DuckDB.

The Quiet Strength Of DuckDB

DuckDB deserves its own paragraph because it is a different kind of analytical database than the ones a general audience hears about.

It is embedded. It does not run as a service. There is no daemon, no port, no separate process. It is a library that the application loads, like SQLite for the teller side. It reads parquet files directly off the disk or off the bucket. It is fast at the kind of scan-heavy queries that hurt a teller-tuned Postgres.

What this means for Spaden, practically, is that adding analytical capability does not require a new service to deploy, monitor, back up, or update. Nightly, Spaden writes a parquet export of the relevant tables to the main bucket. DuckDB reads those exports for any heavy query. The Postgres handles everything else.

The infrastructure cost is essentially zero. The integration cost is real but small. The split is reversible — if DuckDB ever becomes unnecessary, the export job goes away and Postgres is still doing everything.

This is the upgrade path. Start with one Postgres. When something specific slows down, add DuckDB beside it. Do not pre-emptively deploy ClickHouse, or Snowflake, or BigQuery. Those tools exist for problems Spaden does not have.

The Mistake Worth Naming

The mistake that would cost Spaden the most is committing to the analytical engine before there is evidence of need. The temptation looks like this. The substrate's load-bearing capability is cross-investigation discovery on a timeline with area filters. That sounds analytical. So we should design with analytical infrastructure from day one. ClickHouse is a real database. Let us start there.

This is wrong for three reasons. First, it commits to operational complexity before the workload justifies it. ClickHouse is a service with its own ingest patterns, its own backup story, its own update cadence. For one journalist's corpus, this is overhead per analytical query that the analytical query does not earn back.

Second, it bifurcates the data model on day one. Some things are in Postgres, some in ClickHouse, the syncing is its own engineering problem. Every new feature has to decide which side it lives on. Every join across them is a pipeline.

Third, and most importantly for an artificial-intelligence-built tool, it doubles the surface area Claude has to debug and edit. Postgres patterns are saturated in training data. ClickHouse patterns are present but thinner. Every round on the ClickHouse side runs at higher risk because the model has fewer examples to draw on. The round count rises. The build slows down. The exotic-stack rule from the planning documents applies here directly.

The right move is to be ready for the second engine without prematurely deploying it. Design the schema in Postgres so that nightly parquet exports are clean. Keep the analytical queries in a code path that could route to a different engine later. Do not actually route them until something hurts.

The Actual Answer For Spaden

A summary, said carefully because the question deserves it.

Spaden runs both teller-work and accountant-work. For the first year and probably longer, a single Postgres with PostGIS and TimescaleDB extensions handles both. This is the boring correct answer and it is correct because Spaden's scale is small enough that the structural compromise of a single database does not hurt.

When a specific analytical query begins to take uncomfortable time, the path is to tune the Postgres first. Materialised views. Better indexes. Partitioning. These usually buy a long extension.

When tuning is not enough, add DuckDB alongside Postgres. Nightly parquet exports become the bridge. The user interface routes heavy queries to the analytical engine quietly. The infrastructure cost stays minimal because DuckDB is embedded, not a service.

Do not deploy ClickHouse, Snowflake, or other heavy analytical engines on day one. The complexity budget is precious. The exotic-infrastructure pull is real and worth resisting. The boring path is the right path until something specific breaks the boring path.

The Question Reframed

Pär asked whether Spaden needs one database or two. The answer is one for now, with a clear path to two when the work demands it. The two-database day will come, probably. But it does not come on day one. It comes on whichever specific morning a query that used to be quick takes long enough that Pär notices.

Designing for that day, without paying for it before it arrives, is the move. The substrate stays small while it is small. It scales by adding the cheapest possible analytical layer when the scale demands it. The teller and the accountant work at the same desk for now. When the accountant needs more space, the new desk goes next to the old one. The bank does not move.