The Question That Kept Growing: How a Search Skill Became an Entire Memory

A Small Question

It started with something perfectly reasonable. Four search skills, each pointing at a different archive. Emails over here. Newspaper articles over there. A thousand and nine hundred AI conversations in a third place. Reddit off in the corner doing its own thing. And the question was simple enough. Can we add a slash find command that searches all of them at once?

That is a thirty minute task. Read the skill files, write a new one that fans out across the backends, merge the results. Ship it. Move on with your day.

But something happens when you actually look at the pieces instead of just describing them. You start noticing things. Like the Reddit skill quietly burning Opus tokens on what amounts to a bash wrapper. Like the email archive sitting on ten thousand messages that nobody has cleaned in three years. Like the newspaper search still using raw grep across six hundred files because nobody ever built it a proper index.

The thirty minute task was already growing.

Three Judges Walk Into a Database

The first real fork in the road was the database question. Should the unified search live in SQLite, because that is what already works? Or PostgreSQL, because that is what serious projects use? The instinct was to just pick one and go.

Instead, three independent evaluations ran in parallel. One argued for SQLite with conviction. One argued against it with equal conviction. A third evaluated PostgreSQL on its own merits without knowing what the other two were saying.

The single maintainer went dark for six months in twenty twenty five. A community fork had to patch a memory leak. The vector search extension has no approximate nearest neighbor index. You are proposing to stack three fragile C extensions on top of each other and call it infrastructure.

At fifteen thousand documents, brute force vector search takes two milliseconds on Apple Silicon. The entire knowledge base fits in one file. Backup is a copy command. There is no daemon, no port, no authentication, no vacuuming schedule. For a single user system, that is not simplicity. That is architecture.

And then the strange part. All three evaluations, arguing from completely different positions, converged on the same answer. Not SQLite. Not PostgreSQL. Both. Local SQLite for the fast interactive searches you run fifty times a day. PostgreSQL on the server for the deep semantic queries that need vector similarity and fuzzy matching and cross archive entity linking.

The question had grown again. It was not about where to put a search index anymore. It was about the topology of memory itself.

The Newspaper That Changed Everything

The real turn happened when someone said wait, read the other project first. The one that does not exist yet. The future editorial system for a small Swedish newspaper in the mountains.

Buried in its ideas document, written months earlier during a completely different conversation, was a section called Email and CRM Integration. The dream. While writing an email to a customer, see their full history. Every ad they have booked. Every article they appeared in. Every conversation about their account across three years of email threads.

And suddenly the search project was not a search project anymore.

The newspaper system needed an archive of articles. Already existed, nine hundred and seventy six of them, extracted by a previous experiment. It needed a people index. Already existed, six hundred and sixteen names pulled from those articles. It needed to connect email history to customer records. The email archive was right there, ten thousand messages deep.

The knowledge base was not a tool for finding things. It was the foundation layer for an entire business system that had not been built yet. The institutional memory that everything else would stand on.

The Swedish Compound Word Problem

Here is a detail that delighted no one. Swedish is a language that builds words by gluing other words together. A real estate agency is a fastighetsformedling. A newspaper archive is a tidningsarkiv. A municipal council meeting is a kommunfullmaktige.

Neither SQLite nor PostgreSQL can break these apart for search. The Snowball stemmer, which handles English beautifully, explicitly says it cannot split compound words without a dictionary. The Hunspell dictionary in PostgreSQL claims basic compound support but in practice treats fastighetsformedling as one opaque token.

Search for fastighet. Get zero results. The word is right there, it is just wearing a disguise.

This is not a database problem. This is a language problem. And it does not care which database you chose.

The solution, eventually, is a custom decomposition step in the ingestion pipeline. Break the compounds before they reach the index. But the discovery was humbling. Two hours of careful database evaluation, three independent research agents, dozens of web sources consulted. And the hardest problem turned out to be one that neither candidate could solve.

Building for Ghosts

The final expansion happened when the conversation zoomed out to the full ecosystem. Not just the newspaper. Not just the search tools. Everything. The personal productivity suite that acts as an operating system. The rental business. The consulting practice. The Raspberry Pi in the basement relaying SMS messages. The location service that knows where you are. The vehicle telemetry that knows where your car is.

All of it produces data. All of it has memory. And none of it talks to each other about the past.

The database schema that emerged has a column called source. It can be chatarkiv or email or newspaper. But it can also be where or bmw or lifelab or capture or focus. Sources that do not feed into the system today. Sources that might never feed into it. But if they do, the schema does not need to change. A new source is just new rows.

Someone asked the question that made the design click. Say you are writing an article and you want to know where you were when you did that interview last year. Not saying we will ever build that. But the option should be there.

That is what future proofing actually looks like. Not building features for hypothetical users. Building a schema that does not flinch when reality gets weirder than the plan.

The Thirty Minute Task

By the end of the session, the thirty minute search skill had become a PostgreSQL database with vector embeddings, Swedish full text search with Hunspell dictionaries, trigram fuzzy matching for OCR errors, entity extraction across ten thousand emails and six hundred newspaper issues, a hybrid search API, three ingestion scripts, a VPS setup script, and an architecture document that ties together four years of accumulated data into something that might actually deserve the word infrastructure.

Also the Reddit skill got fixed to use Haiku instead of Opus. That was genuinely a thirty second task.

The lesson, if there is one, is about the value of not starting. Of letting the question expand before you answer it. The search skill was the right place to begin. But if anyone had built it in thirty minutes and moved on, the newspaper system would have ended up inventing its own archive layer six months later. The email history would have stayed in its silo. The entity connections between a person in an article and a person in an email and a person in a booking would have remained invisible.

Sometimes the best engineering decision is to keep asking why until the question stops growing. And then build the thing the question was actually about.