How many transcripts does it take to fill a website? Thirty-six, apparently. A podcast site called parpod dot net had thirty-six episode pages, each with a title, a subtitle, an audio player, and a scroll-to-read button labeled "Read." The button scrolled to nothing. Every episode page ended at the frontmatter. The audio was there. The words were not.
The transcripts existed. They had always existed. One hundred and thirty-nine full narration scripts, sitting in a directory on the same machine, written months earlier as the source material for text-to-speech generation. Rich, detailed scripts with voice casting notes, sound effect cues, and source citations. Thirty-five-thousand-word deep dives on SQLite, and fourteen-thousand-word investigations of OpenSSL's Heartbleed. They were all there, waiting for someone to connect the dots.
The problem was that nobody had introduced the scripts to the website. And as it turned out, the scripts did not speak the same language as the pages.
The website called one episode "five eighty-five conversations." The script called it "five hundred and eighty-five conversations in forty-four days." The website said "curl twenty eight years." The script said "curl twenty eight years from a swedish suburb." The website said "garbage in fine tuning disaster." The script said "garbage in a fine tuning disaster in three acts."
These are the same episodes. The same audio. The same words. But somewhere between the podcast generation pipeline and the website import, someone had trimmed the slugs down for readability, and in doing so severed the connection between the content and its source.
This is not a rare problem. It is one of the oldest problems in computing. Two systems that refer to the same thing by different names, each internally consistent, each unaware of the other's conventions. It is the problem that gave us foreign keys, canonical URLs, DOIs, ISBNs, and half the architecture of the modern internet. And at the scale of one person's podcast website, it manifested as thirty-six empty pages.
Three search agents launched simultaneously, each assigned a different quadrant of the filesystem. One searched the podcast tool directory. One searched the archives and Dropbox. One traced the audio file paths backward, looking for text files living alongside the MP3s.
I found a hundred and thirty-nine markdown files in episodes slash sources. They look like full narration scripts.
The first agent came back in eighteen seconds. The transcripts were exactly where you would expect them to be, in a directory called "sources" inside the podcast tool. The other two agents came back with nothing useful, having spent their time bumping into permission walls and reporting back apologetically.
One hundred and thirty-nine scripts. Thirty-six website episodes. The math should have been simple.
The first matching attempt was embarrassingly naive. Take the filename from the website, look for the same filename in the sources directory, report a match or a miss.
Eleven matches. Twenty-five misses. Only a third of the episodes had identical slugs in both systems. The rest were invisible to a direct lookup. "Zettelkasten the wooden box" on the website was "zettelkasten the wooden box that thought for itself" in the sources. "The committees that built the world" had appended "the hidden organizations behind every tap click and call." Every title had been truncated, rearranged, or simplified somewhere in the pipeline.
The slugs were shortened for the website. The originals were written for TTS, where you write out every word because the synthesizer will try to pronounce abbreviations. Different audiences, different naming conventions, same content underneath.
This is the heart of it. The podcast scripts were written for a machine that reads aloud. Numbers become words. Subtitles get spelled out in full. "Five hundred and eighty-five conversations in forty-four days" is what you write when a voice synthesizer needs to say it correctly. "Five eighty-five conversations" is what you write when a human needs to scan a URL. Both are right. Neither knows the other exists.
The second attempt used title matching. Extract the title from the website's frontmatter, extract the first line from each source file, normalize both, and compare. This caught most of them. "The Lindy Effect: Why Old Things Refuse to Die" matched to a source file that opened with the same words. Thirty-three episodes now had matches.
Three remained.
"AI That Cleaned My Repos" was not in the sources directory at all. It had been generated from a different pipeline and its script had landed in a contentbuilder archive, a directory two levels up in the project hierarchy, filed under a naming convention that included the series code. The full path was contentbuilder slash archive slash git dash B five dash the dash ai dash that dash cleaned dash my dash repos dot md. It was findable, but only if you knew to look outside the normal directory.
"Casting Claude in a Role" was worse. Its source file had been in the podcast tool's inbox, a staging area for unprocessed scripts. After the audio was generated, the inbox file was deleted. The script existed only as eighty-character preview snippets in a build manifest, truncated fragments like "A company needed to analyze a dense quarterly earnings report. They asked Claude" and then nothing. The full text was gone.
And then there was "The Four Ds of Working with AI." Its source file existed. It had the right filename, complete with the umlaut in Impart. But when you opened it, the content inside was about a completely different topic. The file called "the four ds of working with ai" contained an episode about who is using AI to code, a research paper from the University of Technology Vienna. Somewhere in the pipeline, two scripts had been swapped into each other's containers. The label said one thing. The contents said another.
That is a pipeline bug. The generation system wrote the wrong content into the right filename. The audio is correct because it was generated before the swap happened. But the source file is corrupted.
Even after matching, the scripts could not be pasted directly into a website. They were written for a text-to-speech engine, not a web browser. Every script contained voice casting directives, sound effect cues, and source citation comments that would render as garbled text on the page.
A voice block like this:
Richard, why don't you just write one?
Becomes a blockquote on the web. A sound cue like "sparkle" or "buzzer" disappears entirely. The source citations, careful inline comments pointing to podcast interviews and Wikipedia articles, get stripped because they interrupt the reading flow. What remains is the narration itself, clean prose organized by section headers, with dialogue formatted as quotes.
A Python script handled the conversion. Read each source file. Strip the title line because Hugo already has one in the frontmatter. Convert voice blocks to markdown blockquotes. Remove sound cues. Remove source comments. Remove metadata headers. Collapse excessive blank lines. Append the cleaned text to the Hugo episode file after the closing frontmatter delimiter.
Thirty-four files processed. Five thousand lines of content added. The build completed in thirty-eight milliseconds.
The website had always been a shell. Episode pages with audio players and no text, like a library where every book had a cover and a spine but no pages inside. The transcripts were never missing. They were sitting in a directory on the same machine, filed under names that were two words too long.
This is the quiet version of a problem that scales to every organization that stores information. The content exists. It was created, reviewed, processed, and stored. But the system that needs it calls it something different, and so the content might as well not exist. A thirty-five-thousand-word script about SQLite, sitting three directories away from the empty page that was supposed to display it, separated by nothing but a naming convention.
Two episodes remain empty. One lost its script to a cleanup process. The other is a ghost file, a filename that promises one thing and delivers another. They are reminders that even in a system built by one person, on one machine, information can still get lost between the cracks of its own naming conventions.
The other thirty-four pages now have words in them. The "Read" button scrolls to something. And the next time someone visits parpod dot net and wants to read a podcast instead of listening to it, they will find the text waiting for them, finally introduced to the page that was always meant to hold it.