Doing the Work to Learn the Work: The Quiet Pattern Behind Good Software

The Wizard Behind the Curtain

In the early days of Reddit, the founders had a problem that every new social platform has. There was no content. People showed up, saw nothing interesting, and left. The site was dead, even though the code worked perfectly well. They needed users posting before they could attract more users posting, and they had neither, which is the classic chicken and egg problem of building any community.

So they cheated. Steve Huffman and Alexis Ohanian, the founders, created hundreds of fake accounts and used them to post links. They scoured the internet for interesting things and posted them under invented usernames with invented voices. Each fake user had its own posting style, its own interests, its own commenting tone. To anyone visiting Reddit in late two-thousand-five, the site looked like a small but lively community of curious people sharing things. It was, in fact, two guys impersonating a community of dozens.

[calm]

This is not just a funny startup story for the founder mythology shelf. It is an instance of a pattern that shows up everywhere in software, and the pattern is worth giving a name. The founders were doing manually what the platform was supposed to do automatically. The fake users were a kind of wizard behind the curtain — humans pretending to be a system, until the system became real on its own.

The Original Wizard of Oz

The pattern has an older name in the field that studies how humans interact with computers. They call it the Wizard of Oz technique, after the Frank Baum book, where the great and powerful wizard turns out to be a small man behind a curtain operating levers. In a Wizard of Oz test, a researcher tells a participant they are interacting with a computer system. The participant types or speaks to the system. They see responses come back. They evaluate the experience as if it were real.

What the participant does not know is that the computer is not a computer. It is a human researcher in another room, reading their inputs and typing replies as fast as they can. The point of this is not the deception, although that part is funny. The point is that the researcher gets to test what the system should do before building it. They get real human responses to a system that does not exist yet.

They see where users stumble. They see what users expect to happen. They see what confuses people and what feels obvious. And they get all of this without writing a single line of the eventual product. This is a remarkable trick. By using something expensive and flexible — a human researcher — to simulate something cheap and rigid — a finished product — you get to learn what the cheap and rigid version should actually do. You buy clarity now, with effort, and spend it later on accurate engineering.

The Pattern, Generalized

Once you start looking for this pattern, you see it everywhere across the history of software and product development. The fake Reddit users were one instance. There are many others.

There is a story about Mechanical Turk, the eighteenth-century chess-playing automaton, which was actually a small skilled chess player hidden inside an ornate cabinet. The automaton fooled audiences across Europe for decades, even playing Napoleon and Benjamin Franklin. The pattern is so old that Amazon eventually named a service after it — their Mechanical Turk platform, where humans do small tasks that look like they could be automated but are not, at least not yet.

There is a more recent story about luggage tracking, where for one company in the early days, the app would let you track your bag in real time. The tracking, internally, was a human employee opening a spreadsheet and typing in updates as bags moved through the warehouse. Eventually they built the real system. By then, they knew exactly what tracking events mattered, exactly how often customers wanted updates, exactly what to do when a bag was delayed in a particular hub.

There are countless smaller stories. The startup whose AI customer service was a team of contractors in another country for the first six months. The fintech whose fraud risk model was a junior analyst doing each check by hand at a desk. The food delivery app whose dispatch algorithm was a human watching a map and texting drivers from their phone. These are not failures. They are not even shortcuts. They are field research, conducted by doing the work itself, with cheap flexible labor in place of expensive rigid code.

The AI Conversation as Wizard

[serious]

Here is the part that is genuinely new in our current moment. The Wizard of Oz pattern, for most of its history, required a human wizard. That made it expensive, slow, and hard to keep secret for long. The humans got tired. The humans wanted to be paid for their time. The humans eventually told someone what they were doing, and the trick was up, and the project either built the real system or shut down.

But there is now another option, which is a large language model in a conversational interface. The model is not a human. It does not get tired. It costs cents per interaction instead of dollars per hour. And it can pretend to be a system that does not yet exist, in ways that are good enough to do real work for real people who do not need to be told they are part of an experiment.

This means the Wizard of Oz pattern, which used to be a research technique reserved for early-stage product design and academic studies, is now available as a way to actually run a small business. Not just to test what the system should do. To run the system, using the conversational wizard, while you figure out what the deterministic version should eventually look like when you get around to building it.

A Small Concrete Example

Imagine you run a small business that needs to do its accounting. You have an accounting software with a documented API. You could write a script that pulls your transactions, categorizes them, posts them to the right accounts, generates the reports you need. The script would take a week to build, maybe more. You would need to know the API well. You would need to know your own bookkeeping needs deeply enough to encode every rule and exception into code.

Or you could open a conversation with an AI assistant. You could give it access to the API. You could ask it to do the work, transaction by transaction, while you watch and correct. You could do your accounting for three months this way. It would be slower than the eventual script, on a per-transaction basis, but faster than building the script before you fully understood what the script needed to do.

At the end of three months, you would have done your actual accounting, which is the thing you needed done in the first place. You would also know the API intimately, because you watched it being used hundreds of times in slightly different ways. You would know your own bookkeeping rules, because you corrected the assistant whenever it got them wrong. You would know which transactions are routine and which are weird, because you handled all of them personally with the wizard helping.

Now you write the script. It takes a day instead of a week, because the specification was being written continuously in the form of the conversation. Every weird edge case you encountered is in your head. Every API quirk is documented somewhere in the back-and-forth. The script is not the product of guessing at requirements. It is the product of having lived the requirements for ninety days. The two things produce very different software, and the lived version is almost always better.

The Spec That Writes Itself

[calm]

This is the deeper point worth sitting with. In traditional software development, the specification comes before the code. Someone writes down what the system should do, in a document, ahead of time. The code is built to match that specification. Then reality intervenes, as it always does, and the specification turns out to be wrong in a hundred small ways nobody could have predicted, and the code has to be rewritten, and the timeline slips.

In conversational prototyping, the specification is the conversation. It is being written, continuously, in the form of "do this, no not like that, like this instead." The corrections become the requirements. The successful interactions become the happy path. The strange edge cases become the test suite. None of this is written down formally as a document. It is captured in the artifact of the conversation, which can be reviewed later, summarized, and turned into actual documentation if you want.

The deterministic version, when it eventually gets built, is not a translation of requirements from one form to another. It is a compression of behavior. The conversation worked. The script will do the same thing the conversation did, more cheaply and more reliably, but with the design already validated by months of actual use in the real world.

What This Lets You Skip

The traditional process of building a new tool for yourself has several painful steps that everyone has experienced and nobody enjoys. First, decide what to build. Second, figure out the requirements in detail. Third, design the architecture for those requirements. Fourth, implement the design. Fifth, discover that the requirements were wrong in some fundamental way. Sixth, refactor everything and pretend it was always going to look like this.

Steps two, three, and five are the expensive ones. They cost time and clarity that you do not actually have when you are starting out, because you have not used the thing yet, and you do not know what you actually need from it. You are guessing, intelligently maybe, but still guessing about your own future behavior.

Conversational prototyping collapses these middle steps. You skip the upfront specification, you skip the upfront architecture, and you skip the painful refactor that follows the first attempt. You start with the conversation doing the work. The work gets done. The understanding accumulates as you go. The deterministic version, when you build it, is built once and built right, because by then you actually know what you are doing and you are no longer guessing.

The Quiet Trap

There is a trap in this pattern worth naming clearly, because it is easy to fall into and the falling is not obvious until you have already fallen.

The trap is that the conversational version starts to feel like the finished product. It works. It does the job. The motivation to build the deterministic version fades, because the deterministic version starts to feel like just an optimization of something that is already working well enough.

The problem is that the conversational version depends on an external service that you do not control and never will. Pricing might shift suddenly. Models might change in ways that break your workflow. The service might go down for an afternoon on the worst possible day. The thing that felt like a finished product turns out to be a temporary scaffolding made of someone else's infrastructure, and you discover this on the day when you needed the thing to work and it did not work, and now you are scrambling.

The discipline, then, is to treat the conversational version explicitly as scaffolding from the beginning. It is doing the work and writing the spec at the same time. The deterministic version is the finished thing that you will eventually build using everything you learned from the scaffolding. The two roles are different, and pretending otherwise is the kind of mistake that bites you nine months later, in a way that feels small at first and then is not small at all when you trace it back to its source.

A New Old Pattern

This is one of those situations where something genuinely new in the world is best understood by reference to something genuinely old. The Wizard of Oz pattern is decades old in its named form, centuries old in its underlying instinct. The fake users on Reddit are one instance. The hidden humans in early AI products are another instance. The accounting done by a conversation with a language model is the latest instance, more efficient than the previous ones, more flexible, but the same pattern down at the bones of the thing.

[serious]

The shape of good software, in retrospect, is almost always shaped by the work it was built to do. Software designed in the abstract, ahead of use, tends to be wrong about something important. Software designed by doing the work first and then formalizing it tends to be right about most things, because it was shaped by reality before it was carved into code that resists being changed.

The conversational era of computing is, among other things, a new substrate for this old pattern of doing the work to learn the work. The wizard is cheaper than it has ever been. The pretending is more convincing than it has ever been. And the work gets done while the spec gets written, by the same act, with the same effort, in the same conversation. That part is genuinely new. The instinct underneath it is as old as software itself, just finally given a tool that makes it cheap enough to use as a default rather than a luxury.